Implementation:Mlc ai Web llm Completion Request
| Knowledge Sources | |
|---|---|
| Domains | NLP, API_Protocol |
| Last Updated | 2026-02-14 22:30 GMT |
Overview
Concrete tool for constructing OpenAI-compatible text completion requests provided by the @mlc-ai/web-llm library.
Description
The Completions class and associated types define the raw text completion API for web-llm. Unlike the chat completion API which uses structured message arrays, this API accepts a single prompt string and generates text continuations. The types are adapted from the openai-node library to maintain API compatibility.
The type hierarchy consists of:
- CompletionCreateParamsBase -- The base interface containing all request fields including prompt, generation parameters, and penalty parameters
- CompletionCreateParamsNonStreaming -- Extends base with
stream?: false | null - CompletionCreateParamsStreaming -- Extends base with
stream: true - CompletionCreateParams -- Union type of streaming and non-streaming variants
- Completion -- The response type containing an array of
CompletionChoiceobjects with generated text - CompletionChoice -- Individual choice with
text,finish_reason, and optionallogprobs - Completions -- Wrapper class that delegates
create()calls toengine.completion()
WebLLM-specific extensions not in OpenAI's API include:
repetition_penalty-- Multiplicative penalty for repeated tokensignore_eos-- When true, generation continues past stop tokens untilmax_tokensextra_body.enable_latency_breakdown-- Includes per-stage timing in usage statistics
Usage
Import these types when building applications that need direct text generation from a prompt string rather than structured chat conversations. Use engine.completions.create() to execute a completion request. This API is appropriate for text continuation, code completion, and other tasks where multi-turn message formatting is unnecessary.
Code Reference
Source Location
- Repository: Mlc_ai_Web_llm
- File: src/openai_api_protocols/completion.ts
- Lines: 1-381
Signature
export class Completions {
private engine: MLCEngineInterface;
constructor(engine: MLCEngineInterface);
create(request: CompletionCreateParamsNonStreaming): Promise<Completion>;
create(request: CompletionCreateParamsStreaming): Promise<AsyncIterable<Completion>>;
create(request: CompletionCreateParams): Promise<AsyncIterable<Completion> | Completion>;
}
export interface CompletionCreateParamsBase {
prompt: string;
echo?: boolean | null;
frequency_penalty?: number | null;
logit_bias?: Record<string, number> | null;
logprobs?: boolean | null;
top_logprobs?: number | null;
max_tokens?: number | null;
n?: number | null;
presence_penalty?: number | null;
repetition_penalty?: number | null;
seed?: number | null;
stop?: string | null | Array<string>;
stream?: boolean | null;
stream_options?: ChatCompletionStreamOptions | null;
temperature?: number | null;
top_p?: number | null;
ignore_eos?: boolean;
model?: string | null;
extra_body?: {
enable_latency_breakdown?: boolean | null;
};
}
export interface Completion {
id: string;
choices: Array<CompletionChoice>;
created: number;
model: string;
object: "text_completion";
system_fingerprint?: string;
usage?: CompletionUsage;
}
export interface CompletionChoice {
finish_reason: ChatCompletionFinishReason | null;
index: number;
logprobs?: ChatCompletion.Choice.Logprobs | null;
text: string;
}
export function postInitAndCheckFields(
request: CompletionCreateParams,
currentModelId: string,
): void;
Import
import {
CompletionCreateParams,
CompletionCreateParamsNonStreaming,
CompletionCreateParamsStreaming,
Completion,
CompletionChoice,
} from "@mlc-ai/web-llm";
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt | string |
Yes | The prompt string to generate completions for |
| stream | boolean |
No | If true, returns an async iterable of partial completion chunks |
| temperature | number |
No | Sampling temperature (0 to 2); higher values increase randomness |
| top_p | number |
No | Nucleus sampling threshold (0 to 1) |
| max_tokens | number |
No | Maximum tokens to generate |
| stop | string[] | No | Stop sequence(s) that terminate generation |
| n | number |
No | Number of completions to generate (must be 1 when streaming) |
| echo | boolean |
No | Echo back the prompt in addition to the completion |
| frequency_penalty | number |
No | Penalizes tokens by frequency in generated text (-2.0 to 2.0) |
| presence_penalty | number |
No | Penalizes tokens that have appeared at all (-2.0 to 2.0) |
| repetition_penalty | number |
No | Multiplicative penalty for repeated tokens |
| seed | number |
No | Integer seed for deterministic generation |
| logprobs | boolean |
No | Whether to return log probabilities of output tokens |
| top_logprobs | number |
No | Number of most likely tokens to return with logprobs (0 to 5) |
| logit_bias | Record<string, number> |
No | Token ID to bias value mapping (-100 to 100) |
| model | string |
No | Model ID; optional if only one model is loaded |
Outputs
| Name | Type | Description |
|---|---|---|
| Completion | Completion |
Response object with id, choices array, created timestamp, model, and usage |
| CompletionChoice.text | string |
The generated text for each choice |
| CompletionChoice.finish_reason | ChatCompletionFinishReason |
Reason generation stopped: "stop" or "length"
|
| CompletionChoice.logprobs | Logprobs |
Token-level log probabilities (if requested) |
| usage | CompletionUsage |
Token counts for prompt and completion |
Validation rules (enforced by postInitAndCheckFields()):
- Fields
suffix,user, andbest_ofare unsupported and will throwUnsupportedFieldsError - Streaming with
n > 1is not allowed (throwsStreamingCountError) seedmust be an integer if provided (throwsSeedTypeError)stream_optionsrequiresstream: true(throwsInvalidStreamOptionsError)
Usage Examples
Basic Text Completion
import { CreateMLCEngine } from "@mlc-ai/web-llm";
// 1. Create engine with a model
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");
// 2. Create a non-streaming completion
const completion = await engine.completions.create({
prompt: "The capital of France is",
max_tokens: 50,
temperature: 0.7,
});
// 3. Read generated text
console.log(completion.choices[0].text);
console.log("Finish reason:", completion.choices[0].finish_reason);
Streaming Completion
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");
// Stream text completion tokens
const stream = await engine.completions.create({
prompt: "Once upon a time,",
max_tokens: 200,
temperature: 0.8,
stream: true,
stream_options: { include_usage: true },
});
for await (const chunk of stream) {
if (chunk.choices[0]?.text) {
process.stdout.write(chunk.choices[0].text);
}
if (chunk.usage) {
console.log("\nUsage:", chunk.usage);
}
}
Completion with Logprobs
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");
const completion = await engine.completions.create({
prompt: "The meaning of life is",
max_tokens: 30,
temperature: 0,
logprobs: true,
top_logprobs: 3,
echo: true,
});
// Access logprobs for each token
const choice = completion.choices[0];
console.log("Text:", choice.text);
if (choice.logprobs) {
console.log("Token logprobs:", choice.logprobs);
}