Implementation:Mlc ai Web llm Chat Completion Request
Overview
ChatCompletionRequest is the TypeScript interface provided by @mlc-ai/web-llm for constructing OpenAI-compatible chat completion request objects. It mirrors the OpenAI Chat Completion API request format and supports messages, streaming, generation parameters, penalty parameters, tool calling, structured output, and log probability output.
Description
The request type hierarchy consists of:
- ChatCompletionRequestBase -- The base interface containing all request fields
- ChatCompletionRequestNonStreaming -- Extends base with
stream?: false | null - ChatCompletionRequestStreaming -- Extends base with
stream: true - ChatCompletionRequest -- Union type of streaming and non-streaming variants
The messages array accepts four message types via the ChatCompletionMessageParam union:
- ChatCompletionSystemMessageParam --
role: "system",content: string - ChatCompletionUserMessageParam --
role: "user",content: string | Array<ChatCompletionContentPart>(array form for VLM image input) - ChatCompletionAssistantMessageParam --
role: "assistant", optionaltool_calls - ChatCompletionToolMessageParam --
role: "tool",tool_call_id: string
The ResponseFormat type supports four modes:
type: "text"-- Free-form text (default)type: "json_object"-- Valid JSON output, with optionalschemastringtype: "grammar"-- Output constrained by EBNF grammar stringtype: "structural_tag"-- Tag-delimited constraint blocks
WebLLM-specific extensions not in OpenAI's API include:
repetition_penalty-- Multiplicative penalty for repeated tokensignore_eos-- When true, generation continues past stop tokens untilmax_tokensextra_body.enable_thinking-- Controls thinking token generation for Qwen3 modelsextra_body.enable_latency_breakdown-- Includes per-stage timing in usage statistics
Code Reference
- Repository: https://github.com/mlc-ai/web-llm
- File:
src/openai_api_protocols/chat_completion.ts - ChatCompletionRequestBase: Lines 91-287
- ChatCompletionRequest type: Lines 305-307
- Message types: Lines 710-788
- ResponseFormat: Lines 1194-1223
- Validation:
postInitAndCheckFields()at lines 418-601
Type Signature
export interface ChatCompletionRequestBase {
messages: Array<ChatCompletionMessageParam>;
stream?: boolean | null;
stream_options?: ChatCompletionStreamOptions | null;
n?: number | null;
frequency_penalty?: number | null;
presence_penalty?: number | null;
repetition_penalty?: number | null;
max_tokens?: number | null;
stop?: string | null | Array<string>;
temperature?: number | null;
top_p?: number | null;
logit_bias?: Record<string, number> | null;
logprobs?: boolean | null;
top_logprobs?: number | null;
seed?: number | null;
tool_choice?: ChatCompletionToolChoiceOption;
tools?: Array<ChatCompletionTool>;
response_format?: ResponseFormat;
ignore_eos?: boolean;
model?: string | null;
extra_body?: {
enable_thinking?: boolean | null;
enable_latency_breakdown?: boolean | null;
};
}
export type ChatCompletionRequest =
| ChatCompletionRequestNonStreaming
| ChatCompletionRequestStreaming;
Import
import {
ChatCompletionRequest,
ChatCompletionRequestNonStreaming,
ChatCompletionRequestStreaming,
ChatCompletionMessageParam,
} from "@mlc-ai/web-llm";
I/O Contract
| Direction | Name | Type | Required | Description |
|---|---|---|---|---|
| Input | messages | Array<ChatCompletionMessageParam> |
Yes | Conversation history; last message must be from user or tool
|
| Input | stream | boolean |
No | If true, returns an async iterable of chunks instead of a complete response |
| Input | temperature | number |
No | Sampling temperature (0 to 2); defaults to model config value |
| Input | top_p | number |
No | Nucleus sampling threshold (0 to 1); defaults to model config value |
| Input | max_tokens | number |
No | Maximum tokens to generate; must be > 0 |
| Input | stop | string[] | No | Stop sequence(s) that terminate generation |
| Input | tools | Array<ChatCompletionTool> |
No | Function definitions for tool calling (limited to supported models) |
| Input | response_format | ResponseFormat |
No | Output format constraint (text, json_object, grammar, structural_tag) |
| Output | -- | ChatCompletionRequest |
-- | Request object ready for engine.chat.completions.create()
|
Validation rules (enforced by postInitAndCheckFields()):
- System message must be first if present
- Last message must be from
userortool - Streaming with
n > 1is not allowed seedmust be an integer if providedschemarequirestype: "json_object"grammarrequirestype: "grammar"and vice versatoolsonly supported for specific model IDs (Hermes-2-Pro, Hermes-3)stream_optionsrequiresstream: true
Usage Example
import { ChatCompletionMessageParam } from "@mlc-ai/web-llm";
// Simple single-turn request
const messages: ChatCompletionMessageParam[] = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
];
const request = {
messages: messages,
temperature: 0.7,
max_tokens: 256,
};
// Multi-turn conversation request
const multiTurnMessages: ChatCompletionMessageParam[] = [
{ role: "system", content: "You are a coding assistant." },
{ role: "user", content: "Write a hello world in Python." },
{ role: "assistant", content: "print('Hello, World!')" },
{ role: "user", content: "Now make it a function." },
];
const multiTurnRequest = {
messages: multiTurnMessages,
temperature: 0.3,
max_tokens: 512,
stream: true,
stream_options: { include_usage: true },
};
// Structured JSON output request
const jsonRequest = {
messages: [
{ role: "system", content: "Output valid JSON with keys: name, age." },
{ role: "user", content: "Tell me about Albert Einstein." },
],
response_format: {
type: "json_object" as const,
schema: '{"type":"object","properties":{"name":{"type":"string"},"age":{"type":"number"}},"required":["name","age"]}',
},
temperature: 0,
};