Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Web llm Chat Completion Request

From Leeroopedia

Overview

ChatCompletionRequest is the TypeScript interface provided by @mlc-ai/web-llm for constructing OpenAI-compatible chat completion request objects. It mirrors the OpenAI Chat Completion API request format and supports messages, streaming, generation parameters, penalty parameters, tool calling, structured output, and log probability output.

Description

The request type hierarchy consists of:

  • ChatCompletionRequestBase -- The base interface containing all request fields
  • ChatCompletionRequestNonStreaming -- Extends base with stream?: false | null
  • ChatCompletionRequestStreaming -- Extends base with stream: true
  • ChatCompletionRequest -- Union type of streaming and non-streaming variants

The messages array accepts four message types via the ChatCompletionMessageParam union:

  • ChatCompletionSystemMessageParam -- role: "system", content: string
  • ChatCompletionUserMessageParam -- role: "user", content: string | Array<ChatCompletionContentPart> (array form for VLM image input)
  • ChatCompletionAssistantMessageParam -- role: "assistant", optional tool_calls
  • ChatCompletionToolMessageParam -- role: "tool", tool_call_id: string

The ResponseFormat type supports four modes:

  • type: "text" -- Free-form text (default)
  • type: "json_object" -- Valid JSON output, with optional schema string
  • type: "grammar" -- Output constrained by EBNF grammar string
  • type: "structural_tag" -- Tag-delimited constraint blocks

WebLLM-specific extensions not in OpenAI's API include:

  • repetition_penalty -- Multiplicative penalty for repeated tokens
  • ignore_eos -- When true, generation continues past stop tokens until max_tokens
  • extra_body.enable_thinking -- Controls thinking token generation for Qwen3 models
  • extra_body.enable_latency_breakdown -- Includes per-stage timing in usage statistics

Code Reference

  • Repository: https://github.com/mlc-ai/web-llm
  • File: src/openai_api_protocols/chat_completion.ts
  • ChatCompletionRequestBase: Lines 91-287
  • ChatCompletionRequest type: Lines 305-307
  • Message types: Lines 710-788
  • ResponseFormat: Lines 1194-1223
  • Validation: postInitAndCheckFields() at lines 418-601

Type Signature

export interface ChatCompletionRequestBase {
  messages: Array<ChatCompletionMessageParam>;
  stream?: boolean | null;
  stream_options?: ChatCompletionStreamOptions | null;
  n?: number | null;
  frequency_penalty?: number | null;
  presence_penalty?: number | null;
  repetition_penalty?: number | null;
  max_tokens?: number | null;
  stop?: string | null | Array<string>;
  temperature?: number | null;
  top_p?: number | null;
  logit_bias?: Record<string, number> | null;
  logprobs?: boolean | null;
  top_logprobs?: number | null;
  seed?: number | null;
  tool_choice?: ChatCompletionToolChoiceOption;
  tools?: Array<ChatCompletionTool>;
  response_format?: ResponseFormat;
  ignore_eos?: boolean;
  model?: string | null;
  extra_body?: {
    enable_thinking?: boolean | null;
    enable_latency_breakdown?: boolean | null;
  };
}

export type ChatCompletionRequest =
  | ChatCompletionRequestNonStreaming
  | ChatCompletionRequestStreaming;

Import

import {
  ChatCompletionRequest,
  ChatCompletionRequestNonStreaming,
  ChatCompletionRequestStreaming,
  ChatCompletionMessageParam,
} from "@mlc-ai/web-llm";

I/O Contract

Direction Name Type Required Description
Input messages Array<ChatCompletionMessageParam> Yes Conversation history; last message must be from user or tool
Input stream boolean No If true, returns an async iterable of chunks instead of a complete response
Input temperature number No Sampling temperature (0 to 2); defaults to model config value
Input top_p number No Nucleus sampling threshold (0 to 1); defaults to model config value
Input max_tokens number No Maximum tokens to generate; must be > 0
Input stop string[] No Stop sequence(s) that terminate generation
Input tools Array<ChatCompletionTool> No Function definitions for tool calling (limited to supported models)
Input response_format ResponseFormat No Output format constraint (text, json_object, grammar, structural_tag)
Output -- ChatCompletionRequest -- Request object ready for engine.chat.completions.create()

Validation rules (enforced by postInitAndCheckFields()):

  • System message must be first if present
  • Last message must be from user or tool
  • Streaming with n > 1 is not allowed
  • seed must be an integer if provided
  • schema requires type: "json_object"
  • grammar requires type: "grammar" and vice versa
  • tools only supported for specific model IDs (Hermes-2-Pro, Hermes-3)
  • stream_options requires stream: true

Usage Example

import { ChatCompletionMessageParam } from "@mlc-ai/web-llm";

// Simple single-turn request
const messages: ChatCompletionMessageParam[] = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" },
];

const request = {
  messages: messages,
  temperature: 0.7,
  max_tokens: 256,
};

// Multi-turn conversation request
const multiTurnMessages: ChatCompletionMessageParam[] = [
  { role: "system", content: "You are a coding assistant." },
  { role: "user", content: "Write a hello world in Python." },
  { role: "assistant", content: "print('Hello, World!')" },
  { role: "user", content: "Now make it a function." },
];

const multiTurnRequest = {
  messages: multiTurnMessages,
  temperature: 0.3,
  max_tokens: 512,
  stream: true,
  stream_options: { include_usage: true },
};

// Structured JSON output request
const jsonRequest = {
  messages: [
    { role: "system", content: "Output valid JSON with keys: name, age." },
    { role: "user", content: "Tell me about Albert Einstein." },
  ],
  response_format: {
    type: "json_object" as const,
    schema: '{"type":"object","properties":{"name":{"type":"string"},"age":{"type":"number"}},"required":["name","age"]}',
  },
  temperature: 0,
};

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment