Implementation:Mlc ai Web llm Completion Request

Knowledge Sources	Mlc_ai_Web_llm OpenAI Completions API
Domains	NLP, API_Protocol
Last Updated	2026-02-14 22:30 GMT

Overview

Concrete tool for constructing OpenAI-compatible text completion requests provided by the @mlc-ai/web-llm library.

Description

The Completions class and associated types define the raw text completion API for web-llm. Unlike the chat completion API which uses structured message arrays, this API accepts a single prompt string and generates text continuations. The types are adapted from the openai-node library to maintain API compatibility.

The type hierarchy consists of:

CompletionCreateParamsBase -- The base interface containing all request fields including prompt, generation parameters, and penalty parameters
CompletionCreateParamsNonStreaming -- Extends base with stream?: false | null
CompletionCreateParamsStreaming -- Extends base with stream: true
CompletionCreateParams -- Union type of streaming and non-streaming variants
Completion -- The response type containing an array of CompletionChoice objects with generated text
CompletionChoice -- Individual choice with text, finish_reason, and optional logprobs
Completions -- Wrapper class that delegates create() calls to engine.completion()

WebLLM-specific extensions not in OpenAI's API include:

repetition_penalty -- Multiplicative penalty for repeated tokens
ignore_eos -- When true, generation continues past stop tokens until max_tokens
extra_body.enable_latency_breakdown -- Includes per-stage timing in usage statistics

Usage

Import these types when building applications that need direct text generation from a prompt string rather than structured chat conversations. Use engine.completions.create() to execute a completion request. This API is appropriate for text continuation, code completion, and other tasks where multi-turn message formatting is unnecessary.

Code Reference

Source Location

Repository: Mlc_ai_Web_llm
File: src/openai_api_protocols/completion.ts
Lines: 1-381

Signature

export class Completions {
  private engine: MLCEngineInterface;

  constructor(engine: MLCEngineInterface);

  create(request: CompletionCreateParamsNonStreaming): Promise<Completion>;
  create(request: CompletionCreateParamsStreaming): Promise<AsyncIterable<Completion>>;
  create(request: CompletionCreateParams): Promise<AsyncIterable<Completion> | Completion>;
}

export interface CompletionCreateParamsBase {
  prompt: string;
  echo?: boolean | null;
  frequency_penalty?: number | null;
  logit_bias?: Record<string, number> | null;
  logprobs?: boolean | null;
  top_logprobs?: number | null;
  max_tokens?: number | null;
  n?: number | null;
  presence_penalty?: number | null;
  repetition_penalty?: number | null;
  seed?: number | null;
  stop?: string | null | Array<string>;
  stream?: boolean | null;
  stream_options?: ChatCompletionStreamOptions | null;
  temperature?: number | null;
  top_p?: number | null;
  ignore_eos?: boolean;
  model?: string | null;
  extra_body?: {
    enable_latency_breakdown?: boolean | null;
  };
}

export interface Completion {
  id: string;
  choices: Array<CompletionChoice>;
  created: number;
  model: string;
  object: "text_completion";
  system_fingerprint?: string;
  usage?: CompletionUsage;
}

export interface CompletionChoice {
  finish_reason: ChatCompletionFinishReason | null;
  index: number;
  logprobs?: ChatCompletion.Choice.Logprobs | null;
  text: string;
}

export function postInitAndCheckFields(
  request: CompletionCreateParams,
  currentModelId: string,
): void;

Import

import {
  CompletionCreateParams,
  CompletionCreateParamsNonStreaming,
  CompletionCreateParamsStreaming,
  Completion,
  CompletionChoice,
} from "@mlc-ai/web-llm";

I/O Contract

Inputs

Name	Type	Required	Description
prompt	`string`	Yes	The prompt string to generate completions for
stream	`boolean`	No	If true, returns an async iterable of partial completion chunks
temperature	`number`	No	Sampling temperature (0 to 2); higher values increase randomness
top_p	`number`	No	Nucleus sampling threshold (0 to 1)
max_tokens	`number`	No	Maximum tokens to generate
stop	string[]	No	Stop sequence(s) that terminate generation
n	`number`	No	Number of completions to generate (must be 1 when streaming)
echo	`boolean`	No	Echo back the prompt in addition to the completion
frequency_penalty	`number`	No	Penalizes tokens by frequency in generated text (-2.0 to 2.0)
presence_penalty	`number`	No	Penalizes tokens that have appeared at all (-2.0 to 2.0)
repetition_penalty	`number`	No	Multiplicative penalty for repeated tokens
seed	`number`	No	Integer seed for deterministic generation
logprobs	`boolean`	No	Whether to return log probabilities of output tokens
top_logprobs	`number`	No	Number of most likely tokens to return with logprobs (0 to 5)
logit_bias	`Record<string, number>`	No	Token ID to bias value mapping (-100 to 100)
model	`string`	No	Model ID; optional if only one model is loaded

Outputs

Name	Type	Description
Completion	`Completion`	Response object with id, choices array, created timestamp, model, and usage
CompletionChoice.text	`string`	The generated text for each choice
CompletionChoice.finish_reason	`ChatCompletionFinishReason`	Reason generation stopped: `"stop"` or `"length"`
CompletionChoice.logprobs	`Logprobs`	Token-level log probabilities (if requested)
usage	`CompletionUsage`	Token counts for prompt and completion

Validation rules (enforced by postInitAndCheckFields()):

Fields suffix, user, and best_of are unsupported and will throw UnsupportedFieldsError
Streaming with n > 1 is not allowed (throws StreamingCountError)
seed must be an integer if provided (throws SeedTypeError)
stream_options requires stream: true (throws InvalidStreamOptionsError)

Usage Examples

Basic Text Completion

import { CreateMLCEngine } from "@mlc-ai/web-llm";

// 1. Create engine with a model
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

// 2. Create a non-streaming completion
const completion = await engine.completions.create({
  prompt: "The capital of France is",
  max_tokens: 50,
  temperature: 0.7,
});

// 3. Read generated text
console.log(completion.choices[0].text);
console.log("Finish reason:", completion.choices[0].finish_reason);

Streaming Completion

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

// Stream text completion tokens
const stream = await engine.completions.create({
  prompt: "Once upon a time,",
  max_tokens: 200,
  temperature: 0.8,
  stream: true,
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.text) {
    process.stdout.write(chunk.choices[0].text);
  }
  if (chunk.usage) {
    console.log("\nUsage:", chunk.usage);
  }
}

Completion with Logprobs

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

const completion = await engine.completions.create({
  prompt: "The meaning of life is",
  max_tokens: 30,
  temperature: 0,
  logprobs: true,
  top_logprobs: 3,
  echo: true,
});

// Access logprobs for each token
const choice = completion.choices[0];
console.log("Text:", choice.text);
if (choice.logprobs) {
  console.log("Token logprobs:", choice.logprobs);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment