Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlc ai Web llm Completion Request

From Leeroopedia
Knowledge Sources
Domains NLP, API_Protocol
Last Updated 2026-02-14 22:30 GMT

Overview

Concrete tool for constructing OpenAI-compatible text completion requests provided by the @mlc-ai/web-llm library.

Description

The Completions class and associated types define the raw text completion API for web-llm. Unlike the chat completion API which uses structured message arrays, this API accepts a single prompt string and generates text continuations. The types are adapted from the openai-node library to maintain API compatibility.

The type hierarchy consists of:

  • CompletionCreateParamsBase -- The base interface containing all request fields including prompt, generation parameters, and penalty parameters
  • CompletionCreateParamsNonStreaming -- Extends base with stream?: false | null
  • CompletionCreateParamsStreaming -- Extends base with stream: true
  • CompletionCreateParams -- Union type of streaming and non-streaming variants
  • Completion -- The response type containing an array of CompletionChoice objects with generated text
  • CompletionChoice -- Individual choice with text, finish_reason, and optional logprobs
  • Completions -- Wrapper class that delegates create() calls to engine.completion()

WebLLM-specific extensions not in OpenAI's API include:

  • repetition_penalty -- Multiplicative penalty for repeated tokens
  • ignore_eos -- When true, generation continues past stop tokens until max_tokens
  • extra_body.enable_latency_breakdown -- Includes per-stage timing in usage statistics

Usage

Import these types when building applications that need direct text generation from a prompt string rather than structured chat conversations. Use engine.completions.create() to execute a completion request. This API is appropriate for text continuation, code completion, and other tasks where multi-turn message formatting is unnecessary.

Code Reference

Source Location

Signature

export class Completions {
  private engine: MLCEngineInterface;

  constructor(engine: MLCEngineInterface);

  create(request: CompletionCreateParamsNonStreaming): Promise<Completion>;
  create(request: CompletionCreateParamsStreaming): Promise<AsyncIterable<Completion>>;
  create(request: CompletionCreateParams): Promise<AsyncIterable<Completion> | Completion>;
}

export interface CompletionCreateParamsBase {
  prompt: string;
  echo?: boolean | null;
  frequency_penalty?: number | null;
  logit_bias?: Record<string, number> | null;
  logprobs?: boolean | null;
  top_logprobs?: number | null;
  max_tokens?: number | null;
  n?: number | null;
  presence_penalty?: number | null;
  repetition_penalty?: number | null;
  seed?: number | null;
  stop?: string | null | Array<string>;
  stream?: boolean | null;
  stream_options?: ChatCompletionStreamOptions | null;
  temperature?: number | null;
  top_p?: number | null;
  ignore_eos?: boolean;
  model?: string | null;
  extra_body?: {
    enable_latency_breakdown?: boolean | null;
  };
}

export interface Completion {
  id: string;
  choices: Array<CompletionChoice>;
  created: number;
  model: string;
  object: "text_completion";
  system_fingerprint?: string;
  usage?: CompletionUsage;
}

export interface CompletionChoice {
  finish_reason: ChatCompletionFinishReason | null;
  index: number;
  logprobs?: ChatCompletion.Choice.Logprobs | null;
  text: string;
}

export function postInitAndCheckFields(
  request: CompletionCreateParams,
  currentModelId: string,
): void;

Import

import {
  CompletionCreateParams,
  CompletionCreateParamsNonStreaming,
  CompletionCreateParamsStreaming,
  Completion,
  CompletionChoice,
} from "@mlc-ai/web-llm";

I/O Contract

Inputs

Name Type Required Description
prompt string Yes The prompt string to generate completions for
stream boolean No If true, returns an async iterable of partial completion chunks
temperature number No Sampling temperature (0 to 2); higher values increase randomness
top_p number No Nucleus sampling threshold (0 to 1)
max_tokens number No Maximum tokens to generate
stop string[] No Stop sequence(s) that terminate generation
n number No Number of completions to generate (must be 1 when streaming)
echo boolean No Echo back the prompt in addition to the completion
frequency_penalty number No Penalizes tokens by frequency in generated text (-2.0 to 2.0)
presence_penalty number No Penalizes tokens that have appeared at all (-2.0 to 2.0)
repetition_penalty number No Multiplicative penalty for repeated tokens
seed number No Integer seed for deterministic generation
logprobs boolean No Whether to return log probabilities of output tokens
top_logprobs number No Number of most likely tokens to return with logprobs (0 to 5)
logit_bias Record<string, number> No Token ID to bias value mapping (-100 to 100)
model string No Model ID; optional if only one model is loaded

Outputs

Name Type Description
Completion Completion Response object with id, choices array, created timestamp, model, and usage
CompletionChoice.text string The generated text for each choice
CompletionChoice.finish_reason ChatCompletionFinishReason Reason generation stopped: "stop" or "length"
CompletionChoice.logprobs Logprobs Token-level log probabilities (if requested)
usage CompletionUsage Token counts for prompt and completion

Validation rules (enforced by postInitAndCheckFields()):

  • Fields suffix, user, and best_of are unsupported and will throw UnsupportedFieldsError
  • Streaming with n > 1 is not allowed (throws StreamingCountError)
  • seed must be an integer if provided (throws SeedTypeError)
  • stream_options requires stream: true (throws InvalidStreamOptionsError)

Usage Examples

Basic Text Completion

import { CreateMLCEngine } from "@mlc-ai/web-llm";

// 1. Create engine with a model
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

// 2. Create a non-streaming completion
const completion = await engine.completions.create({
  prompt: "The capital of France is",
  max_tokens: 50,
  temperature: 0.7,
});

// 3. Read generated text
console.log(completion.choices[0].text);
console.log("Finish reason:", completion.choices[0].finish_reason);

Streaming Completion

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

// Stream text completion tokens
const stream = await engine.completions.create({
  prompt: "Once upon a time,",
  max_tokens: 200,
  temperature: 0.8,
  stream: true,
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.text) {
    process.stdout.write(chunk.choices[0].text);
  }
  if (chunk.usage) {
    console.log("\nUsage:", chunk.usage);
  }
}

Completion with Logprobs

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC");

const completion = await engine.completions.create({
  prompt: "The meaning of life is",
  max_tokens: 30,
  temperature: 0,
  logprobs: true,
  top_logprobs: 3,
  echo: true,
});

// Access logprobs for each token
const choice = completion.choices[0];
console.log("Text:", choice.text);
if (choice.logprobs) {
  console.log("Token logprobs:", choice.logprobs);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment