Principle:Mlc ai Web llm Text Completion Configuration
| Knowledge Sources | |
|---|---|
| Domains | NLP, API_Protocol |
| Last Updated | 2026-02-14 22:30 GMT |
Overview
Technique for constructing structured request objects that specify a text prompt and generation parameters for raw text completion inference, following the OpenAI Completions API specification.
Description
Text completion configuration involves assembling a request object that encodes a prompt string and generation controls for direct text continuation. Unlike chat completion which operates on structured message arrays with role-based turns, text completion accepts a single prompt and generates text that continues from it.
The request object encodes:
- Prompt -- A single string from which the model generates continuations
- Generation parameters -- Controls for the autoregressive sampling process including temperature, top_p, max_tokens, and stop sequences
- Penalty parameters -- Frequency penalty, presence penalty, and repetition penalty to discourage repetitive outputs
- Output configuration -- Streaming mode toggle, echo setting to include the prompt in output, and log probability reporting
- Diagnostic options -- Seed for deterministic generation and latency breakdown reporting
The response contains an array of CompletionChoice objects, each with a text field containing the generated continuation, a finish_reason, and optional logprobs.
Usage
Use text completion configuration when the task requires direct text generation from a prompt without multi-turn conversation structure. Appropriate for:
- Text continuation -- Completing a partial sentence or paragraph
- Code completion -- Generating code from a partial snippet
- Prompt-based generation -- Tasks where chat message formatting adds unnecessary overhead
- Logprob analysis -- Examining token-level probabilities for a prompt and its continuation
Prefer the chat completion API instead when the task involves multi-turn conversations, system instructions, tool calling, or structured output with JSON schemas.
Theoretical Basis
Text completion follows the same autoregressive generation mechanism as chat completion, but without the conversation template layer:
Request Processing
- The prompt string is tokenized directly without chat template formatting
- Generation parameters (temperature, top_p, penalties) are applied identically to chat completion
- The
postInitAndCheckFields()validator enforces constraints before generation begins
Sampling Parameters
- temperature (0 to 2) -- Controls randomness of sampling; lower values produce more deterministic output
- top_p (0 to 1) -- Nucleus sampling; considers only tokens in the top
pprobability mass - seed -- Integer seed for deterministic generation; seeding is per-request, not per-choice
Penalty Parameters
- frequency_penalty (-2.0 to 2.0) -- Penalizes tokens by their count in generated text
- presence_penalty (-2.0 to 2.0) -- Penalizes tokens that have appeared at all
- repetition_penalty (> 0) -- Multiplicative penalty for repeated tokens
Differences from Chat Completion
| Aspect | Text Completion | Chat Completion |
|---|---|---|
| Input | Single prompt string | Array of role-tagged messages |
| Template | No chat template applied | Uses model's conversation template |
| Tool calling | Not supported | Supported via tools field
|
| Response format | Not supported | Supports JSON, grammar, structural tags |
| Echo | Can echo prompt in output | Not applicable |