Implementation:InternLM Lmdeploy GenerationConfig
| Knowledge Sources | |
|---|---|
| Domains | Text_Generation, Sampling |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for parameterizing text generation sampling strategy provided by the LMDeploy library.
Description
The GenerationConfig dataclass controls all aspects of token sampling including temperature, top-k, top-p, repetition penalty, output length limits, stop conditions, and advanced features like n-best generation and logprobs output.
Usage
Import this class when you need to control generation behavior. Create an instance and pass it to Pipeline.__call__() or pipe.stream_infer(). Each prompt in a batch can have its own GenerationConfig.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/messages.py
- Lines: L24-181
Signature
@dataclass
class GenerationConfig:
n: int = 1 # Number of completions per prompt
max_new_tokens: int = 512 # Max output tokens
top_p: float = 1.0 # Nucleus sampling threshold
top_k: int = 50 # Top-k filtering
temperature: float = 0.8 # Sampling temperature
repetition_penalty: float = 1.0 # Repetition penalty factor
ignore_eos: bool = False # Continue past EOS token
random_seed: int = None # Reproducibility seed
stop_words: List[str] = None # Stop word strings
bad_words: List[str] = None # Banned word strings
min_new_tokens: int = None # Min tokens before stop check
skip_special_tokens: bool = True # Strip special tokens from output
logprobs: int = None # Top logprobs per token to return
response_format: Dict = None # JSON schema for guided output
do_sample: bool = False # Enable stochastic sampling
Import
from lmdeploy import GenerationConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| max_new_tokens | int | No | Maximum output token count (default: 512) |
| temperature | float | No | Sampling temperature (default: 0.8) |
| top_p | float | No | Nucleus sampling threshold (default: 1.0) |
| top_k | int | No | Top-k candidates (default: 50) |
| repetition_penalty | float | No | Penalty for repeated tokens (default: 1.0) |
| stop_words | List[str] | No | Strings that trigger generation stop |
| do_sample | bool | No | Enable sampling vs greedy (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| GenerationConfig | dataclass | Configuration instance passed to Pipeline |
Usage Examples
Deterministic Output
from lmdeploy import pipeline, GenerationConfig
pipe = pipeline('internlm/internlm2_5-7b-chat')
# Greedy decoding for factual responses
gen_config = GenerationConfig(
max_new_tokens=256,
temperature=0.0,
do_sample=False
)
response = pipe('What is the speed of light?', gen_config=gen_config)
print(response.text)
Creative Sampling
from lmdeploy import GenerationConfig
# High-temperature sampling for creative text
gen_config = GenerationConfig(
max_new_tokens=1024,
temperature=0.9,
top_p=0.95,
top_k=40,
do_sample=True,
repetition_penalty=1.1
)