Implementation:InternLM Lmdeploy GenerationConfig

Knowledge Sources	LMDeploy Pipeline API
Domains	Text_Generation, Sampling
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for parameterizing text generation sampling strategy provided by the LMDeploy library.

Description

The GenerationConfig dataclass controls all aspects of token sampling including temperature, top-k, top-p, repetition penalty, output length limits, stop conditions, and advanced features like n-best generation and logprobs output.

Usage

Import this class when you need to control generation behavior. Create an instance and pass it to Pipeline.__call__() or pipe.stream_infer(). Each prompt in a batch can have its own GenerationConfig.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/messages.py
Lines: L24-181

Signature

@dataclass
class GenerationConfig:
    n: int = 1                                  # Number of completions per prompt
    max_new_tokens: int = 512                   # Max output tokens
    top_p: float = 1.0                          # Nucleus sampling threshold
    top_k: int = 50                             # Top-k filtering
    temperature: float = 0.8                    # Sampling temperature
    repetition_penalty: float = 1.0             # Repetition penalty factor
    ignore_eos: bool = False                    # Continue past EOS token
    random_seed: int = None                     # Reproducibility seed
    stop_words: List[str] = None                # Stop word strings
    bad_words: List[str] = None                 # Banned word strings
    min_new_tokens: int = None                  # Min tokens before stop check
    skip_special_tokens: bool = True            # Strip special tokens from output
    logprobs: int = None                        # Top logprobs per token to return
    response_format: Dict = None                # JSON schema for guided output
    do_sample: bool = False                     # Enable stochastic sampling

Import

from lmdeploy import GenerationConfig

I/O Contract

Inputs

Name	Type	Required	Description
max_new_tokens	int	No	Maximum output token count (default: 512)
temperature	float	No	Sampling temperature (default: 0.8)
top_p	float	No	Nucleus sampling threshold (default: 1.0)
top_k	int	No	Top-k candidates (default: 50)
repetition_penalty	float	No	Penalty for repeated tokens (default: 1.0)
stop_words	List[str]	No	Strings that trigger generation stop
do_sample	bool	No	Enable sampling vs greedy (default: False)

Outputs

Name	Type	Description
GenerationConfig	dataclass	Configuration instance passed to Pipeline

Usage Examples

Deterministic Output

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('internlm/internlm2_5-7b-chat')

# Greedy decoding for factual responses
gen_config = GenerationConfig(
    max_new_tokens=256,
    temperature=0.0,
    do_sample=False
)

response = pipe('What is the speed of light?', gen_config=gen_config)
print(response.text)

Creative Sampling

from lmdeploy import GenerationConfig

# High-temperature sampling for creative text
gen_config = GenerationConfig(
    max_new_tokens=1024,
    temperature=0.9,
    top_p=0.95,
    top_k=40,
    do_sample=True,
    repetition_penalty=1.1
)

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_Generation_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment