Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy GenerationConfig

From Leeroopedia


Knowledge Sources
Domains Text_Generation, Sampling
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for parameterizing text generation sampling strategy provided by the LMDeploy library.

Description

The GenerationConfig dataclass controls all aspects of token sampling including temperature, top-k, top-p, repetition penalty, output length limits, stop conditions, and advanced features like n-best generation and logprobs output.

Usage

Import this class when you need to control generation behavior. Create an instance and pass it to Pipeline.__call__() or pipe.stream_infer(). Each prompt in a batch can have its own GenerationConfig.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/messages.py
  • Lines: L24-181

Signature

@dataclass
class GenerationConfig:
    n: int = 1                                  # Number of completions per prompt
    max_new_tokens: int = 512                   # Max output tokens
    top_p: float = 1.0                          # Nucleus sampling threshold
    top_k: int = 50                             # Top-k filtering
    temperature: float = 0.8                    # Sampling temperature
    repetition_penalty: float = 1.0             # Repetition penalty factor
    ignore_eos: bool = False                    # Continue past EOS token
    random_seed: int = None                     # Reproducibility seed
    stop_words: List[str] = None                # Stop word strings
    bad_words: List[str] = None                 # Banned word strings
    min_new_tokens: int = None                  # Min tokens before stop check
    skip_special_tokens: bool = True            # Strip special tokens from output
    logprobs: int = None                        # Top logprobs per token to return
    response_format: Dict = None                # JSON schema for guided output
    do_sample: bool = False                     # Enable stochastic sampling

Import

from lmdeploy import GenerationConfig

I/O Contract

Inputs

Name Type Required Description
max_new_tokens int No Maximum output token count (default: 512)
temperature float No Sampling temperature (default: 0.8)
top_p float No Nucleus sampling threshold (default: 1.0)
top_k int No Top-k candidates (default: 50)
repetition_penalty float No Penalty for repeated tokens (default: 1.0)
stop_words List[str] No Strings that trigger generation stop
do_sample bool No Enable sampling vs greedy (default: False)

Outputs

Name Type Description
GenerationConfig dataclass Configuration instance passed to Pipeline

Usage Examples

Deterministic Output

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('internlm/internlm2_5-7b-chat')

# Greedy decoding for factual responses
gen_config = GenerationConfig(
    max_new_tokens=256,
    temperature=0.0,
    do_sample=False
)

response = pipe('What is the speed of light?', gen_config=gen_config)
print(response.text)

Creative Sampling

from lmdeploy import GenerationConfig

# High-temperature sampling for creative text
gen_config = GenerationConfig(
    max_new_tokens=1024,
    temperature=0.9,
    top_p=0.95,
    top_k=40,
    do_sample=True,
    repetition_penalty=1.1
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment