Implementation:InternLM Lmdeploy Response Dataclass

Knowledge Sources	LMDeploy
Domains	LLM_Inference, Data_Structures
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for encapsulating inference results including generated text, token counts, and finish reasons provided by the LMDeploy library.

Description

The Response dataclass packages all output information from a single inference request. It provides text output, token statistics, finish reason, optional logprobs, and an extend() method for incremental streaming aggregation.

Usage

Returned by Pipeline.__call__() and Pipeline.stream_infer(). Access response.text for the generated string, response.finish_reason to check completion status, and response.generate_token_len for usage tracking.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/messages.py
Lines: L460-547

Signature

@dataclass
class Response:
    text: str                                              # Generated text
    generate_token_len: int                                # Output token count
    input_token_len: int                                   # Input token count
    finish_reason: Optional[Literal['stop', 'length']] = None  # Stop reason
    token_ids: List[int] = field(default_factory=list)     # Output token IDs
    logprobs: List[Dict[int, float]] = None                # Per-token logprobs
    logits: torch.Tensor = None                            # Raw logits tensor
    last_hidden_state: torch.Tensor = None                 # Hidden state
    index: int = 0                                         # Batch position index

    def extend(self, other: 'Response') -> 'Response':
        """Merge another response into this one (for streaming)."""
        ...

Import

from lmdeploy.messages import Response

I/O Contract

Inputs

Name	Type	Required	Description
text	str	Yes	Generated text content
generate_token_len	int	Yes	Number of tokens generated
input_token_len	int	Yes	Number of input tokens (includes template)

Outputs

Name	Type	Description
text	str	The generated text
finish_reason	'stop' or 'length'	Why generation ended
generate_token_len	int	Output token count
input_token_len	int	Input token count
token_ids	List[int]	Raw output token IDs

Usage Examples

Response Inspection

from lmdeploy import pipeline

pipe = pipeline('internlm/internlm2_5-7b-chat')
response = pipe('What is AI?')

print(f"Text: {response.text}")
print(f"Input tokens: {response.input_token_len}")
print(f"Output tokens: {response.generate_token_len}")
print(f"Finish reason: {response.finish_reason}")
print(f"Token IDs: {response.token_ids[:10]}...")

pipe.close()

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_Response_Processing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment