Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Response Dataclass

From Leeroopedia
Revision as of 15:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Response_Dataclass.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains LLM_Inference, Data_Structures
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for encapsulating inference results including generated text, token counts, and finish reasons provided by the LMDeploy library.

Description

The Response dataclass packages all output information from a single inference request. It provides text output, token statistics, finish reason, optional logprobs, and an extend() method for incremental streaming aggregation.

Usage

Returned by Pipeline.__call__() and Pipeline.stream_infer(). Access response.text for the generated string, response.finish_reason to check completion status, and response.generate_token_len for usage tracking.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/messages.py
  • Lines: L460-547

Signature

@dataclass
class Response:
    text: str                                              # Generated text
    generate_token_len: int                                # Output token count
    input_token_len: int                                   # Input token count
    finish_reason: Optional[Literal['stop', 'length']] = None  # Stop reason
    token_ids: List[int] = field(default_factory=list)     # Output token IDs
    logprobs: List[Dict[int, float]] = None                # Per-token logprobs
    logits: torch.Tensor = None                            # Raw logits tensor
    last_hidden_state: torch.Tensor = None                 # Hidden state
    index: int = 0                                         # Batch position index

    def extend(self, other: 'Response') -> 'Response':
        """Merge another response into this one (for streaming)."""
        ...

Import

from lmdeploy.messages import Response

I/O Contract

Inputs

Name Type Required Description
text str Yes Generated text content
generate_token_len int Yes Number of tokens generated
input_token_len int Yes Number of input tokens (includes template)

Outputs

Name Type Description
text str The generated text
finish_reason 'stop' or 'length' Why generation ended
generate_token_len int Output token count
input_token_len int Input token count
token_ids List[int] Raw output token IDs

Usage Examples

Response Inspection

from lmdeploy import pipeline

pipe = pipeline('internlm/internlm2_5-7b-chat')
response = pipe('What is AI?')

print(f"Text: {response.text}")
print(f"Input tokens: {response.input_token_len}")
print(f"Output tokens: {response.generate_token_len}")
print(f"Finish reason: {response.finish_reason}")
print(f"Token IDs: {response.token_ids[:10]}...")

pipe.close()

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment