Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:InternLM Lmdeploy Pipeline Call

From Leeroopedia


Knowledge Sources
Domains LLM_Inference, Text_Generation
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for executing batch text generation through the Pipeline callable interface provided by the LMDeploy library.

Description

The Pipeline.__call__() method (and its underlying infer() and stream_infer() methods) is the primary interface for generating text. It accepts single or batched prompts in multiple formats, submits them to the async engine, and returns Response objects. Prompts are sorted by length for GPU efficiency and results are reordered to match the original input order.

Usage

Call the Pipeline object directly with prompts after initialization. Use the blocking mode for batch processing and stream_infer() for real-time streaming applications.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/pipeline.py
  • Lines: L83-122 (infer), L128-162 (stream_infer), L305-309 (__call__)

Signature

class Pipeline:
    def __call__(self,
                 prompts: List[str] | str | List[Dict] | List[List[Dict]],
                 gen_config: GenerationConfig | List[GenerationConfig] | None = None,
                 **kwargs) -> Response | List[Response]:
        return self.infer(prompts, gen_config=gen_config, **kwargs)

    def infer(self, prompts, gen_config=None, do_preprocess=None,
              adapter_name=None, **kwargs) -> List[Response]:
        ...

    def stream_infer(self, prompts, gen_config=None, do_preprocess=None,
                     adapter_name=None, stream_response=True,
                     **kwargs) -> Iterator[Iterator[Response]]:
        ...

Import

from lmdeploy import pipeline, GenerationConfig

I/O Contract

Inputs

Name Type Required Description
prompts str, List[str], List[Dict], or List[List[Dict]] Yes Single or batch prompts in string or OpenAI message format
gen_config GenerationConfig or List[GenerationConfig] No Sampling parameters (per-prompt or shared)
do_preprocess bool No Whether to apply chat template (default: True)
adapter_name str No LoRA adapter name to use for this request

Outputs

Name Type Description
Response or List[Response] Response Generated text with metadata (text, token counts, finish_reason)
Iterator[Iterator[Response]] Iterator Streaming mode: nested iterators yielding partial responses

Usage Examples

Batch Inference

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('internlm/internlm2_5-7b-chat')

# Batch of prompts
prompts = [
    'Explain neural networks briefly.',
    'Write a Python hello world.',
    'What is the capital of France?'
]

gen_config = GenerationConfig(max_new_tokens=256, temperature=0.7)
responses = pipe(prompts, gen_config=gen_config)

for i, resp in enumerate(responses):
    print(f"Prompt {i}: {resp.text[:100]}...")
    print(f"  Tokens: {resp.generate_token_len}, Reason: {resp.finish_reason}")

Streaming Output

from lmdeploy import pipeline

pipe = pipeline('internlm/internlm2_5-7b-chat')

for stream_outputs in pipe.stream_infer(['Tell me a story']):
    for response in stream_outputs:
        print(response.text, end='', flush=True)
print()

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment