Implementation:InternLM Lmdeploy Pipeline Factory

Knowledge Sources	LMDeploy Pipeline API
Domains	LLM_Inference, API_Design
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for creating inference pipelines from model paths provided by the LMDeploy library.

Description

The pipeline() factory function is the primary entry point for LMDeploy's Python API. It downloads or loads a model, auto-detects architecture, selects the optimal backend (TurboMind or PyTorch), and returns a ready-to-use Pipeline object wrapping an async inference engine.

Usage

Import this function when starting any LMDeploy inference workload. Pass a HuggingFace model ID or local path, and optionally a backend configuration. The function handles all initialization automatically.

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/api.py
Lines: L15-74

Signature

def pipeline(model_path: str,
             model_name: Optional[str] = None,
             backend_config: Optional[Union[TurbomindEngineConfig,
                                            PytorchEngineConfig]] = None,
             chat_template_config: Optional[ChatTemplateConfig] = None,
             log_level: str = 'WARNING',
             max_log_len: int = None,
             speculative_config: Optional[SpeculativeConfig] = None,
             **kwargs) -> Pipeline:

Import

from lmdeploy import pipeline

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes	HuggingFace model ID or local directory path
model_name	Optional[str]	No	Override model name for chat template lookup
backend_config	TurbomindEngineConfig or PytorchEngineConfig	No	Engine configuration (auto-detected if None)
chat_template_config	Optional[ChatTemplateConfig]	No	Custom chat template configuration
log_level	str	No	Logging level (default: 'WARNING')
max_log_len	int	No	Max prompt characters in log output
speculative_config	Optional[SpeculativeConfig]	No	Speculative decoding configuration

Outputs

Name	Type	Description
Pipeline	Pipeline	Initialized inference pipeline with async engine, ready for __call__()

Usage Examples

Minimal Usage

from lmdeploy import pipeline

# Auto-detect everything: backend, chat template, precision
pipe = pipeline('internlm/internlm2_5-7b-chat')
responses = pipe(['Hello!', 'What is Python?'])
for r in responses:
    print(r.text)
pipe.close()

With Context Manager

from lmdeploy import pipeline, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(tp=2, session_len=8192)

with pipeline('internlm/internlm2_5-7b-chat',
              backend_config=backend_config) as pipe:
    responses = pipe(['Explain quantum computing'])
    print(responses[0].text)
# Resources automatically released

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_Pipeline_Initialization

Requires Environment

Uses Heuristic

Heuristic:InternLM_Lmdeploy_Backend_Selection_Strategy

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment