Implementation:InternLM Lmdeploy Pipeline Factory
Appearance
| Knowledge Sources | |
|---|---|
| Domains | LLM_Inference, API_Design |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for creating inference pipelines from model paths provided by the LMDeploy library.
Description
The pipeline() factory function is the primary entry point for LMDeploy's Python API. It downloads or loads a model, auto-detects architecture, selects the optimal backend (TurboMind or PyTorch), and returns a ready-to-use Pipeline object wrapping an async inference engine.
Usage
Import this function when starting any LMDeploy inference workload. Pass a HuggingFace model ID or local path, and optionally a backend configuration. The function handles all initialization automatically.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/api.py
- Lines: L15-74
Signature
def pipeline(model_path: str,
model_name: Optional[str] = None,
backend_config: Optional[Union[TurbomindEngineConfig,
PytorchEngineConfig]] = None,
chat_template_config: Optional[ChatTemplateConfig] = None,
log_level: str = 'WARNING',
max_log_len: int = None,
speculative_config: Optional[SpeculativeConfig] = None,
**kwargs) -> Pipeline:
Import
from lmdeploy import pipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | HuggingFace model ID or local directory path |
| model_name | Optional[str] | No | Override model name for chat template lookup |
| backend_config | TurbomindEngineConfig or PytorchEngineConfig | No | Engine configuration (auto-detected if None) |
| chat_template_config | Optional[ChatTemplateConfig] | No | Custom chat template configuration |
| log_level | str | No | Logging level (default: 'WARNING') |
| max_log_len | int | No | Max prompt characters in log output |
| speculative_config | Optional[SpeculativeConfig] | No | Speculative decoding configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| Pipeline | Pipeline | Initialized inference pipeline with async engine, ready for __call__() |
Usage Examples
Minimal Usage
from lmdeploy import pipeline
# Auto-detect everything: backend, chat template, precision
pipe = pipeline('internlm/internlm2_5-7b-chat')
responses = pipe(['Hello!', 'What is Python?'])
for r in responses:
print(r.text)
pipe.close()
With Context Manager
from lmdeploy import pipeline, TurbomindEngineConfig
backend_config = TurbomindEngineConfig(tp=2, session_len=8192)
with pipeline('internlm/internlm2_5-7b-chat',
backend_config=backend_config) as pipe:
responses = pipe(['Explain quantum computing'])
print(responses[0].text)
# Resources automatically released
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment