Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:InternLM Lmdeploy Pipeline Factory

From Leeroopedia


Knowledge Sources
Domains LLM_Inference, API_Design
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for creating inference pipelines from model paths provided by the LMDeploy library.

Description

The pipeline() factory function is the primary entry point for LMDeploy's Python API. It downloads or loads a model, auto-detects architecture, selects the optimal backend (TurboMind or PyTorch), and returns a ready-to-use Pipeline object wrapping an async inference engine.

Usage

Import this function when starting any LMDeploy inference workload. Pass a HuggingFace model ID or local path, and optionally a backend configuration. The function handles all initialization automatically.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/api.py
  • Lines: L15-74

Signature

def pipeline(model_path: str,
             model_name: Optional[str] = None,
             backend_config: Optional[Union[TurbomindEngineConfig,
                                            PytorchEngineConfig]] = None,
             chat_template_config: Optional[ChatTemplateConfig] = None,
             log_level: str = 'WARNING',
             max_log_len: int = None,
             speculative_config: Optional[SpeculativeConfig] = None,
             **kwargs) -> Pipeline:

Import

from lmdeploy import pipeline

I/O Contract

Inputs

Name Type Required Description
model_path str Yes HuggingFace model ID or local directory path
model_name Optional[str] No Override model name for chat template lookup
backend_config TurbomindEngineConfig or PytorchEngineConfig No Engine configuration (auto-detected if None)
chat_template_config Optional[ChatTemplateConfig] No Custom chat template configuration
log_level str No Logging level (default: 'WARNING')
max_log_len int No Max prompt characters in log output
speculative_config Optional[SpeculativeConfig] No Speculative decoding configuration

Outputs

Name Type Description
Pipeline Pipeline Initialized inference pipeline with async engine, ready for __call__()

Usage Examples

Minimal Usage

from lmdeploy import pipeline

# Auto-detect everything: backend, chat template, precision
pipe = pipeline('internlm/internlm2_5-7b-chat')
responses = pipe(['Hello!', 'What is Python?'])
for r in responses:
    print(r.text)
pipe.close()

With Context Manager

from lmdeploy import pipeline, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(tp=2, session_len=8192)

with pipeline('internlm/internlm2_5-7b-chat',
              backend_config=backend_config) as pipe:
    responses = pipe(['Explain quantum computing'])
    print(responses[0].text)
# Resources automatically released

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment