Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:InternLM Lmdeploy Pipeline Factory Pytorch

From Leeroopedia


Knowledge Sources
Domains LLM_Inference, Quantization
Last Updated 2026-02-07 15:00 GMT

Overview

Concrete tool for creating inference pipelines for SmoothQuant (W8A8) quantized models using the PyTorch backend provided by the LMDeploy library.

Description

This is the pipeline() factory function used specifically for SmoothQuant W8A8 model inference. SmoothQuant models must use PytorchEngineConfig because the TurboMind backend does not support the SmoothQuant weight format. The model's quantization_config is auto-detected.

Usage

Use this after quantizing a model with smooth_quant. Pass PytorchEngineConfig (not TurbomindEngineConfig) to ensure the correct backend is selected.

Code Reference

Source Location

  • Repository: lmdeploy
  • File: lmdeploy/api.py L15-74, lmdeploy/messages.py L297-442

Signature

# Same pipeline() factory with PyTorch backend for W8A8
pipe = pipeline(
    model_path,
    backend_config=PytorchEngineConfig(tp=N)
)

Import

from lmdeploy import pipeline, PytorchEngineConfig

I/O Contract

Inputs

Name Type Required Description
model_path str Yes Path to SmoothQuant-quantized model
backend_config PytorchEngineConfig Yes PyTorch backend config (required for SmoothQuant)

Outputs

Name Type Description
Pipeline Pipeline Inference pipeline with W8A8 kernels active

Usage Examples

from lmdeploy import pipeline, PytorchEngineConfig

# Load SmoothQuant model with PyTorch backend
backend_config = PytorchEngineConfig(
    tp=2,
    session_len=4096,
    cache_max_entry_count=0.8
)

pipe = pipeline('./internlm2_5-7b-w8a8', backend_config=backend_config)
response = pipe('What is machine learning?')
print(response.text)
pipe.close()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment