Implementation:InternLM Lmdeploy Pipeline Factory Pytorch
Appearance
| Knowledge Sources | |
|---|---|
| Domains | LLM_Inference, Quantization |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for creating inference pipelines for SmoothQuant (W8A8) quantized models using the PyTorch backend provided by the LMDeploy library.
Description
This is the pipeline() factory function used specifically for SmoothQuant W8A8 model inference. SmoothQuant models must use PytorchEngineConfig because the TurboMind backend does not support the SmoothQuant weight format. The model's quantization_config is auto-detected.
Usage
Use this after quantizing a model with smooth_quant. Pass PytorchEngineConfig (not TurbomindEngineConfig) to ensure the correct backend is selected.
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/api.py L15-74, lmdeploy/messages.py L297-442
Signature
# Same pipeline() factory with PyTorch backend for W8A8
pipe = pipeline(
model_path,
backend_config=PytorchEngineConfig(tp=N)
)
Import
from lmdeploy import pipeline, PytorchEngineConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | Path to SmoothQuant-quantized model |
| backend_config | PytorchEngineConfig | Yes | PyTorch backend config (required for SmoothQuant) |
Outputs
| Name | Type | Description |
|---|---|---|
| Pipeline | Pipeline | Inference pipeline with W8A8 kernels active |
Usage Examples
from lmdeploy import pipeline, PytorchEngineConfig
# Load SmoothQuant model with PyTorch backend
backend_config = PytorchEngineConfig(
tp=2,
session_len=4096,
cache_max_entry_count=0.8
)
pipe = pipeline('./internlm2_5-7b-w8a8', backend_config=backend_config)
response = pipe('What is machine learning?')
print(response.text)
pipe.close()
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment