Environment:InternLM Lmdeploy Python Dependencies
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Python |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Python 3.10+ runtime with HuggingFace Transformers, accelerate, and serving framework dependencies for LMDeploy inference and quantization.
Description
This environment defines the Python-level software stack required to run LMDeploy. The dependencies span several categories: core ML libraries (PyTorch, Transformers), serving frameworks (FastAPI, uvicorn), quantization utilities (peft), communication (pyzmq, ray), and tokenization (sentencepiece, tiktoken). The package is installable via `pip install lmdeploy` with extras for `[all]`, `[lite]` (quantization), and `[serve]` (API server).
Usage
Use this environment for all Python-level LMDeploy operations: loading models, running pipelines, serving APIs, performing quantization. This is always required alongside the CUDA GPU runtime for CUDA deployments. For non-CUDA platforms (Ascend, MACA, Cambricon, ROCm), replace `requirements/runtime_cuda.txt` with the appropriate platform file.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | 3.10, 3.11, 3.12, or 3.13 | Defined in `setup.py` classifiers |
| OS | Linux (primary), Windows (limited) | Triton requires Linux x86_64 |
| pip | >= 21.0 | For PEP 517 builds |
Dependencies
Core ML Libraries
- `torch` >= 2.0.0, <= 2.8.0
- `torchvision` >= 0.15.0, <= 0.23.0
- `transformers` < 5.0.0
- `accelerate` >= 0.29.3
- `peft` <= 0.14.0 (LoRA adapter support)
- `safetensors` (efficient model weight loading)
- `einops` (tensor operations)
Serving Dependencies
- `fastapi` (API framework)
- `uvicorn` (ASGI server)
- `aiohttp` (async HTTP client)
- `openai` (OpenAI-compatible client)
- `pydantic` > 2.0.0 (data validation)
Tokenization
- `sentencepiece` (SentencePiece tokenizer)
- `tiktoken` (BPE tokenizer)
Infrastructure
- `triton` >= 3.0.0, <= 3.4.0 (Linux x86_64 only; JIT kernel compilation)
- `ray` (distributed execution and multi-node)
- `pyzmq` (inter-process communication)
- `xgrammar` (grammar-guided generation)
Optional Dependencies
- `flash_attn_interface` (FlashAttention-3 for SM90+ with CUDA >= 12.3)
- `flash_mla` (Multi-head Latent Attention for SM90+)
- `fast_hadamard_transform` (required for DeepSeek V3.2 models)
Credentials
No credentials required for package installation. See InternLM_Lmdeploy_CUDA_GPU_Runtime for model download credentials.
Quick Install
# Standard installation (CUDA)
pip install lmdeploy
# With all extras
pip install lmdeploy[all]
# Quantization only
pip install lmdeploy[lite]
# API serving
pip install lmdeploy[serve]
# For Ascend NPU
LMDEPLOY_TARGET_DEVICE=ascend pip install lmdeploy
# For ROCm (AMD GPU)
LMDEPLOY_TARGET_DEVICE=rocm pip install lmdeploy
Code Evidence
Target device selection from `setup.py:13-14`:
def get_target_device():
return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')
Platform-specific install from `setup.py:176`:
install_requires=parse_requirements(
f'requirements/runtime_{get_target_device()}.txt'
) + extra_deps,
Triton version validation from `lmdeploy/pytorch/check_env/triton.py:6-7`:
MAX_TRITON_VERSION = '3.4.0'
MIN_TRITON_VERSION = '3.0.0'
Device type validation from `lmdeploy/messages.py:432`:
assert self.device_type in ['cuda', 'ascend', 'maca', 'camb'], (
f'invalid device_type: {self.device_type}')
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: Please install fast_hadamard_transform package.` | Missing optional dependency for DeepSeek V3.2 | `pip install fast_hadamard_transform` |
| `ImportError: To use LlavaVLModel, please install llava` | Missing llava package for LLaVA VLM models | `pip install llava` |
| `Could not import transformers_modules used for remote code` | Missing remote code module | Add `--trust-remote-code` flag; ensure `transformers_modules` is available |
| Triton version mismatch errors | Triton outside 3.0.0-3.4.0 range | `pip install triton>=3.0.0,<=3.4.0` |
Compatibility Notes
- Triton: Only available on Linux x86_64. Not supported on ARM (`aarch64`) or Windows. Required for PyTorch backend CUDA kernels.
- peft: Pinned to <= 0.14.0 for LoRA adapter compatibility. Newer versions may cause issues.
- transformers: Must be < 5.0.0. The codebase uses internal APIs that may break with major version changes.
- Platform files: Each supported platform has its own requirements file: `runtime_cuda.txt`, `runtime_rocm.txt`, `runtime_ascend.txt`, `runtime_maca.txt`, `runtime_camb.txt`.
- DeepLink (dlinfer): Non-CUDA devices (Ascend, MACA, Cambricon) require the `dlinfer` framework (`dlinfer-ascend`, `dlinfer-maca`) for device abstraction.
Related Pages
- Implementation:InternLM_Lmdeploy_TurbomindEngineConfig
- Implementation:InternLM_Lmdeploy_PytorchEngineConfig
- Implementation:InternLM_Lmdeploy_Pipeline_Factory
- Implementation:InternLM_Lmdeploy_Serve_Api_Server
- Implementation:InternLM_Lmdeploy_Calibrate
- Implementation:InternLM_Lmdeploy_Load_Image
- Implementation:InternLM_Lmdeploy_BaseChatTemplate_Messages2prompt