Environment:InternLM Lmdeploy Python Dependencies

Knowledge Sources	LMDeploy PyPI lmdeploy
Domains	Infrastructure, Python
Last Updated	2026-02-07 15:00 GMT

Overview

Python 3.10+ runtime with HuggingFace Transformers, accelerate, and serving framework dependencies for LMDeploy inference and quantization.

Description

This environment defines the Python-level software stack required to run LMDeploy. The dependencies span several categories: core ML libraries (PyTorch, Transformers), serving frameworks (FastAPI, uvicorn), quantization utilities (peft), communication (pyzmq, ray), and tokenization (sentencepiece, tiktoken). The package is installable via `pip install lmdeploy` with extras for `[all]`, `[lite]` (quantization), and `[serve]` (API server).

Usage

Use this environment for all Python-level LMDeploy operations: loading models, running pipelines, serving APIs, performing quantization. This is always required alongside the CUDA GPU runtime for CUDA deployments. For non-CUDA platforms (Ascend, MACA, Cambricon, ROCm), replace `requirements/runtime_cuda.txt` with the appropriate platform file.

System Requirements

Category	Requirement	Notes
Python	3.10, 3.11, 3.12, or 3.13	Defined in `setup.py` classifiers
OS	Linux (primary), Windows (limited)	Triton requires Linux x86_64
pip	>= 21.0	For PEP 517 builds

Dependencies

Core ML Libraries

`torch` >= 2.0.0, <= 2.8.0
`torchvision` >= 0.15.0, <= 0.23.0
`transformers` < 5.0.0
`accelerate` >= 0.29.3
`peft` <= 0.14.0 (LoRA adapter support)
`safetensors` (efficient model weight loading)
`einops` (tensor operations)

Serving Dependencies

`fastapi` (API framework)
`uvicorn` (ASGI server)
`aiohttp` (async HTTP client)
`openai` (OpenAI-compatible client)
`pydantic` > 2.0.0 (data validation)

Tokenization

`sentencepiece` (SentencePiece tokenizer)
`tiktoken` (BPE tokenizer)

Infrastructure

`triton` >= 3.0.0, <= 3.4.0 (Linux x86_64 only; JIT kernel compilation)
`ray` (distributed execution and multi-node)
`pyzmq` (inter-process communication)
`xgrammar` (grammar-guided generation)

Optional Dependencies

`flash_attn_interface` (FlashAttention-3 for SM90+ with CUDA >= 12.3)
`flash_mla` (Multi-head Latent Attention for SM90+)
`fast_hadamard_transform` (required for DeepSeek V3.2 models)

Credentials

No credentials required for package installation. See InternLM_Lmdeploy_CUDA_GPU_Runtime for model download credentials.

Quick Install

# Standard installation (CUDA)
pip install lmdeploy

# With all extras
pip install lmdeploy[all]

# Quantization only
pip install lmdeploy[lite]

# API serving
pip install lmdeploy[serve]

# For Ascend NPU
LMDEPLOY_TARGET_DEVICE=ascend pip install lmdeploy

# For ROCm (AMD GPU)
LMDEPLOY_TARGET_DEVICE=rocm pip install lmdeploy

Code Evidence

Target device selection from `setup.py:13-14`:

def get_target_device():
    return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')

Platform-specific install from `setup.py:176`:

install_requires=parse_requirements(
    f'requirements/runtime_{get_target_device()}.txt'
) + extra_deps,

Triton version validation from `lmdeploy/pytorch/check_env/triton.py:6-7`:

MAX_TRITON_VERSION = '3.4.0'
MIN_TRITON_VERSION = '3.0.0'

Device type validation from `lmdeploy/messages.py:432`:

assert self.device_type in ['cuda', 'ascend', 'maca', 'camb'], (
    f'invalid device_type: {self.device_type}')

Common Errors

Error Message	Cause	Solution
`ImportError: Please install fast_hadamard_transform package.`	Missing optional dependency for DeepSeek V3.2	`pip install fast_hadamard_transform`
`ImportError: To use LlavaVLModel, please install llava`	Missing llava package for LLaVA VLM models	`pip install llava`
`Could not import transformers_modules used for remote code`	Missing remote code module	Add `--trust-remote-code` flag; ensure `transformers_modules` is available
Triton version mismatch errors	Triton outside 3.0.0-3.4.0 range	`pip install triton>=3.0.0,<=3.4.0`

Compatibility Notes

Triton: Only available on Linux x86_64. Not supported on ARM (`aarch64`) or Windows. Required for PyTorch backend CUDA kernels.
peft: Pinned to <= 0.14.0 for LoRA adapter compatibility. Newer versions may cause issues.
transformers: Must be < 5.0.0. The codebase uses internal APIs that may break with major version changes.
Platform files: Each supported platform has its own requirements file: `runtime_cuda.txt`, `runtime_rocm.txt`, `runtime_ascend.txt`, `runtime_maca.txt`, `runtime_camb.txt`.
DeepLink (dlinfer): Non-CUDA devices (Ascend, MACA, Cambricon) require the `dlinfer` framework (`dlinfer-ascend`, `dlinfer-maca`) for device abstraction.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment