Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:InternLM Lmdeploy Python Dependencies

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Python
Last Updated 2026-02-07 15:00 GMT

Overview

Python 3.10+ runtime with HuggingFace Transformers, accelerate, and serving framework dependencies for LMDeploy inference and quantization.

Description

This environment defines the Python-level software stack required to run LMDeploy. The dependencies span several categories: core ML libraries (PyTorch, Transformers), serving frameworks (FastAPI, uvicorn), quantization utilities (peft), communication (pyzmq, ray), and tokenization (sentencepiece, tiktoken). The package is installable via `pip install lmdeploy` with extras for `[all]`, `[lite]` (quantization), and `[serve]` (API server).

Usage

Use this environment for all Python-level LMDeploy operations: loading models, running pipelines, serving APIs, performing quantization. This is always required alongside the CUDA GPU runtime for CUDA deployments. For non-CUDA platforms (Ascend, MACA, Cambricon, ROCm), replace `requirements/runtime_cuda.txt` with the appropriate platform file.

System Requirements

Category Requirement Notes
Python 3.10, 3.11, 3.12, or 3.13 Defined in `setup.py` classifiers
OS Linux (primary), Windows (limited) Triton requires Linux x86_64
pip >= 21.0 For PEP 517 builds

Dependencies

Core ML Libraries

  • `torch` >= 2.0.0, <= 2.8.0
  • `torchvision` >= 0.15.0, <= 0.23.0
  • `transformers` < 5.0.0
  • `accelerate` >= 0.29.3
  • `peft` <= 0.14.0 (LoRA adapter support)
  • `safetensors` (efficient model weight loading)
  • `einops` (tensor operations)

Serving Dependencies

  • `fastapi` (API framework)
  • `uvicorn` (ASGI server)
  • `aiohttp` (async HTTP client)
  • `openai` (OpenAI-compatible client)
  • `pydantic` > 2.0.0 (data validation)

Tokenization

  • `sentencepiece` (SentencePiece tokenizer)
  • `tiktoken` (BPE tokenizer)

Infrastructure

  • `triton` >= 3.0.0, <= 3.4.0 (Linux x86_64 only; JIT kernel compilation)
  • `ray` (distributed execution and multi-node)
  • `pyzmq` (inter-process communication)
  • `xgrammar` (grammar-guided generation)

Optional Dependencies

  • `flash_attn_interface` (FlashAttention-3 for SM90+ with CUDA >= 12.3)
  • `flash_mla` (Multi-head Latent Attention for SM90+)
  • `fast_hadamard_transform` (required for DeepSeek V3.2 models)

Credentials

No credentials required for package installation. See InternLM_Lmdeploy_CUDA_GPU_Runtime for model download credentials.

Quick Install

# Standard installation (CUDA)
pip install lmdeploy

# With all extras
pip install lmdeploy[all]

# Quantization only
pip install lmdeploy[lite]

# API serving
pip install lmdeploy[serve]

# For Ascend NPU
LMDEPLOY_TARGET_DEVICE=ascend pip install lmdeploy

# For ROCm (AMD GPU)
LMDEPLOY_TARGET_DEVICE=rocm pip install lmdeploy

Code Evidence

Target device selection from `setup.py:13-14`:

def get_target_device():
    return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')

Platform-specific install from `setup.py:176`:

install_requires=parse_requirements(
    f'requirements/runtime_{get_target_device()}.txt'
) + extra_deps,

Triton version validation from `lmdeploy/pytorch/check_env/triton.py:6-7`:

MAX_TRITON_VERSION = '3.4.0'
MIN_TRITON_VERSION = '3.0.0'

Device type validation from `lmdeploy/messages.py:432`:

assert self.device_type in ['cuda', 'ascend', 'maca', 'camb'], (
    f'invalid device_type: {self.device_type}')

Common Errors

Error Message Cause Solution
`ImportError: Please install fast_hadamard_transform package.` Missing optional dependency for DeepSeek V3.2 `pip install fast_hadamard_transform`
`ImportError: To use LlavaVLModel, please install llava` Missing llava package for LLaVA VLM models `pip install llava`
`Could not import transformers_modules used for remote code` Missing remote code module Add `--trust-remote-code` flag; ensure `transformers_modules` is available
Triton version mismatch errors Triton outside 3.0.0-3.4.0 range `pip install triton>=3.0.0,<=3.4.0`

Compatibility Notes

  • Triton: Only available on Linux x86_64. Not supported on ARM (`aarch64`) or Windows. Required for PyTorch backend CUDA kernels.
  • peft: Pinned to <= 0.14.0 for LoRA adapter compatibility. Newer versions may cause issues.
  • transformers: Must be < 5.0.0. The codebase uses internal APIs that may break with major version changes.
  • Platform files: Each supported platform has its own requirements file: `runtime_cuda.txt`, `runtime_rocm.txt`, `runtime_ascend.txt`, `runtime_maca.txt`, `runtime_camb.txt`.
  • DeepLink (dlinfer): Non-CUDA devices (Ascend, MACA, Cambricon) require the `dlinfer` framework (`dlinfer-ascend`, `dlinfer-maca`) for device abstraction.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment