Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:InternLM Lmdeploy Pytorch Engine Configuration

From Leeroopedia


Knowledge Sources
Domains LLM_Inference, Configuration
Last Updated 2026-02-07 15:00 GMT

Overview

A configuration pattern that parameterizes the PyTorch inference backend with support for data parallelism, expert parallelism, LoRA adapters, and multi-platform deployment.

Description

PyTorch Engine Configuration extends the engine configuration principle for the PyTorch-based inference backend. It supports features not available in TurboMind:

  • Data Parallelism (dp) for throughput scaling across GPU groups
  • Expert Parallelism (ep) for Mixture-of-Experts models
  • LoRA adapter serving (adapters) for serving multiple fine-tuned variants
  • Multi-platform support (device_type: cuda, ascend, maca, camb) for non-NVIDIA hardware
  • Disaggregated serving (role: Hybrid, Prefill, Decode) for prefill-decode separation

This is the required backend for SmoothQuant (W8A8) quantized models and models with architectures not yet supported by TurboMind.

Usage

Use this configuration when deploying models on non-NVIDIA hardware, when using SmoothQuant quantization, when serving LoRA adapters, or when the model architecture is only supported by the PyTorch backend. Also required for data-parallel deployments.

Theoretical Basis

The PyTorch backend configuration extends the base engine configuration with additional parallelism dimensions:

  • Tensor Parallelism (TP): Splits individual layers across GPUs
  • Data Parallelism (DP): Replicates model across GPU groups for throughput
  • Expert Parallelism (EP): Distributes MoE experts across GPUs

Pseudo-code:

# Abstract parallelism strategy
if model.is_moe:
    config = PytorchConfig(tp=2, dp=2, ep=4)  # 16 GPUs total
elif need_lora:
    config = PytorchConfig(tp=N, adapters={"adapter1": "/path"})
elif target_device != "cuda":
    config = PytorchConfig(device_type=target_device)

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment