Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:InternLM Lmdeploy Backend Auto Selection

From Leeroopedia


Knowledge Sources
Domains LLM_Inference, Architecture_Detection
Last Updated 2026-02-07 15:00 GMT

Overview

An automatic detection mechanism that selects the optimal inference backend (TurboMind or PyTorch) based on model architecture, quantization format, and hardware constraints.

Description

Backend Auto Selection solves the problem of routing models to the correct inference engine without requiring users to know the internal capabilities of each backend. The decision logic considers:

  • Model architecture: TurboMind supports a curated list of architectures (LLaMA, InternLM, Qwen, Mistral, etc.); unsupported models fall back to PyTorch
  • Quantization format: AWQ/GPTQ models use TurboMind; SmoothQuant models require PyTorch
  • Hardware platform: Non-CUDA platforms (Ascend, Cambricon) must use PyTorch
  • Vision-language models: VLMs are detected via architecture class names and use VLAsyncEngine

The system reads the model's HuggingFace config to extract the architecture class, then looks up a mapping to determine backend support.

Usage

This happens automatically during pipeline initialization. Override it by explicitly passing a backend_config of the desired type (TurbomindEngineConfig or PytorchEngineConfig).

Theoretical Basis

Backend selection uses a Strategy Pattern with architecture-based dispatch:

# Abstract selection algorithm
def select_backend(model_config, user_config):
    arch = model_config.architectures[0]
    if user_config is TurbomindEngineConfig:
        return 'turbomind'
    if user_config is PytorchEngineConfig:
        return 'pytorch'
    if arch in TURBOMIND_SUPPORTED:
        return 'turbomind'
    return 'pytorch'  # fallback

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment