Implementation:InternLM Lmdeploy LanguageModel

Knowledge Sources	InternLM_Lmdeploy
Domains	Language Model, Inference Engine
Last Updated	2026-02-07 15:00 GMT

Overview

Defines the LanguageModel class that encapsulates the neural network model (based on the LLaMA architecture) and provides a unified interface for running batch operations during inference.

Description

The LanguageModel class is the model abstraction layer in TurboMind. It wraps the underlying LLaMA-style transformer model and exposes a simple batch-operation interface via its Run() method.

The class uses the pimpl idiom with a private Impl struct. It is constructed with comprehensive model configuration: data type, model parameters (ModelParam), engine parameters (EngineParam), attention parameters (AttentionParam), mixture-of-experts parameters (MoeParam), an execution context, model weights (LlamaWeight), and the number of pipeline phases.

The class is move-only (not copyable) and provides an explicit operator bool() for initialization checks. It also exposes read-only accessors for the model and attention parameters.

The Run() method accepts a BatchOp, phase index, and TensorMap environment, integrating seamlessly with the engine's batch lifecycle (setup, prepare, forward, etc.). Internally, it coordinates the InputProcessor, OutputProcessor, and the core transformer forward pass.

Usage

Created during TurboMind initialization and passed to the Engine constructor. The Engine and ModelExecutor call Run() at each stage of the batch lifecycle. Parameter accessors are used by other components that need model configuration details.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/models/language_model.h
Lines: 1-46

Signature

class LanguageModel {
public:
    ~LanguageModel();
    LanguageModel() = default;
    LanguageModel(LanguageModel&&) noexcept;

    explicit operator bool() const noexcept;

    LanguageModel(DataType              dtype,
                  const ModelParam&     model,
                  const EngineParam&    engine,
                  const AttentionParam& attn,
                  const MoeParam&       moe,
                  const Context&        ctx,
                  const LlamaWeight&    weights,
                  int                   phases);

    void Run(BatchOp op, int phase, TensorMap& env);

    const ModelParam&     model_param() const noexcept;
    const AttentionParam& attn_param() const noexcept;

private:
    struct Impl;
    std::unique_ptr<Impl> impl_;
};

Import

#include "src/turbomind/models/language_model.h"

I/O Contract

Inputs

Name	Type	Required	Description
dtype	DataType	Yes	Computation data type (float16, bfloat16)
model	const ModelParam&	Yes	Model architecture parameters (layers, hidden units, heads, etc.)
engine	const EngineParam&	Yes	Engine configuration (batch sizes, token limits)
attn	const AttentionParam&	Yes	Attention configuration (head dimensions, RoPE, etc.)
moe	const MoeParam&	Yes	Mixture-of-experts configuration
ctx	const Context&	Yes	Execution context with CUDA resources
weights	const LlamaWeight&	Yes	Pre-loaded model weights
phases	int	Yes	Number of pipeline phases
op (Run)	BatchOp	Yes	Batch operation to execute
phase (Run)	int	Yes	Pipeline phase index
env (Run)	TensorMap&	Yes	Environment tensor map

Outputs

Name	Type	Description
model_param()	const ModelParam&	Model architecture parameters
attn_param()	const AttentionParam&	Attention configuration parameters
env (modified via Run)	TensorMap&	Modified environment with model outputs (hidden states, logits, etc.)

Usage Examples

// Construct the language model
LanguageModel model(dtype, model_param, engine_param, attn_param, moe_param, ctx, weights, phases);

// Check initialization
if (model) {
    // Run batch operations
    model.Run(BatchOp::kSetup, phase, env);
    model.Run(BatchOp::kPrepare, phase, env);
    model.Run(BatchOp::kForward, phase, env);
}

// Access model parameters
const auto& mp = model.model_param();
int hidden_units = mp.hidden_units;

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment