Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy LanguageModel

From Leeroopedia


Knowledge Sources
Domains Language Model, Inference Engine
Last Updated 2026-02-07 15:00 GMT

Overview

Defines the LanguageModel class that encapsulates the neural network model (based on the LLaMA architecture) and provides a unified interface for running batch operations during inference.

Description

The LanguageModel class is the model abstraction layer in TurboMind. It wraps the underlying LLaMA-style transformer model and exposes a simple batch-operation interface via its Run() method.

The class uses the pimpl idiom with a private Impl struct. It is constructed with comprehensive model configuration: data type, model parameters (ModelParam), engine parameters (EngineParam), attention parameters (AttentionParam), mixture-of-experts parameters (MoeParam), an execution context, model weights (LlamaWeight), and the number of pipeline phases.

The class is move-only (not copyable) and provides an explicit operator bool() for initialization checks. It also exposes read-only accessors for the model and attention parameters.

The Run() method accepts a BatchOp, phase index, and TensorMap environment, integrating seamlessly with the engine's batch lifecycle (setup, prepare, forward, etc.). Internally, it coordinates the InputProcessor, OutputProcessor, and the core transformer forward pass.

Usage

Created during TurboMind initialization and passed to the Engine constructor. The Engine and ModelExecutor call Run() at each stage of the batch lifecycle. Parameter accessors are used by other components that need model configuration details.

Code Reference

Source Location

Signature

class LanguageModel {
public:
    ~LanguageModel();
    LanguageModel() = default;
    LanguageModel(LanguageModel&&) noexcept;

    explicit operator bool() const noexcept;

    LanguageModel(DataType              dtype,
                  const ModelParam&     model,
                  const EngineParam&    engine,
                  const AttentionParam& attn,
                  const MoeParam&       moe,
                  const Context&        ctx,
                  const LlamaWeight&    weights,
                  int                   phases);

    void Run(BatchOp op, int phase, TensorMap& env);

    const ModelParam&     model_param() const noexcept;
    const AttentionParam& attn_param() const noexcept;

private:
    struct Impl;
    std::unique_ptr<Impl> impl_;
};

Import

#include "src/turbomind/models/language_model.h"

I/O Contract

Inputs

Name Type Required Description
dtype DataType Yes Computation data type (float16, bfloat16)
model const ModelParam& Yes Model architecture parameters (layers, hidden units, heads, etc.)
engine const EngineParam& Yes Engine configuration (batch sizes, token limits)
attn const AttentionParam& Yes Attention configuration (head dimensions, RoPE, etc.)
moe const MoeParam& Yes Mixture-of-experts configuration
ctx const Context& Yes Execution context with CUDA resources
weights const LlamaWeight& Yes Pre-loaded model weights
phases int Yes Number of pipeline phases
op (Run) BatchOp Yes Batch operation to execute
phase (Run) int Yes Pipeline phase index
env (Run) TensorMap& Yes Environment tensor map

Outputs

Name Type Description
model_param() const ModelParam& Model architecture parameters
attn_param() const AttentionParam& Attention configuration parameters
env (modified via Run) TensorMap& Modified environment with model outputs (hidden states, logits, etc.)

Usage Examples

// Construct the language model
LanguageModel model(dtype, model_param, engine_param, attn_param, moe_param, ctx, weights, phases);

// Check initialization
if (model) {
    // Run batch operations
    model.Run(BatchOp::kSetup, phase, env);
    model.Run(BatchOp::kPrepare, phase, env);
    model.Run(BatchOp::kForward, phase, env);
}

// Access model parameters
const auto& mp = model.model_param();
int hidden_units = mp.hidden_units;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment