Implementation:InternLM Lmdeploy LanguageModel
| Knowledge Sources | |
|---|---|
| Domains | Language Model, Inference Engine |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Defines the LanguageModel class that encapsulates the neural network model (based on the LLaMA architecture) and provides a unified interface for running batch operations during inference.
Description
The LanguageModel class is the model abstraction layer in TurboMind. It wraps the underlying LLaMA-style transformer model and exposes a simple batch-operation interface via its Run() method.
The class uses the pimpl idiom with a private Impl struct. It is constructed with comprehensive model configuration: data type, model parameters (ModelParam), engine parameters (EngineParam), attention parameters (AttentionParam), mixture-of-experts parameters (MoeParam), an execution context, model weights (LlamaWeight), and the number of pipeline phases.
The class is move-only (not copyable) and provides an explicit operator bool() for initialization checks. It also exposes read-only accessors for the model and attention parameters.
The Run() method accepts a BatchOp, phase index, and TensorMap environment, integrating seamlessly with the engine's batch lifecycle (setup, prepare, forward, etc.). Internally, it coordinates the InputProcessor, OutputProcessor, and the core transformer forward pass.
Usage
Created during TurboMind initialization and passed to the Engine constructor. The Engine and ModelExecutor call Run() at each stage of the batch lifecycle. Parameter accessors are used by other components that need model configuration details.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/models/language_model.h
- Lines: 1-46
Signature
class LanguageModel {
public:
~LanguageModel();
LanguageModel() = default;
LanguageModel(LanguageModel&&) noexcept;
explicit operator bool() const noexcept;
LanguageModel(DataType dtype,
const ModelParam& model,
const EngineParam& engine,
const AttentionParam& attn,
const MoeParam& moe,
const Context& ctx,
const LlamaWeight& weights,
int phases);
void Run(BatchOp op, int phase, TensorMap& env);
const ModelParam& model_param() const noexcept;
const AttentionParam& attn_param() const noexcept;
private:
struct Impl;
std::unique_ptr<Impl> impl_;
};
Import
#include "src/turbomind/models/language_model.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dtype | DataType | Yes | Computation data type (float16, bfloat16) |
| model | const ModelParam& | Yes | Model architecture parameters (layers, hidden units, heads, etc.) |
| engine | const EngineParam& | Yes | Engine configuration (batch sizes, token limits) |
| attn | const AttentionParam& | Yes | Attention configuration (head dimensions, RoPE, etc.) |
| moe | const MoeParam& | Yes | Mixture-of-experts configuration |
| ctx | const Context& | Yes | Execution context with CUDA resources |
| weights | const LlamaWeight& | Yes | Pre-loaded model weights |
| phases | int | Yes | Number of pipeline phases |
| op (Run) | BatchOp | Yes | Batch operation to execute |
| phase (Run) | int | Yes | Pipeline phase index |
| env (Run) | TensorMap& | Yes | Environment tensor map |
Outputs
| Name | Type | Description |
|---|---|---|
| model_param() | const ModelParam& | Model architecture parameters |
| attn_param() | const AttentionParam& | Attention configuration parameters |
| env (modified via Run) | TensorMap& | Modified environment with model outputs (hidden states, logits, etc.) |
Usage Examples
// Construct the language model
LanguageModel model(dtype, model_param, engine_param, attn_param, moe_param, ctx, weights, phases);
// Check initialization
if (model) {
// Run batch operations
model.Run(BatchOp::kSetup, phase, env);
model.Run(BatchOp::kPrepare, phase, env);
model.Run(BatchOp::kForward, phase, env);
}
// Access model parameters
const auto& mp = model.model_param();
int hidden_units = mp.hidden_units;