Implementation:InternLM Lmdeploy TurboMind
| Knowledge Sources | |
|---|---|
| Domains | Inference Engine, Public API |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Defines the main TurboMind public API class that serves as the top-level entry point for model loading, weight management, engine lifecycle, and inference request creation.
Description
The TurboMind class is the primary facade for the TurboMind inference engine. It uses the pimpl idiom with a private Impl struct and provides a complete lifecycle management interface.
Construction: Takes a model directory path, a configuration string, and an FFI context factory (a function returning std::shared_ptr<void> used for Python/FFI integration). The FFICtxFactory type alias is defined as std::function<std::shared_ptr<void>()>.
Weight management:
CreateWeights(int index): Allocates weight tensors for the specified device/model index.GetWeights(int index): Returns a TensorMap containing the model weights for inspection or modification.ProcessWeights(int index): Performs weight preprocessing (quantization, layout transformation, etc.).
Engine lifecycle:
CreateEngine(int index): Instantiates an Engine for the specified device index.Sleep(int index, int level): Puts an engine to sleep at the specified power level (for resource management).WakeUp(int index, const std::vector<std::string>& tags): Wakes up a sleeping engine with optional weight tags.
Inference:
CreateRequest(): Creates and returns a newModelRequestfor submitting inference work.GetScheduleMetrics(int index): Returns scheduling metrics for monitoring.
Utility:
is_dummy_node(): Returns true if this is a dummy (non-participating) node in a distributed setup.
Usage
The main entry point for applications using TurboMind. Typical flow: construct TurboMind with model path and config, create and process weights, create an engine, then use CreateRequest() to submit inference requests via the ModelRequest API.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/turbomind.h
- Lines: 1-46
Signature
class TurboMind {
public:
using FFICtxFactory = std::function<std::shared_ptr<void>()>;
~TurboMind();
TurboMind(std::string model_dir, std::string config, FFICtxFactory ffi_ctx_factory);
void CreateWeights(int index);
TensorMap GetWeights(int index);
void ProcessWeights(int index);
void CreateEngine(int index);
void Sleep(int index, int level);
void WakeUp(int index, const std::vector<std::string>& tags);
bool is_dummy_node() const noexcept;
std::shared_ptr<ScheduleMetrics> GetScheduleMetrics(int index);
std::unique_ptr<ModelRequest> CreateRequest();
private:
struct Impl;
std::unique_ptr<Impl> impl_;
};
Import
#include "src/turbomind/turbomind.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_dir | std::string | Yes | Path to the model directory containing weights and configuration |
| config | std::string | Yes | Configuration string (JSON or similar) for model and engine parameters |
| ffi_ctx_factory | FFICtxFactory | Yes | Factory function for creating FFI/Python context objects |
| index | int | Yes (most methods) | Device/engine index for multi-GPU setups |
Outputs
| Name | Type | Description |
|---|---|---|
| GetWeights() | TensorMap | Model weight tensors for the specified index |
| CreateRequest() | std::unique_ptr<ModelRequest> | A new ModelRequest instance for submitting inference |
| GetScheduleMetrics() | std::shared_ptr<ScheduleMetrics> | Scheduling performance metrics |
| is_dummy_node() | bool | Whether this is a non-participating node |
Usage Examples
// Initialize TurboMind
auto ctx_factory = []() { return std::make_shared<SomeContext>(); };
TurboMind tm("/path/to/model", config_json, ctx_factory);
// Load and process weights for device 0
tm.CreateWeights(0);
tm.ProcessWeights(0);
// Create and start the engine
tm.CreateEngine(0);
// Create a request and run inference
auto request = tm.CreateRequest();
auto output = request->Forward(input_param, callback);
// Monitor scheduling metrics
auto metrics = tm.GetScheduleMetrics(0);
// Power management
tm.Sleep(0, 1);
tm.WakeUp(0, {"default"});