Implementation:InternLM Lmdeploy TurboMind

Knowledge Sources	InternLM_Lmdeploy
Domains	Inference Engine, Public API
Last Updated	2026-02-07 15:00 GMT

Overview

Defines the main TurboMind public API class that serves as the top-level entry point for model loading, weight management, engine lifecycle, and inference request creation.

Description

The TurboMind class is the primary facade for the TurboMind inference engine. It uses the pimpl idiom with a private Impl struct and provides a complete lifecycle management interface.

Construction: Takes a model directory path, a configuration string, and an FFI context factory (a function returning std::shared_ptr<void> used for Python/FFI integration). The FFICtxFactory type alias is defined as std::function<std::shared_ptr<void>()>.

Weight management:

CreateWeights(int index): Allocates weight tensors for the specified device/model index.
GetWeights(int index): Returns a TensorMap containing the model weights for inspection or modification.
ProcessWeights(int index): Performs weight preprocessing (quantization, layout transformation, etc.).

Engine lifecycle:

CreateEngine(int index): Instantiates an Engine for the specified device index.
Sleep(int index, int level): Puts an engine to sleep at the specified power level (for resource management).
WakeUp(int index, const std::vector<std::string>& tags): Wakes up a sleeping engine with optional weight tags.

Inference:

CreateRequest(): Creates and returns a new ModelRequest for submitting inference work.
GetScheduleMetrics(int index): Returns scheduling metrics for monitoring.

Utility:

is_dummy_node(): Returns true if this is a dummy (non-participating) node in a distributed setup.

Usage

The main entry point for applications using TurboMind. Typical flow: construct TurboMind with model path and config, create and process weights, create an engine, then use CreateRequest() to submit inference requests via the ModelRequest API.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/turbomind.h
Lines: 1-46

Signature

class TurboMind {
public:
    using FFICtxFactory = std::function<std::shared_ptr<void>()>;

    ~TurboMind();

    TurboMind(std::string model_dir, std::string config, FFICtxFactory ffi_ctx_factory);

    void CreateWeights(int index);

    TensorMap GetWeights(int index);

    void ProcessWeights(int index);

    void CreateEngine(int index);

    void Sleep(int index, int level);

    void WakeUp(int index, const std::vector<std::string>& tags);

    bool is_dummy_node() const noexcept;

    std::shared_ptr<ScheduleMetrics> GetScheduleMetrics(int index);

    std::unique_ptr<ModelRequest> CreateRequest();

private:
    struct Impl;
    std::unique_ptr<Impl> impl_;
};

Import

#include "src/turbomind/turbomind.h"

I/O Contract

Inputs

Name	Type	Required	Description
model_dir	std::string	Yes	Path to the model directory containing weights and configuration
config	std::string	Yes	Configuration string (JSON or similar) for model and engine parameters
ffi_ctx_factory	FFICtxFactory	Yes	Factory function for creating FFI/Python context objects
index	int	Yes (most methods)	Device/engine index for multi-GPU setups

Outputs

Name	Type	Description
GetWeights()	TensorMap	Model weight tensors for the specified index
CreateRequest()	std::unique_ptr<ModelRequest>	A new ModelRequest instance for submitting inference
GetScheduleMetrics()	std::shared_ptr<ScheduleMetrics>	Scheduling performance metrics
is_dummy_node()	bool	Whether this is a non-participating node

Usage Examples

// Initialize TurboMind
auto ctx_factory = []() { return std::make_shared<SomeContext>(); };
TurboMind tm("/path/to/model", config_json, ctx_factory);

// Load and process weights for device 0
tm.CreateWeights(0);
tm.ProcessWeights(0);

// Create and start the engine
tm.CreateEngine(0);

// Create a request and run inference
auto request = tm.CreateRequest();
auto output = request->Forward(input_param, callback);

// Monitor scheduling metrics
auto metrics = tm.GetScheduleMetrics(0);

// Power management
tm.Sleep(0, 1);
tm.WakeUp(0, {"default"});

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment