Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy TurboMind

From Leeroopedia
Revision as of 15:16, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_TurboMind.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Inference Engine, Public API
Last Updated 2026-02-07 15:00 GMT

Overview

Defines the main TurboMind public API class that serves as the top-level entry point for model loading, weight management, engine lifecycle, and inference request creation.

Description

The TurboMind class is the primary facade for the TurboMind inference engine. It uses the pimpl idiom with a private Impl struct and provides a complete lifecycle management interface.

Construction: Takes a model directory path, a configuration string, and an FFI context factory (a function returning std::shared_ptr<void> used for Python/FFI integration). The FFICtxFactory type alias is defined as std::function<std::shared_ptr<void>()>.

Weight management:

  • CreateWeights(int index): Allocates weight tensors for the specified device/model index.
  • GetWeights(int index): Returns a TensorMap containing the model weights for inspection or modification.
  • ProcessWeights(int index): Performs weight preprocessing (quantization, layout transformation, etc.).

Engine lifecycle:

  • CreateEngine(int index): Instantiates an Engine for the specified device index.
  • Sleep(int index, int level): Puts an engine to sleep at the specified power level (for resource management).
  • WakeUp(int index, const std::vector<std::string>& tags): Wakes up a sleeping engine with optional weight tags.

Inference:

  • CreateRequest(): Creates and returns a new ModelRequest for submitting inference work.
  • GetScheduleMetrics(int index): Returns scheduling metrics for monitoring.

Utility:

  • is_dummy_node(): Returns true if this is a dummy (non-participating) node in a distributed setup.

Usage

The main entry point for applications using TurboMind. Typical flow: construct TurboMind with model path and config, create and process weights, create an engine, then use CreateRequest() to submit inference requests via the ModelRequest API.

Code Reference

Source Location

Signature

class TurboMind {
public:
    using FFICtxFactory = std::function<std::shared_ptr<void>()>;

    ~TurboMind();

    TurboMind(std::string model_dir, std::string config, FFICtxFactory ffi_ctx_factory);

    void CreateWeights(int index);

    TensorMap GetWeights(int index);

    void ProcessWeights(int index);

    void CreateEngine(int index);

    void Sleep(int index, int level);

    void WakeUp(int index, const std::vector<std::string>& tags);

    bool is_dummy_node() const noexcept;

    std::shared_ptr<ScheduleMetrics> GetScheduleMetrics(int index);

    std::unique_ptr<ModelRequest> CreateRequest();

private:
    struct Impl;
    std::unique_ptr<Impl> impl_;
};

Import

#include "src/turbomind/turbomind.h"

I/O Contract

Inputs

Name Type Required Description
model_dir std::string Yes Path to the model directory containing weights and configuration
config std::string Yes Configuration string (JSON or similar) for model and engine parameters
ffi_ctx_factory FFICtxFactory Yes Factory function for creating FFI/Python context objects
index int Yes (most methods) Device/engine index for multi-GPU setups

Outputs

Name Type Description
GetWeights() TensorMap Model weight tensors for the specified index
CreateRequest() std::unique_ptr<ModelRequest> A new ModelRequest instance for submitting inference
GetScheduleMetrics() std::shared_ptr<ScheduleMetrics> Scheduling performance metrics
is_dummy_node() bool Whether this is a non-participating node

Usage Examples

// Initialize TurboMind
auto ctx_factory = []() { return std::make_shared<SomeContext>(); };
TurboMind tm("/path/to/model", config_json, ctx_factory);

// Load and process weights for device 0
tm.CreateWeights(0);
tm.ProcessWeights(0);

// Create and start the engine
tm.CreateEngine(0);

// Create a request and run inference
auto request = tm.CreateRequest();
auto output = request->Forward(input_param, callback);

// Monitor scheduling metrics
auto metrics = tm.GetScheduleMetrics(0);

// Power management
tm.Sleep(0, 1);
tm.WakeUp(0, {"default"});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment