Implementation:InternLM Lmdeploy Engine
| Knowledge Sources | |
|---|---|
| Domains | Inference Engine, Orchestration |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Top-level Engine class that owns and coordinates the TurboMind inference pipeline, managing scheduling, model execution, and request processing across devices.
Description
The Engine class is the primary entry point for running inference in TurboMind. It uses the pimpl (pointer to implementation) idiom via a private Impl struct to hide internal complexity. The engine is constructed with a data type, engine parameters, a language model instance, an execution context, a gateway for request routing, a device ID, a queue ID, and a phase count.
The class is move-only (non-copyable) and provides an explicit operator bool() to check whether the engine has been properly initialized. The Start() method launches the engine's processing loop, and GetScheduleMetrics() returns shared scheduling metrics for monitoring throughput and latency.
Usage
Instantiated by the TurboMind top-level class during initialization. Each engine instance is bound to a specific device and queue. After construction, call Start() to begin processing requests that arrive through the associated Gateway.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/engine/engine.h
- Lines: 1-47
Signature
class Engine {
public:
~Engine();
Engine();
Engine(Engine&&) noexcept;
Engine& operator=(Engine&&) noexcept;
explicit operator bool() const noexcept;
Engine(DataType dtype,
EngineParam param,
LanguageModel model,
Context& ctx,
Gateway& gateway,
int device_id,
int queue_id,
int phases);
void Start();
std::shared_ptr<ScheduleMetrics> GetScheduleMetrics();
private:
struct Impl;
std::unique_ptr<Impl> impl_;
};
Import
#include "src/turbomind/engine/engine.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dtype | DataType | Yes | Data type for model computation (e.g., float16, bfloat16) |
| param | EngineParam | Yes | Engine configuration parameters (batch size, token limits, etc.) |
| model | LanguageModel | Yes | The language model to execute |
| ctx | Context& | Yes | Shared execution context (CUDA streams, allocators) |
| gateway | Gateway& | Yes | Request routing gateway for submitting and retrieving requests |
| device_id | int | Yes | CUDA device index to run on |
| queue_id | int | Yes | Queue index within the gateway |
| phases | int | Yes | Number of pipeline phases for overlapping execution |
Outputs
| Name | Type | Description |
|---|---|---|
| ScheduleMetrics | std::shared_ptr<ScheduleMetrics> | Metrics about scheduling performance |
Usage Examples
// Construct and start an engine
Engine engine(dtype, param, std::move(model), ctx, gateway, device_id, queue_id, phases);
engine.Start();
// Retrieve scheduling metrics
auto metrics = engine.GetScheduleMetrics();