Implementation:InternLM Lmdeploy DLPack
| Knowledge Sources | |
|---|---|
| Domains | Tensor Exchange, Interoperability |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Provides the DLPack v1.0 header defining the standard C-level tensor data structures (DLTensor, DLManagedTensor, DLManagedTensorVersioned) for zero-copy tensor exchange between frameworks.
Description
This is the DLPack specification header (version 1.0), a widely adopted standard for in-memory tensor exchange between deep learning frameworks (PyTorch, TensorFlow, TVM, CuPy, etc.) without data copying.
The header defines several core types:
DLPackVersion: Contains major and minor version numbers. Consumers must check the major version matches before accessing tensor fields.
DLDeviceType: Enum of supported device types including kDLCPU (1), kDLCUDA (2), kDLCUDAHost (3, pinned memory), kDLROCM (10), kDLOneAPI (14), kDLWebGPU (15), and others.
DLDevice: A (device_type, device_id) pair identifying where tensor data resides.
DLDataTypeCode: Enum of element types: kDLInt, kDLUInt, kDLFloat, kDLBfloat, kDLComplex, kDLBool, kDLOpaqueHandle.
DLDataType: Compact 4-byte struct with type code, bit width, and lane count for vectorized types.
DLTensor: The core tensor descriptor with a data pointer, device, number of dimensions, data type, shape array, optional strides array, and byte offset.
DLManagedTensor (legacy, pre-v0.8): Wraps a DLTensor with a manager context and deleter callback for lifetime management.
DLManagedTensorVersioned (current standard): The versioned replacement adding a DLPackVersion, flags field (including read-only bitmask), manager context, and deleter.
In TurboMind, DLPack is used primarily by the guided decoding module to interface with xgrammar, which uses DLTensor for bitmask exchange.
Usage
Included where tensor data needs to be exchanged with external libraries (xgrammar) using the DLPack protocol. The DLTensor struct is constructed inline to wrap existing buffer pointers without copying data.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/python/dlpack.h
- Lines: 1-322
Signature
typedef struct {
uint32_t major;
uint32_t minor;
} DLPackVersion;
typedef enum: int32_t {
kDLCPU = 1,
kDLCUDA = 2,
kDLCUDAHost = 3,
// ... other device types
} DLDeviceType;
typedef struct {
DLDeviceType device_type;
int32_t device_id;
} DLDevice;
typedef struct {
uint8_t code;
uint8_t bits;
uint16_t lanes;
} DLDataType;
typedef struct {
void* data;
DLDevice device;
int32_t ndim;
DLDataType dtype;
int64_t* shape;
int64_t* strides;
uint64_t byte_offset;
} DLTensor;
typedef struct DLManagedTensor {
DLTensor dl_tensor;
void* manager_ctx;
void (*deleter)(struct DLManagedTensor* self);
} DLManagedTensor;
struct DLManagedTensorVersioned {
DLPackVersion version;
void* manager_ctx;
void (*deleter)(struct DLManagedTensorVersioned* self);
uint64_t flags;
DLTensor dl_tensor;
};
Import
#include "src/turbomind/python/dlpack.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | void* | Yes | Pointer to tensor data (device or host memory) |
| device | DLDevice | Yes | Device type and ID where data resides |
| ndim | int32_t | Yes | Number of tensor dimensions |
| dtype | DLDataType | Yes | Element data type (code, bits, lanes) |
| shape | int64_t* | Yes | Array of dimension sizes |
| strides | int64_t* | No | Optional strides (NULL for compact row-major) |
| byte_offset | uint64_t | No | Byte offset to data start (default 0) |
Outputs
| Name | Type | Description |
|---|---|---|
| DLTensor | struct | Self-describing tensor descriptor for zero-copy exchange |
Usage Examples
// Construct a DLTensor wrapping an existing buffer (as done in guided_decoding.cc)
DLTensor dlbitmask{
bitmask_buf_.data(), // data pointer
DLDevice{kDLCPU, 0}, // CPU device
bitmask_buf_.ndim(), // number of dimensions
xgrammar::GetBitmaskDLType(), // data type
(int64_t*)bitmask_buf_.shape().data(), // shape array
nullptr, // strides (compact)
0 // byte offset
};