Implementation:InternLM Lmdeploy DLPack

Knowledge Sources	InternLM_Lmdeploy
Domains	Tensor Exchange, Interoperability
Last Updated	2026-02-07 15:00 GMT

Overview

Provides the DLPack v1.0 header defining the standard C-level tensor data structures (DLTensor, DLManagedTensor, DLManagedTensorVersioned) for zero-copy tensor exchange between frameworks.

Description

This is the DLPack specification header (version 1.0), a widely adopted standard for in-memory tensor exchange between deep learning frameworks (PyTorch, TensorFlow, TVM, CuPy, etc.) without data copying.

The header defines several core types:

DLPackVersion: Contains major and minor version numbers. Consumers must check the major version matches before accessing tensor fields.

DLDeviceType: Enum of supported device types including kDLCPU (1), kDLCUDA (2), kDLCUDAHost (3, pinned memory), kDLROCM (10), kDLOneAPI (14), kDLWebGPU (15), and others.

DLDevice: A (device_type, device_id) pair identifying where tensor data resides.

DLDataTypeCode: Enum of element types: kDLInt, kDLUInt, kDLFloat, kDLBfloat, kDLComplex, kDLBool, kDLOpaqueHandle.

DLDataType: Compact 4-byte struct with type code, bit width, and lane count for vectorized types.

DLTensor: The core tensor descriptor with a data pointer, device, number of dimensions, data type, shape array, optional strides array, and byte offset.

DLManagedTensor (legacy, pre-v0.8): Wraps a DLTensor with a manager context and deleter callback for lifetime management.

DLManagedTensorVersioned (current standard): The versioned replacement adding a DLPackVersion, flags field (including read-only bitmask), manager context, and deleter.

In TurboMind, DLPack is used primarily by the guided decoding module to interface with xgrammar, which uses DLTensor for bitmask exchange.

Usage

Included where tensor data needs to be exchanged with external libraries (xgrammar) using the DLPack protocol. The DLTensor struct is constructed inline to wrap existing buffer pointers without copying data.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/python/dlpack.h
Lines: 1-322

Signature

typedef struct {
    uint32_t major;
    uint32_t minor;
} DLPackVersion;

typedef enum: int32_t {
    kDLCPU = 1,
    kDLCUDA = 2,
    kDLCUDAHost = 3,
    // ... other device types
} DLDeviceType;

typedef struct {
    DLDeviceType device_type;
    int32_t device_id;
} DLDevice;

typedef struct {
    uint8_t code;
    uint8_t bits;
    uint16_t lanes;
} DLDataType;

typedef struct {
    void* data;
    DLDevice device;
    int32_t ndim;
    DLDataType dtype;
    int64_t* shape;
    int64_t* strides;
    uint64_t byte_offset;
} DLTensor;

typedef struct DLManagedTensor {
    DLTensor dl_tensor;
    void* manager_ctx;
    void (*deleter)(struct DLManagedTensor* self);
} DLManagedTensor;

struct DLManagedTensorVersioned {
    DLPackVersion version;
    void* manager_ctx;
    void (*deleter)(struct DLManagedTensorVersioned* self);
    uint64_t flags;
    DLTensor dl_tensor;
};

Import

#include "src/turbomind/python/dlpack.h"

I/O Contract

Inputs

Name	Type	Required	Description
data	void*	Yes	Pointer to tensor data (device or host memory)
device	DLDevice	Yes	Device type and ID where data resides
ndim	int32_t	Yes	Number of tensor dimensions
dtype	DLDataType	Yes	Element data type (code, bits, lanes)
shape	int64_t*	Yes	Array of dimension sizes
strides	int64_t*	No	Optional strides (NULL for compact row-major)
byte_offset	uint64_t	No	Byte offset to data start (default 0)

Outputs

Name	Type	Description
DLTensor	struct	Self-describing tensor descriptor for zero-copy exchange

Usage Examples

// Construct a DLTensor wrapping an existing buffer (as done in guided_decoding.cc)
DLTensor dlbitmask{
    bitmask_buf_.data(),                    // data pointer
    DLDevice{kDLCPU, 0},                   // CPU device
    bitmask_buf_.ndim(),                    // number of dimensions
    xgrammar::GetBitmaskDLType(),           // data type
    (int64_t*)bitmask_buf_.shape().data(),  // shape array
    nullptr,                                 // strides (compact)
    0                                        // byte offset
};

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment