Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed Pin Tensor

From Leeroopedia


Knowledge Sources
Domains Async_IO, NVMe_Offload
Last Updated 2026-02-09 00:00 GMT

Overview

Memory manager for creating and tracking PyTorch CPU tensors backed by page-locked (pinned) memory for high-performance I/O operations.

Description

The deepspeed_pin_tensor_t class manages a pool of PyTorch CPU tensors that occupy page-locked memory using mlock(). Page-locked memory cannot be swapped to disk by the operating system, which provides two key benefits: guaranteed memory residency for asynchronous I/O operations and faster DMA transfers to/from GPUs. The manager maintains a registry of allocated pinned buffers, tracks their sizes, and ensures proper cleanup via munlock() and free() when tensors are no longer needed or when the manager is destroyed.

This implementation uses page-aligned memory allocation (via posix_memalign) combined with mlock to create buffers suitable for O_DIRECT I/O operations. The manager provides tensor allocation with either TensorOptions or ScalarType specifications, automatic cleanup in the destructor, and query capabilities to check if a tensor is managed by this system.

Usage

Use this manager when you need reusable pinned memory buffers for frequent I/O operations, such as during repeated checkpoint loading/saving or optimizer state swapping. The pinned tensors avoid allocation overhead and provide optimal performance for both disk I/O and CPU-GPU transfers.

Code Reference

Source Location

Signature

class deepspeed_pin_tensor_t {
    ~deepspeed_pin_tensor_t();

    torch::Tensor alloc(const int64_t num_elem,
                        const torch::TensorOptions& options);

    torch::Tensor alloc(const int64_t num_elem,
                        const at::ScalarType& elem_type);

    bool free(torch::Tensor& locked_tensor);

    bool is_managed(const torch::Tensor& buffer);
};

Import

#include "deepspeed_pin_tensor.h"

I/O Contract

Inputs

Name Type Required Description
num_elem int64_t Yes Number of elements in the tensor
options torch::TensorOptions Yes Tensor options specifying dtype, device, etc.
elem_type at::ScalarType Yes Scalar type (alternative to options)
locked_tensor torch::Tensor Yes Tensor to free (must be managed by this manager)
buffer torch::Tensor Yes Tensor to query for management status

Outputs

Name Type Description
tensor torch::Tensor Newly allocated pinned CPU tensor
freed bool True if tensor was successfully freed, false if not managed
is_managed bool True if tensor is managed by this manager

Usage Examples

// Create pinned tensor manager
auto pinned_mgr = std::make_unique<deepspeed_pin_tensor_t>();

// Allocate pinned tensor with TensorOptions
auto options = torch::TensorOptions()
                   .dtype(torch::kFloat32)
                   .device(torch::kCPU)
                   .requires_grad(false);
auto pinned_tensor1 = pinned_mgr->alloc(1024*1024, options);

// Allocate pinned tensor with ScalarType
auto pinned_tensor2 = pinned_mgr->alloc(2048*1024, torch::kFloat16);

// Check if tensor is managed
if (pinned_mgr->is_managed(pinned_tensor1)) {
    std::cout << "Tensor is pinned and managed" << std::endl;
}

// Use pinned tensors for I/O operations
// ... perform async I/O with pinned_tensor1 ...

// Free specific tensor when done
bool freed = pinned_mgr->free(pinned_tensor1);

// Manager automatically frees all remaining tensors in destructor
// (pinned_tensor2 will be freed when pinned_mgr goes out of scope)

// Typical usage pattern with I/O handle:
auto io_handle = new deepspeed_io_handle_t(1024*1024, 128, false, true, 8);
auto buffer = io_handle->new_cpu_locked_tensor(1024*1024, example_tensor);
io_handle->async_pread(buffer, "/nvme/state.pt", 0);
io_handle->wait();
io_handle->free_cpu_locked_tensor(buffer);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment