Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LMCache LMCache NIXL KV Transfer

From Leeroopedia


Knowledge Sources
Domains Networking, High_Performance_Computing
Last Updated 2026-02-09 00:00 GMT

Overview

A high-performance data transfer mechanism using NVIDIA NIXL for RDMA-based KV cache movement between disaggregated prefill and decode instances.

Description

NIXL KV Transfer uses the NVIDIA Inference eXtension Library (NIXL) to perform direct memory transfers between GPU/CPU buffers across nodes without CPU-mediated copies. The PDBackend orchestrates the transfer: it establishes NIXL peer connections, requests remote buffer allocation from the receiver, constructs transfer descriptors, and executes batched RDMA writes.

Usage

This is the core data plane for disaggregated prefill. It operates transparently within the PDBackend when transfer_channel="nixl" is configured.

Theoretical Basis

NIXL transfer uses one-sided RDMA writes:

  1. Connection: Sender and receiver exchange NIXL agent metadata via ZMQ
  2. Registration: Both sides register their memory buffers with the NIXL agent
  3. Transfer: Sender prepares a transfer descriptor mapping local buffers to remote offsets, then calls nixl_agent.transfer() which initiates RDMA writes
  4. Completion: Sender polls nixl_agent.check_xfer_state() until status is "DONE"

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment