Principle:LMCache LMCache NIXL KV Transfer

Knowledge Sources	LMCache NIXL
Domains	Networking, High_Performance_Computing
Last Updated	2026-02-09 00:00 GMT

Overview

A high-performance data transfer mechanism using NVIDIA NIXL for RDMA-based KV cache movement between disaggregated prefill and decode instances.

Description

NIXL KV Transfer uses the NVIDIA Inference eXtension Library (NIXL) to perform direct memory transfers between GPU/CPU buffers across nodes without CPU-mediated copies. The PDBackend orchestrates the transfer: it establishes NIXL peer connections, requests remote buffer allocation from the receiver, constructs transfer descriptors, and executes batched RDMA writes.

Usage

This is the core data plane for disaggregated prefill. It operates transparently within the PDBackend when transfer_channel="nixl" is configured.

Theoretical Basis

NIXL transfer uses one-sided RDMA writes:

Connection: Sender and receiver exchange NIXL agent metadata via ZMQ
Registration: Both sides register their memory buffers with the NIXL agent
Transfer: Sender prepares a transfer descriptor mapping local buffers to remote offsets, then calls nixl_agent.transfer() which initiates RDMA writes
Completion: Sender polls nixl_agent.check_xfer_state() until status is "DONE"

Related Pages

Implemented By

Implementation:LMCache_LMCache_PDBackend_Batched_Submit_Put_Task

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment