Principle:LMCache LMCache NIXL KV Transfer
| Knowledge Sources | |
|---|---|
| Domains | Networking, High_Performance_Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A high-performance data transfer mechanism using NVIDIA NIXL for RDMA-based KV cache movement between disaggregated prefill and decode instances.
Description
NIXL KV Transfer uses the NVIDIA Inference eXtension Library (NIXL) to perform direct memory transfers between GPU/CPU buffers across nodes without CPU-mediated copies. The PDBackend orchestrates the transfer: it establishes NIXL peer connections, requests remote buffer allocation from the receiver, constructs transfer descriptors, and executes batched RDMA writes.
Usage
This is the core data plane for disaggregated prefill. It operates transparently within the PDBackend when transfer_channel="nixl" is configured.
Theoretical Basis
NIXL transfer uses one-sided RDMA writes:
- Connection: Sender and receiver exchange NIXL agent metadata via ZMQ
- Registration: Both sides register their memory buffers with the NIXL agent
- Transfer: Sender prepares a transfer descriptor mapping local buffers to remote offsets, then calls nixl_agent.transfer() which initiates RDMA writes
- Completion: Sender polls nixl_agent.check_xfer_state() until status is "DONE"