Implementation:Vllm project Vllm Sequence
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Parallel, Distributed_Inference |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Defines the IntermediateTensors data structure for passing hidden states and residuals between pipeline-parallel stages during distributed inference.
Description
This module provides the IntermediateTensors dataclass, which wraps a dictionary of named tensors (dict[str, torch.Tensor]) with a dictionary-like interface supporting key-based access, slice-based batched access, and iteration. It also carries an optional kv_connector_output for KV cache transfers between pipeline stages. The class is implemented as a dataclass (rather than msgspec.Struct) for compatibility with PyTorch Dynamo tracing, and its __init__ is explicitly defined so Dynamo can trace the source file correctly.
Usage
Use this data structure in pipeline-parallel model execution to transfer intermediate hidden states between pipeline stages. Every pipeline stage except the last returns an IntermediateTensors instance containing the hidden states and residuals needed by the next stage.
Code Reference
Source Location
- Repository: vllm
- File: vllm/sequence.py
- Lines: 1-64
Signature
@dataclass
class IntermediateTensors:
tensors: dict[str, torch.Tensor]
kv_connector_output: KVConnectorOutput | None
def __init__(
self,
tensors: dict[str, torch.Tensor],
kv_connector_output: KVConnectorOutput | None = None,
) -> None: ...
def __getitem__(self, key: str | slice): ...
def __setitem__(self, key: str, value: torch.Tensor): ...
def items(self): ...
def __len__(self) -> int: ...
def __eq__(self, other: object) -> bool: ...
def __repr__(self) -> str: ...
Import
from vllm.sequence import IntermediateTensors
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| tensors | dict[str, torch.Tensor] | Yes | Named tensors (e.g., "hidden_states", "residual") to pass between stages |
| kv_connector_output | KVConnectorOutput or None | No | Optional KV cache connector output for cross-stage KV transfers (default None) |
Outputs
| Name | Type | Description |
|---|---|---|
| tensor | torch.Tensor | Individual tensor retrieved by string key via __getitem__ |
| sliced | IntermediateTensors | New IntermediateTensors with sliced tensors via slice key |
| items | ItemsView | Key-value pairs of all stored tensors via items() |
Usage Examples
from vllm.sequence import IntermediateTensors
import torch
# Create intermediate tensors to pass between pipeline stages
intermediate = IntermediateTensors(
tensors={
"hidden_states": torch.randn(32, 4096),
"residual": torch.randn(32, 4096),
}
)
# Access a specific tensor by name
hidden = intermediate["hidden_states"] # torch.Tensor [32, 4096]
# Slice for a subset of the batch
batch_slice = intermediate[0:16] # IntermediateTensors with tensors sliced to [16, 4096]
# Set a tensor
intermediate["hidden_states"] = new_hidden_states
# Iterate over stored tensors
for name, tensor in intermediate.items():
print(f"{name}: {tensor.shape}")