Implementation:Run llama Llama index Pooling
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, Pooling |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
Defines the Pooling enum with CLS and MEAN pooling strategies for reducing token-level embedding tensors into fixed-size vectors.
Description
The Pooling class is a string enum that provides two pooling strategies commonly used in transformer-based embedding models:
- CLS ("cls"): Extracts the embedding from the first token (the [CLS] token). For 3D arrays (batch of sequences), it returns array[:, 0]. For 2D arrays (single sequence), it returns array[0].
- MEAN ("mean"): Computes the mean of all token embeddings. For 3D arrays, it averages along axis 1 (the sequence dimension). For 2D arrays, it averages along axis 0.
The enum is callable, meaning instances can be used directly as functions: calling Pooling.CLS(array) dispatches to cls_pooling and Pooling.MEAN(array) dispatches to mean_pooling. Both class methods support numpy.ndarray and torch.Tensor inputs through @overload type annotations. If an array with an unsupported number of dimensions is provided, a NotImplementedError is raised.
Usage
Use this enum when configuring or implementing embedding models that need to reduce variable-length token-level representations into fixed-size vectors. It is particularly useful when building custom embedding wrappers around HuggingFace or other transformer models where the pooling strategy is configurable.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/embeddings/pooling.py
Signature
class Pooling(str, Enum):
CLS = "cls"
MEAN = "mean"
def __call__(self, array: np.ndarray) -> np.ndarray: ...
@classmethod
def cls_pooling(cls, array: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]: ...
@classmethod
def mean_pooling(cls, array: np.ndarray) -> np.ndarray: ...
Import
from llama_index.core.embeddings.pooling import Pooling
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| array | np.ndarray or torch.Tensor | Yes | A 2D array (single sequence of token embeddings) or 3D array (batch of sequences of token embeddings). |
Outputs
| Name | Type | Description |
|---|---|---|
| result | np.ndarray or torch.Tensor | The pooled embedding: a 1D vector (from 2D input) or a 2D batch of vectors (from 3D input). |
Usage Examples
import numpy as np
from llama_index.core.embeddings.pooling import Pooling
# Token-level embeddings for a single sequence: (seq_len, embed_dim)
token_embeddings = np.random.rand(128, 768)
# CLS pooling: extract first token
cls_vector = Pooling.CLS(token_embeddings) # shape: (768,)
# MEAN pooling: average all tokens
mean_vector = Pooling.MEAN(token_embeddings) # shape: (768,)
# Batch of token embeddings: (batch_size, seq_len, embed_dim)
batch_embeddings = np.random.rand(4, 128, 768)
batch_cls = Pooling.CLS(batch_embeddings) # shape: (4, 768)
batch_mean = Pooling.MEAN(batch_embeddings) # shape: (4, 768)