Implementation:Run llama Llama index Pooling

Knowledge Sources	Run_llama_Llama_index
Domains	Embeddings, Pooling
Last Updated	2026-02-11 19:00 GMT

Overview

Defines the Pooling enum with CLS and MEAN pooling strategies for reducing token-level embedding tensors into fixed-size vectors.

Description

The Pooling class is a string enum that provides two pooling strategies commonly used in transformer-based embedding models:

CLS ("cls"): Extracts the embedding from the first token (the [CLS] token). For 3D arrays (batch of sequences), it returns array[:, 0]. For 2D arrays (single sequence), it returns array[0].
MEAN ("mean"): Computes the mean of all token embeddings. For 3D arrays, it averages along axis 1 (the sequence dimension). For 2D arrays, it averages along axis 0.

The enum is callable, meaning instances can be used directly as functions: calling Pooling.CLS(array) dispatches to cls_pooling and Pooling.MEAN(array) dispatches to mean_pooling. Both class methods support numpy.ndarray and torch.Tensor inputs through @overload type annotations. If an array with an unsupported number of dimensions is provided, a NotImplementedError is raised.

Usage

Use this enum when configuring or implementing embedding models that need to reduce variable-length token-level representations into fixed-size vectors. It is particularly useful when building custom embedding wrappers around HuggingFace or other transformer models where the pooling strategy is configurable.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/embeddings/pooling.py

Signature

class Pooling(str, Enum):
    CLS = "cls"
    MEAN = "mean"

    def __call__(self, array: np.ndarray) -> np.ndarray: ...

    @classmethod
    def cls_pooling(cls, array: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]: ...

    @classmethod
    def mean_pooling(cls, array: np.ndarray) -> np.ndarray: ...

Import

from llama_index.core.embeddings.pooling import Pooling

I/O Contract

Inputs

Name	Type	Required	Description
array	np.ndarray or torch.Tensor	Yes	A 2D array (single sequence of token embeddings) or 3D array (batch of sequences of token embeddings).

Outputs

Name	Type	Description
result	np.ndarray or torch.Tensor	The pooled embedding: a 1D vector (from 2D input) or a 2D batch of vectors (from 3D input).

Usage Examples

import numpy as np
from llama_index.core.embeddings.pooling import Pooling

# Token-level embeddings for a single sequence: (seq_len, embed_dim)
token_embeddings = np.random.rand(128, 768)

# CLS pooling: extract first token
cls_vector = Pooling.CLS(token_embeddings)  # shape: (768,)

# MEAN pooling: average all tokens
mean_vector = Pooling.MEAN(token_embeddings)  # shape: (768,)

# Batch of token embeddings: (batch_size, seq_len, embed_dim)
batch_embeddings = np.random.rand(4, 128, 768)
batch_cls = Pooling.CLS(batch_embeddings)    # shape: (4, 768)
batch_mean = Pooling.MEAN(batch_embeddings)  # shape: (4, 768)

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment