Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index Pooling

From Leeroopedia
Knowledge Sources
Domains Embeddings, Pooling
Last Updated 2026-02-11 19:00 GMT

Overview

Defines the Pooling enum with CLS and MEAN pooling strategies for reducing token-level embedding tensors into fixed-size vectors.

Description

The Pooling class is a string enum that provides two pooling strategies commonly used in transformer-based embedding models:

  • CLS ("cls"): Extracts the embedding from the first token (the [CLS] token). For 3D arrays (batch of sequences), it returns array[:, 0]. For 2D arrays (single sequence), it returns array[0].
  • MEAN ("mean"): Computes the mean of all token embeddings. For 3D arrays, it averages along axis 1 (the sequence dimension). For 2D arrays, it averages along axis 0.

The enum is callable, meaning instances can be used directly as functions: calling Pooling.CLS(array) dispatches to cls_pooling and Pooling.MEAN(array) dispatches to mean_pooling. Both class methods support numpy.ndarray and torch.Tensor inputs through @overload type annotations. If an array with an unsupported number of dimensions is provided, a NotImplementedError is raised.

Usage

Use this enum when configuring or implementing embedding models that need to reduce variable-length token-level representations into fixed-size vectors. It is particularly useful when building custom embedding wrappers around HuggingFace or other transformer models where the pooling strategy is configurable.

Code Reference

Source Location

Signature

class Pooling(str, Enum):
    CLS = "cls"
    MEAN = "mean"

    def __call__(self, array: np.ndarray) -> np.ndarray: ...

    @classmethod
    def cls_pooling(cls, array: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]: ...

    @classmethod
    def mean_pooling(cls, array: np.ndarray) -> np.ndarray: ...

Import

from llama_index.core.embeddings.pooling import Pooling

I/O Contract

Inputs

Name Type Required Description
array np.ndarray or torch.Tensor Yes A 2D array (single sequence of token embeddings) or 3D array (batch of sequences of token embeddings).

Outputs

Name Type Description
result np.ndarray or torch.Tensor The pooled embedding: a 1D vector (from 2D input) or a 2D batch of vectors (from 3D input).

Usage Examples

import numpy as np
from llama_index.core.embeddings.pooling import Pooling

# Token-level embeddings for a single sequence: (seq_len, embed_dim)
token_embeddings = np.random.rand(128, 768)

# CLS pooling: extract first token
cls_vector = Pooling.CLS(token_embeddings)  # shape: (768,)

# MEAN pooling: average all tokens
mean_vector = Pooling.MEAN(token_embeddings)  # shape: (768,)

# Batch of token embeddings: (batch_size, seq_len, embed_dim)
batch_embeddings = np.random.rand(4, 128, 768)
batch_cls = Pooling.CLS(batch_embeddings)    # shape: (4, 768)
batch_mean = Pooling.MEAN(batch_embeddings)  # shape: (4, 768)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment