Principle:Ggml org Llama cpp Embedding Normalization

Field	Value
Principle Name	Embedding Normalization
Domain	Vector Mathematics, Similarity Metrics
Description	Theory of vector normalization and similarity metrics for embedding spaces: L2 norm, cosine similarity, and p-norm variants
Related Workflow	Embedding_Extraction

Overview

Description

The Embedding Normalization principle covers the mathematical theory behind normalizing embedding vectors and computing similarity metrics between them. Raw embedding vectors produced by transformer models have varying magnitudes that can distort similarity comparisons. Normalization projects these vectors onto a unit hypersphere (or scaled equivalent), ensuring that similarity metrics reflect semantic relatedness rather than magnitude differences.

The principle addresses:

L2 (Euclidean) normalization: Scaling vectors to unit length, the most common normalization for cosine similarity.
Max-absolute normalization: Scaling by the maximum absolute component value, useful for integer quantization.
P-norm generalization: Normalizing by the p-norm for arbitrary p values, generalizing both L1 and L2 norms.
No normalization: Passing raw embeddings through without modification, preserving the original magnitude information.
Cosine similarity computation: Computing the angle-based similarity between two vectors, the standard metric for semantic relatedness.

Usage

Normalization is applied as the final post-processing step after embedding extraction. It is essential for:

Making embedding vectors suitable for cosine similarity comparisons
Preparing embeddings for storage in vector databases that assume unit-norm vectors
Quantizing embeddings to integer representations for compact storage
Computing similarity matrices between collections of texts

Theoretical Basis

Vector normalization transforms a vector v into a unit vector v_hat by dividing by its norm:

v_hat = v / ||v||_p

where ||v||_p is the p-norm of the vector. The choice of p determines the normalization behavior:

p = 2 (Euclidean/L2 norm): ||v||_2 = sqrt(sum(v_i^2)). This is the most common choice for embedding normalization. After L2 normalization, the dot product between two vectors equals their cosine similarity, simplifying downstream computations.

p = 0 (max-absolute): ||v||_0 = max(|v_i|) / 32760. This variant normalizes by the maximum absolute component and scales to an int16-compatible range. It is designed for quantized storage where embeddings are converted to 16-bit integers.

p = -1 (no normalization): The identity operation, returning the raw embedding values. Useful when the downstream consumer handles normalization itself or when magnitude information is meaningful.

General p-norm: ||v||_p = (sum(|v_i|^p))^(1/p). Higher p values emphasize larger components; lower p values spread influence more evenly across dimensions.

Cosine similarity measures the angle between two vectors:

cos_sim(a, b) = (a . b) / (||a||_2 * ||b||_2)

where a . b is the dot product. The result ranges from -1 (opposite directions) through 0 (orthogonal) to 1 (identical directions). For L2-normalized vectors, cosine similarity reduces to a simple dot product, making it computationally efficient.

Edge case handling is important for robustness:

Zero vectors: When one or both vectors are all zeros (which can occur with padding or degenerate inputs), the similarity function must handle the division-by-zero gracefully. The convention adopted is that two zero vectors have similarity 1.0 (identical), while a zero vector and a non-zero vector have similarity 0.0.
Numerical precision: The computation uses double-precision accumulation for the dot products and norms to minimize floating-point error, even though the input and output vectors are single-precision.

Related Pages

Implementation:Ggml_org_Llama_cpp_Common_Embd_Normalize

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment