Principle:Openai Openai python Embedding Generation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Embeddings, Semantic_Search |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A vector representation technique that maps text into dense numerical vectors for semantic similarity, search, clustering, and classification tasks.
Description
Embedding generation transforms text into fixed-dimensional float vectors that capture semantic meaning. Similar texts produce vectors with high cosine similarity. Modern embedding models (text-embedding-3-*) support configurable output dimensions, enabling tradeoffs between vector quality and storage/compute costs. Embeddings are the foundation for semantic search, RAG (retrieval-augmented generation), clustering, and text classification.
Usage
Use this principle when you need to convert text into numerical representations for downstream tasks: semantic search, similarity comparison, clustering, or as input features for ML models. Choose model and dimensions based on quality vs. cost tradeoffs.
Theoretical Basis
Embedding generation applies a Transformer Encoder to produce dense vectors:
# Text to vector mapping
vector = embed(text, model=model, dimensions=d)
# vector: list[float] of length d
# Semantic similarity via cosine similarity
similarity = cosine_similarity(vector_a, vector_b)
# Range: -1 to 1 (higher = more similar)
# Cosine similarity formula
cos_sim = dot(a, b) / (norm(a) * norm(b))
The text-embedding-3-* models use Matryoshka Representation Learning which allows truncating the output vector to fewer dimensions while preserving most of the semantic information.