Principle:Microsoft Semantic kernel Embedding Generation
Overview
The Embedding Generation principle describes how text (or other unstructured data) is converted into dense vector representations (embeddings) that enable semantic similarity operations. In the Semantic Kernel vector store pipeline, embedding generation is the critical transformation step that bridges human-readable content and machine-comparable vectors.
Embeddings are fixed-length arrays of floating-point numbers produced by a neural network model. Two pieces of text that are semantically similar will produce embedding vectors that are geometrically close in the vector space, enabling similarity search without keyword matching.
Motivation
Traditional text search relies on exact or fuzzy keyword matching. This approach fails when:
- The query uses different words than the stored content (synonym problem)
- The meaning is expressed in a structurally different way (paraphrase problem)
- The relationship is conceptual rather than lexical (semantic gap problem)
Embedding generation solves these problems by mapping text into a continuous vector space where semantic meaning is encoded as geometric position. In this space, "automobile" and "car" produce nearly identical vectors, even though they share no characters.
Core Concepts
Embedding Models
An embedding model is a neural network that accepts text input and produces a fixed-dimensional vector output. Key properties of embedding models:
- Dimensionality: The number of elements in the output vector (e.g., 1536 for OpenAI's
text-embedding-ada-002) - Context window: The maximum number of tokens the model can process in a single input
- Semantic fidelity: How well the vector captures the meaning of the input text
The IEmbeddingGenerator Abstraction
Semantic Kernel defines the IEmbeddingGenerator<TInput, TEmbedding> interface as a provider-agnostic abstraction over embedding models. This interface:
- Decouples application code from specific embedding providers (OpenAI, Azure, Hugging Face, etc.)
- Provides a consistent
GenerateAsyncmethod regardless of the underlying model - Supports dependency injection for testability and configuration
Vector as a Semantic Fingerprint
Each embedding vector can be thought of as a semantic fingerprint of the input text. Two important properties:
- Consistency: The same input always produces the same (or very similar) output from a given model
- Locality: Semantically similar inputs produce vectors that are close together in the vector space, as measured by cosine similarity or Euclidean distance
When Embeddings Are Generated
In the RAG pipeline, embeddings are generated at two distinct points:
- Ingestion time: Each record's text content is embedded and stored alongside the record in the vector store. This is a one-time cost per record.
- Query time: The user's search query is embedded using the same model to produce a query vector. This vector is then compared against stored vectors.
It is critical that the same embedding model is used for both ingestion and query. Using different models produces incompatible vector spaces and will yield meaningless search results.
Design Principles
Model Consistency
The embedding model used at query time must be the same model used at ingestion time. If the model is changed, all previously stored embeddings must be regenerated. This is a fundamental constraint of vector similarity search.
Asynchronous Generation
Embedding generation involves a network call to an AI service (or a local model inference), so the API is fully asynchronous. The GenerateAsync method returns a Task, allowing efficient use of I/O resources and integration with async/await patterns.
Batch Processing
The GenerateAsync method accepts collections of inputs, enabling batch embedding generation for efficiency. When ingesting many records, batching reduces the number of network round-trips to the embedding service.
Dimensional Alignment
The dimensionality of generated embeddings must match the Dimensions parameter specified in the data model's [VectorStoreVector] attribute:
| Data Model Declaration | Required Embedding Model Output |
|---|---|
[VectorStoreVector(Dimensions: 1536)] |
1536-dimensional vector |
[VectorStoreVector(Dimensions: 3072)] |
3072-dimensional vector |
A mismatch between the declared dimensions and the actual embedding size will cause runtime errors during upsert operations.
Relationship to Other Principles
- Vector Store Data Model declares the vector property that receives the generated embedding
- Data Ingestion stores the generated embeddings into the vector store
- Vector Similarity Search uses query-time embeddings to find similar records
- RAG Chat Augmentation uses the full embed-search-augment pipeline
Implementation:Microsoft_Semantic_kernel_IEmbeddingGenerator_GenerateAsync