Principle:Microsoft Semantic kernel Embedding Generation

Overview

The Embedding Generation principle describes how text (or other unstructured data) is converted into dense vector representations (embeddings) that enable semantic similarity operations. In the Semantic Kernel vector store pipeline, embedding generation is the critical transformation step that bridges human-readable content and machine-comparable vectors.

Embeddings are fixed-length arrays of floating-point numbers produced by a neural network model. Two pieces of text that are semantically similar will produce embedding vectors that are geometrically close in the vector space, enabling similarity search without keyword matching.

Motivation

Traditional text search relies on exact or fuzzy keyword matching. This approach fails when:

The query uses different words than the stored content (synonym problem)
The meaning is expressed in a structurally different way (paraphrase problem)
The relationship is conceptual rather than lexical (semantic gap problem)

Embedding generation solves these problems by mapping text into a continuous vector space where semantic meaning is encoded as geometric position. In this space, "automobile" and "car" produce nearly identical vectors, even though they share no characters.

Core Concepts

Embedding Models

An embedding model is a neural network that accepts text input and produces a fixed-dimensional vector output. Key properties of embedding models:

Dimensionality: The number of elements in the output vector (e.g., 1536 for OpenAI's text-embedding-ada-002)
Context window: The maximum number of tokens the model can process in a single input
Semantic fidelity: How well the vector captures the meaning of the input text

The IEmbeddingGenerator Abstraction

Semantic Kernel defines the IEmbeddingGenerator<TInput, TEmbedding> interface as a provider-agnostic abstraction over embedding models. This interface:

Decouples application code from specific embedding providers (OpenAI, Azure, Hugging Face, etc.)
Provides a consistent GenerateAsync method regardless of the underlying model
Supports dependency injection for testability and configuration

Vector as a Semantic Fingerprint

Each embedding vector can be thought of as a semantic fingerprint of the input text. Two important properties:

Consistency: The same input always produces the same (or very similar) output from a given model
Locality: Semantically similar inputs produce vectors that are close together in the vector space, as measured by cosine similarity or Euclidean distance

When Embeddings Are Generated

In the RAG pipeline, embeddings are generated at two distinct points:

Ingestion time: Each record's text content is embedded and stored alongside the record in the vector store. This is a one-time cost per record.
Query time: The user's search query is embedded using the same model to produce a query vector. This vector is then compared against stored vectors.

It is critical that the same embedding model is used for both ingestion and query. Using different models produces incompatible vector spaces and will yield meaningless search results.

Design Principles

Model Consistency

The embedding model used at query time must be the same model used at ingestion time. If the model is changed, all previously stored embeddings must be regenerated. This is a fundamental constraint of vector similarity search.

Asynchronous Generation

Embedding generation involves a network call to an AI service (or a local model inference), so the API is fully asynchronous. The GenerateAsync method returns a Task, allowing efficient use of I/O resources and integration with async/await patterns.

Batch Processing

The GenerateAsync method accepts collections of inputs, enabling batch embedding generation for efficiency. When ingesting many records, batching reduces the number of network round-trips to the embedding service.

Dimensional Alignment

The dimensionality of generated embeddings must match the Dimensions parameter specified in the data model's [VectorStoreVector] attribute:

Data Model Declaration	Required Embedding Model Output
`[VectorStoreVector(Dimensions: 1536)]`	1536-dimensional vector
`[VectorStoreVector(Dimensions: 3072)]`	3072-dimensional vector

A mismatch between the declared dimensions and the actual embedding size will cause runtime errors during upsert operations.

Relationship to Other Principles

Vector Store Data Model declares the vector property that receives the generated embedding
Data Ingestion stores the generated embeddings into the vector store
Vector Similarity Search uses query-time embeddings to find similar records
RAG Chat Augmentation uses the full embed-search-augment pipeline

Implementation:Microsoft_Semantic_kernel_IEmbeddingGenerator_GenerateAsync

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment