Workflow:Microsoft Semantic kernel Vector Store RAG Pipeline
| Knowledge Sources | |
|---|---|
| Domains | RAG, Vector_Stores, Embeddings, Information_Retrieval |
| Last Updated | 2026-02-11 18:00 GMT |
Overview
End-to-end process for implementing Retrieval Augmented Generation (RAG) using Semantic Kernel's vector store abstractions, covering data ingestion, embedding generation, vector search with filtering, and augmented chat completion.
Description
This workflow demonstrates how to build a RAG pipeline using Semantic Kernel's vector data layer. RAG enhances AI responses by retrieving relevant information from a knowledge base before generating a response. The workflow covers the complete pipeline: creating a vector store and collection, generating text embeddings using an embedding service, ingesting data (text documents, PDFs, or structured records) into the vector store, performing similarity searches with optional metadata filtering, and augmenting chat completion prompts with retrieved context. The vector store abstraction supports multiple backends including InMemory, Azure AI Search, Cosmos DB, Qdrant, Redis, Weaviate, Pinecone, Elasticsearch, and SQL Server.
Usage
Execute this workflow when you need the AI model to answer questions using information that is not in its training data, such as proprietary documents, product catalogs, knowledge bases, or frequently updated content. This is the standard approach for grounding AI responses in factual, domain-specific information while avoiding hallucination. Use it whenever you have a corpus of documents and need accurate, sourced AI responses.
Execution Steps
Step 1: Define the Data Model
Create a C# record or class that represents the data to be stored in the vector collection. The model must include a key field, a vector embedding field, and any metadata fields used for filtering or display.
Key considerations:
- The key field uniquely identifies each record (typically a string or GUID)
- The embedding field stores the vector representation (ReadOnlyMemory<float>)
- Metadata fields (category, title, source) enable pre-filtering during search
- Attributes or fluent configuration map fields to vector store schema
Step 2: Create a Vector Store and Collection
Instantiate a vector store implementation (e.g., InMemoryVectorStore for development, or a production store like Qdrant or Azure AI Search). Then get or create a named collection within the store, specifying the key type and record type.
Pseudocode:
vectorStore = new InMemoryVectorStore()
collection = vectorStore.GetCollection<string, Glossary>("glossary")
await collection.EnsureCollectionExistsAsync()
Key considerations:
- The IVectorStore abstraction enables swapping backends without code changes
- GetCollection<TKey, TRecord>() returns a typed collection handle
- EnsureCollectionExistsAsync() creates the collection if it does not exist
- Supported backends: InMemory, AzureAISearch, CosmosDB, Qdrant, Redis, Weaviate, Pinecone, Elasticsearch, SQL Server
Step 3: Generate Embeddings
Use an embedding generation service (such as OpenAI text-embedding-ada-002 or Azure OpenAI) to convert text content into vector representations. Each text field that should be searchable must be embedded.
Key considerations:
- IEmbeddingGenerator<string, Embedding<float>> is the service interface
- GenerateAsync() produces a vector from input text
- Embedding dimension must match the vector store collection configuration
- Batch embedding generation improves throughput for large datasets
Step 4: Ingest Data Into the Collection
Generate embeddings for each record's text content and store the records in the vector collection using the upsert operation. Upsert creates new records or updates existing ones based on the key.
Pseudocode:
foreach record in records:
record.Embedding = await embeddingGenerator.GenerateAsync(record.Text)
await collection.UpsertAsync(records)
Key considerations:
- Embeddings should be generated before upserting
- Parallel embedding generation improves ingestion speed
- UpsertAsync handles both insert and update operations
- Records can be verified by retrieving them with GetAsync()
Step 5: Perform Vector Similarity Search
Convert a user query into an embedding and search the collection for the most similar records. The search returns records ranked by similarity score, optionally filtered by metadata predicates.
Pseudocode:
queryVector = await embeddingGenerator.GenerateAsync(userQuery) results = await collection.SearchAsync(queryVector, top: 3)
Key considerations:
- SearchAsync returns VectorSearchResult<T> with Record and Score properties
- The top parameter controls how many results to return
- Higher scores indicate greater similarity
- Results are ordered by relevance
Step 6: Apply Metadata Filters (Optional)
Add pre-filtering to narrow the search space before vector similarity comparison. Filters use LINQ-style lambda expressions on metadata fields.
Pseudocode:
results = await collection.SearchAsync(queryVector, top: 3,
new VectorSearchOptions { Filter = record => record.Category == "AI" })
Key considerations:
- Filters are applied before similarity ranking, reducing the candidate set
- Filter expressions reference metadata fields on the record type
- Multiple filter conditions can be combined
- Not all vector store backends support all filter operations
Step 7: Augment Chat Completion With Retrieved Context
Combine the retrieved documents with the user's question in a prompt template. The template injects the relevant context into the system or user message, enabling the AI model to generate a grounded, accurate response.
Key considerations:
- Retrieved text is injected into the prompt as context
- The prompt instructs the model to answer based on the provided context
- Source attribution can be included by passing record metadata
- The augmented prompt is then sent to the kernel's chat completion service