Workflow:Microsoft Semantic kernel Vector Store RAG Pipeline

Knowledge Sources	Microsoft Semantic Kernel Semantic Kernel Documentation
Domains	RAG, Vector_Stores, Embeddings, Information_Retrieval
Last Updated	2026-02-11 18:00 GMT

Overview

End-to-end process for implementing Retrieval Augmented Generation (RAG) using Semantic Kernel's vector store abstractions, covering data ingestion, embedding generation, vector search with filtering, and augmented chat completion.

Description

This workflow demonstrates how to build a RAG pipeline using Semantic Kernel's vector data layer. RAG enhances AI responses by retrieving relevant information from a knowledge base before generating a response. The workflow covers the complete pipeline: creating a vector store and collection, generating text embeddings using an embedding service, ingesting data (text documents, PDFs, or structured records) into the vector store, performing similarity searches with optional metadata filtering, and augmenting chat completion prompts with retrieved context. The vector store abstraction supports multiple backends including InMemory, Azure AI Search, Cosmos DB, Qdrant, Redis, Weaviate, Pinecone, Elasticsearch, and SQL Server.

Usage

Execute this workflow when you need the AI model to answer questions using information that is not in its training data, such as proprietary documents, product catalogs, knowledge bases, or frequently updated content. This is the standard approach for grounding AI responses in factual, domain-specific information while avoiding hallucination. Use it whenever you have a corpus of documents and need accurate, sourced AI responses.

Execution Steps

Step 1: Define the Data Model

Create a C# record or class that represents the data to be stored in the vector collection. The model must include a key field, a vector embedding field, and any metadata fields used for filtering or display.

Key considerations:

The key field uniquely identifies each record (typically a string or GUID)
The embedding field stores the vector representation (ReadOnlyMemory<float>)
Metadata fields (category, title, source) enable pre-filtering during search
Attributes or fluent configuration map fields to vector store schema

Step 2: Create a Vector Store and Collection

Instantiate a vector store implementation (e.g., InMemoryVectorStore for development, or a production store like Qdrant or Azure AI Search). Then get or create a named collection within the store, specifying the key type and record type.

Pseudocode:

vectorStore = new InMemoryVectorStore()
collection = vectorStore.GetCollection<string, Glossary>("glossary")
await collection.EnsureCollectionExistsAsync()

Key considerations:

The IVectorStore abstraction enables swapping backends without code changes
GetCollection<TKey, TRecord>() returns a typed collection handle
EnsureCollectionExistsAsync() creates the collection if it does not exist
Supported backends: InMemory, AzureAISearch, CosmosDB, Qdrant, Redis, Weaviate, Pinecone, Elasticsearch, SQL Server

Step 3: Generate Embeddings

Use an embedding generation service (such as OpenAI text-embedding-ada-002 or Azure OpenAI) to convert text content into vector representations. Each text field that should be searchable must be embedded.

Key considerations:

IEmbeddingGenerator<string, Embedding<float>> is the service interface
GenerateAsync() produces a vector from input text
Embedding dimension must match the vector store collection configuration
Batch embedding generation improves throughput for large datasets

Step 4: Ingest Data Into the Collection

Generate embeddings for each record's text content and store the records in the vector collection using the upsert operation. Upsert creates new records or updates existing ones based on the key.

Pseudocode:

foreach record in records:
    record.Embedding = await embeddingGenerator.GenerateAsync(record.Text)
await collection.UpsertAsync(records)

Key considerations:

Embeddings should be generated before upserting
Parallel embedding generation improves ingestion speed
UpsertAsync handles both insert and update operations
Records can be verified by retrieving them with GetAsync()

Step 5: Perform Vector Similarity Search

Convert a user query into an embedding and search the collection for the most similar records. The search returns records ranked by similarity score, optionally filtered by metadata predicates.

Pseudocode:

queryVector = await embeddingGenerator.GenerateAsync(userQuery)
results = await collection.SearchAsync(queryVector, top: 3)

Key considerations:

SearchAsync returns VectorSearchResult<T> with Record and Score properties
The top parameter controls how many results to return
Higher scores indicate greater similarity
Results are ordered by relevance

Step 6: Apply Metadata Filters (Optional)

Add pre-filtering to narrow the search space before vector similarity comparison. Filters use LINQ-style lambda expressions on metadata fields.

Pseudocode:

results = await collection.SearchAsync(queryVector, top: 3,
    new VectorSearchOptions { Filter = record => record.Category == "AI" })

Key considerations:

Filters are applied before similarity ranking, reducing the candidate set
Filter expressions reference metadata fields on the record type
Multiple filter conditions can be combined
Not all vector store backends support all filter operations

Step 7: Augment Chat Completion With Retrieved Context

Combine the retrieved documents with the user's question in a prompt template. The template injects the relevant context into the system or user message, enabling the AI model to generate a grounded, accurate response.

Key considerations:

Retrieved text is injected into the prompt as context
The prompt instructs the model to answer based on the provided context
Source attribution can be included by passing record metadata
The augmented prompt is then sent to the kernel's chat completion service

Execution Diagram

GitHub URL

Workflow Repository