Principle:Microsoft Semantic kernel Vector Store Collection Setup
Overview
The Vector Store Collection Setup principle describes how Semantic Kernel organizes vector data into typed collections accessed through a provider-agnostic abstraction layer. A vector store instance acts as a factory for collections, and each collection is parameterized with a key type and a record type, ensuring type safety from creation through querying.
This two-layer architecture — store then collection — mirrors how relational databases separate the database connection from individual tables, but with the added benefit of generic type parameters that enforce schema correctness at compile time.
Motivation
Applications that use vector stores need to manage multiple concerns:
- Backend selection: Choosing between in-memory stores for development, Azure AI Search for production, Qdrant for self-hosted deployments, and so on
- Collection lifecycle: Creating collections (tables/indexes) if they do not exist, or connecting to existing ones
- Type safety: Ensuring that records written to a collection match the expected schema and that query results are correctly typed
- Swappability: Allowing the backend to change without modifying application logic
The Vector Store Collection Setup principle addresses all of these concerns through a layered abstraction.
Core Concepts
The Vector Store as a Factory
A vector store instance (such as InMemoryVectorStore, AzureAISearchVectorStore, or QdrantVectorStore) does not directly hold data. Instead, it serves as a factory that produces typed collection instances. This factory pattern means:
- The store encapsulates backend-specific connection details (endpoints, credentials, client options)
- Collections are created through a uniform
GetCollectionmethod regardless of backend - Multiple collections with different record types can coexist within a single store
Typed Collection Access
The GetCollection<TKey, TRecord>(collectionName) method returns a VectorStoreCollection<TKey, TRecord> that is fully typed. This means:
- The key type (
TKey) determines what kind of identifiers the collection accepts (e.g.,string,Guid) - The record type (
TRecord) must be a class decorated with vector store attributes ([VectorStoreKey],[VectorStoreData],[VectorStoreVector]) - All operations on the collection — upsert, get, search, delete — are type-safe
Collection Lifecycle Management
Before data can be ingested, the collection must exist in the underlying store. The EnsureCollectionExistsAsync() method provides idempotent collection creation:
- If the collection does not exist, it creates the collection (and any necessary indexes) based on the record type's attributes
- If the collection already exists, it does nothing
- This makes the method safe to call on every application startup without risking data loss
Design Principles
Provider Abstraction
The collection setup API is identical across all supported backends. Application code that creates a collection and ingests data can switch from InMemoryVectorStore to AzureAISearchVectorStore by changing only the store instantiation line. The collection operations (GetCollection, EnsureCollectionExistsAsync, UpsertAsync, SearchAsync) remain the same.
Generic Type Safety
By parameterizing collections with <TKey, TRecord>, the compiler prevents common errors:
- Upserting a record of the wrong type into a collection
- Using a key of the wrong type for retrieval
- Accessing properties that do not exist on the record type in search results
Lazy Initialization
GetCollection does not create the collection in the backend — it only creates the client-side collection object. Actual backend resources are created when EnsureCollectionExistsAsync() is called. This lazy approach allows the application to configure multiple collections before incurring any network or storage costs.
Typical Setup Flow
The standard setup flow follows these steps:
- Instantiate the vector store with backend-specific configuration
- Get a typed collection by specifying the key type, record type, and collection name
- Ensure the collection exists in the backend
- Proceed with data operations (upsert, search)
This flow is consistent across all backends and is the recommended pattern for both development and production scenarios.
Relationship to Other Principles
- Vector Store Data Model defines the record type used as the
TRecordtype parameter - Data Ingestion uses the collection returned by setup to upsert records
- Vector Similarity Search uses the same collection to perform searches
- Metadata Filtering leverages indexes created during collection setup
Implementation:Microsoft_Semantic_kernel_InMemoryVectorStore