Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Microsoft Semantic kernel Vector Store Data Model

From Leeroopedia

Overview

The Vector Store Data Model principle defines how application data structures are mapped to vector store records in Microsoft Semantic Kernel. Rather than writing imperative schema definitions or configuration files, Semantic Kernel uses a declarative, attribute-based approach where .NET class properties are annotated with metadata attributes that describe their role in the vector store: key, data, or vector.

This approach follows the well-established pattern of data annotations in the .NET ecosystem (similar to Entity Framework attributes or JSON serialization attributes), allowing developers to define their vector store schema directly alongside their domain model.

Motivation

Vector databases require structured records that contain at minimum three categories of information:

  • A unique key that identifies each record
  • Data fields that hold the actual content (text, metadata, categories)
  • Vector fields that store the dense embedding representations used for similarity search

Without a declarative schema mechanism, developers would need to manually map between their application objects and the vector store's wire format. This creates several problems:

  • Tight coupling between application logic and storage layer
  • Repetitive boilerplate for serialization and deserialization
  • Error-prone manual mapping where field names or types can drift out of sync
  • No compile-time validation of schema correctness

The attribute-based data model solves these problems by making the class definition itself the single source of truth for the vector store schema.

Core Concepts

Three Attribute Categories

Every vector store record class uses three categories of attributes:

  1. [VectorStoreKey] — Marks exactly one property as the unique identifier for the record. This is analogous to a primary key in a relational database.
  2. [VectorStoreData] — Marks properties that contain stored content. These fields are persisted and returned in query results. An optional IsIndexed = true parameter indicates that the field should be indexed for filtering operations.
  3. [VectorStoreVector(Dimensions)] — Marks properties that contain embedding vectors. The Dimensions parameter specifies the vector size, which must match the output dimensions of the embedding model being used.

Schema as Code

The data model class serves multiple purposes simultaneously:

  • Storage schema: The attributes tell the vector store connector how to create and manage the underlying collection
  • Serialization contract: The property types and names define how data is serialized to and from the store
  • Query contract: Indexed data fields become available for metadata filtering in search operations
  • Documentation: The class itself is human-readable documentation of the record structure

Dimensional Consistency

The Dimensions parameter on [VectorStoreVector] must match the embedding model's output size. For example, OpenAI's text-embedding-ada-002 produces 1536-dimensional vectors, so the attribute must specify Dimensions: 1536. A mismatch will cause runtime errors during upsert or search operations.

Design Principles

Separation of Concerns

The data model separates what the record looks like from how it is stored. The same model class can be used with different vector store backends (in-memory, Azure AI Search, Qdrant, Pinecone) without modification, because each connector interprets the attributes according to its own storage semantics.

Convention Over Configuration

Property names are used as field names by default. The attributes add only the metadata that cannot be inferred from the type system alone — specifically, which property is the key, which properties should be indexed, and the dimensionality of vector fields.

Compile-Time Safety

Because the schema is defined in C# code, the compiler catches type mismatches, missing properties, and other structural errors before runtime. This is a significant advantage over schema-as-configuration approaches where errors only surface at deployment time.

Relationship to Other Principles

The Vector Store Data Model is the foundation upon which the rest of the Vector Store RAG Pipeline is built:

Implementation:Microsoft_Semantic_kernel_VectorStore_Attributes

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment