Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Microsoft Semantic kernel Metadata Filtering

From Leeroopedia

Overview

The Metadata Filtering principle describes how vector similarity search results can be pre-filtered using predicates on indexed data fields. While vector search finds semantically similar records based on embedding proximity, metadata filtering adds a structured query layer that narrows results based on exact field values, ranges, or other criteria.

This combination of semantic search (vector similarity) and structured filtering (metadata predicates) is a powerful hybrid approach that delivers both relevance and precision.

Motivation

Pure vector similarity search has an inherent limitation: it ranks results solely by geometric proximity in embedding space, with no regard for structured attributes. Consider a glossary with entries from multiple domains:

  • A search for "neural network" returns all semantically related entries — from AI, neuroscience, telecommunications, and other categories
  • A user working in the AI domain only wants AI-related results
  • Without metadata filtering, irrelevant results from other domains clutter the output

Metadata filtering solves this by applying a structured predicate before or during the vector search, so only records matching the filter are considered for similarity ranking.

Core Concepts

Indexed vs Non-Indexed Fields

Not all data fields can be used for filtering. The distinction is critical:

  • Indexed fields ([VectorStoreData(IsIndexed = true)]): These fields have backend indexes that support efficient filtering. Only indexed fields can appear in filter expressions.
  • Non-indexed fields ([VectorStoreData]): These fields are stored and returned in results but cannot be used in filter predicates. They are "display only."

The decision to index a field should be made at data model design time, because adding indexes later may require collection recreation.

Filter as a Lambda Expression

Semantic Kernel uses C# lambda expressions (specifically Expression<Func<TRecord, bool>>) to define filter predicates. This approach:

  • Leverages the C# type system for compile-time validation
  • Uses familiar LINQ-style syntax that .NET developers already know
  • Enables the framework to inspect and translate the expression tree into backend-specific filter syntax

Pre-Filter vs Post-Filter

Metadata filtering in Semantic Kernel operates as a pre-filter:

  • The filter is applied before or during the vector similarity calculation
  • Only records that pass the filter are considered as candidates for similarity ranking
  • The top parameter applies to the filtered result set

This is important because it means:

  • A search with top: 5 and a category filter returns the 5 most similar records within that category
  • The filter does not reduce a larger result set after the fact — it constrains the search space itself

Supported Filter Operations

The lambda expression syntax supports common comparison operations on indexed fields:

Operation Example Description
Equality g => g.Category == "AI" Exact match on a string field
Inequality g => g.Category != "Legacy" Exclude a specific value
Comparison g => g.Year >= 2020 Numeric or date range filtering
Logical AND g => g.Category == "AI" && g.Year >= 2020 Multiple conditions (all must be true)
Logical OR g => g.Category == "AI" g.Category == "ML" Alternative conditions (any can be true)

The exact operations supported depend on the vector store backend. The in-memory store supports all standard C# comparison operators. Remote backends may have limitations on complex expressions.

Design Principles

Declarative Indexing

The decision about which fields support filtering is made declaratively in the data model, not imperatively at query time. By marking IsIndexed = true at the attribute level, the developer signals both:

  • To the vector store connector: "create an index for this field"
  • To future query authors: "this field is available for filtering"

Type-Safe Filters

Using Expression<Func<TRecord, bool>> means the compiler validates:

  • That the referenced properties exist on the record type
  • That the comparison types are compatible (no comparing strings to integers)
  • That the expression is syntactically valid

This catches filter errors at compile time rather than at runtime.

Backend Translation

The lambda expression is an expression tree that Semantic Kernel translates into the appropriate filter syntax for each backend:

  • In-memory: Direct C# predicate evaluation
  • Azure AI Search: OData filter string
  • Qdrant: Qdrant filter JSON
  • Pinecone: Pinecone metadata filter

The developer writes one expression, and each connector translates it appropriately.

When to Use Metadata Filtering

Metadata filtering is most valuable when:

  • Records belong to distinct categories or domains and the user's query is domain-specific
  • Records have temporal attributes and only recent entries are relevant
  • Records have access control attributes and filtering enforces visibility rules
  • The application needs to scope search to a specific subset of the collection

Relationship to Other Principles

Implementation:Microsoft_Semantic_kernel_VectorSearchOptions_Filter

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment