Implementation:Infiniflow Ragflow Metadata Utils
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Search |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
Concrete tool for metadata filtering, transformation, and JSON schema generation used in document retrieval provided by the RAGFlow common library.
Description
The metadata_utils module provides functions for filtering document metadata against complex conditions (contains, in, starts with, equality, comparison operators), applying metadata filters with auto/semi-auto/manual resolution modes, deduplicating lists, updating metadata dictionaries, and generating JSON schemas from metadata structures.
Usage
Import these utilities when implementing metadata-based document filtering in retrieval pipelines, when building search queries that incorporate metadata constraints, or when converting metadata definitions to JSON schema for API validation.
Code Reference
Source Location
- Repository: Infiniflow_Ragflow
- File: common/metadata_utils.py
- Lines: 1-344
Signature
def meta_filter(metas: dict, filters: list, logic: str = "and") -> bool:
"""Filter metadata dict against a list of conditions with AND/OR logic."""
async def apply_meta_data_filter(
meta_data_filter: dict,
metas: dict,
question: str,
chat_mdl=None,
base_doc_ids: list = None,
manual_value_resolver=None,
) -> list:
"""Apply metadata filters with auto/semi_auto/manual resolution modes."""
def metadata_schema(metadata: dict) -> dict:
"""Generate JSON schema from metadata structure."""
def turn2jsonschema(obj) -> dict:
"""Convert metadata object to JSON schema format."""
Import
from common.metadata_utils import meta_filter, apply_meta_data_filter, metadata_schema
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| metas | dict | Yes | Document metadata dictionary to filter |
| filters | list | Yes | List of filter conditions with field, operator, value |
| logic | str | No | Logical combinator: "and" or "or" (default: "and") |
| question | str | Yes | User query for auto-resolution mode |
| chat_mdl | object | No | Chat model for auto-resolving filter values |
Outputs
| Name | Type | Description |
|---|---|---|
| meta_filter() returns | bool | Whether metadata passes all filter conditions |
| apply_meta_data_filter() returns | list | List of matching document IDs |
| metadata_schema() returns | dict | JSON schema representation |
Usage Examples
from common.metadata_utils import meta_filter, metadata_schema
# Filter documents by metadata
doc_meta = {"author": "John", "year": 2024, "tags": ["AI", "RAG"]}
filters = [
{"field": "author", "operator": "=", "value": "John"},
{"field": "year", "operator": ">", "value": 2020},
]
matches = meta_filter(doc_meta, filters, logic="and")
# Returns True
# Generate JSON schema
schema = metadata_schema({"title": "string", "pages": "integer"})