Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Langgenius Dify Segment Management

From Leeroopedia
Knowledge Sources Domains Last Updated
Dify RAG, Knowledge_Management, Frontend 2026-02-12 00:00 GMT

Overview

Description

Segment and Metadata Management encompasses the operations available for inspecting, modifying, and governing individual chunks (segments) and their associated document metadata after a document has been indexed in Dify. While chunking and indexing are automated during document upload, the resulting segments and metadata often require manual curation to optimize retrieval quality.

This principle covers two complementary areas:

Segment Operations -- Individual chunks can be listed, created, updated, deleted, enabled, or disabled. These operations allow knowledge base administrators to:

  • Fix content errors within a specific chunk without re-indexing the entire document.
  • Add supplemental segments that fill knowledge gaps.
  • Disable low-quality or irrelevant segments to exclude them from retrieval without permanently deleting them.
  • Manage child chunks within hierarchical (parent-child) segmentation.

Document Metadata Management -- Each document carries structured metadata (document type and key-value metadata pairs) that aids in organization, filtering, and contextual retrieval. The metadata schema supports standard document types (book, web_page, paper, social_media_post, personal_document, business_document, im_chat_log) as well as system-assigned types (synced_from_github, synced_from_notion, wikipedia_entry).

Usage

  • Segment browsing -- List all segments for a document with pagination, keyword filtering, and enabled/disabled filtering using useSegmentList.
  • Content correction -- Update a segment's content, answer (for Q&A mode), summary, or keywords using useUpdateSegment.
  • Knowledge augmentation -- Add new segments to fill gaps with useAddSegment.
  • Quality control -- Enable or disable segments in bulk using useEnableSegment and useDisableSegment.
  • Segment removal -- Permanently delete segments with useDeleteSegment.
  • Child chunk management -- In hierarchical mode, manage sub-chunks via useChildSegmentList, useAddChildSegment, useUpdateChildSegment, and useDeleteChildSegment.
  • Document classification -- Set or update a document's type and metadata fields using modifyDocMetadata.

Theoretical Basis

  • Post-Indexing Curation -- Automated chunking produces segments of variable quality. Segment management provides a human-in-the-loop feedback mechanism, allowing domain experts to refine the knowledge base after initial ingestion. This is essential for production RAG systems where retrieval precision directly impacts answer quality.
  • Soft Deletion via Enable/Disable -- Rather than requiring permanent deletion, the enable/disable toggle implements a soft exclusion pattern. Disabled segments remain in the system for audit and potential re-enablement, reducing the risk of accidental data loss.
  • React Query Mutation Pattern -- All segment write operations are implemented as TanStack React Query mutations (useMutation), providing automatic cache invalidation, optimistic updates, and error handling. Read operations use useQuery with structured query keys for cache management.
  • Structured Metadata Schema -- Document metadata follows a semi-structured pattern: a fixed doc_type field selects the document category, while doc_metadata is a flexible key-value record. This balances the need for consistent categorization with the flexibility to store domain-specific attributes.
  • Hierarchical Segment Model -- The segment management API supports a two-level hierarchy (parent segments and child chunks), mirroring the parent-child chunking strategy. This allows granular management at both levels of the hierarchy.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment