Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Infiniflow Ragflow Parser Options Configuration

From Leeroopedia
Revision as of 17:52, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Infiniflow_Ragflow_Parser_Options_Configuration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains RAG, NLP, Document_Processing
Last Updated 2026-02-12 06:00 GMT

Overview

A fine-grained configuration pattern that tunes parser-specific parameters such as chunk size, delimiters, and layout recognition mode.

Description

Parser Options Configuration allows detailed control over how documents are parsed within a chosen chunking method. Key parameters include chunk_token_num (target chunk size in tokens), delimiter (custom text splitting characters), layout_recognize (DeepDOC vs Plain Text mode for PDFs), table_context_size and image_context_size (surrounding context for tables/images), pages (page ranges to process), and task_page_size (pages per worker task). These options are deep-merged with existing configuration using a recursive update (dfs_update).

Usage

Configure after selecting the chunking method and before processing documents. Adjust these parameters when default chunking produces suboptimal results for your document type.

Theoretical Basis

Chunk quality directly impacts retrieval quality. Key trade-offs:

  • Chunk size: Larger chunks retain more context but reduce retrieval precision; smaller chunks improve precision but may lose context
  • Layout recognition: DeepDOC uses YOLO-based layout analysis for PDFs (better for complex layouts), Plain Text is faster for simple documents
  • Delimiters: Custom delimiters allow splitting on domain-specific markers (e.g., section headers, legal article numbers)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment