Principle:Infiniflow Ragflow Parser Options Configuration
| Knowledge Sources | |
|---|---|
| Domains | RAG, NLP, Document_Processing |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
A fine-grained configuration pattern that tunes parser-specific parameters such as chunk size, delimiters, and layout recognition mode.
Description
Parser Options Configuration allows detailed control over how documents are parsed within a chosen chunking method. Key parameters include chunk_token_num (target chunk size in tokens), delimiter (custom text splitting characters), layout_recognize (DeepDOC vs Plain Text mode for PDFs), table_context_size and image_context_size (surrounding context for tables/images), pages (page ranges to process), and task_page_size (pages per worker task). These options are deep-merged with existing configuration using a recursive update (dfs_update).
Usage
Configure after selecting the chunking method and before processing documents. Adjust these parameters when default chunking produces suboptimal results for your document type.
Theoretical Basis
Chunk quality directly impacts retrieval quality. Key trade-offs:
- Chunk size: Larger chunks retain more context but reduce retrieval precision; smaller chunks improve precision but may lose context
- Layout recognition: DeepDOC uses YOLO-based layout analysis for PDFs (better for complex layouts), Plain Text is faster for simple documents
- Delimiters: Custom delimiters allow splitting on domain-specific markers (e.g., section headers, legal article numbers)