Principle:Langgenius Dify Indexing Progress Monitoring

Knowledge Sources	Domains	Last Updated
Dify	RAG, Knowledge_Management, Frontend	2026-02-12 00:00 GMT

Overview

Description

Indexing Progress Monitoring provides real-time visibility into the asynchronous document processing pipeline within Dify. After a document is uploaded, it passes through a multi-stage pipeline before becoming available for retrieval. Each stage represents a distinct transformation:

waiting -- The document is queued for processing.
parsing -- Raw content is extracted from the source file (PDF, DOCX, etc.).
cleaning -- Pre-processing rules are applied (removing extra whitespace, URLs, etc.).
splitting -- The cleaned text is segmented into chunks according to the configured process rules.
indexing -- Chunks are embedded and written to the vector store.
completed -- All chunks are indexed and the document is available for retrieval.

Two exceptional states can also occur:

error -- An unrecoverable failure occurred during any pipeline stage.
paused -- The user or system explicitly paused processing.

Monitoring is available at two granularities: per-document (tracking a single document through its pipeline) and per-batch (tracking all documents submitted in a single upload operation).

Usage

Progress indicators -- The UI polls the indexing status endpoint to display a progress bar showing completed segments vs. total segments.
Stage-level timestamps -- Each stage records a completion timestamp (parsing_completed_at, cleaning_completed_at, splitting_completed_at, completed_at), enabling performance analysis and bottleneck detection.
Error handling -- When the status transitions to error, the error field contains diagnostic information that can be surfaced to the user.
Pause/resume control -- Users can pause and resume indexing using companion endpoints (pauseDocIndexing, resumeDocIndexing), and the status endpoint reflects these state transitions in real time.
Batch monitoring -- When multiple documents are uploaded together, fetchIndexingStatusBatch retrieves the status of all documents in the batch with a single request.

Theoretical Basis

Finite State Machine -- The indexing pipeline is modeled as a deterministic state machine with a linear progression from waiting through completed, plus two deviation states (error and paused). This makes status transitions predictable and easy to reason about in UI logic.
Polling-Based Observability -- Since document processing is asynchronous (executed via Celery workers with Redis as the broker), the frontend cannot rely on synchronous responses. Instead, it adopts a polling pattern, periodically fetching the status until a terminal state (completed, error) is reached.
Segment-Level Granularity -- The completed_segments and total_segments fields provide fractional progress information within the indexing stage itself, enabling smooth progress bar updates rather than coarse stage-level jumps.
Batch Abstraction -- Grouping documents by batch ID decouples the upload action from individual document tracking, allowing the system to efficiently report on bulk operations without requiring per-document polling loops.

Related Pages

Implementation:Langgenius_Dify_FetchIndexingStatus

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment