Implementation:Langgenius Dify FetchIndexingEstimate

Knowledge Sources	Dify
Domains	RAG Embeddings Vector Indexing Async Processing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tools for estimating indexing costs and tracking indexing progress provided by the Dify frontend service layer.

Description

This implementation documents two closely related service functions that support the embedding and indexing execution phase of the knowledge base creation workflow:

fetchIndexingEstimate -- retrieves a pre-processing estimate of the token count, segment count, and cost for indexing a document. This is called before indexing begins, giving the user a chance to review the expected resource consumption.
fetchIndexingStatus -- polls the current indexing progress for a document, returning the status, completed segment count, and total segment count. This is called repeatedly while indexing is in progress to update the UI's progress indicator.

Together, these two functions bracket the indexing execution: the estimate function provides a preview of what will happen, and the status function provides real-time feedback on what is happening.

Usage

Import and call these functions when:

Before indexing -- call fetchIndexingEstimate to display a cost/token estimate dialog so the user can confirm before committing resources.
During indexing -- poll fetchIndexingStatus at regular intervals to drive a progress bar or status indicator.
After configuration changes -- call fetchIndexingEstimate again when the user modifies processing rules to show the updated estimate.

Code Reference

Source Location

Repository: Dify
File: web/service/datasets.ts

Signatures

fetchIndexingEstimate (lines 141--143)

export const fetchIndexingEstimate = (
  { datasetId, documentId }: CommonDocReq,
): Promise<IndexingEstimateResponse> => {
  return get<IndexingEstimateResponse>(
    `/datasets/${datasetId}/documents/${documentId}/indexing-estimate`,
    {},
  )
}

fetchIndexingStatus (lines 148--150)

export const fetchIndexingStatus = (
  { datasetId, documentId }: CommonDocReq,
): Promise<IndexingStatusResponse> => {
  return get<IndexingStatusResponse>(
    `/datasets/${datasetId}/documents/${documentId}/indexing-status`,
    {},
  )
}

Import

import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'

Dependencies

Dependency	Purpose
`@/service/base` (get)	Provides the typed HTTP GET helper that handles authentication headers, URL interpolation, and response deserialization

I/O Contract

Inputs (shared by both functions)

Name	Type	Required	Description
datasetId	`string`	Yes	The unique identifier of the dataset (knowledge base) containing the document
documentId	`string`	Yes	The unique identifier of the document to estimate or track

Outputs: fetchIndexingEstimate

Name	Type	Description
tokens	`number`	The estimated total number of tokens across all segments of the document
total_price	`number`	The estimated cost of embedding all tokens using the configured embedding model (in the billing currency)
total_segments	`number`	The estimated number of segments the document will be split into based on the current processing rule
currency	`string`	The currency unit for total_price (e.g., `'USD'`)
qa_preview	`object`	(Optional) Preview of QA-mode extraction if QA segmentation is enabled

Outputs: fetchIndexingStatus

Name	Type	Description
indexing_status	'parsing' \| 'cleaning' \| 'splitting' \| 'indexing' \| 'completed' \| 'error' \| 'paused'	The current phase of the indexing pipeline
completed_segments	`number`	The number of segments that have been successfully embedded and indexed so far
total_segments	`number`	The total number of segments to be processed
processing_started_at	`number`	Unix timestamp when indexing began
error	null	Error message if indexing_status is `'error'`; `null` otherwise

Usage Examples

Displaying a Cost Estimate

import { fetchIndexingEstimate } from '@/service/datasets'

const showEstimate = async (datasetId: string, documentId: string) => {
  const estimate = await fetchIndexingEstimate({ datasetId, documentId })

  console.log(`Segments: ${estimate.total_segments}`)
  console.log(`Tokens: ${estimate.tokens}`)
  console.log(`Cost: ${estimate.total_price} ${estimate.currency}`)

  // Display confirmation dialog
  const confirmed = await confirmDialog({
    title: 'Indexing Estimate',
    message: `This document will be split into ${estimate.total_segments} segments `
      + `(${estimate.tokens} tokens) at an estimated cost of `
      + `${estimate.total_price} ${estimate.currency}.`,
  })

  if (confirmed) {
    // Proceed with indexing
    await startIndexing({ datasetId, documentId })
  }
}

Polling Indexing Progress

import { fetchIndexingStatus } from '@/service/datasets'

const pollIndexingProgress = (datasetId: string, documentId: string) => {
  const intervalId = setInterval(async () => {
    const status = await fetchIndexingStatus({ datasetId, documentId })

    const progress = status.total_segments > 0
      ? status.completed_segments / status.total_segments
      : 0

    updateProgressBar(progress)

    if (status.indexing_status === 'completed') {
      clearInterval(intervalId)
      showSuccess('Indexing complete!')
    }

    if (status.indexing_status === 'error') {
      clearInterval(intervalId)
      showError(`Indexing failed: ${status.error}`)
    }
  }, 2000) // Poll every 2 seconds

  return () => clearInterval(intervalId) // Cleanup function
}

Combined Estimate and Progress Workflow

import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'

const indexDocument = async (datasetId: string, documentId: string) => {
  // Phase 1: Show estimate
  const estimate = await fetchIndexingEstimate({ datasetId, documentId })
  setEstimate(estimate)

  // Phase 2: User confirms, indexing starts via separate API call
  await createDocument({ datasetId, ...documentConfig })

  // Phase 3: Poll for progress
  const poll = async () => {
    const status = await fetchIndexingStatus({ datasetId, documentId })
    setProgress(status)

    if (status.indexing_status !== 'completed' && status.indexing_status !== 'error') {
      setTimeout(poll, 2000)
    }
  }
  poll()
}

Error Handling

HTTP Status	Cause	Handling
401	Authentication expired	Redirect to login
404	Dataset or document does not exist	Display error; navigate back to dataset list
409	Document is in an incompatible state for estimation	Retry after a short delay
500	Server error during estimation or status check	Display generic error; retry with exponential back-off

Related Pages

Implements Principle

Principle:Langgenius_Dify_Embedding_and_Indexing

Requires Environment

Environment:Langgenius_Dify_Vector_Database_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment