Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Langgenius Dify FetchIndexingEstimate

From Leeroopedia


Knowledge Sources
Domains RAG Embeddings Vector Indexing Async Processing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tools for estimating indexing costs and tracking indexing progress provided by the Dify frontend service layer.

Description

This implementation documents two closely related service functions that support the embedding and indexing execution phase of the knowledge base creation workflow:

  1. fetchIndexingEstimate -- retrieves a pre-processing estimate of the token count, segment count, and cost for indexing a document. This is called before indexing begins, giving the user a chance to review the expected resource consumption.
  2. fetchIndexingStatus -- polls the current indexing progress for a document, returning the status, completed segment count, and total segment count. This is called repeatedly while indexing is in progress to update the UI's progress indicator.

Together, these two functions bracket the indexing execution: the estimate function provides a preview of what will happen, and the status function provides real-time feedback on what is happening.

Usage

Import and call these functions when:

  • Before indexing -- call fetchIndexingEstimate to display a cost/token estimate dialog so the user can confirm before committing resources.
  • During indexing -- poll fetchIndexingStatus at regular intervals to drive a progress bar or status indicator.
  • After configuration changes -- call fetchIndexingEstimate again when the user modifies processing rules to show the updated estimate.

Code Reference

Source Location

  • Repository: Dify
  • File: web/service/datasets.ts

Signatures

fetchIndexingEstimate (lines 141--143)

export const fetchIndexingEstimate = (
  { datasetId, documentId }: CommonDocReq,
): Promise<IndexingEstimateResponse> => {
  return get<IndexingEstimateResponse>(
    `/datasets/${datasetId}/documents/${documentId}/indexing-estimate`,
    {},
  )
}

fetchIndexingStatus (lines 148--150)

export const fetchIndexingStatus = (
  { datasetId, documentId }: CommonDocReq,
): Promise<IndexingStatusResponse> => {
  return get<IndexingStatusResponse>(
    `/datasets/${datasetId}/documents/${documentId}/indexing-status`,
    {},
  )
}

Import

import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'

Dependencies

Dependency Purpose
@/service/base (get) Provides the typed HTTP GET helper that handles authentication headers, URL interpolation, and response deserialization

I/O Contract

Inputs (shared by both functions)

Name Type Required Description
datasetId string Yes The unique identifier of the dataset (knowledge base) containing the document
documentId string Yes The unique identifier of the document to estimate or track

Outputs: fetchIndexingEstimate

Name Type Description
tokens number The estimated total number of tokens across all segments of the document
total_price number The estimated cost of embedding all tokens using the configured embedding model (in the billing currency)
total_segments number The estimated number of segments the document will be split into based on the current processing rule
currency string The currency unit for total_price (e.g., 'USD')
qa_preview object (Optional) Preview of QA-mode extraction if QA segmentation is enabled

Outputs: fetchIndexingStatus

Name Type Description
indexing_status 'parsing' | 'cleaning' | 'splitting' | 'indexing' | 'completed' | 'error' | 'paused' The current phase of the indexing pipeline
completed_segments number The number of segments that have been successfully embedded and indexed so far
total_segments number The total number of segments to be processed
processing_started_at number Unix timestamp when indexing began
error null Error message if indexing_status is 'error'; null otherwise

Usage Examples

Displaying a Cost Estimate

import { fetchIndexingEstimate } from '@/service/datasets'

const showEstimate = async (datasetId: string, documentId: string) => {
  const estimate = await fetchIndexingEstimate({ datasetId, documentId })

  console.log(`Segments: ${estimate.total_segments}`)
  console.log(`Tokens: ${estimate.tokens}`)
  console.log(`Cost: ${estimate.total_price} ${estimate.currency}`)

  // Display confirmation dialog
  const confirmed = await confirmDialog({
    title: 'Indexing Estimate',
    message: `This document will be split into ${estimate.total_segments} segments `
      + `(${estimate.tokens} tokens) at an estimated cost of `
      + `${estimate.total_price} ${estimate.currency}.`,
  })

  if (confirmed) {
    // Proceed with indexing
    await startIndexing({ datasetId, documentId })
  }
}

Polling Indexing Progress

import { fetchIndexingStatus } from '@/service/datasets'

const pollIndexingProgress = (datasetId: string, documentId: string) => {
  const intervalId = setInterval(async () => {
    const status = await fetchIndexingStatus({ datasetId, documentId })

    const progress = status.total_segments > 0
      ? status.completed_segments / status.total_segments
      : 0

    updateProgressBar(progress)

    if (status.indexing_status === 'completed') {
      clearInterval(intervalId)
      showSuccess('Indexing complete!')
    }

    if (status.indexing_status === 'error') {
      clearInterval(intervalId)
      showError(`Indexing failed: ${status.error}`)
    }
  }, 2000) // Poll every 2 seconds

  return () => clearInterval(intervalId) // Cleanup function
}

Combined Estimate and Progress Workflow

import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'

const indexDocument = async (datasetId: string, documentId: string) => {
  // Phase 1: Show estimate
  const estimate = await fetchIndexingEstimate({ datasetId, documentId })
  setEstimate(estimate)

  // Phase 2: User confirms, indexing starts via separate API call
  await createDocument({ datasetId, ...documentConfig })

  // Phase 3: Poll for progress
  const poll = async () => {
    const status = await fetchIndexingStatus({ datasetId, documentId })
    setProgress(status)

    if (status.indexing_status !== 'completed' && status.indexing_status !== 'error') {
      setTimeout(poll, 2000)
    }
  }
  poll()
}

Error Handling

HTTP Status Cause Handling
401 Authentication expired Redirect to login
404 Dataset or document does not exist Display error; navigate back to dataset list
409 Document is in an incompatible state for estimation Retry after a short delay
500 Server error during estimation or status check Display generic error; retry with exponential back-off

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment