Implementation:Langgenius Dify FetchIndexingEstimate
Appearance
| Knowledge Sources | |
|---|---|
| Domains | RAG Embeddings Vector Indexing Async Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tools for estimating indexing costs and tracking indexing progress provided by the Dify frontend service layer.
Description
This implementation documents two closely related service functions that support the embedding and indexing execution phase of the knowledge base creation workflow:
fetchIndexingEstimate-- retrieves a pre-processing estimate of the token count, segment count, and cost for indexing a document. This is called before indexing begins, giving the user a chance to review the expected resource consumption.fetchIndexingStatus-- polls the current indexing progress for a document, returning the status, completed segment count, and total segment count. This is called repeatedly while indexing is in progress to update the UI's progress indicator.
Together, these two functions bracket the indexing execution: the estimate function provides a preview of what will happen, and the status function provides real-time feedback on what is happening.
Usage
Import and call these functions when:
- Before indexing -- call
fetchIndexingEstimateto display a cost/token estimate dialog so the user can confirm before committing resources. - During indexing -- poll
fetchIndexingStatusat regular intervals to drive a progress bar or status indicator. - After configuration changes -- call
fetchIndexingEstimateagain when the user modifies processing rules to show the updated estimate.
Code Reference
Source Location
- Repository: Dify
- File:
web/service/datasets.ts
Signatures
fetchIndexingEstimate (lines 141--143)
export const fetchIndexingEstimate = (
{ datasetId, documentId }: CommonDocReq,
): Promise<IndexingEstimateResponse> => {
return get<IndexingEstimateResponse>(
`/datasets/${datasetId}/documents/${documentId}/indexing-estimate`,
{},
)
}
fetchIndexingStatus (lines 148--150)
export const fetchIndexingStatus = (
{ datasetId, documentId }: CommonDocReq,
): Promise<IndexingStatusResponse> => {
return get<IndexingStatusResponse>(
`/datasets/${datasetId}/documents/${documentId}/indexing-status`,
{},
)
}
Import
import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'
Dependencies
| Dependency | Purpose |
|---|---|
@/service/base (get) |
Provides the typed HTTP GET helper that handles authentication headers, URL interpolation, and response deserialization |
I/O Contract
| Name | Type | Required | Description |
|---|---|---|---|
| datasetId | string |
Yes | The unique identifier of the dataset (knowledge base) containing the document |
| documentId | string |
Yes | The unique identifier of the document to estimate or track |
Outputs: fetchIndexingEstimate
| Name | Type | Description |
|---|---|---|
| tokens | number |
The estimated total number of tokens across all segments of the document |
| total_price | number |
The estimated cost of embedding all tokens using the configured embedding model (in the billing currency) |
| total_segments | number |
The estimated number of segments the document will be split into based on the current processing rule |
| currency | string |
The currency unit for total_price (e.g., 'USD')
|
| qa_preview | object |
(Optional) Preview of QA-mode extraction if QA segmentation is enabled |
Outputs: fetchIndexingStatus
| Name | Type | Description |
|---|---|---|
| indexing_status | 'parsing' | 'cleaning' | 'splitting' | 'indexing' | 'completed' | 'error' | 'paused' | The current phase of the indexing pipeline |
| completed_segments | number |
The number of segments that have been successfully embedded and indexed so far |
| total_segments | number |
The total number of segments to be processed |
| processing_started_at | number |
Unix timestamp when indexing began |
| error | null | Error message if indexing_status is 'error'; null otherwise
|
Usage Examples
Displaying a Cost Estimate
import { fetchIndexingEstimate } from '@/service/datasets'
const showEstimate = async (datasetId: string, documentId: string) => {
const estimate = await fetchIndexingEstimate({ datasetId, documentId })
console.log(`Segments: ${estimate.total_segments}`)
console.log(`Tokens: ${estimate.tokens}`)
console.log(`Cost: ${estimate.total_price} ${estimate.currency}`)
// Display confirmation dialog
const confirmed = await confirmDialog({
title: 'Indexing Estimate',
message: `This document will be split into ${estimate.total_segments} segments `
+ `(${estimate.tokens} tokens) at an estimated cost of `
+ `${estimate.total_price} ${estimate.currency}.`,
})
if (confirmed) {
// Proceed with indexing
await startIndexing({ datasetId, documentId })
}
}
Polling Indexing Progress
import { fetchIndexingStatus } from '@/service/datasets'
const pollIndexingProgress = (datasetId: string, documentId: string) => {
const intervalId = setInterval(async () => {
const status = await fetchIndexingStatus({ datasetId, documentId })
const progress = status.total_segments > 0
? status.completed_segments / status.total_segments
: 0
updateProgressBar(progress)
if (status.indexing_status === 'completed') {
clearInterval(intervalId)
showSuccess('Indexing complete!')
}
if (status.indexing_status === 'error') {
clearInterval(intervalId)
showError(`Indexing failed: ${status.error}`)
}
}, 2000) // Poll every 2 seconds
return () => clearInterval(intervalId) // Cleanup function
}
Combined Estimate and Progress Workflow
import { fetchIndexingEstimate, fetchIndexingStatus } from '@/service/datasets'
const indexDocument = async (datasetId: string, documentId: string) => {
// Phase 1: Show estimate
const estimate = await fetchIndexingEstimate({ datasetId, documentId })
setEstimate(estimate)
// Phase 2: User confirms, indexing starts via separate API call
await createDocument({ datasetId, ...documentConfig })
// Phase 3: Poll for progress
const poll = async () => {
const status = await fetchIndexingStatus({ datasetId, documentId })
setProgress(status)
if (status.indexing_status !== 'completed' && status.indexing_status !== 'error') {
setTimeout(poll, 2000)
}
}
poll()
}
Error Handling
| HTTP Status | Cause | Handling |
|---|---|---|
| 401 | Authentication expired | Redirect to login |
| 404 | Dataset or document does not exist | Display error; navigate back to dataset list |
| 409 | Document is in an incompatible state for estimation | Retry after a short delay |
| 500 | Server error during estimation or status check | Display generic error; retry with exponential back-off |
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment