Principle:Langgenius Dify Embedding and Indexing

Knowledge Sources	Dify Dify Knowledge Base Docs Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Domains	RAG Embeddings Vector Indexing Async Processing
Last Updated	2026-02-08 00:00 GMT

Overview

Embedding and indexing execution is the process of transforming text chunks into vector representations and constructing searchable index structures, including cost estimation and progress tracking for long-running batch operations.

Description

After documents have been chunked and an indexing method has been selected, the platform must execute the actual embedding and indexing work. This is a computationally expensive, potentially long-running operation that involves:

Token estimation -- before committing resources, the system estimates the total number of tokens across all segments, the cost (if using a paid embedding model), and the expected number of segments. This gives the user a chance to review and confirm before processing begins.
Embedding execution -- each segment is passed through the configured embedding model to produce a dense vector. For large datasets, this is done in batches to manage memory and API rate limits.
Index construction -- the resulting vectors are inserted into a vector database index (or, for economy mode, tokens are inserted into an inverted index).
Progress tracking -- because embedding and indexing can take minutes to hours for large document sets, the system exposes a status endpoint that reports completed segments, total segments, and overall progress.

This entire pipeline typically runs asynchronously (via Celery task queues in Dify's architecture), allowing the user to navigate away and return to check progress.

Usage

Embedding and indexing execution occurs when:

A new knowledge base is created -- after the user confirms processing configuration and indexing method, the platform begins embedding all initial documents.
New documents are added -- appending documents to an existing dataset triggers incremental embedding and indexing for the new segments only.
Documents are re-processed -- if processing configuration changes (e.g., different chunk size), affected documents are re-embedded and re-indexed.
The user requests a cost estimate -- before committing, the estimation endpoint provides token counts and pricing so the user can make an informed decision.

Theoretical Basis

Embedding Pipeline

The embedding pipeline converts text into fixed-dimensional vectors:

Segments[]
  → Batch grouping (e.g., 32 segments per batch)
  → For each batch:
       → Tokenize segments
       → Call embedding model API
       → Receive vector[] (d dimensions each)
       → Store (segment_id, vector) in vector database
  → Update indexing progress

Key parameters:

Parameter	Description
Embedding model	The model used to generate vectors (e.g., text-embedding-ada-002, bge-large-en)
Vector dimensions	The dimensionality of the output vectors (e.g., 1536 for Ada-002, 1024 for BGE)
Batch size	Number of segments processed per API call
Total tokens	Sum of tokens across all segments -- determines cost

Cost Estimation

Before indexing begins, the estimation endpoint calculates:

total_tokens   = sum(token_count(segment) for segment in segments)
total_segments = len(segments)
total_price    = total_tokens * price_per_token(embedding_model)

This allows the user to:

Understand the financial impact before committing
Adjust processing configuration (e.g., increase chunk size to reduce segment count) if the estimate is too high
Compare costs across different embedding models

Progress Tracking

Indexing is an asynchronous operation. The status endpoint reports:

{
  indexing_status: "indexing" | "completed" | "error" | "paused",
  completed_segments: 147,
  total_segments: 500,
  progress: 0.294
}

The frontend polls this endpoint at regular intervals to update a progress bar or status indicator. The polling cadence should balance responsiveness against API load -- a common pattern is to start with short intervals (1--2 seconds) and back off as processing progresses.

Batch Processing Considerations

Rate limiting -- embedding model APIs impose rate limits (tokens per minute, requests per minute). The batch processor must respect these limits and implement exponential back-off on 429 responses.
Partial failure -- if a batch fails, only that batch needs to be retried; previously completed batches are already persisted.
Idempotency -- re-submitting the same segment for embedding should not create duplicate index entries. The system should upsert by segment ID.
Memory management -- for very large datasets, the batch processor should stream segments from the database rather than loading all into memory.

Async Architecture

In Dify, the embedding and indexing pipeline runs as a Celery task:

API Request (create/update documents)
  → Celery task enqueued (Redis broker)
  → Worker picks up task
  → Worker iterates segments, embeds, indexes
  → Worker updates status in database
  → Frontend polls status endpoint

This decouples the user-facing API from the long-running computation, ensuring that HTTP requests return quickly while processing continues in the background.

Related Pages

Implemented By

Implementation:Langgenius_Dify_FetchIndexingEstimate

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment