Implementation:Vespa engine Vespa IndexingProcessor Process
| Knowledge Sources | |
|---|---|
| Domains | Document_Processing, Indexing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for receiving and dispatching document operations in the indexing pipeline, provided by Vespa's document processing framework.
Description
IndexingProcessor is a DocumentProcessor subclass that serves as the primary entry point for document indexing in Vespa. It receives a Processing object containing a batch of document operations, iterates over them, and dispatches each operation to the appropriate handler based on its type (put, update, or remove).
The class is annotated with lifecycle markers:
@Provides("indexedDocument")declares that this processor produces indexed documents.@Before("indexingEnd")indicates it runs before the indexing end phase.@After("indexingStart")indicates it runs after the indexing start phase.
The processor maintains references to a DocumentTypeManager for type resolution, a ScriptManager for looking up indexing scripts, and a FieldValuesFactory for creating field value containers during expression execution.
The constructor accepts dependency-injected components including linguistics processors, chunkers, embedders, and field generators, which are used to configure the ScriptManager that drives the indexing expressions.
Usage
This processor is instantiated automatically by Vespa's dependency injection framework and inserted into the document processing chain. It should not typically be instantiated manually. It is invoked for every document operation that passes through the indexing pipeline.
Use this implementation when:
- You need to understand how Vespa routes document operations to their indexing scripts.
- You are debugging why a particular document type is not being processed correctly.
- You want to understand the error handling semantics of the indexing pipeline.
- You are extending the document processing chain and need to understand the lifecycle annotations.
Code Reference
Source Location
- Repository: Vespa
- File:
docprocs/src/main/java/com/yahoo/docprocs/indexing/IndexingProcessor.java - Lines: 92-139
Signature
@Provides(IndexingProcessor.PROVIDED_NAME)
@Before(IndexingProcessor.INDEXING_END)
@After(IndexingProcessor.INDEXING_START)
public class IndexingProcessor extends DocumentProcessor {
public static final String PROVIDED_NAME = "indexedDocument";
public static final String INDEXING_START = "indexingStart";
public static final String INDEXING_END = "indexingEnd";
@Override
public Progress process(Processing proc);
}
Import
import com.yahoo.docprocs.indexing.IndexingProcessor;
Constructor
@Inject
public IndexingProcessor(DocumentTypeManager documentTypeManager,
IlscriptsConfig ilscriptsConfig,
Linguistics linguistics,
ComponentRegistry<Chunker> chunkers,
ComponentRegistry<Embedder> embedders,
ComponentRegistry<FieldGenerator> generators)
Full Method Body
@Override
public Progress process(Processing proc) {
if (proc.getDocumentOperations().isEmpty()) return Progress.DONE;
Instant deadline = null;
var timeLeft = proc.timeLeft();
if (timeLeft != Processing.NO_TIMEOUT) {
deadline = Instant.now().plus(timeLeft);
}
List<DocumentOperation> out = new ArrayList<>(proc.getDocumentOperations().size());
for (var op : proc.getDocumentOperations()) {
try {
if (op instanceof DocumentPut dp) {
processDocument(dp, out, deadline);
} else if (op instanceof DocumentUpdate du) {
processUpdate(du, out, deadline);
} else if (op instanceof DocumentRemove dr) {
processRemove(dr, out);
} else if (op != null) {
throw new IllegalArgumentException(
"Document class " + op.getClass().getName() + " not supported.");
} else {
throw new IllegalArgumentException("Expected document, got null.");
}
} catch (InvalidInputException e) {
return Progress.INVALID_INPUT.withReason(
"Document '" + op.getId() + "': " + e.getMessage());
} catch (OverloadException e) {
return Progress.OVERLOAD.withReason(
"Document '" + op.getId() + "': " + e.getMessage());
} catch (TimeoutException e) {
return Progress.TIMEOUT.withReason(
"Document '" + op.getId() + "': " + e.getMessage());
}
}
proc.getDocumentOperations().clear();
proc.getDocumentOperations().addAll(out);
return Progress.DONE;
}
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| proc | Processing |
Yes | The processing context containing a list of document operations to be indexed. Also provides timeout information via proc.timeLeft().
|
Outputs
| Name | Type | Description |
|---|---|---|
| return value | Progress |
One of Progress.DONE, Progress.INVALID_INPUT, Progress.OVERLOAD, or Progress.TIMEOUT, indicating the outcome of the processing attempt.
|
| proc (mutated) | Processing |
On success, the document operations list within proc is replaced with the processed output operations.
|
Key Fields
| Field | Type | Description |
|---|---|---|
| documentTypeManager | DocumentTypeManager |
Manages document type definitions and provides type resolution for incoming documents. |
| scriptManager | ScriptManager |
Resolves document types to their corresponding indexing scripts. |
| fieldValuesFactory | FieldValuesFactory |
Factory for creating field value containers used during indexing expression execution. |
Usage Examples
// The IndexingProcessor is typically used within Vespa's document processing chain.
// It is instantiated via dependency injection and invoked automatically.
// Example of how the processor fits into the chain:
Processing processing = new Processing();
processing.getDocumentOperations().add(new DocumentPut(document));
IndexingProcessor processor = new IndexingProcessor(
documentTypeManager,
ilscriptsConfig,
linguistics,
chunkers,
embedders,
generators
);
Progress result = processor.process(processing);
if (result == Progress.DONE) {
// Document operations in processing have been replaced with indexed versions
List<DocumentOperation> indexedOps = processing.getDocumentOperations();
} else if (result.getReason() != null) {
// Handle error: result contains the reason string
log.warning("Processing failed: " + result.getReason());
}