Implementation:Vespa engine Vespa IndexingProcessor Process

Knowledge Sources	Vespa
Domains	Document_Processing, Indexing
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for receiving and dispatching document operations in the indexing pipeline, provided by Vespa's document processing framework.

Description

IndexingProcessor is a DocumentProcessor subclass that serves as the primary entry point for document indexing in Vespa. It receives a Processing object containing a batch of document operations, iterates over them, and dispatches each operation to the appropriate handler based on its type (put, update, or remove).

The class is annotated with lifecycle markers:

@Provides("indexedDocument") declares that this processor produces indexed documents.
@Before("indexingEnd") indicates it runs before the indexing end phase.
@After("indexingStart") indicates it runs after the indexing start phase.

The processor maintains references to a DocumentTypeManager for type resolution, a ScriptManager for looking up indexing scripts, and a FieldValuesFactory for creating field value containers during expression execution.

The constructor accepts dependency-injected components including linguistics processors, chunkers, embedders, and field generators, which are used to configure the ScriptManager that drives the indexing expressions.

Usage

This processor is instantiated automatically by Vespa's dependency injection framework and inserted into the document processing chain. It should not typically be instantiated manually. It is invoked for every document operation that passes through the indexing pipeline.

Use this implementation when:

You need to understand how Vespa routes document operations to their indexing scripts.
You are debugging why a particular document type is not being processed correctly.
You want to understand the error handling semantics of the indexing pipeline.
You are extending the document processing chain and need to understand the lifecycle annotations.

Code Reference

Source Location

Repository: Vespa
File: docprocs/src/main/java/com/yahoo/docprocs/indexing/IndexingProcessor.java
Lines: 92-139

Signature

@Provides(IndexingProcessor.PROVIDED_NAME)
@Before(IndexingProcessor.INDEXING_END)
@After(IndexingProcessor.INDEXING_START)
public class IndexingProcessor extends DocumentProcessor {

    public static final String PROVIDED_NAME = "indexedDocument";
    public static final String INDEXING_START = "indexingStart";
    public static final String INDEXING_END = "indexingEnd";

    @Override
    public Progress process(Processing proc);
}

Import

import com.yahoo.docprocs.indexing.IndexingProcessor;

Constructor

@Inject
public IndexingProcessor(DocumentTypeManager documentTypeManager,
                         IlscriptsConfig ilscriptsConfig,
                         Linguistics linguistics,
                         ComponentRegistry<Chunker> chunkers,
                         ComponentRegistry<Embedder> embedders,
                         ComponentRegistry<FieldGenerator> generators)

Full Method Body

@Override
public Progress process(Processing proc) {
    if (proc.getDocumentOperations().isEmpty()) return Progress.DONE;
    Instant deadline = null;
    var timeLeft = proc.timeLeft();
    if (timeLeft != Processing.NO_TIMEOUT) {
        deadline = Instant.now().plus(timeLeft);
    }
    List<DocumentOperation> out = new ArrayList<>(proc.getDocumentOperations().size());
    for (var op : proc.getDocumentOperations()) {
        try {
            if (op instanceof DocumentPut dp) {
                processDocument(dp, out, deadline);
            } else if (op instanceof DocumentUpdate du) {
                processUpdate(du, out, deadline);
            } else if (op instanceof DocumentRemove dr) {
                processRemove(dr, out);
            } else if (op != null) {
                throw new IllegalArgumentException(
                    "Document class " + op.getClass().getName() + " not supported.");
            } else {
                throw new IllegalArgumentException("Expected document, got null.");
            }
        } catch (InvalidInputException e) {
            return Progress.INVALID_INPUT.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        } catch (OverloadException e) {
            return Progress.OVERLOAD.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        } catch (TimeoutException e) {
            return Progress.TIMEOUT.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        }
    }
    proc.getDocumentOperations().clear();
    proc.getDocumentOperations().addAll(out);
    return Progress.DONE;
}

I/O Contract

Inputs

Name	Type	Required	Description
proc	`Processing`	Yes	The processing context containing a list of document operations to be indexed. Also provides timeout information via `proc.timeLeft()`.

Outputs

Name	Type	Description
return value	`Progress`	One of `Progress.DONE`, `Progress.INVALID_INPUT`, `Progress.OVERLOAD`, or `Progress.TIMEOUT`, indicating the outcome of the processing attempt.
proc (mutated)	`Processing`	On success, the document operations list within `proc` is replaced with the processed output operations.

Key Fields

Field	Type	Description
documentTypeManager	`DocumentTypeManager`	Manages document type definitions and provides type resolution for incoming documents.
scriptManager	`ScriptManager`	Resolves document types to their corresponding indexing scripts.
fieldValuesFactory	`FieldValuesFactory`	Factory for creating field value containers used during indexing expression execution.

Usage Examples

// The IndexingProcessor is typically used within Vespa's document processing chain.
// It is instantiated via dependency injection and invoked automatically.

// Example of how the processor fits into the chain:
Processing processing = new Processing();
processing.getDocumentOperations().add(new DocumentPut(document));

IndexingProcessor processor = new IndexingProcessor(
    documentTypeManager,
    ilscriptsConfig,
    linguistics,
    chunkers,
    embedders,
    generators
);

Progress result = processor.process(processing);

if (result == Progress.DONE) {
    // Document operations in processing have been replaced with indexed versions
    List<DocumentOperation> indexedOps = processing.getDocumentOperations();
} else if (result.getReason() != null) {
    // Handle error: result contains the reason string
    log.warning("Processing failed: " + result.getReason());
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment