Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vespa engine Vespa IndexingProcessor Process

From Leeroopedia


Knowledge Sources
Domains Document_Processing, Indexing
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for receiving and dispatching document operations in the indexing pipeline, provided by Vespa's document processing framework.

Description

IndexingProcessor is a DocumentProcessor subclass that serves as the primary entry point for document indexing in Vespa. It receives a Processing object containing a batch of document operations, iterates over them, and dispatches each operation to the appropriate handler based on its type (put, update, or remove).

The class is annotated with lifecycle markers:

  • @Provides("indexedDocument") declares that this processor produces indexed documents.
  • @Before("indexingEnd") indicates it runs before the indexing end phase.
  • @After("indexingStart") indicates it runs after the indexing start phase.

The processor maintains references to a DocumentTypeManager for type resolution, a ScriptManager for looking up indexing scripts, and a FieldValuesFactory for creating field value containers during expression execution.

The constructor accepts dependency-injected components including linguistics processors, chunkers, embedders, and field generators, which are used to configure the ScriptManager that drives the indexing expressions.

Usage

This processor is instantiated automatically by Vespa's dependency injection framework and inserted into the document processing chain. It should not typically be instantiated manually. It is invoked for every document operation that passes through the indexing pipeline.

Use this implementation when:

  • You need to understand how Vespa routes document operations to their indexing scripts.
  • You are debugging why a particular document type is not being processed correctly.
  • You want to understand the error handling semantics of the indexing pipeline.
  • You are extending the document processing chain and need to understand the lifecycle annotations.

Code Reference

Source Location

  • Repository: Vespa
  • File: docprocs/src/main/java/com/yahoo/docprocs/indexing/IndexingProcessor.java
  • Lines: 92-139

Signature

@Provides(IndexingProcessor.PROVIDED_NAME)
@Before(IndexingProcessor.INDEXING_END)
@After(IndexingProcessor.INDEXING_START)
public class IndexingProcessor extends DocumentProcessor {

    public static final String PROVIDED_NAME = "indexedDocument";
    public static final String INDEXING_START = "indexingStart";
    public static final String INDEXING_END = "indexingEnd";

    @Override
    public Progress process(Processing proc);
}

Import

import com.yahoo.docprocs.indexing.IndexingProcessor;

Constructor

@Inject
public IndexingProcessor(DocumentTypeManager documentTypeManager,
                         IlscriptsConfig ilscriptsConfig,
                         Linguistics linguistics,
                         ComponentRegistry<Chunker> chunkers,
                         ComponentRegistry<Embedder> embedders,
                         ComponentRegistry<FieldGenerator> generators)

Full Method Body

@Override
public Progress process(Processing proc) {
    if (proc.getDocumentOperations().isEmpty()) return Progress.DONE;
    Instant deadline = null;
    var timeLeft = proc.timeLeft();
    if (timeLeft != Processing.NO_TIMEOUT) {
        deadline = Instant.now().plus(timeLeft);
    }
    List<DocumentOperation> out = new ArrayList<>(proc.getDocumentOperations().size());
    for (var op : proc.getDocumentOperations()) {
        try {
            if (op instanceof DocumentPut dp) {
                processDocument(dp, out, deadline);
            } else if (op instanceof DocumentUpdate du) {
                processUpdate(du, out, deadline);
            } else if (op instanceof DocumentRemove dr) {
                processRemove(dr, out);
            } else if (op != null) {
                throw new IllegalArgumentException(
                    "Document class " + op.getClass().getName() + " not supported.");
            } else {
                throw new IllegalArgumentException("Expected document, got null.");
            }
        } catch (InvalidInputException e) {
            return Progress.INVALID_INPUT.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        } catch (OverloadException e) {
            return Progress.OVERLOAD.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        } catch (TimeoutException e) {
            return Progress.TIMEOUT.withReason(
                "Document '" + op.getId() + "': " + e.getMessage());
        }
    }
    proc.getDocumentOperations().clear();
    proc.getDocumentOperations().addAll(out);
    return Progress.DONE;
}

I/O Contract

Inputs

Name Type Required Description
proc Processing Yes The processing context containing a list of document operations to be indexed. Also provides timeout information via proc.timeLeft().

Outputs

Name Type Description
return value Progress One of Progress.DONE, Progress.INVALID_INPUT, Progress.OVERLOAD, or Progress.TIMEOUT, indicating the outcome of the processing attempt.
proc (mutated) Processing On success, the document operations list within proc is replaced with the processed output operations.

Key Fields

Field Type Description
documentTypeManager DocumentTypeManager Manages document type definitions and provides type resolution for incoming documents.
scriptManager ScriptManager Resolves document types to their corresponding indexing scripts.
fieldValuesFactory FieldValuesFactory Factory for creating field value containers used during indexing expression execution.

Usage Examples

// The IndexingProcessor is typically used within Vespa's document processing chain.
// It is instantiated via dependency injection and invoked automatically.

// Example of how the processor fits into the chain:
Processing processing = new Processing();
processing.getDocumentOperations().add(new DocumentPut(document));

IndexingProcessor processor = new IndexingProcessor(
    documentTypeManager,
    ilscriptsConfig,
    linguistics,
    chunkers,
    embedders,
    generators
);

Progress result = processor.process(processing);

if (result == Progress.DONE) {
    // Document operations in processing have been replaced with indexed versions
    List<DocumentOperation> indexedOps = processing.getDocumentOperations();
} else if (result.getReason() != null) {
    // Handle error: result contains the reason string
    log.warning("Processing failed: " + result.getReason());
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment