Principle:Vespa engine Vespa Indexing Error Handling
| Knowledge Sources | |
|---|---|
| Domains | Document_Processing, Indexing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Indexing error handling is the structured classification of processing failures into distinct categories that enable upstream systems to take appropriate corrective action, such as rejecting, retrying, or backing off.
Description
When a document indexing pipeline processes operations, failures can occur for a variety of reasons. Rather than treating all failures uniformly, a well-designed indexing system classifies errors into categories that carry semantic meaning about the nature of the failure and the appropriate response.
The key insight is that different failure modes require fundamentally different responses from the caller:
Invalid Input
An invalid input error indicates that the document operation itself is malformed or violates the schema constraints. Examples include:
- A document contains fields that are not declared in its type definition.
- A field value does not match the expected data type.
- A required field is missing.
- The document ID is malformed.
Invalid input errors are permanent failures: retrying the same operation will produce the same error. The correct response is to reject the operation and report the error to the document producer so they can fix the input.
Overload
An overload error indicates that the system is temporarily unable to process the operation due to resource constraints. Examples include:
- The embedding generation service is at capacity.
- Memory pressure is preventing allocation of processing buffers.
- A downstream dependency is throttling requests.
Overload errors are transient failures: the operation is valid and would succeed under normal conditions. The correct response is to retry with exponential backoff, reducing the feeding rate to allow the system to recover.
Timeout
A timeout error indicates that the processing could not complete within the allocated time window. Examples include:
- A complex document with many fields exceeds the per-batch processing deadline.
- An external service call (such as embedding generation) takes longer than expected.
- System load causes processing to be slower than anticipated.
Timeout errors are transient failures that may or may not succeed on retry depending on the cause. The correct response is to retry the operation, possibly with a longer timeout or a smaller batch size.
Unhandled Errors
Any error that does not fall into the above categories represents an unexpected failure -- a bug in the processing logic, a corrupted document, or an infrastructure failure. These errors should be logged with full context for debugging and may be surfaced as internal server errors to the caller.
Usage
Indexing error handling should be applied at the boundary of the document processing pipeline, wrapping the core processing logic in a try-catch structure that maps exceptions to structured progress results.
Use this pattern when:
- You are building a document processing pipeline that needs to communicate failure semantics to callers.
- You need to distinguish between permanent and transient failures to enable intelligent retry policies.
- You want to maintain system stability under load by providing backpressure signals through error classification.
- You need to include diagnostic information (the document ID and error message) in failure responses for debugging.
Theoretical Basis
Indexing error handling implements the error classification pattern, which is a specialization of the broader structured error handling principle. The key theoretical foundation is the mapping between exception types and retry semantics.
The classification can be modeled as a function from exception types to progress results:
function classifyError(exception, operation):
reason = "Document '" + operation.id + "': " + exception.message
if exception is InvalidInputException:
return INVALID_INPUT(reason) // permanent, do not retry
if exception is OverloadException:
return OVERLOAD(reason) // transient, retry with backoff
if exception is TimeoutException:
return TIMEOUT(reason) // transient, retry immediately
// Unhandled exception: propagate as runtime error
raise exception
The error classification forms a decision tree for the caller:
Progress result received
|
+-- DONE: Success, proceed to next batch
|
+-- INVALID_INPUT: Permanent failure
| +-- Log error with document ID
| +-- Remove document from feed
| +-- Alert document producer
|
+-- OVERLOAD: Transient failure (capacity)
| +-- Reduce feeding rate
| +-- Retry with exponential backoff
| +-- Monitor system metrics
|
+-- TIMEOUT: Transient failure (time)
+-- Retry with same or longer timeout
+-- Consider reducing batch size
+-- Monitor processing latency
Reason strings: Each error result includes a human-readable reason string that contains the document ID and the exception message. This provides essential context for debugging without requiring the caller to parse stack traces or internal error codes. The document ID is critical for correlating errors with specific documents in the feed.
Fail-fast semantics: When an error is encountered for any operation in a batch, processing of the entire batch is aborted and the error is returned immediately. This fail-fast behavior prevents wasted work on subsequent operations that may also fail for the same reason (such as overload), and ensures that error signals reach the caller as quickly as possible.