Implementation:Vespa engine Vespa DocumentScript RemoveSpanTree
| Knowledge Sources | |
|---|---|
| Domains | Document_Processing, Indexing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for recursively removing linguistics span trees from all field value types within a document, provided by Vespa's document processing framework.
Description
The removeAnyLinguisticsSpanTree method is a private recursive method in DocumentScript that traverses the field value type hierarchy and removes any attached linguistics span tree. It handles five distinct field value types:
- StringFieldValue: Directly removes the span tree named
SpanTrees.LINGUISTICS. - Array: Iterates over all elements and recurses into each one.
- WeightedSet: Iterates over the key set and recurses into each key.
- MapFieldValue: Iterates over all entries and recurses into both keys and values.
- StructuredFieldValue: Iterates over all field-value pairs using an iterator and recurses into each value.
The method uses instanceof pattern matching to dispatch to the correct handling logic for each type. For types that do not match any of the above (such as numeric fields), the method silently returns without action, as these types cannot carry span trees.
Usage
This method is called internally by DocumentScript.execute() as a pre-processing step before running the indexing expression. It is not intended to be called directly by external code.
Use this implementation reference when:
- You need to understand the span tree cleanup logic during re-indexing.
- You are debugging issues where stale linguistic annotations persist after re-indexing.
- You are working with custom field value types and need to understand whether they are covered by the cleanup traversal.
Code Reference
Source Location
- Repository: Vespa
- File:
docprocs/src/main/java/com/yahoo/docprocs/indexing/DocumentScript.java - Lines: 87-108
Signature
private void removeAnyLinguisticsSpanTree(FieldValue value)
Import
import com.yahoo.docprocs.indexing.DocumentScript;
Full Method Body
private void removeAnyLinguisticsSpanTree(FieldValue value) {
if (value instanceof StringFieldValue) {
((StringFieldValue)value).removeSpanTree(SpanTrees.LINGUISTICS);
} else if (value instanceof Array<?> arr) {
for (FieldValue fieldValue : arr.getValues()) {
removeAnyLinguisticsSpanTree(fieldValue);
}
} else if (value instanceof WeightedSet<?> wset) {
for (FieldValue fieldValue : wset.keySet()) {
removeAnyLinguisticsSpanTree(fieldValue);
}
} else if (value instanceof MapFieldValue<?, ?> map) {
for (Map.Entry<?, ?> entry : map.entrySet()) {
removeAnyLinguisticsSpanTree((FieldValue)entry.getKey());
removeAnyLinguisticsSpanTree((FieldValue)entry.getValue());
}
} else if (value instanceof StructuredFieldValue struct) {
for (Iterator<Map.Entry<Field, FieldValue>> it = struct.iterator(); it.hasNext();) {
removeAnyLinguisticsSpanTree(it.next().getValue());
}
}
}
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| value | FieldValue |
Yes | The field value from which to remove linguistics span trees. May be any subtype of FieldValue including StringFieldValue, Array, WeightedSet, MapFieldValue, or StructuredFieldValue.
|
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | void |
The method modifies the input FieldValue in place. Any StringFieldValue encountered during traversal has its linguistics span tree removed. No value is returned.
|
Type Dispatch Table
| Field Value Type | Action | Recursion Target |
|---|---|---|
StringFieldValue |
Remove SpanTrees.LINGUISTICS span tree |
None (leaf) |
Array<?> |
Iterate elements | Each element value |
WeightedSet<?> |
Iterate key set | Each key value |
MapFieldValue<?, ?> |
Iterate entries | Both key and value of each entry |
StructuredFieldValue |
Iterate field-value pairs | Each field value |
| Other types | No action (silent return) | None |
Usage Examples
// This method is private and called internally by DocumentScript.execute().
// The following illustrates the conceptual usage pattern:
// Given a document with a string field that has a stale linguistics span tree:
StringFieldValue title = new StringFieldValue("Vespa Search Engine");
title.setSpanTree(new SpanTree(SpanTrees.LINGUISTICS, linguisticsRoot));
// When DocumentScript.execute() is called, it iterates over all fields
// and calls removeAnyLinguisticsSpanTree on each value:
// removeAnyLinguisticsSpanTree(title);
// Result: title no longer has a linguistics span tree
// For nested types, the recursion handles deep structures:
Array<StringFieldValue> tags = new Array<>(DataType.getArray(DataType.STRING));
tags.add(new StringFieldValue("search"));
tags.add(new StringFieldValue("engine"));
// Each element's linguistics span tree would be removed recursively