Principle:Vespa engine Vespa Log Message Formatting
Log Message Formatting
Overview
Log message formatting converts structured log records into a standardized text format for aggregation and analysis. Vespa uses a tab-delimited format with fields for timestamp, hostname, PID/TID, service name, component, level, and escaped message. This uniform format enables centralized log processing across all C++ and Java processes in a Vespa deployment.
Motivation
A Vespa deployment consists of many different process types -- config servers, distributors, search nodes, container nodes, and more -- running across multiple hosts. All of these processes produce log output that must be:
- Aggregatable: Logs from different processes and hosts must be combinable into a single stream for analysis.
- Parseable: Each field must be unambiguously extractable by log processing tools (
vespa-logfmt). - Sortable: Logs must be sortable by timestamp for chronological reconstruction of events.
- Consistent: The same format must be used by both Java and C++ processes.
- Safe: Special characters (tabs, newlines, backslashes) in messages must be escaped to prevent format corruption.
A standardized, tab-delimited format satisfies all of these requirements while being both human-readable and machine-parseable.
Log Line Format
Each Vespa log line contains exactly 7 tab-separated fields followed by a newline:
<time>\t<hostname>\t<pid/tid>\t<service>\t<component>\t<level>\t<message>\n
| Field | Format | Example | Description |
|---|---|---|---|
| time | seconds.microseconds |
1696000000.123456 |
Unix epoch timestamp with microsecond precision |
| hostname | String | node-01.example.com |
The hostname of the machine producing the log |
| pid/tid | pid/tid |
12345/67 |
Process ID slash thread ID |
| service | String | searchnode |
The Vespa service name from VESPA_SERVICE_NAME
|
| component | Dotted path | container.handler.search |
The component hierarchy, built from application prefix and logger name |
| level | Lowercase string | info |
The log level (fatal, error, warning, config, info, event, debug, spam) |
| message | Escaped text | Query completed in 42ms |
The log message with special characters escaped |
Timestamp Format
Vespa uses a Unix epoch timestamp with microsecond precision, formatted as:
<seconds>.<microseconds>
For example, 1696000000.123456 represents September 29, 2023, at some specific time with 123456 microseconds. This format is:
- Timezone-independent: Unix epoch is always UTC.
- Sortable: Lexicographic sorting of the timestamp string produces chronological order.
- Precise: Microsecond resolution is sufficient for correlating events across processes.
Component Name Construction
The component field is constructed from the application prefix and the JUL logger name:
| Application Prefix | Logger Name | Result |
|---|---|---|
container |
com.yahoo.search.handler |
container.com.yahoo.search.handler
|
container |
(null) | container
|
| (null) | com.yahoo.search.handler |
.com.yahoo.search.handler
|
| (null) | (null) | -
|
The prefix provides a top-level grouping by program, while the logger name provides fine-grained component identification. This two-part structure allows operators to filter logs at different granularity levels.
Message Escaping
The message field undergoes escape processing to preserve the tab-delimited format:
| Character | Escape Sequence | Reason |
|---|---|---|
\ (backslash) |
\\ |
Escape character itself must be escaped |
\n (newline) |
\n |
Newlines would break the one-line-per-record format |
\r (carriage return) |
\r |
Same as newline |
\t (tab) |
\t |
Tabs would corrupt field boundaries |
The VespaFormat.escape() method handles this transformation. It is applied to both regular messages and event payloads.
Event Messages
Log records at the EVENT level receive special treatment. Instead of using the formatted message text, the formatter extracts a structured Event object from the log record's parameters:
if (r.getLevel() == LogLevel.EVENT) {
Event event = (Event) r.getParameters()[0];
sbuf.append(VespaFormat.escape(event.toString()));
}
Event objects represent structured occurrences like "service started", "service stopped", or "state change". They have their own serialization format distinct from regular log messages.
Exception Handling
When a log record includes a thrown exception (via r.getThrown()), the stack trace is appended to the message after escaping:
appendException(r.getThrown(), sbuf);
The stack trace newlines are escaped so that the entire log record, including the exception, remains on a single line.
Cross-Language Consistency
The same tab-delimited format is used by both the Java formatter (VespaFormatter) and the C++ logging library (vespalog). This ensures that log lines from all process types can be:
- Interleaved chronologically in a single log file
- Processed by the same log analysis tools (
vespa-logfmt) - Collected by the same log aggregation infrastructure
Design Constraints
- Fixed number of fields: The format always has exactly 7 tab-separated fields. Missing values use
"-"as a placeholder. - Single-line records: Each log record occupies exactly one line. Embedded newlines in messages or exceptions are escaped.
- No structured data in message field: The message field is free-form text. Structured data is conveyed through the EVENT level with dedicated
Eventobjects. - Hostname resolved at initialization: The hostname is resolved once during formatter construction and reused for all log records. It does not change if the hostname is updated while the process is running.
Interaction with Other Components
- Logger Handler Installation: The
VespaFormatteris created during handler initialization and set on theVespaLogHandler. - Log Rotation and Special Handling: The formatted string is what actually gets written to the log target during
publish().