Principle:Vespa engine Vespa Log Message Formatting

Log Message Formatting

Overview

Log message formatting converts structured log records into a standardized text format for aggregation and analysis. Vespa uses a tab-delimited format with fields for timestamp, hostname, PID/TID, service name, component, level, and escaped message. This uniform format enables centralized log processing across all C++ and Java processes in a Vespa deployment.

Motivation

A Vespa deployment consists of many different process types -- config servers, distributors, search nodes, container nodes, and more -- running across multiple hosts. All of these processes produce log output that must be:

Aggregatable: Logs from different processes and hosts must be combinable into a single stream for analysis.
Parseable: Each field must be unambiguously extractable by log processing tools (vespa-logfmt).
Sortable: Logs must be sortable by timestamp for chronological reconstruction of events.
Consistent: The same format must be used by both Java and C++ processes.
Safe: Special characters (tabs, newlines, backslashes) in messages must be escaped to prevent format corruption.

A standardized, tab-delimited format satisfies all of these requirements while being both human-readable and machine-parseable.

Log Line Format

Each Vespa log line contains exactly 7 tab-separated fields followed by a newline:

<time>\t<hostname>\t<pid/tid>\t<service>\t<component>\t<level>\t<message>\n

Field	Format	Example	Description
time	`seconds.microseconds`	`1696000000.123456`	Unix epoch timestamp with microsecond precision
hostname	String	`node-01.example.com`	The hostname of the machine producing the log
pid/tid	`pid/tid`	`12345/67`	Process ID slash thread ID
service	String	`searchnode`	The Vespa service name from `VESPA_SERVICE_NAME`
component	Dotted path	`container.handler.search`	The component hierarchy, built from application prefix and logger name
level	Lowercase string	`info`	The log level (fatal, error, warning, config, info, event, debug, spam)
message	Escaped text	`Query completed in 42ms`	The log message with special characters escaped

Timestamp Format

Vespa uses a Unix epoch timestamp with microsecond precision, formatted as:

<seconds>.<microseconds>

For example, 1696000000.123456 represents September 29, 2023, at some specific time with 123456 microseconds. This format is:

Timezone-independent: Unix epoch is always UTC.
Sortable: Lexicographic sorting of the timestamp string produces chronological order.
Precise: Microsecond resolution is sufficient for correlating events across processes.

Component Name Construction

The component field is constructed from the application prefix and the JUL logger name:

Application Prefix	Logger Name	Result
`container`	`com.yahoo.search.handler`	`container.com.yahoo.search.handler`
`container`	(null)	`container`
(null)	`com.yahoo.search.handler`	`.com.yahoo.search.handler`
(null)	(null)	`-`

The prefix provides a top-level grouping by program, while the logger name provides fine-grained component identification. This two-part structure allows operators to filter logs at different granularity levels.

Message Escaping

The message field undergoes escape processing to preserve the tab-delimited format:

Character	Escape Sequence	Reason
`\` (backslash)	`\\`	Escape character itself must be escaped
`\n` (newline)	`\n`	Newlines would break the one-line-per-record format
`\r` (carriage return)	`\r`	Same as newline
`\t` (tab)	`\t`	Tabs would corrupt field boundaries

The VespaFormat.escape() method handles this transformation. It is applied to both regular messages and event payloads.

Event Messages

Log records at the EVENT level receive special treatment. Instead of using the formatted message text, the formatter extracts a structured Event object from the log record's parameters:

if (r.getLevel() == LogLevel.EVENT) {
    Event event = (Event) r.getParameters()[0];
    sbuf.append(VespaFormat.escape(event.toString()));
}

Event objects represent structured occurrences like "service started", "service stopped", or "state change". They have their own serialization format distinct from regular log messages.

Exception Handling

When a log record includes a thrown exception (via r.getThrown()), the stack trace is appended to the message after escaping:

appendException(r.getThrown(), sbuf);

The stack trace newlines are escaped so that the entire log record, including the exception, remains on a single line.

Cross-Language Consistency

The same tab-delimited format is used by both the Java formatter (VespaFormatter) and the C++ logging library (vespalog). This ensures that log lines from all process types can be:

Interleaved chronologically in a single log file
Processed by the same log analysis tools (vespa-logfmt)
Collected by the same log aggregation infrastructure

Design Constraints

Fixed number of fields: The format always has exactly 7 tab-separated fields. Missing values use "-" as a placeholder.
Single-line records: Each log record occupies exactly one line. Embedded newlines in messages or exceptions are escaped.
No structured data in message field: The message field is free-form text. Structured data is conveyed through the EVENT level with dedicated Event objects.
Hostname resolved at initialization: The hostname is resolved once during formatter construction and reused for all log records. It does not change if the hostname is updated while the process is running.

Interaction with Other Components

Logger Handler Installation: The VespaFormatter is created during handler initialization and set on the VespaLogHandler.
Log Rotation and Special Handling: The formatted string is what actually gets written to the log target during publish().

References

Related Pages

Implemented By

Implementation:Vespa_engine_Vespa_VespaFormatter_Format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment