Principle:Duckdb Duckdb String Formatting
| Knowledge Sources | |
|---|---|
| Domains | Text_Processing, Software_Engineering |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A type-safe string formatting system using replacement fields in format strings, providing compile-time format string validation and extensible formatting for user-defined types.
Description
String formatting is the process of constructing strings by inserting formatted values into a template string. The {fmt} library provides a modern C++ formatting system that replaces traditional printf-style formatting with a type-safe, extensible, and performant alternative. It was later adopted into the C++20 standard as `std::format`.
The core design uses replacement fields enclosed in curly braces (`{}`) within format strings. Each field can contain an optional argument index, format specification, or both. For example, `"{0:.2f}"` refers to argument 0, formatted as a floating-point number with 2 decimal places. The format string is parsed at compile time (when possible) to detect type mismatches and invalid format specifiers, catching errors that printf would only reveal at runtime.
The library achieves type safety through template-based argument handling. Unlike printf, which uses C variadic arguments (losing type information), {fmt} uses variadic templates that preserve the exact type of each argument. This enables automatic type-specific formatting without manual format specifier selection: `format("{}", x)` correctly formats `x` whether it is an integer, a float, a string, or any user-defined type with a formatter specialization.
Performance is competitive with or better than printf for most types, and significantly faster for certain operations. The library avoids heap allocation for small outputs, uses optimized integer-to-string and float-to-string algorithms, and supports locale-independent formatting (which avoids the overhead of locale lookup).
Usage
String formatting is used throughout DuckDB's codebase for constructing error messages, debug output, SQL string representation, and formatted output. It provides a safe and readable alternative to sprintf/snprintf for all internal string construction needs, eliminating format string vulnerabilities and type mismatch bugs that are common with printf-family functions.
Theoretical Basis
Format String Grammar:
format_string = (literal_text | replacement_field)*
replacement_field = '{' [arg_id] [':' format_spec] '}'
arg_id = integer | identifier
format_spec = [[fill]align][sign]['#']['0'][width]['.' precision][type]
// Examples:
"{}" -> default formatting, auto-index
"{0}" -> argument 0, default format
"{:>10}" -> right-align in 10-char field
"{:.3f}" -> float with 3 decimal places
"{:#x}" -> hexadecimal with 0x prefix
"{:06d}" -> integer with leading zeros, width 6
Type-Safe Argument Passing: Variadic template mechanism:
// Printf (unsafe):
printf("%d", 3.14); // undefined behavior (double as int)
printf("%s", 42); // undefined behavior (int as string)
// {fmt} (safe):
format("{}", 3.14); // OK: auto-detects double
format("{}", 42); // OK: auto-detects int
format("{:d}", "hi"); // compile-time error: string is not an integer
// Template mechanism (simplified):
template<typename... Args>
string format(format_string fmt, Args&&... args) {
// Each arg's type is known at compile time
// format_string can be validated against arg types
auto store = make_format_args(args...);
return vformat(fmt, store);
}
Compile-Time Format String Validation:
// Format string is parsed at compile time
constexpr auto validate(string_view fmt, type_list types):
pos = 0
arg_index = 0
while pos < fmt.size():
if fmt[pos] == '{':
if fmt[pos+1] == '{': pos += 2; continue // escaped
spec = parse_replacement_field(fmt, pos)
check_type_compatible(spec, types[arg_index])
arg_index++
pos++
check_arg_count(arg_index, types.size())
Custom Type Formatting: Extending for user-defined types:
// Specializing formatter for a custom type
struct Point { double x, y; };
template<>
struct formatter<Point> {
auto parse(parse_context& ctx) {
return ctx.begin(); // no custom format spec
}
auto format(const Point& p, format_context& ctx) {
return format_to(ctx.out(), "({}, {})", p.x, p.y);
}
};
// Usage: format("Point: {}", Point{1.0, 2.0})
// Output: "Point: (1, 2)"