Implementation:Apache Paimon StructuredOptionsSplitter
| Knowledge Sources | |
|---|---|
| Domains | String Parsing, Configuration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
StructuredOptionsSplitter is a utility class for parsing delimited strings with support for quoting and escaping.
Description
StructuredOptionsSplitter provides sophisticated string splitting functionality that respects quoting conventions, making it suitable for parsing configuration values that may contain delimiter characters within quoted sections. The class supports both single-quote and double-quote delimiters, with quote escaping through doubling ( or "").
The splitting algorithm tokenizes the input string into four types of tokens: single-quoted content, double-quoted content, unquoted content, and delimiters. During tokenization, quotes can be escaped by doubling them (e.g., 'its' becomes "it's"). The tokenizer handles incomplete quoting as an error, ensuring that all quoted sections are properly closed.
After tokenization, the processor validates the token sequence, ensuring that quoted sections are followed by delimiters or end of string. This prevents malformed inputs like 'quoted'unquoted from being accepted. Empty values between consecutive delimiters are preserved as empty strings.
The class also provides an escaping function that takes an unquoted string and adds single quotes around it if it contains any of the specified characters to escape, double quotes, or single quotes themselves. Any single quotes in the input are doubled for escaping. This bidirectional support (splitting and escaping) enables round-trip conversion of configuration values.
The implementation is efficient, using StringBuilder for string construction and single-pass algorithms where possible. Error messages include position information to help users locate problems in malformed configuration strings.
Usage
Use StructuredOptionsSplitter when parsing configuration values that may contain delimiter characters but need to preserve them in quoted sections. This is common for comma-separated lists where individual values might themselves contain commas, or for any structured configuration format that needs quoting support. The escaping method is useful when generating configuration strings that will later be parsed.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-api/src/main/java/org/apache/paimon/options/StructuredOptionsSplitter.java
Signature
class StructuredOptionsSplitter {
static List<String> splitEscaped(String string, char delimiter)
static String escapeWithSingleQuote(String string, String... charsToEscape)
private static List<String> processTokens(List<Token> tokens)
private static List<Token> tokenize(String string, char delimiter)
private static int consumeInQuotes(String string, char quote, int cursor, StringBuilder builder)
private static int consumeUnquoted(String string, char delimiter, int cursor, StringBuilder builder)
private enum TokenType {
DOUBLE_QUOTED, SINGLE_QUOTED, UNQUOTED, DELIMITER
}
private static class Token {
private final TokenType tokenType;
private final String string;
private final int position;
}
}
Import
// Package-private class, not directly importable
import org.apache.paimon.options.StructuredOptionsSplitter;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| string | String | Yes | String to split or escape |
| delimiter | char | Yes | Delimiter character for splitting |
| charsToEscape | String... | No | Characters that trigger quoting when escaping |
Outputs
| Name | Type | Description |
|---|---|---|
| splits | List<String> | List of split strings with quotes removed and escapes processed |
| escaped | String | Input string with single quotes added if needed |
Usage Examples
// Simple splitting without quotes
List<String> result1 = StructuredOptionsSplitter.splitEscaped("a;b;c", ';');
// result1: ["a", "b", "c"]
// Splitting with single-quoted section containing delimiter
List<String> result2 = StructuredOptionsSplitter.splitEscaped("'a;b';c", ';');
// result2: ["a;b", "c"]
// Double-quoted section with single quote inside
List<String> result3 = StructuredOptionsSplitter.splitEscaped("\"AB'D\";B;C", ';');
// result3: ["AB'D", "B", "C"]
// Escaped quotes (doubling)
List<String> result4 = StructuredOptionsSplitter.splitEscaped("'it''s';ok", ';');
// result4: ["it's", "ok"]
// Complex escaping with mixed quotes
List<String> result5 = StructuredOptionsSplitter.splitEscaped("\"AB'\"\"D;B\";C", ';');
// result5: ["AB'\"D;B", "C"]
// Empty values between delimiters
List<String> result6 = StructuredOptionsSplitter.splitEscaped("a;;b", ';');
// result6: ["a", "", "b"]
// Escaping strings for safe parsing
String escaped1 = StructuredOptionsSplitter.escapeWithSingleQuote("simple", ";");
// escaped1: "simple" (no escaping needed)
String escaped2 = StructuredOptionsSplitter.escapeWithSingleQuote("has;delimiter", ";");
// escaped2: "'has;delimiter'"
String escaped3 = StructuredOptionsSplitter.escapeWithSingleQuote("has'quote", ";");
// escaped3: "'has''quote'"
String escaped4 = StructuredOptionsSplitter.escapeWithSingleQuote("has\"double", ";");
// escaped4: "'has\"double'"
String escaped5 = StructuredOptionsSplitter.escapeWithSingleQuote("normal", ";", ",");
// escaped5: "normal" (no special chars)
String escaped6 = StructuredOptionsSplitter.escapeWithSingleQuote("a,b", ";", ",");
// escaped6: "'a,b'" (contains comma which is in charsToEscape)
// Parsing configuration lists
String configValue = "'path/with;semicolon';'another;path';normal-path";
List<String> paths = StructuredOptionsSplitter.splitEscaped(configValue, ';');
// paths: ["path/with;semicolon", "another;path", "normal-path"]
// Round-trip conversion
String original = "value;with;delimiters";
String escaped = StructuredOptionsSplitter.escapeWithSingleQuote(original, ";");
// escaped: "'value;with;delimiters'"
List<String> parsed = StructuredOptionsSplitter.splitEscaped(escaped, ';');
// parsed: ["value;with;delimiters"]
assert parsed.get(0).equals(original);
// Error handling - unclosed quote
try {
StructuredOptionsSplitter.splitEscaped("'unclosed;value", ';');
} catch (IllegalArgumentException e) {
System.err.println(e.getMessage());
// "Could not split string. Quoting was not closed properly."
}
// Error handling - illegal quote position
try {
StructuredOptionsSplitter.splitEscaped("'quoted'bad", ';');
} catch (IllegalArgumentException e) {
System.err.println(e.getMessage());
// "Could not split string. Illegal quoting at position: ..."
}
// Using in option parsing
public List<String> parseListOption(String value) {
return StructuredOptionsSplitter.splitEscaped(value, ',');
}
public String formatListOption(List<String> values) {
return values.stream()
.map(v -> StructuredOptionsSplitter.escapeWithSingleQuote(v, ","))
.collect(Collectors.joining(","));
}