Implementation:Apache Paimon DataTypeJsonParser
| Knowledge Sources | |
|---|---|
| Domains | Type System, Serialization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
DataTypeJsonParser parses DataType and DataField instances from JSON representations and SQL-style type strings.
Description
DataTypeJsonParser is a critical serialization utility that handles bidirectional conversion between string representations and DataType objects. It supports two input formats: JSON objects (created by DataType.serializeJson) and SQL-style type strings (e.g., "VARCHAR(100)", "ROW<id INT, name STRING>"). The parser is essential for reading table schemas from metadata files, processing DDL statements, and supporting REST APIs.
The parser handles complex nested types recursively, including ARRAY, MAP, ROW, MULTISET, and VECTOR types. For JSON input, it distinguishes between simple textual type names and complex object structures with nested elements. The SQL string parser uses a tokenizer that recognizes keywords, identifiers, parameters, and structural characters, then builds a parse tree using recursive descent.
A unique feature is the automatic field ID assignment for JSON inputs that lack explicit IDs. The parser accepts an AtomicInteger fieldId parameter that auto-increments when IDs are missing, but validates that partial IDs are not mixed with auto-assigned IDs. This supports both legacy schemas (without field IDs) and modern schemas (with explicit IDs) seamlessly.
Usage
Use DataTypeJsonParser when deserializing table schemas from storage, parsing DDL type specifications, or implementing REST API endpoints that accept type definitions. The parser is stateless and thread-safe for concurrent use.
Code Reference
Source Location
- Repository: Apache_Paimon
- File: paimon-api/src/main/java/org/apache/paimon/types/DataTypeJsonParser.java
- Lines: 39-687
Signature
public final class DataTypeJsonParser {
public static DataField parseDataField(JsonNode json);
public static DataType parseDataType(JsonNode json);
public static DataType parseDataType(
JsonNode json,
AtomicInteger fieldId
);
public static DataType parseAtomicTypeSQLString(String string);
}
Import
import org.apache.paimon.types.DataTypeJsonParser;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| json | JsonNode | yes | JSON node containing type definition |
| string | String | yes | SQL-style type string (e.g., "VARCHAR(100)") |
| fieldId | AtomicInteger | no | Counter for auto-assigning field IDs |
Outputs
| Name | Type | Description |
|---|---|---|
| dataType | DataType | Parsed data type instance |
| dataField | DataField | Parsed field with ID, name, type, and metadata |
Usage Examples
Parsing SQL Type Strings
// Parse simple atomic types
DataType intType = DataTypeJsonParser.parseAtomicTypeSQLString("INT");
DataType varcharType = DataTypeJsonParser.parseAtomicTypeSQLString("VARCHAR(100)");
DataType decimalType = DataTypeJsonParser.parseAtomicTypeSQLString("DECIMAL(18,2)");
// Parse timestamp with time zone
DataType timestampType = DataTypeJsonParser.parseAtomicTypeSQLString(
"TIMESTAMP(3) WITH LOCAL TIME ZONE"
);
// Parse with NOT NULL constraint
DataType notNullType = DataTypeJsonParser.parseAtomicTypeSQLString(
"VARCHAR(50) NOT NULL"
);
Parsing JSON Type Definitions
// Parse from JSON node
ObjectMapper mapper = new ObjectMapper();
JsonNode typeNode = mapper.readTree("{\"type\": \"VARCHAR(100)\"}");
DataType parsedType = DataTypeJsonParser.parseDataType(typeNode);
// Parse complex nested type
String complexJson = """
{
"type": "ROW",
"fields": [
{"id": 0, "name": "id", "type": "BIGINT"},
{"id": 1, "name": "name", "type": "STRING"},
{"id": 2, "name": "tags", "type": {"type": "ARRAY", "element": "STRING"}}
]
}
""";
JsonNode complexNode = mapper.readTree(complexJson);
DataType rowType = DataTypeJsonParser.parseDataType(complexNode);
Parsing DataFields with Auto ID Assignment
// Parse field with explicit ID
String fieldJson = """
{
"id": 5,
"name": "user_id",
"type": "BIGINT",
"description": "User identifier"
}
""";
JsonNode fieldNode = mapper.readTree(fieldJson);
DataField field = DataTypeJsonParser.parseDataField(fieldNode);
// Parse field with auto-assigned ID
AtomicInteger idCounter = new AtomicInteger(0);
String fieldWithoutId = """
{
"name": "email",
"type": "STRING",
"description": "Email address"
}
""";
JsonNode fieldNode2 = mapper.readTree(fieldWithoutId);
DataType fieldType = DataTypeJsonParser.parseDataType(
fieldNode2.get("type"),
idCounter
);
// ID will be auto-assigned as 1
Parsing Nested Collection Types
// Parse ARRAY type
DataType arrayType = DataTypeJsonParser.parseAtomicTypeSQLString(
"ARRAY<STRING>"
);
// Parse MAP type
DataType mapType = DataTypeJsonParser.parseAtomicTypeSQLString(
"MAP<STRING, INT>"
);
// Parse VECTOR type
DataType vectorType = DataTypeJsonParser.parseAtomicTypeSQLString(
"VECTOR<FLOAT, 128>"
);
// Parse nested ROW type
String nestedRowSql = "ROW<id BIGINT, address ROW<street STRING, city STRING>>";
DataType nestedType = DataTypeJsonParser.parseAtomicTypeSQLString(nestedRowSql);
Error Handling
try {
// Invalid type string
DataType invalid = DataTypeJsonParser.parseAtomicTypeSQLString(
"INVALID_TYPE(100)"
);
} catch (IllegalArgumentException e) {
// Handle parsing error
System.err.println("Parse error: " + e.getMessage());
}
// Validate mixed field ID usage
try {
String mixedIdJson = """
{
"type": "ROW",
"fields": [
{"id": 0, "name": "id", "type": "INT"},
{"name": "name", "type": "STRING"}
]
}
""";
AtomicInteger counter = new AtomicInteger(-1);
JsonNode node = mapper.readTree(mixedIdJson);
DataType type = DataTypeJsonParser.parseDataType(node, counter);
// Throws: "Partial field id is not allowed"
} catch (IllegalStateException e) {
System.err.println("Mixed ID error: " + e.getMessage());
}