Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:ClickHouse ClickHouse Pdjson Tokenizer

From Leeroopedia
Revision as of 14:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/ClickHouse_ClickHouse_Pdjson_Tokenizer.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains JSON, Parsing
Last Updated 2026-02-08 00:00 GMT

Overview

A streaming JSON tokenizer that parses JSON incrementally without loading the entire document into memory.

Description

The pdjson (Portable and Dependency-free JSON) tokenizer is a C-based streaming JSON parser vendored into Poco. It processes JSON from various sources (buffers, streams, or custom I/O) and emits tokens representing JSON elements as they are encountered. The parser maintains a stack to track nesting levels and validates JSON structure during parsing.

Key features include support for UTF-8 encoding, Unicode escape sequences (including surrogate pairs), streaming mode for multiple JSON documents, and customizable memory allocators. The parser is designed to be lightweight and portable.

Usage

ClickHouse uses this streaming JSON parser for efficient processing of large JSON documents or streams where loading the entire document into memory would be impractical. It's particularly useful for parsing JSON logs, processing API responses, or handling JSON data in data ingestion pipelines.

Code Reference

Source Location

Signature

enum json_type {
    JSON_ERROR,
    JSON_DONE,
    JSON_OBJECT,
    JSON_OBJECT_END,
    JSON_ARRAY,
    JSON_ARRAY_END,
    JSON_STRING,
    JSON_NUMBER,
    JSON_TRUE,
    JSON_FALSE,
    JSON_NULL
};

typedef struct json_stream json_stream;
typedef int (*json_user_io)(void *user);

void json_open_buffer(json_stream *json, const void *buffer, size_t size);
void json_open_string(json_stream *json, const char *string);
void json_open_stream(json_stream *json, FILE *stream);
void json_open_user(json_stream *json, json_user_io get, json_user_io peek, void *user);

enum json_type json_next(json_stream *json);
enum json_type json_peek(json_stream *json);
void json_reset(json_stream *json);
void json_close(json_stream *json);

const char *json_get_string(json_stream *json, size_t *length);
double json_get_number(json_stream *json);
const char *json_get_error(json_stream *json);
size_t json_get_lineno(json_stream *json);
size_t json_get_position(json_stream *json);
size_t json_get_depth(json_stream *json);

void json_set_allocator(json_stream *json, json_allocator *a);
void json_set_streaming(json_stream *json, bool streaming);

Import

#include "pdjson.h"

I/O Contract

Input Output
JSON text from buffer, string, stream, or custom I/O Stream of JSON tokens (`json_type` values)
Token query via `json_next` or `json_peek` Current token type and associated data
Error state query Error message with line number

JSON Token Types

Token Type Description
`JSON_OBJECT` Start of JSON object (`{`)
`JSON_OBJECT_END` End of JSON object (`}`)
`JSON_ARRAY` Start of JSON array (`[`)
`JSON_ARRAY_END` End of JSON array (`]`)
`JSON_STRING` String value (retrieve with `json_get_string`)
`JSON_NUMBER` Numeric value (retrieve with `json_get_number`)
`JSON_TRUE` Boolean true literal
`JSON_FALSE` Boolean false literal
`JSON_NULL` Null literal
`JSON_DONE` Parsing complete
`JSON_ERROR` Parse error occurred

Usage Examples

// Parse JSON from string
json_stream json;
const char *input = "{\"name\":\"test\",\"value\":42}";
json_open_string(&json, input);

enum json_type type;
while ((type = json_next(&json)) != JSON_DONE) {
    if (type == JSON_ERROR) {
        fprintf(stderr, "Error: %s\n", json_get_error(&json));
        break;
    }

    switch (type) {
        case JSON_OBJECT:
            printf("Start object\n");
            break;
        case JSON_STRING: {
            size_t len;
            const char *str = json_get_string(&json, &len);
            printf("String: %.*s\n", (int)len, str);
            break;
        }
        case JSON_NUMBER:
            printf("Number: %g\n", json_get_number(&json));
            break;
        // Handle other types...
    }
}

json_close(&json);

// Parse from file stream
FILE *fp = fopen("data.json", "r");
json_open_stream(&json, fp);
// ... process tokens ...
json_close(&json);
fclose(fp);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment