Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Duckdb Duckdb Libpg Query Parser

From Leeroopedia


Knowledge Sources
Domains SQL_Parsing, Third_Party
Last Updated 2026-02-07 12:00 GMT

Overview

DuckDB embeds a forked version of libpg_query, the PostgreSQL parser extracted as a standalone C/C++ library, to perform lexical and grammatical analysis of SQL statements and produce raw parse trees.

Description

The libpg_query integration provides DuckDB with a battle-tested SQL parser derived from the PostgreSQL codebase. The library operates within the duckdb_libpgquery namespace and exposes a C++ wrapper class (duckdb::PostgresParser) that manages parser lifecycle, memory allocation, and error reporting. Internally, the parser uses a Bison-generated grammar and Flex-based scanner to convert SQL text into a linked list of PGRawStmt parse tree nodes. Supporting infrastructure includes list manipulation functions (append, concatenate, copy, truncate) and node constructor functions (makeAExpr, makeRangeVar, makeFuncCall, etc.) that build typed AST nodes. Memory is managed through a custom arena allocator (palloc) backed by thread-local parser state, which allocates in 10 KB chunks and frees all memory when the parser is cleaned up.

Usage

DuckDB uses this library whenever a SQL query string needs to be parsed. The PostgresParser::Parse() method is invoked during query preparation to convert raw SQL text into an internal parse tree. The PostgresParser::Tokenize() method is used for syntax highlighting and keyword detection. The PostgresParser::IsKeyword() and PostgresParser::KeywordList() static methods support keyword introspection for auto-completion and SQL formatting.

Code Reference

Source Location

Signature

// C++ wrapper class (postgres_parser.hpp)
namespace duckdb {
class PostgresParser {
public:
    PostgresParser();
    ~PostgresParser();

    bool success;
    duckdb_libpgquery::PGList *parse_tree;
    std::string error_message;
    int error_location;

    void Parse(const std::string &query);
    static vector<duckdb_libpgquery::PGSimplifiedToken> Tokenize(const std::string &query);
    static duckdb_libpgquery::PGKeywordCategory IsKeyword(const std::string &text);
    static vector<duckdb_libpgquery::PGKeyword> KeywordList();
    static void SetPreserveIdentifierCase(bool preserve);
};
}

// Core parser function (src_backend_parser_parser.cpp)
namespace duckdb_libpgquery {
PGList *raw_parser(const char *str);
PGKeywordCategory is_keyword(const char *text);
std::vector<PGKeyword> keyword_list();
}

// List operations (src_backend_nodes_list.cpp)
namespace duckdb_libpgquery {
PGList *lappend(PGList *list, void *datum);
PGList *lcons(void *datum, PGList *list);
PGList *list_concat(PGList *list1, PGList *list2);
void   *list_nth(const PGList *list, int n);
PGList *list_copy(const PGList *oldlist);
PGList *list_copy_tail(const PGList *oldlist, int nskip);
PGList *list_truncate(PGList *list, int new_size);
void    list_free(PGList *list);
}

// Node constructors (src_backend_nodes_makefuncs.cpp)
namespace duckdb_libpgquery {
PGAExpr      *makeAExpr(PGAExpr_Kind kind, PGList *name, PGNode *lexpr, PGNode *rexpr, int location);
PGAExpr      *makeSimpleAExpr(PGAExpr_Kind kind, const char *name, PGNode *lexpr, PGNode *rexpr, int location);
PGExpr       *makeBoolExpr(PGBoolExprType boolop, PGList *args, int location);
PGAlias      *makeAlias(const char *aliasname, PGList *colnames);
PGRangeVar   *makeRangeVar(char *schemaname, char *relname, int location);
PGTypeName   *makeTypeName(char *typnam);
PGTypeName   *makeTypeNameFromNameList(PGList *names);
PGDefElem    *makeDefElem(const char *name, PGNode *arg, int location);
PGFuncCall   *makeFuncCall(PGList *name, PGList *args, int location);
PGGroupingSet *makeGroupingSet(GroupingSetKind kind, PGList *content, int location);
}

// Memory management (pg_functions.cpp)
namespace duckdb_libpgquery {
void *palloc(size_t n);
void  pg_parser_init();
void  pg_parser_cleanup();
}

Import

#include "postgres_parser.hpp"

// Internal headers used by the library itself:
#include "pg_functions.hpp"
#include "parser/parser.hpp"
#include "nodes/pg_list.hpp"
#include "nodes/makefuncs.hpp"

I/O Contract

Inputs

Name Type Required Description
query const std::string & Yes The SQL query string to parse or tokenize
text const std::string & No A text token to check against the keyword list (for IsKeyword)
preserve bool No Whether to preserve identifier case during parsing (for SetPreserveIdentifierCase)

Outputs

Name Type Description
parse_tree duckdb_libpgquery::PGList * Linked list of PGRawStmt nodes representing the parsed SQL; nullptr on failure
success bool Indicates whether parsing completed without errors
error_message std::string Human-readable error description if parsing failed
error_location int Character offset in the query string where the error was detected
tokens vector<PGSimplifiedToken> Sequence of token descriptors returned by Tokenize()
keyword_category PGKeywordCategory Category of the keyword (PG_KEYWORD_NONE if not a keyword)

Usage Examples

// Parsing a SQL query
#include "postgres_parser.hpp"

duckdb::PostgresParser parser;
parser.Parse("SELECT 1 + 2 AS result;");

if (parser.success) {
    // Walk parser.parse_tree (a PGList of PGRawStmt nodes)
    duckdb_libpgquery::PGList *stmts = parser.parse_tree;
    // ... process each statement node
} else {
    // Handle error
    std::cerr << "Parse error at position " << parser.error_location
              << ": " << parser.error_message << std::endl;
}

// Tokenizing a query for syntax highlighting
auto tokens = duckdb::PostgresParser::Tokenize("SELECT * FROM tbl WHERE id = 42");
for (auto &tok : tokens) {
    // tok.start, tok.type describe each token
}

// Checking if a word is a SQL keyword
auto category = duckdb::PostgresParser::IsKeyword("SELECT");
// category == PG_KEYWORD_RESERVED

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment