Implementation:Duckdb Duckdb Libpg Query Parser
| Knowledge Sources | |
|---|---|
| Domains | SQL_Parsing, Third_Party |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
DuckDB embeds a forked version of libpg_query, the PostgreSQL parser extracted as a standalone C/C++ library, to perform lexical and grammatical analysis of SQL statements and produce raw parse trees.
Description
The libpg_query integration provides DuckDB with a battle-tested SQL parser derived from the PostgreSQL codebase. The library operates within the duckdb_libpgquery namespace and exposes a C++ wrapper class (duckdb::PostgresParser) that manages parser lifecycle, memory allocation, and error reporting. Internally, the parser uses a Bison-generated grammar and Flex-based scanner to convert SQL text into a linked list of PGRawStmt parse tree nodes. Supporting infrastructure includes list manipulation functions (append, concatenate, copy, truncate) and node constructor functions (makeAExpr, makeRangeVar, makeFuncCall, etc.) that build typed AST nodes. Memory is managed through a custom arena allocator (palloc) backed by thread-local parser state, which allocates in 10 KB chunks and frees all memory when the parser is cleaned up.
Usage
DuckDB uses this library whenever a SQL query string needs to be parsed. The PostgresParser::Parse() method is invoked during query preparation to convert raw SQL text into an internal parse tree. The PostgresParser::Tokenize() method is used for syntax highlighting and keyword detection. The PostgresParser::IsKeyword() and PostgresParser::KeywordList() static methods support keyword introspection for auto-completion and SQL formatting.
Code Reference
Source Location
- Repository: Duckdb_Duckdb
- Files:
- third_party/libpg_query/postgres_parser.cpp -- C++ PostgresParser entry point (51 lines)
- third_party/libpg_query/src_backend_parser_parser.cpp -- main
raw_parserfunction (289 lines) - third_party/libpg_query/pg_functions.cpp -- parser utility functions and arena allocator (270 lines)
- third_party/libpg_query/src_backend_nodes_list.cpp -- linked list node operations (540 lines)
- third_party/libpg_query/src_backend_nodes_makefuncs.cpp -- AST node constructor functions (305 lines)
Signature
// C++ wrapper class (postgres_parser.hpp)
namespace duckdb {
class PostgresParser {
public:
PostgresParser();
~PostgresParser();
bool success;
duckdb_libpgquery::PGList *parse_tree;
std::string error_message;
int error_location;
void Parse(const std::string &query);
static vector<duckdb_libpgquery::PGSimplifiedToken> Tokenize(const std::string &query);
static duckdb_libpgquery::PGKeywordCategory IsKeyword(const std::string &text);
static vector<duckdb_libpgquery::PGKeyword> KeywordList();
static void SetPreserveIdentifierCase(bool preserve);
};
}
// Core parser function (src_backend_parser_parser.cpp)
namespace duckdb_libpgquery {
PGList *raw_parser(const char *str);
PGKeywordCategory is_keyword(const char *text);
std::vector<PGKeyword> keyword_list();
}
// List operations (src_backend_nodes_list.cpp)
namespace duckdb_libpgquery {
PGList *lappend(PGList *list, void *datum);
PGList *lcons(void *datum, PGList *list);
PGList *list_concat(PGList *list1, PGList *list2);
void *list_nth(const PGList *list, int n);
PGList *list_copy(const PGList *oldlist);
PGList *list_copy_tail(const PGList *oldlist, int nskip);
PGList *list_truncate(PGList *list, int new_size);
void list_free(PGList *list);
}
// Node constructors (src_backend_nodes_makefuncs.cpp)
namespace duckdb_libpgquery {
PGAExpr *makeAExpr(PGAExpr_Kind kind, PGList *name, PGNode *lexpr, PGNode *rexpr, int location);
PGAExpr *makeSimpleAExpr(PGAExpr_Kind kind, const char *name, PGNode *lexpr, PGNode *rexpr, int location);
PGExpr *makeBoolExpr(PGBoolExprType boolop, PGList *args, int location);
PGAlias *makeAlias(const char *aliasname, PGList *colnames);
PGRangeVar *makeRangeVar(char *schemaname, char *relname, int location);
PGTypeName *makeTypeName(char *typnam);
PGTypeName *makeTypeNameFromNameList(PGList *names);
PGDefElem *makeDefElem(const char *name, PGNode *arg, int location);
PGFuncCall *makeFuncCall(PGList *name, PGList *args, int location);
PGGroupingSet *makeGroupingSet(GroupingSetKind kind, PGList *content, int location);
}
// Memory management (pg_functions.cpp)
namespace duckdb_libpgquery {
void *palloc(size_t n);
void pg_parser_init();
void pg_parser_cleanup();
}
Import
#include "postgres_parser.hpp"
// Internal headers used by the library itself:
#include "pg_functions.hpp"
#include "parser/parser.hpp"
#include "nodes/pg_list.hpp"
#include "nodes/makefuncs.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query | const std::string & |
Yes | The SQL query string to parse or tokenize |
| text | const std::string & |
No | A text token to check against the keyword list (for IsKeyword)
|
| preserve | bool |
No | Whether to preserve identifier case during parsing (for SetPreserveIdentifierCase)
|
Outputs
| Name | Type | Description |
|---|---|---|
| parse_tree | duckdb_libpgquery::PGList * |
Linked list of PGRawStmt nodes representing the parsed SQL; nullptr on failure
|
| success | bool |
Indicates whether parsing completed without errors |
| error_message | std::string |
Human-readable error description if parsing failed |
| error_location | int |
Character offset in the query string where the error was detected |
| tokens | vector<PGSimplifiedToken> |
Sequence of token descriptors returned by Tokenize()
|
| keyword_category | PGKeywordCategory |
Category of the keyword (PG_KEYWORD_NONE if not a keyword)
|
Usage Examples
// Parsing a SQL query
#include "postgres_parser.hpp"
duckdb::PostgresParser parser;
parser.Parse("SELECT 1 + 2 AS result;");
if (parser.success) {
// Walk parser.parse_tree (a PGList of PGRawStmt nodes)
duckdb_libpgquery::PGList *stmts = parser.parse_tree;
// ... process each statement node
} else {
// Handle error
std::cerr << "Parse error at position " << parser.error_location
<< ": " << parser.error_message << std::endl;
}
// Tokenizing a query for syntax highlighting
auto tokens = duckdb::PostgresParser::Tokenize("SELECT * FROM tbl WHERE id = 42");
for (auto &tok : tokens) {
// tok.start, tok.type describe each token
}
// Checking if a word is a SQL keyword
auto category = duckdb::PostgresParser::IsKeyword("SELECT");
// category == PG_KEYWORD_RESERVED