Implementation:Neuml Txtai DuckDB Database
| Knowledge Sources | |
|---|---|
| Domains | Database, SQL |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for DuckDB-backed embedded database operations provided by txtai.
Description
DuckDB is a concrete database backend that uses the DuckDB embedded analytical database engine. It extends Embedded (which extends RDBMS) and implements DuckDB-specific connection management, JSON column extraction via json_extract_string(), and batch row fetching. A key feature is the formatargs() method, which rewrites named SQL parameters (e.g., :param) into positional ? placeholders since DuckDB does not support named parameters. The class also handles DuckDB's lack of INSERT OR REPLACE support by explicitly deleting existing documents and objects before insertion. The copy() method uses Parquet format for efficient database export and import when saving to a new path, creating a fresh schema and importing data via COPY ... FROM ... (FORMAT parquet). Row iteration uses batched fetchmany(256) calls for memory-efficient result streaming. DuckDB connections operate within explicit transactions (connection.begin()).
Usage
Use DuckDB as the database backend when you need high-performance analytical queries on embedded data. Configure txtai with content: "duckdb" in the database configuration. Requires the duckdb Python package (install via the "database" extra).
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/database/duckdb.py - Lines: 1-150
Signature
class DuckDB(Embedded):
# Class constants
DELETE_DOCUMENT = "DELETE FROM documents WHERE id = ?"
DELETE_OBJECT = "DELETE FROM objects WHERE id = ?"
def __init__(self, config)
def execute(self, function, *args)
def insertdocument(self, uid, data, tags, entry)
def insertobject(self, uid, data, tags, entry)
def connect(self, path=":memory:")
def getcursor(self)
def jsonprefix(self)
def jsoncolumn(self, name)
def rows(self)
def addfunctions(self)
def copy(self, path)
def formatargs(self, args)
Import
from txtai.database.duckdb import DuckDB
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Database configuration parameters. Must include content key. DuckDB-specific settings are optional.
|
| path | str | No | File system path for the DuckDB database file. Defaults to ":memory:" for in-memory databases.
|
| documents | list[tuple] | Yes (for insert) | List of (uid, document, tags) tuples for insertion.
|
| args | tuple | Yes (for formatargs) |
Tuple of (query_string, parameters_dict) where named parameters are converted to positional.
|
Outputs
| Name | Type | Description |
|---|---|---|
| connection | duckdb.DuckDBPyConnection | DuckDB connection object from connect(), with an active transaction.
|
| cursor | duckdb.DuckDBPyConnection | The connection itself acts as the cursor (DuckDB uses the same object). |
| rows | generator | Generator yielding result rows in batches of 256 via fetchmany().
|
| formatted args | tuple | Rewritten (query, [params]) with named parameters replaced by positional ? placeholders.
|
| new connection | duckdb.DuckDBPyConnection | New connection with copied data from copy().
|
Usage Examples
from txtai.database.duckdb import DuckDB
# Create a DuckDB-backed database
config = {"content": "duckdb"}
db = DuckDB(config)
# Initialize the database (creates in-memory)
db.initialize()
# Insert documents
documents = [
("doc1", {"text": "Analytical processing with DuckDB"}, None),
("doc2", {"text": "Columnar storage engines"}, None),
]
db.insert(documents)
# Save to disk using Parquet export/import
db.save("/tmp/my_duckdb")
# Count stored sections
total = db.count()
# Close connection
db.close()