Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai DuckDB Database

From Leeroopedia


Knowledge Sources
Domains Database, SQL
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for DuckDB-backed embedded database operations provided by txtai.

Description

DuckDB is a concrete database backend that uses the DuckDB embedded analytical database engine. It extends Embedded (which extends RDBMS) and implements DuckDB-specific connection management, JSON column extraction via json_extract_string(), and batch row fetching. A key feature is the formatargs() method, which rewrites named SQL parameters (e.g., :param) into positional ? placeholders since DuckDB does not support named parameters. The class also handles DuckDB's lack of INSERT OR REPLACE support by explicitly deleting existing documents and objects before insertion. The copy() method uses Parquet format for efficient database export and import when saving to a new path, creating a fresh schema and importing data via COPY ... FROM ... (FORMAT parquet). Row iteration uses batched fetchmany(256) calls for memory-efficient result streaming. DuckDB connections operate within explicit transactions (connection.begin()).

Usage

Use DuckDB as the database backend when you need high-performance analytical queries on embedded data. Configure txtai with content: "duckdb" in the database configuration. Requires the duckdb Python package (install via the "database" extra).

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/database/duckdb.py
  • Lines: 1-150

Signature

class DuckDB(Embedded):
    # Class constants
    DELETE_DOCUMENT = "DELETE FROM documents WHERE id = ?"
    DELETE_OBJECT = "DELETE FROM objects WHERE id = ?"

    def __init__(self, config)
    def execute(self, function, *args)
    def insertdocument(self, uid, data, tags, entry)
    def insertobject(self, uid, data, tags, entry)
    def connect(self, path=":memory:")
    def getcursor(self)
    def jsonprefix(self)
    def jsoncolumn(self, name)
    def rows(self)
    def addfunctions(self)
    def copy(self, path)
    def formatargs(self, args)

Import

from txtai.database.duckdb import DuckDB

I/O Contract

Inputs

Name Type Required Description
config dict Yes Database configuration parameters. Must include content key. DuckDB-specific settings are optional.
path str No File system path for the DuckDB database file. Defaults to ":memory:" for in-memory databases.
documents list[tuple] Yes (for insert) List of (uid, document, tags) tuples for insertion.
args tuple Yes (for formatargs) Tuple of (query_string, parameters_dict) where named parameters are converted to positional.

Outputs

Name Type Description
connection duckdb.DuckDBPyConnection DuckDB connection object from connect(), with an active transaction.
cursor duckdb.DuckDBPyConnection The connection itself acts as the cursor (DuckDB uses the same object).
rows generator Generator yielding result rows in batches of 256 via fetchmany().
formatted args tuple Rewritten (query, [params]) with named parameters replaced by positional ? placeholders.
new connection duckdb.DuckDBPyConnection New connection with copied data from copy().

Usage Examples

from txtai.database.duckdb import DuckDB

# Create a DuckDB-backed database
config = {"content": "duckdb"}
db = DuckDB(config)

# Initialize the database (creates in-memory)
db.initialize()

# Insert documents
documents = [
    ("doc1", {"text": "Analytical processing with DuckDB"}, None),
    ("doc2", {"text": "Columnar storage engines"}, None),
]
db.insert(documents)

# Save to disk using Parquet export/import
db.save("/tmp/my_duckdb")

# Count stored sections
total = db.count()

# Close connection
db.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment