Implementation:Neuml Txtai SQL Aggregate

Knowledge Sources	Neuml_Txtai
Domains	SQL_Processing, Distributed_Query
Last Updated	2026-02-09 17:00 GMT

Overview

Aggregate is a SQL-aware result aggregator that combines and sorts partial query results from sharded indexes, handling aggregate functions, GROUP BY, and ORDER BY clauses.

Description

The Aggregate class inherits from SQL and is designed to merge partial result sets that arrive from distributed query execution across multiple index shards. It parses SQL queries to detect aggregate functions (COUNT, SUM, TOTAL, MAX, MIN, AVG), groups results using GROUP BY clauses, applies the appropriate aggregate computations across partial results, and sorts results according to ORDER BY clauses. When no SQL-specific aggregation is needed, it falls back to sorting by the score column in descending order.

Usage

Use Aggregate when merging search results from multiple txtai shards that may include SQL aggregate queries. It is used internally by the Cluster class to combine results from distributed search but can also be used directly when building custom distributed query orchestration.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/database/sql/aggregate.py
Lines: 1-178

Signature

class Aggregate(SQL):
    def __init__(self, database=None):
        """
        Creates a new Aggregate instance.

        Args:
            database: optional database reference
        """

    def __call__(self, query, results):
        """
        Analyzes query results, combines aggregate function results and applies ordering.

        Args:
            query: input query
            results: query results

        Returns:
            aggregated query results
        """

Import

from txtai.database.sql import Aggregate

Key Methods

Method	Description
`__call__(query, results)`	Main entry point. Parses the SQL query, detects aggregate columns and ordering, applies aggregation and sorting, and returns merged results.
`aggcolumns(columns)`	Inspects column names for SQL aggregate function prefixes (`count(`, `sum(`, `total(`, `max(`, `min(`, `avg(`) and maps each to the appropriate Python function.
`aggregate(query, results, columns, aggcolumns)`	Groups results (if GROUP BY is present), then computes aggregate values for aggregate columns while preserving the first value for non-aggregate columns.
`groupby(query, results, columns)`	Groups results by the columns specified in the query's GROUP BY clause using `itertools.groupby`.
`orderby(query, results)`	Sorts results according to the ORDER BY clause, supporting both ASC and DESC directions with multi-column sorting.
`defaultsort(results)`	Default sorting when no ORDER BY is specified. Sorts by the `score` column descending, if present.

I/O Contract

Inputs

Name	Type	Required	Description
database	object	No	Optional database reference passed to the parent `SQL` class. Defaults to None.
query	str	Yes	The SQL query string or plain text query. Parsed to detect SELECT, GROUP BY, and ORDER BY clauses.
results	list of dict	Yes	Partial query results from multiple shards. Each dict represents a row with column name keys and their values.

Outputs

Name	Type	Description
results	list of dict	Aggregated and sorted results. Aggregate columns contain computed values (sum, count, max, min, avg). Rows are sorted per ORDER BY or by score descending.

Supported Aggregate Functions

SQL Function	Python Implementation	Description
`COUNT(...)`	`sum`	Sums partial counts from each shard.
`SUM(...)`	`sum`	Sums partial sums from each shard.
`TOTAL(...)`	`sum`	Sums partial totals from each shard.
`MAX(...)`	`max`	Takes the maximum across all shard results.
`MIN(...)`	`min`	Takes the minimum across all shard results.
`AVG(...)`	`sum(x) / len(x)`	Computes the average across all partial results.

Usage Examples

Basic Usage

from txtai.database.sql import Aggregate

# Create aggregator
agg = Aggregate()

# Simulate partial results from two shards for a standard search
query = "machine learning"
results = [
    {"id": "doc1", "score": 0.95, "text": "Machine learning basics"},
    {"id": "doc3", "score": 0.88, "text": "Deep learning fundamentals"},
    {"id": "doc2", "score": 0.92, "text": "ML algorithms overview"},
    {"id": "doc4", "score": 0.85, "text": "Neural networks intro"},
]

# Aggregate and sort by score
sorted_results = agg(query, results)
for r in sorted_results:
    print(f"ID: {r['id']}, Score: {r['score']:.4f}")

SQL Aggregate Query

from txtai.database.sql import Aggregate

agg = Aggregate()

# Simulate a SQL aggregate query with GROUP BY
query = "SELECT category, count(*) count, avg(score) avg_score FROM txtai GROUP BY category ORDER BY count DESC"
results = [
    {"category": "ml", "count(*)": 10, "avg(score)": 0.85},
    {"category": "nlp", "count(*)": 5, "avg(score)": 0.90},
    {"category": "ml", "count(*)": 8, "avg(score)": 0.82},
    {"category": "nlp", "count(*)": 7, "avg(score)": 0.88},
]

# Aggregate merges counts and averages, groups by category
merged = agg(query, results)
for r in merged:
    print(r)

Related Pages

Principle:Neuml_Txtai_SQL_Query_Processing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment