Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai SQL Aggregate

From Leeroopedia


Knowledge Sources
Domains SQL_Processing, Distributed_Query
Last Updated 2026-02-09 17:00 GMT

Overview

Aggregate is a SQL-aware result aggregator that combines and sorts partial query results from sharded indexes, handling aggregate functions, GROUP BY, and ORDER BY clauses.

Description

The Aggregate class inherits from SQL and is designed to merge partial result sets that arrive from distributed query execution across multiple index shards. It parses SQL queries to detect aggregate functions (COUNT, SUM, TOTAL, MAX, MIN, AVG), groups results using GROUP BY clauses, applies the appropriate aggregate computations across partial results, and sorts results according to ORDER BY clauses. When no SQL-specific aggregation is needed, it falls back to sorting by the score column in descending order.

Usage

Use Aggregate when merging search results from multiple txtai shards that may include SQL aggregate queries. It is used internally by the Cluster class to combine results from distributed search but can also be used directly when building custom distributed query orchestration.

Code Reference

Source Location

Signature

class Aggregate(SQL):
    def __init__(self, database=None):
        """
        Creates a new Aggregate instance.

        Args:
            database: optional database reference
        """

    def __call__(self, query, results):
        """
        Analyzes query results, combines aggregate function results and applies ordering.

        Args:
            query: input query
            results: query results

        Returns:
            aggregated query results
        """

Import

from txtai.database.sql import Aggregate

Key Methods

Method Description
__call__(query, results) Main entry point. Parses the SQL query, detects aggregate columns and ordering, applies aggregation and sorting, and returns merged results.
aggcolumns(columns) Inspects column names for SQL aggregate function prefixes (count(, sum(, total(, max(, min(, avg() and maps each to the appropriate Python function.
aggregate(query, results, columns, aggcolumns) Groups results (if GROUP BY is present), then computes aggregate values for aggregate columns while preserving the first value for non-aggregate columns.
groupby(query, results, columns) Groups results by the columns specified in the query's GROUP BY clause using itertools.groupby.
orderby(query, results) Sorts results according to the ORDER BY clause, supporting both ASC and DESC directions with multi-column sorting.
defaultsort(results) Default sorting when no ORDER BY is specified. Sorts by the score column descending, if present.

I/O Contract

Inputs

Name Type Required Description
database object No Optional database reference passed to the parent SQL class. Defaults to None.
query str Yes The SQL query string or plain text query. Parsed to detect SELECT, GROUP BY, and ORDER BY clauses.
results list of dict Yes Partial query results from multiple shards. Each dict represents a row with column name keys and their values.

Outputs

Name Type Description
results list of dict Aggregated and sorted results. Aggregate columns contain computed values (sum, count, max, min, avg). Rows are sorted per ORDER BY or by score descending.

Supported Aggregate Functions

SQL Function Python Implementation Description
COUNT(...) sum Sums partial counts from each shard.
SUM(...) sum Sums partial sums from each shard.
TOTAL(...) sum Sums partial totals from each shard.
MAX(...) max Takes the maximum across all shard results.
MIN(...) min Takes the minimum across all shard results.
AVG(...) sum(x) / len(x) Computes the average across all partial results.

Usage Examples

Basic Usage

from txtai.database.sql import Aggregate

# Create aggregator
agg = Aggregate()

# Simulate partial results from two shards for a standard search
query = "machine learning"
results = [
    {"id": "doc1", "score": 0.95, "text": "Machine learning basics"},
    {"id": "doc3", "score": 0.88, "text": "Deep learning fundamentals"},
    {"id": "doc2", "score": 0.92, "text": "ML algorithms overview"},
    {"id": "doc4", "score": 0.85, "text": "Neural networks intro"},
]

# Aggregate and sort by score
sorted_results = agg(query, results)
for r in sorted_results:
    print(f"ID: {r['id']}, Score: {r['score']:.4f}")

SQL Aggregate Query

from txtai.database.sql import Aggregate

agg = Aggregate()

# Simulate a SQL aggregate query with GROUP BY
query = "SELECT category, count(*) count, avg(score) avg_score FROM txtai GROUP BY category ORDER BY count DESC"
results = [
    {"category": "ml", "count(*)": 10, "avg(score)": 0.85},
    {"category": "nlp", "count(*)": 5, "avg(score)": 0.90},
    {"category": "ml", "count(*)": 8, "avg(score)": 0.82},
    {"category": "nlp", "count(*)": 7, "avg(score)": 0.88},
]

# Aggregate merges counts and averages, groups by category
merged = agg(query, results)
for r in merged:
    print(r)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment