Overview
Aggregate is a SQL-aware result aggregator that combines and sorts partial query results from sharded indexes, handling aggregate functions, GROUP BY, and ORDER BY clauses.
Description
The Aggregate class inherits from SQL and is designed to merge partial result sets that arrive from distributed query execution across multiple index shards. It parses SQL queries to detect aggregate functions (COUNT, SUM, TOTAL, MAX, MIN, AVG), groups results using GROUP BY clauses, applies the appropriate aggregate computations across partial results, and sorts results according to ORDER BY clauses. When no SQL-specific aggregation is needed, it falls back to sorting by the score column in descending order.
Usage
Use Aggregate when merging search results from multiple txtai shards that may include SQL aggregate queries. It is used internally by the Cluster class to combine results from distributed search but can also be used directly when building custom distributed query orchestration.
Code Reference
Source Location
Signature
class Aggregate(SQL):
def __init__(self, database=None):
"""
Creates a new Aggregate instance.
Args:
database: optional database reference
"""
def __call__(self, query, results):
"""
Analyzes query results, combines aggregate function results and applies ordering.
Args:
query: input query
results: query results
Returns:
aggregated query results
"""
Import
from txtai.database.sql import Aggregate
Key Methods
| Method |
Description
|
__call__(query, results) |
Main entry point. Parses the SQL query, detects aggregate columns and ordering, applies aggregation and sorting, and returns merged results.
|
aggcolumns(columns) |
Inspects column names for SQL aggregate function prefixes (count(, sum(, total(, max(, min(, avg() and maps each to the appropriate Python function.
|
aggregate(query, results, columns, aggcolumns) |
Groups results (if GROUP BY is present), then computes aggregate values for aggregate columns while preserving the first value for non-aggregate columns.
|
groupby(query, results, columns) |
Groups results by the columns specified in the query's GROUP BY clause using itertools.groupby.
|
orderby(query, results) |
Sorts results according to the ORDER BY clause, supporting both ASC and DESC directions with multi-column sorting.
|
defaultsort(results) |
Default sorting when no ORDER BY is specified. Sorts by the score column descending, if present.
|
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| database |
object |
No |
Optional database reference passed to the parent SQL class. Defaults to None.
|
| query |
str |
Yes |
The SQL query string or plain text query. Parsed to detect SELECT, GROUP BY, and ORDER BY clauses.
|
| results |
list of dict |
Yes |
Partial query results from multiple shards. Each dict represents a row with column name keys and their values.
|
Outputs
| Name |
Type |
Description
|
| results |
list of dict |
Aggregated and sorted results. Aggregate columns contain computed values (sum, count, max, min, avg). Rows are sorted per ORDER BY or by score descending.
|
Supported Aggregate Functions
| SQL Function |
Python Implementation |
Description
|
COUNT(...) |
sum |
Sums partial counts from each shard.
|
SUM(...) |
sum |
Sums partial sums from each shard.
|
TOTAL(...) |
sum |
Sums partial totals from each shard.
|
MAX(...) |
max |
Takes the maximum across all shard results.
|
MIN(...) |
min |
Takes the minimum across all shard results.
|
AVG(...) |
sum(x) / len(x) |
Computes the average across all partial results.
|
Usage Examples
Basic Usage
from txtai.database.sql import Aggregate
# Create aggregator
agg = Aggregate()
# Simulate partial results from two shards for a standard search
query = "machine learning"
results = [
{"id": "doc1", "score": 0.95, "text": "Machine learning basics"},
{"id": "doc3", "score": 0.88, "text": "Deep learning fundamentals"},
{"id": "doc2", "score": 0.92, "text": "ML algorithms overview"},
{"id": "doc4", "score": 0.85, "text": "Neural networks intro"},
]
# Aggregate and sort by score
sorted_results = agg(query, results)
for r in sorted_results:
print(f"ID: {r['id']}, Score: {r['score']:.4f}")
SQL Aggregate Query
from txtai.database.sql import Aggregate
agg = Aggregate()
# Simulate a SQL aggregate query with GROUP BY
query = "SELECT category, count(*) count, avg(score) avg_score FROM txtai GROUP BY category ORDER BY count DESC"
results = [
{"category": "ml", "count(*)": 10, "avg(score)": 0.85},
{"category": "nlp", "count(*)": 5, "avg(score)": 0.90},
{"category": "ml", "count(*)": 8, "avg(score)": 0.82},
{"category": "nlp", "count(*)": 7, "avg(score)": 0.88},
]
# Aggregate merges counts and averages, groups by category
merged = agg(query, results)
for r in merged:
print(r)
Related Pages