Overview
Concrete tool for querying typed trading data from a Parquet-backed data catalog provided by NautilusTrader.
Description
The ParquetDataCatalog query system provides both a generic query method and a set of convenience methods (instruments, bars, trade_ticks, quote_ticks, etc.) for retrieving data from the catalog. The generic query method accepts a data class, optional instrument identifiers, optional time-range bounds, and an optional SQL WHERE clause. It automatically selects between a high-performance Rust backend (for built-in NautilusTrader data types on local filesystem) and a PyArrow backend (for Instrument subtypes, custom data, cloud filesystems, or explicit file lists). The convenience methods, defined in BaseDataCatalog, pre-select the data class and delegate to query.
Usage
Use the catalog query methods when you need to:
- Retrieve historical market data for backtesting (trade ticks, quote ticks, bars, order book data).
- Load instrument definitions from the catalog.
- Filter data by instrument, time range, or custom predicates.
- Feed data into a backtest engine via the catalog interface.
Code Reference
Source Location
| Item |
Value
|
| File (query method) |
nautilus_trader/persistence/catalog/parquet.py
|
| Lines (query) |
L1551-1649
|
| File (convenience methods) |
nautilus_trader/persistence/catalog/base.py
|
| Lines (convenience) |
L67-177
|
| Class |
ParquetDataCatalog (extends BaseDataCatalog)
|
Signature (Generic Query)
def query(
self,
data_cls: type,
identifiers: list[str] | None = None,
start: TimestampLike | None = None,
end: TimestampLike | None = None,
where: str | None = None,
files: list[str] | None = None,
**kwargs: Any,
) -> list[Data | CustomData]: ...
Signatures (Convenience Methods)
def instruments(
self,
instrument_type: type | None = None,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[Instrument]: ...
def trade_ticks(
self,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[TradeTick]: ...
def quote_ticks(
self,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[QuoteTick]: ...
def bars(
self,
bar_types: list[str] | None = None,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[Bar]: ...
def order_book_deltas(
self,
instrument_ids: list[str] | None = None,
batched: bool = False,
**kwargs: Any,
) -> list[OrderBookDelta] | list[OrderBookDeltas]: ...
def order_book_depth10(
self,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[OrderBookDepth10]: ...
def instrument_status(
self,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[InstrumentStatus]: ...
def funding_rates(
self,
instrument_ids: list[str] | None = None,
**kwargs: Any,
) -> list[FundingRateUpdate]: ...
Import
from nautilus_trader.persistence.catalog import ParquetDataCatalog
I/O Contract
Inputs (query method)
| Parameter |
Type |
Default |
Description
|
data_cls |
type |
(required) |
The data class to query for (e.g., TradeTick, QuoteTick, Bar, CurrencyPair).
|
identifiers |
list[str] ¦ None |
None |
Filter by instrument IDs or bar types. If None, returns data for all instruments.
|
start |
int ¦ str ¦ float ¦ None |
None |
Start of the query time range. Accepts nanosecond timestamps, ISO 8601 strings, or float timestamps.
|
end |
int ¦ str ¦ float ¦ None |
None |
End of the query time range. Same format as start.
|
where |
str ¦ None |
None |
Additional SQL WHERE clause for row-level filtering (used in Rust backend).
|
files |
list[str] ¦ None |
None |
Explicit list of Parquet files to query from. Bypasses file discovery. Forces PyArrow backend.
|
Outputs
| Method |
Return Type |
Description
|
query(data_cls) |
list[Data ¦ CustomData] |
Typed data objects matching the query filters. Non-Nautilus types are wrapped in CustomData.
|
instruments() |
list[Instrument] |
All instrument definitions (queries across all Instrument subclasses).
|
trade_ticks() |
list[TradeTick] |
Trade tick data for specified instruments and time range.
|
quote_ticks() |
list[QuoteTick] |
Quote tick data for specified instruments and time range.
|
bars() |
list[Bar] |
OHLCV bar data for specified bar types and time range.
|
order_book_deltas() |
list[OrderBookDelta] ¦ list[OrderBookDeltas] |
Order book deltas, optionally batched into grouped objects.
|
order_book_depth10() |
list[OrderBookDepth10] |
Depth-10 order book snapshots.
|
funding_rates() |
list[FundingRateUpdate] |
Funding rate updates for perpetual contracts.
|
Backend Selection Logic
| Condition |
Backend Used |
Reason
|
data_cls is OrderBookDelta, QuoteTick, TradeTick, Bar, OrderBookDepth10, OrderBookDeltas, or MarkPriceUpdate and files is None |
Rust (_query_rust) |
Maximum performance via zero-copy deserialization
|
data_cls is Instrument subclass or custom type |
PyArrow (_query_pyarrow) |
Broader type support
|
files is provided |
PyArrow (_query_pyarrow) |
Rust backend does not support custom file lists
|
Usage Examples
Querying Trade Ticks
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Get all trade ticks for a specific instrument
trades = catalog.trade_ticks(
instrument_ids=["BTCUSDT.BINANCE"],
)
print(f"Total trades: {len(trades)}")
Querying with Time Range
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Get quote ticks for a specific time window
quotes = catalog.quote_ticks(
instrument_ids=["ETHUSDT.BINANCE"],
start="2024-01-01",
end="2024-01-31",
)
print(f"Quotes in January 2024: {len(quotes)}")
Querying Instruments
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Get all instruments in the catalog (across all subclasses)
instruments = catalog.instruments()
for inst in instruments:
print(f"{inst.id} -> {type(inst).__name__}")
# Filter by specific instrument type
from nautilus_trader.model.instruments import CurrencyPair
crypto_pairs = catalog.instruments(instrument_type=CurrencyPair)
Querying Bars
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Query bars by bar type identifier
bars = catalog.bars(
bar_types=["BTCUSDT.BINANCE-1-MINUTE-LAST-EXTERNAL"],
start="2024-01-01",
end="2024-02-01",
)
print(f"Loaded {len(bars)} 1-minute bars")
Generic Query with WHERE Clause
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.model.data import TradeTick
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Use the generic query with an additional SQL filter
trades = catalog.query(
data_cls=TradeTick,
identifiers=["BTCUSDT.BINANCE"],
start=1704067200000000000,
end=1704153600000000000,
where="size > 1.0",
)
print(f"Large trades: {len(trades)}")
Querying Order Book Deltas (Batched)
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Get order book deltas grouped into OrderBookDeltas batches
deltas_batched = catalog.order_book_deltas(
instrument_ids=["BTCUSDT.BINANCE"],
batched=True,
start="2024-01-01",
end="2024-01-02",
)
print(f"Batched delta groups: {len(deltas_batched)}")
Related Pages