Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Nautechsystems Nautilus trader ParquetDataCatalog Query

From Leeroopedia


Field Value
Sources GitHub: persistence/catalog/parquet.py, GitHub: persistence/catalog/base.py, NautilusTrader Documentation
Domains Data Retrieval, Time-Series Queries, Rust Backend, PyArrow, Catalog API
Last Updated 2026-02-10 12:00 GMT

Overview

Concrete tool for querying typed trading data from a Parquet-backed data catalog provided by NautilusTrader.

Description

The ParquetDataCatalog query system provides both a generic query method and a set of convenience methods (instruments, bars, trade_ticks, quote_ticks, etc.) for retrieving data from the catalog. The generic query method accepts a data class, optional instrument identifiers, optional time-range bounds, and an optional SQL WHERE clause. It automatically selects between a high-performance Rust backend (for built-in NautilusTrader data types on local filesystem) and a PyArrow backend (for Instrument subtypes, custom data, cloud filesystems, or explicit file lists). The convenience methods, defined in BaseDataCatalog, pre-select the data class and delegate to query.

Usage

Use the catalog query methods when you need to:

  • Retrieve historical market data for backtesting (trade ticks, quote ticks, bars, order book data).
  • Load instrument definitions from the catalog.
  • Filter data by instrument, time range, or custom predicates.
  • Feed data into a backtest engine via the catalog interface.

Code Reference

Source Location

Item Value
File (query method) nautilus_trader/persistence/catalog/parquet.py
Lines (query) L1551-1649
File (convenience methods) nautilus_trader/persistence/catalog/base.py
Lines (convenience) L67-177
Class ParquetDataCatalog (extends BaseDataCatalog)

Signature (Generic Query)

def query(
    self,
    data_cls: type,
    identifiers: list[str] | None = None,
    start: TimestampLike | None = None,
    end: TimestampLike | None = None,
    where: str | None = None,
    files: list[str] | None = None,
    **kwargs: Any,
) -> list[Data | CustomData]: ...

Signatures (Convenience Methods)

def instruments(
    self,
    instrument_type: type | None = None,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[Instrument]: ...

def trade_ticks(
    self,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[TradeTick]: ...

def quote_ticks(
    self,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[QuoteTick]: ...

def bars(
    self,
    bar_types: list[str] | None = None,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[Bar]: ...

def order_book_deltas(
    self,
    instrument_ids: list[str] | None = None,
    batched: bool = False,
    **kwargs: Any,
) -> list[OrderBookDelta] | list[OrderBookDeltas]: ...

def order_book_depth10(
    self,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[OrderBookDepth10]: ...

def instrument_status(
    self,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[InstrumentStatus]: ...

def funding_rates(
    self,
    instrument_ids: list[str] | None = None,
    **kwargs: Any,
) -> list[FundingRateUpdate]: ...

Import

from nautilus_trader.persistence.catalog import ParquetDataCatalog

I/O Contract

Inputs (query method)

Parameter Type Default Description
data_cls type (required) The data class to query for (e.g., TradeTick, QuoteTick, Bar, CurrencyPair).
identifiers list[str] ¦ None None Filter by instrument IDs or bar types. If None, returns data for all instruments.
start int ¦ str ¦ float ¦ None None Start of the query time range. Accepts nanosecond timestamps, ISO 8601 strings, or float timestamps.
end int ¦ str ¦ float ¦ None None End of the query time range. Same format as start.
where str ¦ None None Additional SQL WHERE clause for row-level filtering (used in Rust backend).
files list[str] ¦ None None Explicit list of Parquet files to query from. Bypasses file discovery. Forces PyArrow backend.

Outputs

Method Return Type Description
query(data_cls) list[Data ¦ CustomData] Typed data objects matching the query filters. Non-Nautilus types are wrapped in CustomData.
instruments() list[Instrument] All instrument definitions (queries across all Instrument subclasses).
trade_ticks() list[TradeTick] Trade tick data for specified instruments and time range.
quote_ticks() list[QuoteTick] Quote tick data for specified instruments and time range.
bars() list[Bar] OHLCV bar data for specified bar types and time range.
order_book_deltas() list[OrderBookDelta] ¦ list[OrderBookDeltas] Order book deltas, optionally batched into grouped objects.
order_book_depth10() list[OrderBookDepth10] Depth-10 order book snapshots.
funding_rates() list[FundingRateUpdate] Funding rate updates for perpetual contracts.

Backend Selection Logic

Condition Backend Used Reason
data_cls is OrderBookDelta, QuoteTick, TradeTick, Bar, OrderBookDepth10, OrderBookDeltas, or MarkPriceUpdate and files is None Rust (_query_rust) Maximum performance via zero-copy deserialization
data_cls is Instrument subclass or custom type PyArrow (_query_pyarrow) Broader type support
files is provided PyArrow (_query_pyarrow) Rust backend does not support custom file lists

Usage Examples

Querying Trade Ticks

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Get all trade ticks for a specific instrument
trades = catalog.trade_ticks(
    instrument_ids=["BTCUSDT.BINANCE"],
)
print(f"Total trades: {len(trades)}")

Querying with Time Range

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Get quote ticks for a specific time window
quotes = catalog.quote_ticks(
    instrument_ids=["ETHUSDT.BINANCE"],
    start="2024-01-01",
    end="2024-01-31",
)
print(f"Quotes in January 2024: {len(quotes)}")

Querying Instruments

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Get all instruments in the catalog (across all subclasses)
instruments = catalog.instruments()
for inst in instruments:
    print(f"{inst.id} -> {type(inst).__name__}")

# Filter by specific instrument type
from nautilus_trader.model.instruments import CurrencyPair
crypto_pairs = catalog.instruments(instrument_type=CurrencyPair)

Querying Bars

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Query bars by bar type identifier
bars = catalog.bars(
    bar_types=["BTCUSDT.BINANCE-1-MINUTE-LAST-EXTERNAL"],
    start="2024-01-01",
    end="2024-02-01",
)
print(f"Loaded {len(bars)} 1-minute bars")

Generic Query with WHERE Clause

from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.model.data import TradeTick

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Use the generic query with an additional SQL filter
trades = catalog.query(
    data_cls=TradeTick,
    identifiers=["BTCUSDT.BINANCE"],
    start=1704067200000000000,
    end=1704153600000000000,
    where="size > 1.0",
)
print(f"Large trades: {len(trades)}")

Querying Order Book Deltas (Batched)

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Get order book deltas grouped into OrderBookDeltas batches
deltas_batched = catalog.order_book_deltas(
    instrument_ids=["BTCUSDT.BINANCE"],
    batched=True,
    start="2024-01-01",
    end="2024-01-02",
)
print(f"Batched delta groups: {len(deltas_batched)}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment