Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Nautechsystems Nautilus trader ParquetDataCatalog Write Data

From Leeroopedia


Field Value
Sources GitHub: persistence/catalog/parquet.py, NautilusTrader Documentation
Domains Data Persistence, Apache Parquet, Arrow Serialization, Catalog Management
Last Updated 2026-02-10 12:00 GMT

Overview

Concrete tool for writing typed trading data to a Parquet-backed data catalog provided by NautilusTrader.

Description

The write_data method of ParquetDataCatalog accepts a list of NautilusTrader Data or Event objects and persists them to the catalog's Parquet file store. The method automatically classifies objects by their type and instrument identity, sorts and groups them, validates temporal monotonicity and interval disjointness, serializes each group to an Arrow table, and writes the result as a Parquet file. The internal _write_chunk method handles individual type-identifier groups, while _objects_to_table performs the actual Arrow serialization with ordering validation. File naming uses nanosecond Unix timestamps to encode each file's time coverage, enabling efficient file-level filtering during queries.

Usage

Call write_data when you need to:

  • Persist market data loaded from external sources (Databento, Tardis, etc.) to a catalog.
  • Write instrument definitions for future retrieval during backtesting.
  • Incrementally add new time-range chunks of data to an existing catalog.
  • Build a complete historical data store for backtesting.

Code Reference

Source Location

Item Value
File nautilus_trader/persistence/catalog/parquet.py
Lines L241-385
Method ParquetDataCatalog.write_data
Internal Methods _write_chunk (L322-361), _objects_to_table (L363-385)

Signature

def write_data(
    self,
    data: list[Data | Event] | list[NautilusRustDataType],
    start: int | None = None,
    end: int | None = None,
    skip_disjoint_check: bool = False,
) -> None: ...

Import

from nautilus_trader.persistence.catalog import ParquetDataCatalog

I/O Contract

Inputs

Parameter Type Default Description
data list[Data ¦ Event] (required) List of typed data or event objects to persist. Must be non-empty. All objects of the same type must have monotonically non-decreasing ts_init values.
start int ¦ None None Override start timestamp (nanoseconds) for the file name. Defaults to the first object's ts_init.
end int ¦ None None Override end timestamp (nanoseconds) for the file name. Defaults to the last object's ts_init.
skip_disjoint_check bool False If True, skip validation that new file intervals do not overlap existing files.

Outputs

Output Type Description
Return value None Data is written to disk as Parquet files. No return value.

Side Effects

Effect Description
Directory creation Creates subdirectories under the catalog root for each data type and identifier (e.g., {root}/QuoteTick/BTCUSDT.BINANCE/).
Parquet file creation Writes one Parquet file per type-identifier group, named {start_ns}-{end_ns}.parquet.
Existing file skip If a file with the same name already exists, the write is skipped with a console message.

Errors

Exception Condition
ValueError Data of the same type is not monotonically non-decreasing by ts_init.
ValueError Writing would create overlapping time intervals with existing files (unless skip_disjoint_check is True).

Usage Examples

Writing Trade Ticks to the Catalog

from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader

# Initialize catalog
catalog = ParquetDataCatalog(path="/data/my_catalog")

# Load data from external source
loader = DatabentoDataLoader()
trades = loader.from_dbn_file(
    path="data/AAPL_trades.dbn.zst",
    as_legacy_cython=True,
)

# Write to catalog
catalog.write_data(trades)
# Creates: /data/my_catalog/TradeTick/AAPL.GLBX/{start}-{end}.parquet

Writing Instrument Definitions

from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.test_kit.providers import TestInstrumentProvider

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Create instrument definitions
instruments = [
    TestInstrumentProvider.btcusdt_binance(),
    TestInstrumentProvider.ethusdt_binance(),
]

# Write instruments to catalog
catalog.write_data(instruments)
# Creates: /data/my_catalog/CurrencyPair/BTCUSDT.BINANCE/0-0.parquet
#          /data/my_catalog/CurrencyPair/ETHUSDT.BINANCE/0-0.parquet

Writing Mixed Data Types

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# A mixed list of quotes and trades (automatically sorted and grouped)
mixed_data = quotes + trades  # QuoteTick and TradeTick objects

catalog.write_data(mixed_data)
# Automatically creates separate directories:
#   /data/my_catalog/QuoteTick/{instrument_id}/...
#   /data/my_catalog/TradeTick/{instrument_id}/...

Writing with Explicit Time Range

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Specify explicit start/end for the file name
catalog.write_data(
    data=bars,
    start=1704067200000000000,   # 2024-01-01 00:00:00 UTC in nanoseconds
    end=1704153600000000000,     # 2024-01-02 00:00:00 UTC in nanoseconds
)

Skipping the Disjoint Check

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Skip interval validation when you guarantee non-overlapping data
catalog.write_data(
    data=trades,
    skip_disjoint_check=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment