Implementation:Nautechsystems Nautilus trader ParquetDataCatalog Write Data

Field	Value
Sources	GitHub: persistence/catalog/parquet.py, NautilusTrader Documentation
Domains	Data Persistence, Apache Parquet, Arrow Serialization, Catalog Management
Last Updated	2026-02-10 12:00 GMT

Overview

Concrete tool for writing typed trading data to a Parquet-backed data catalog provided by NautilusTrader.

Description

The write_data method of ParquetDataCatalog accepts a list of NautilusTrader Data or Event objects and persists them to the catalog's Parquet file store. The method automatically classifies objects by their type and instrument identity, sorts and groups them, validates temporal monotonicity and interval disjointness, serializes each group to an Arrow table, and writes the result as a Parquet file. The internal _write_chunk method handles individual type-identifier groups, while _objects_to_table performs the actual Arrow serialization with ordering validation. File naming uses nanosecond Unix timestamps to encode each file's time coverage, enabling efficient file-level filtering during queries.

Usage

Call write_data when you need to:

Persist market data loaded from external sources (Databento, Tardis, etc.) to a catalog.
Write instrument definitions for future retrieval during backtesting.
Incrementally add new time-range chunks of data to an existing catalog.
Build a complete historical data store for backtesting.

Code Reference

Source Location

Item	Value
File	`nautilus_trader/persistence/catalog/parquet.py`
Lines	L241-385
Method	`ParquetDataCatalog.write_data`
Internal Methods	`_write_chunk` (L322-361), `_objects_to_table` (L363-385)

Signature

def write_data(
    self,
    data: list[Data | Event] | list[NautilusRustDataType],
    start: int | None = None,
    end: int | None = None,
    skip_disjoint_check: bool = False,
) -> None: ...

Import

from nautilus_trader.persistence.catalog import ParquetDataCatalog

I/O Contract

Inputs

Parameter	Type	Default	Description
`data`	`list[Data ¦ Event]`	(required)	List of typed data or event objects to persist. Must be non-empty. All objects of the same type must have monotonically non-decreasing ts_init values.
`start`	`int ¦ None`	`None`	Override start timestamp (nanoseconds) for the file name. Defaults to the first object's ts_init.
`end`	`int ¦ None`	`None`	Override end timestamp (nanoseconds) for the file name. Defaults to the last object's ts_init.
`skip_disjoint_check`	`bool`	`False`	If True, skip validation that new file intervals do not overlap existing files.

Outputs

Output	Type	Description
Return value	`None`	Data is written to disk as Parquet files. No return value.

Side Effects

Effect	Description
Directory creation	Creates subdirectories under the catalog root for each data type and identifier (e.g., `{root}/QuoteTick/BTCUSDT.BINANCE/`).
Parquet file creation	Writes one Parquet file per type-identifier group, named `{start_ns}-{end_ns}.parquet`.
Existing file skip	If a file with the same name already exists, the write is skipped with a console message.

Errors

Exception	Condition
`ValueError`	Data of the same type is not monotonically non-decreasing by ts_init.
`ValueError`	Writing would create overlapping time intervals with existing files (unless skip_disjoint_check is True).

Usage Examples

Writing Trade Ticks to the Catalog

from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader

# Initialize catalog
catalog = ParquetDataCatalog(path="/data/my_catalog")

# Load data from external source
loader = DatabentoDataLoader()
trades = loader.from_dbn_file(
    path="data/AAPL_trades.dbn.zst",
    as_legacy_cython=True,
)

# Write to catalog
catalog.write_data(trades)
# Creates: /data/my_catalog/TradeTick/AAPL.GLBX/{start}-{end}.parquet

Writing Instrument Definitions

from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.test_kit.providers import TestInstrumentProvider

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Create instrument definitions
instruments = [
    TestInstrumentProvider.btcusdt_binance(),
    TestInstrumentProvider.ethusdt_binance(),
]

# Write instruments to catalog
catalog.write_data(instruments)
# Creates: /data/my_catalog/CurrencyPair/BTCUSDT.BINANCE/0-0.parquet
#          /data/my_catalog/CurrencyPair/ETHUSDT.BINANCE/0-0.parquet

Writing Mixed Data Types

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# A mixed list of quotes and trades (automatically sorted and grouped)
mixed_data = quotes + trades  # QuoteTick and TradeTick objects

catalog.write_data(mixed_data)
# Automatically creates separate directories:
#   /data/my_catalog/QuoteTick/{instrument_id}/...
#   /data/my_catalog/TradeTick/{instrument_id}/...

Writing with Explicit Time Range

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Specify explicit start/end for the file name
catalog.write_data(
    data=bars,
    start=1704067200000000000,   # 2024-01-01 00:00:00 UTC in nanoseconds
    end=1704153600000000000,     # 2024-01-02 00:00:00 UTC in nanoseconds
)

Skipping the Disjoint Check

from nautilus_trader.persistence.catalog import ParquetDataCatalog

catalog = ParquetDataCatalog(path="/data/my_catalog")

# Skip interval validation when you guarantee non-overlapping data
catalog.write_data(
    data=trades,
    skip_disjoint_check=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment