Overview
Concrete tool for writing typed trading data to a Parquet-backed data catalog provided by NautilusTrader.
Description
The write_data method of ParquetDataCatalog accepts a list of NautilusTrader Data or Event objects and persists them to the catalog's Parquet file store. The method automatically classifies objects by their type and instrument identity, sorts and groups them, validates temporal monotonicity and interval disjointness, serializes each group to an Arrow table, and writes the result as a Parquet file. The internal _write_chunk method handles individual type-identifier groups, while _objects_to_table performs the actual Arrow serialization with ordering validation. File naming uses nanosecond Unix timestamps to encode each file's time coverage, enabling efficient file-level filtering during queries.
Usage
Call write_data when you need to:
- Persist market data loaded from external sources (Databento, Tardis, etc.) to a catalog.
- Write instrument definitions for future retrieval during backtesting.
- Incrementally add new time-range chunks of data to an existing catalog.
- Build a complete historical data store for backtesting.
Code Reference
Source Location
| Item |
Value
|
| File |
nautilus_trader/persistence/catalog/parquet.py
|
| Lines |
L241-385
|
| Method |
ParquetDataCatalog.write_data
|
| Internal Methods |
_write_chunk (L322-361), _objects_to_table (L363-385)
|
Signature
def write_data(
self,
data: list[Data | Event] | list[NautilusRustDataType],
start: int | None = None,
end: int | None = None,
skip_disjoint_check: bool = False,
) -> None: ...
Import
from nautilus_trader.persistence.catalog import ParquetDataCatalog
I/O Contract
Inputs
| Parameter |
Type |
Default |
Description
|
data |
list[Data ¦ Event] |
(required) |
List of typed data or event objects to persist. Must be non-empty. All objects of the same type must have monotonically non-decreasing ts_init values.
|
start |
int ¦ None |
None |
Override start timestamp (nanoseconds) for the file name. Defaults to the first object's ts_init.
|
end |
int ¦ None |
None |
Override end timestamp (nanoseconds) for the file name. Defaults to the last object's ts_init.
|
skip_disjoint_check |
bool |
False |
If True, skip validation that new file intervals do not overlap existing files.
|
Outputs
| Output |
Type |
Description
|
| Return value |
None |
Data is written to disk as Parquet files. No return value.
|
Side Effects
| Effect |
Description
|
| Directory creation |
Creates subdirectories under the catalog root for each data type and identifier (e.g., {root}/QuoteTick/BTCUSDT.BINANCE/).
|
| Parquet file creation |
Writes one Parquet file per type-identifier group, named {start_ns}-{end_ns}.parquet.
|
| Existing file skip |
If a file with the same name already exists, the write is skipped with a console message.
|
Errors
| Exception |
Condition
|
ValueError |
Data of the same type is not monotonically non-decreasing by ts_init.
|
ValueError |
Writing would create overlapping time intervals with existing files (unless skip_disjoint_check is True).
|
Usage Examples
Writing Trade Ticks to the Catalog
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader
# Initialize catalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Load data from external source
loader = DatabentoDataLoader()
trades = loader.from_dbn_file(
path="data/AAPL_trades.dbn.zst",
as_legacy_cython=True,
)
# Write to catalog
catalog.write_data(trades)
# Creates: /data/my_catalog/TradeTick/AAPL.GLBX/{start}-{end}.parquet
Writing Instrument Definitions
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.test_kit.providers import TestInstrumentProvider
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Create instrument definitions
instruments = [
TestInstrumentProvider.btcusdt_binance(),
TestInstrumentProvider.ethusdt_binance(),
]
# Write instruments to catalog
catalog.write_data(instruments)
# Creates: /data/my_catalog/CurrencyPair/BTCUSDT.BINANCE/0-0.parquet
# /data/my_catalog/CurrencyPair/ETHUSDT.BINANCE/0-0.parquet
Writing Mixed Data Types
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# A mixed list of quotes and trades (automatically sorted and grouped)
mixed_data = quotes + trades # QuoteTick and TradeTick objects
catalog.write_data(mixed_data)
# Automatically creates separate directories:
# /data/my_catalog/QuoteTick/{instrument_id}/...
# /data/my_catalog/TradeTick/{instrument_id}/...
Writing with Explicit Time Range
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Specify explicit start/end for the file name
catalog.write_data(
data=bars,
start=1704067200000000000, # 2024-01-01 00:00:00 UTC in nanoseconds
end=1704153600000000000, # 2024-01-02 00:00:00 UTC in nanoseconds
)
Skipping the Disjoint Check
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(path="/data/my_catalog")
# Skip interval validation when you guarantee non-overlapping data
catalog.write_data(
data=trades,
skip_disjoint_check=True,
)
Related Pages