Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Nautechsystems Nautilus trader ParquetDataCatalog Init

From Leeroopedia


Field Value
Sources GitHub: persistence/catalog/parquet.py, NautilusTrader Documentation
Domains Data Storage, Apache Parquet, Catalog Management, Filesystem Abstraction
Last Updated 2026-02-10 12:00 GMT

Overview

Concrete tool for initializing a Parquet-backed data catalog provided by NautilusTrader.

Description

The ParquetDataCatalog class provides a queryable data catalog that persists trading data to files in Apache Parquet (Arrow) format. Initialization configures the root storage path, the filesystem protocol (local, S3, GCS, Azure, or in-memory), the Arrow serializer, and the Parquet row group size. The class extends BaseDataCatalog (a singleton abstract base class) and uses fsspec for filesystem abstraction, allowing the same API to work transparently across different storage backends. The catalog also supports a Rust-backed query engine for high-performance data retrieval of built-in NautilusTrader data types.

Usage

Import and instantiate ParquetDataCatalog when you need to:

  • Create a new data catalog directory for persisting market data.
  • Open an existing catalog for querying or appending data.
  • Configure cloud storage backends (S3, GCS) for distributed data access.
  • Set up an in-memory catalog for unit testing.

Code Reference

Source Location

Item Value
File nautilus_trader/persistence/catalog/parquet.py
Lines L92-164
Class ParquetDataCatalog
Parent Class BaseDataCatalog (from nautilus_trader/persistence/catalog/base.py)

Signature

class ParquetDataCatalog(BaseDataCatalog):
    def __init__(
        self,
        path: PathLike[str] | str,
        fs_protocol: str | None = "file",
        fs_storage_options: dict | None = None,
        fs_rust_storage_options: dict | None = None,
        max_rows_per_group: int = 5_000,
        show_query_paths: bool = False,
    ) -> None: ...

    @classmethod
    def from_env(cls) -> ParquetDataCatalog: ...

    @classmethod
    def from_uri(
        cls,
        uri: str,
        fs_storage_options: dict[str, str] | None = None,
        fs_rust_storage_options: dict[str, str] | None = None,
    ) -> ParquetDataCatalog: ...

Import

from nautilus_trader.persistence.catalog import ParquetDataCatalog

I/O Contract

Inputs

Parameter Type Default Description
path PathLike[str] ¦ str (required) Root path for the catalog. Must be an absolute path for local filesystem.
fs_protocol str ¦ None "file" Filesystem protocol for fsspec: "file" (local), "s3" (AWS S3), "gcs" (Google Cloud), "memory" (in-memory).
fs_storage_options dict ¦ None None Provider-specific storage options (credentials, endpoint URLs, etc.).
fs_rust_storage_options dict ¦ None None Storage options specifically for the Rust backend. Defaults to fs_storage_options if not specified.
max_rows_per_group int 5000 Maximum number of rows per Parquet row group. Controls write batching and query granularity.
show_query_paths bool False If True, print globbed query file paths to stdout for debugging.

Outputs

Output Type Description
Return value ParquetDataCatalog Initialized catalog instance with configured filesystem, serializer, and path.

Key Instance Attributes

Attribute Type Description
path str Normalized absolute path to the catalog root directory.
fs_protocol str The resolved filesystem protocol string.
fs fsspec.AbstractFileSystem The initialized fsspec filesystem instance.
serializer ArrowSerializer Serializer for converting NautilusTrader objects to/from Arrow tables.
max_rows_per_group int Configured Parquet row group size limit.

Usage Examples

Basic Local Catalog

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Initialize a local catalog
catalog = ParquetDataCatalog(path="/data/nautilus_catalog")

print(catalog.path)          # /data/nautilus_catalog
print(catalog.fs_protocol)   # file

Catalog from URI

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Create from a local URI
catalog = ParquetDataCatalog.from_uri("file:///data/nautilus_catalog")

# Create from an S3 URI with credentials
catalog = ParquetDataCatalog.from_uri(
    uri="s3://my-bucket/nautilus_catalog",
    fs_storage_options={
        "key": "AWS_ACCESS_KEY_ID",
        "secret": "AWS_SECRET_ACCESS_KEY",
        "endpoint_url": "https://s3.amazonaws.com",
    },
)

Catalog from Environment Variable

import os
from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Set NAUTILUS_PATH environment variable
os.environ["NAUTILUS_PATH"] = "/home/user/.nautilus"

# Catalog will be created at /home/user/.nautilus/catalog
catalog = ParquetDataCatalog.from_env()

In-Memory Catalog for Testing

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Use an in-memory filesystem for unit tests
catalog = ParquetDataCatalog(
    path="/test_catalog",
    fs_protocol="memory",
)

Catalog with Custom Row Group Size

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Use larger row groups for bulk historical data
catalog = ParquetDataCatalog(
    path="/data/bulk_catalog",
    max_rows_per_group=50_000,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment