Implementation:Nautechsystems Nautilus trader ParquetDataCatalog Init

Field	Value
Sources	GitHub: persistence/catalog/parquet.py, NautilusTrader Documentation
Domains	Data Storage, Apache Parquet, Catalog Management, Filesystem Abstraction
Last Updated	2026-02-10 12:00 GMT

Overview

Concrete tool for initializing a Parquet-backed data catalog provided by NautilusTrader.

Description

The ParquetDataCatalog class provides a queryable data catalog that persists trading data to files in Apache Parquet (Arrow) format. Initialization configures the root storage path, the filesystem protocol (local, S3, GCS, Azure, or in-memory), the Arrow serializer, and the Parquet row group size. The class extends BaseDataCatalog (a singleton abstract base class) and uses fsspec for filesystem abstraction, allowing the same API to work transparently across different storage backends. The catalog also supports a Rust-backed query engine for high-performance data retrieval of built-in NautilusTrader data types.

Usage

Import and instantiate ParquetDataCatalog when you need to:

Create a new data catalog directory for persisting market data.
Open an existing catalog for querying or appending data.
Configure cloud storage backends (S3, GCS) for distributed data access.
Set up an in-memory catalog for unit testing.

Code Reference

Source Location

Item	Value
File	`nautilus_trader/persistence/catalog/parquet.py`
Lines	L92-164
Class	`ParquetDataCatalog`
Parent Class	`BaseDataCatalog` (from `nautilus_trader/persistence/catalog/base.py`)

Signature

class ParquetDataCatalog(BaseDataCatalog):
    def __init__(
        self,
        path: PathLike[str] | str,
        fs_protocol: str | None = "file",
        fs_storage_options: dict | None = None,
        fs_rust_storage_options: dict | None = None,
        max_rows_per_group: int = 5_000,
        show_query_paths: bool = False,
    ) -> None: ...

    @classmethod
    def from_env(cls) -> ParquetDataCatalog: ...

    @classmethod
    def from_uri(
        cls,
        uri: str,
        fs_storage_options: dict[str, str] | None = None,
        fs_rust_storage_options: dict[str, str] | None = None,
    ) -> ParquetDataCatalog: ...

Import

from nautilus_trader.persistence.catalog import ParquetDataCatalog

I/O Contract

Inputs

Parameter	Type	Default	Description
`path`	`PathLike[str] ¦ str`	(required)	Root path for the catalog. Must be an absolute path for local filesystem.
`fs_protocol`	`str ¦ None`	`"file"`	Filesystem protocol for fsspec: "file" (local), "s3" (AWS S3), "gcs" (Google Cloud), "memory" (in-memory).
`fs_storage_options`	`dict ¦ None`	`None`	Provider-specific storage options (credentials, endpoint URLs, etc.).
`fs_rust_storage_options`	`dict ¦ None`	`None`	Storage options specifically for the Rust backend. Defaults to fs_storage_options if not specified.
`max_rows_per_group`	`int`	`5000`	Maximum number of rows per Parquet row group. Controls write batching and query granularity.
`show_query_paths`	`bool`	`False`	If True, print globbed query file paths to stdout for debugging.

Outputs

Output	Type	Description
Return value	`ParquetDataCatalog`	Initialized catalog instance with configured filesystem, serializer, and path.

Key Instance Attributes

Attribute	Type	Description
`path`	`str`	Normalized absolute path to the catalog root directory.
`fs_protocol`	`str`	The resolved filesystem protocol string.
`fs`	`fsspec.AbstractFileSystem`	The initialized fsspec filesystem instance.
`serializer`	`ArrowSerializer`	Serializer for converting NautilusTrader objects to/from Arrow tables.
`max_rows_per_group`	`int`	Configured Parquet row group size limit.

Usage Examples

Basic Local Catalog

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Initialize a local catalog
catalog = ParquetDataCatalog(path="/data/nautilus_catalog")

print(catalog.path)          # /data/nautilus_catalog
print(catalog.fs_protocol)   # file

Catalog from URI

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Create from a local URI
catalog = ParquetDataCatalog.from_uri("file:///data/nautilus_catalog")

# Create from an S3 URI with credentials
catalog = ParquetDataCatalog.from_uri(
    uri="s3://my-bucket/nautilus_catalog",
    fs_storage_options={
        "key": "AWS_ACCESS_KEY_ID",
        "secret": "AWS_SECRET_ACCESS_KEY",
        "endpoint_url": "https://s3.amazonaws.com",
    },
)

Catalog from Environment Variable

import os
from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Set NAUTILUS_PATH environment variable
os.environ["NAUTILUS_PATH"] = "/home/user/.nautilus"

# Catalog will be created at /home/user/.nautilus/catalog
catalog = ParquetDataCatalog.from_env()

In-Memory Catalog for Testing

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Use an in-memory filesystem for unit tests
catalog = ParquetDataCatalog(
    path="/test_catalog",
    fs_protocol="memory",
)

Catalog with Custom Row Group Size

from nautilus_trader.persistence.catalog import ParquetDataCatalog

# Use larger row groups for bulk historical data
catalog = ParquetDataCatalog(
    path="/data/bulk_catalog",
    max_rows_per_group=50_000,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment