Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Run llama Llama index Fsspec Remote Storage

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Storage
Last Updated 2026-02-11 19:00 GMT

Overview

Fsspec-based remote filesystem environment for persisting LlamaIndex storage contexts and ingestion pipeline state to cloud storage (S3, GCS, Azure Blob).

Description

LlamaIndex uses fsspec (filesystem specification) as an abstraction layer for file I/O operations. This allows `StorageContext.persist()` and `IngestionPipeline.persist()` to write to remote filesystems (S3, GCS, Azure Blob Storage, etc.) transparently. The fsspec dependency is included in the core package, but protocol-specific implementations (like `s3fs` for S3) must be installed separately.

Usage

Use this environment when you need to persist or load index data, storage contexts, or pipeline state to/from remote cloud storage instead of the local filesystem. Required when deploying LlamaIndex in cloud environments or when sharing indexes across machines.

System Requirements

Category Requirement Notes
Network Access to target cloud storage S3, GCS, or Azure endpoints
Account Cloud provider credentials Provider-specific auth

Dependencies

Python Packages

  • `fsspec` >= 2023.5.0 (included in llama-index-core)
  • `s3fs` (for Amazon S3)
  • `gcsfs` (for Google Cloud Storage)
  • `adlfs` (for Azure Blob/Data Lake)

Credentials

Credentials depend on the target storage backend:

  • `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`: For S3 storage via s3fs
  • `GOOGLE_APPLICATION_CREDENTIALS`: For GCS storage via gcsfs
  • `AZURE_STORAGE_CONNECTION_STRING`: For Azure Blob via adlfs

Quick Install

# For S3 support
pip install s3fs

# For Google Cloud Storage
pip install gcsfs

# For Azure Blob Storage
pip install adlfs

Code Evidence

Fsspec dependency from `pyproject.toml:60`:

"fsspec>=2023.5.0",

StorageContext uses fsspec for persist/load operations in `storage/storage_context.py`, enabling transparent remote filesystem support through the `fs` parameter.

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 's3fs'` S3 filesystem not installed `pip install s3fs`
`NoCredentialsError` AWS credentials not configured Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
`FileNotFoundError` on remote path Bucket/container does not exist Create the target bucket/container first

Compatibility Notes

  • Local Fallback: All persist/load operations default to local filesystem when no fsspec filesystem is specified.
  • Protocol Detection: Fsspec auto-detects protocols from URL schemes (e.g., `s3://`, `gs://`, `abfs://`).
  • Windows: Ingestion pipeline notes "doesn't support Windows here" for certain filesystem path operations (pipeline.py:326).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment