Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Pola rs Polars Cloud Storage Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Cloud_Storage
Last Updated 2026-02-09 10:00 GMT

Overview

Cloud storage credential and configuration environment for reading from and writing to AWS S3, Azure Blob Storage, and Google Cloud Storage with Polars.

Description

This environment defines the credentials and configuration required for Polars to access cloud storage services. Polars uses the `object_store` Rust crate under the hood, with configurable retry behavior, authentication chains, and protocol-specific options. Each cloud provider (AWS, Azure, GCP) has its own credential format and authentication flow. Polars also supports Hugging Face Hub via the `hf://` protocol and HTTP/HTTPS access.

Usage

Use this environment when reading from or writing to cloud storage using Polars scan/read/write operations. Required for the Data I/O and Format Conversion workflow when accessing remote files, and for the Streaming Large Dataset Processing workflow when scanning cloud-hosted datasets.

System Requirements

Category Requirement Notes
Network Internet access Required for cloud storage operations
Python >= 3.10 With Polars installed

Dependencies

Python Packages

  • `polars` >= 1.38.1
  • `fsspec` (optional, for fsspec-based access)
  • `boto3` (optional, for AWS credential chain)
  • `azure-identity` (optional, for Azure AD auth)

Rust Feature Flags

  • `aws` - AWS S3 support (uses `object_store/aws`)
  • `azure` - Azure Blob Storage support
  • `gcp` - Google Cloud Storage support
  • `http` - HTTP/HTTPS access
  • `cloud` - Base cloud support (required by all above)

Credentials

The following environment variables or configuration options may be needed:

AWS S3:

  • `AWS_ACCESS_KEY_ID`: AWS access key
  • `AWS_SECRET_ACCESS_KEY`: AWS secret key
  • `AWS_SESSION_TOKEN`: Session token (for temporary credentials)
  • `AWS_DEFAULT_REGION`: AWS region
  • Or: IAM role, instance profile, `~/.aws/credentials`

Azure Blob Storage:

  • `AZURE_STORAGE_ACCOUNT_NAME`: Storage account name
  • `AZURE_STORAGE_ACCOUNT_KEY`: Storage account key
  • `AZURE_STORAGE_SAS_TOKEN`: SAS token
  • `POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY`: Set to `1` to auto-retrieve keys from Azure CLI
  • Or: Azure AD authentication, managed identity

Google Cloud Storage:

  • `GOOGLE_APPLICATION_CREDENTIALS`: Path to service account JSON
  • Or: Application Default Credentials, workload identity

Hugging Face:

  • `HF_TOKEN`: Hugging Face API token (for private repos)

Polars Cloud Retry Configuration:

  • `POLARS_CLOUD_MAX_RETRIES`: Maximum retries (default: 2)
  • `POLARS_CLOUD_RETRY_INIT_BACKOFF_MS`: Initial backoff (default: 100ms)
  • `POLARS_CLOUD_RETRY_MAX_BACKOFF_MS`: Max backoff (default: 15000ms)
  • `POLARS_CLOUD_RETRY_BASE_MULTIPLIER`: Backoff multiplier (default: 2.0)
  • `POLARS_CLOUD_RETRY_TIMEOUT_MS`: Retry timeout (default: 10000ms)

Quick Install

# Install Polars with cloud support
pip install 'polars[fsspec]'

# For AWS
pip install boto3

# For Azure
pip install azure-identity

# For full cloud support in development
pip install 'polars[all]'

Code Evidence

Cloud retry configuration from `crates/polars-io/src/cloud/options.rs:155-173`:

static DEFAULTS: LazyLock<object_store::RetryConfig> =
    LazyLock::new(|| object_store::RetryConfig {
        backoff: object_store::BackoffConfig {
            init_backoff: Duration::from_millis(parse_env_var(
                100, "POLARS_CLOUD_RETRY_INIT_BACKOFF_MS",
            )),
            max_backoff: Duration::from_millis(parse_env_var(
                15 * 1000, "POLARS_CLOUD_RETRY_MAX_BACKOFF_MS",
            )),
            base: parse_env_var(2., "POLARS_CLOUD_RETRY_BASE_MULTIPLIER"),
        },
        max_retries: parse_env_var(2, "POLARS_CLOUD_MAX_RETRIES"),
        retry_timeout: Duration::from_millis(parse_env_var(
            10 * 1000, "POLARS_CLOUD_RETRY_TIMEOUT_MS",
        )),
    });

Azure auto-authentication from `crates/polars-io/src/cloud/polars_object_store.rs`:

if std::env::var("POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY").as_deref()
    != Ok("1")
{
    // Error: set POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY=1 if you would
    // like polars to try to retrieve and use the storage account keys
    // from Azure CLI to authenticate
}

Supported cloud schemes from `crates/polars-io/src/cloud/options.rs:236-249`:

pub enum CloudType {
    Aws,
    Azure,
    File,
    Gcp,
    Http,
    Hf,  // HuggingFace
}

Common Errors

Error Message Cause Solution
`NoCredentialProvider` No valid credentials found for cloud provider Configure provider-specific credentials (see Credentials section)
`set POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY=1` Azure auth needs explicit opt-in for CLI key retrieval Set the environment variable to `1`
`Connection timed out` Network or firewall issue Check network access; adjust `POLARS_CLOUD_RETRY_TIMEOUT_MS`
`AccessDenied` Insufficient permissions Verify IAM/SAS/OAuth credentials have read/write access

Compatibility Notes

  • Protocol Prefixes: `s3://`, `az://`, `abfs://`, `gs://`, `gcs://`, `hf://`, `http://`, `https://`, `file://`
  • Credential Providers: Polars supports custom Python credential provider functions for dynamic credential refresh
  • Retry Behavior: All cloud retry parameters are configurable via environment variables with sensible defaults

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment