Environment:Pola rs Polars Cloud Storage Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Cloud_Storage |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Cloud storage credential and configuration environment for reading from and writing to AWS S3, Azure Blob Storage, and Google Cloud Storage with Polars.
Description
This environment defines the credentials and configuration required for Polars to access cloud storage services. Polars uses the `object_store` Rust crate under the hood, with configurable retry behavior, authentication chains, and protocol-specific options. Each cloud provider (AWS, Azure, GCP) has its own credential format and authentication flow. Polars also supports Hugging Face Hub via the `hf://` protocol and HTTP/HTTPS access.
Usage
Use this environment when reading from or writing to cloud storage using Polars scan/read/write operations. Required for the Data I/O and Format Conversion workflow when accessing remote files, and for the Streaming Large Dataset Processing workflow when scanning cloud-hosted datasets.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Network | Internet access | Required for cloud storage operations |
| Python | >= 3.10 | With Polars installed |
Dependencies
Python Packages
- `polars` >= 1.38.1
- `fsspec` (optional, for fsspec-based access)
- `boto3` (optional, for AWS credential chain)
- `azure-identity` (optional, for Azure AD auth)
Rust Feature Flags
- `aws` - AWS S3 support (uses `object_store/aws`)
- `azure` - Azure Blob Storage support
- `gcp` - Google Cloud Storage support
- `http` - HTTP/HTTPS access
- `cloud` - Base cloud support (required by all above)
Credentials
The following environment variables or configuration options may be needed:
AWS S3:
- `AWS_ACCESS_KEY_ID`: AWS access key
- `AWS_SECRET_ACCESS_KEY`: AWS secret key
- `AWS_SESSION_TOKEN`: Session token (for temporary credentials)
- `AWS_DEFAULT_REGION`: AWS region
- Or: IAM role, instance profile, `~/.aws/credentials`
Azure Blob Storage:
- `AZURE_STORAGE_ACCOUNT_NAME`: Storage account name
- `AZURE_STORAGE_ACCOUNT_KEY`: Storage account key
- `AZURE_STORAGE_SAS_TOKEN`: SAS token
- `POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY`: Set to `1` to auto-retrieve keys from Azure CLI
- Or: Azure AD authentication, managed identity
Google Cloud Storage:
- `GOOGLE_APPLICATION_CREDENTIALS`: Path to service account JSON
- Or: Application Default Credentials, workload identity
Hugging Face:
- `HF_TOKEN`: Hugging Face API token (for private repos)
Polars Cloud Retry Configuration:
- `POLARS_CLOUD_MAX_RETRIES`: Maximum retries (default: 2)
- `POLARS_CLOUD_RETRY_INIT_BACKOFF_MS`: Initial backoff (default: 100ms)
- `POLARS_CLOUD_RETRY_MAX_BACKOFF_MS`: Max backoff (default: 15000ms)
- `POLARS_CLOUD_RETRY_BASE_MULTIPLIER`: Backoff multiplier (default: 2.0)
- `POLARS_CLOUD_RETRY_TIMEOUT_MS`: Retry timeout (default: 10000ms)
Quick Install
# Install Polars with cloud support
pip install 'polars[fsspec]'
# For AWS
pip install boto3
# For Azure
pip install azure-identity
# For full cloud support in development
pip install 'polars[all]'
Code Evidence
Cloud retry configuration from `crates/polars-io/src/cloud/options.rs:155-173`:
static DEFAULTS: LazyLock<object_store::RetryConfig> =
LazyLock::new(|| object_store::RetryConfig {
backoff: object_store::BackoffConfig {
init_backoff: Duration::from_millis(parse_env_var(
100, "POLARS_CLOUD_RETRY_INIT_BACKOFF_MS",
)),
max_backoff: Duration::from_millis(parse_env_var(
15 * 1000, "POLARS_CLOUD_RETRY_MAX_BACKOFF_MS",
)),
base: parse_env_var(2., "POLARS_CLOUD_RETRY_BASE_MULTIPLIER"),
},
max_retries: parse_env_var(2, "POLARS_CLOUD_MAX_RETRIES"),
retry_timeout: Duration::from_millis(parse_env_var(
10 * 1000, "POLARS_CLOUD_RETRY_TIMEOUT_MS",
)),
});
Azure auto-authentication from `crates/polars-io/src/cloud/polars_object_store.rs`:
if std::env::var("POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY").as_deref()
!= Ok("1")
{
// Error: set POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY=1 if you would
// like polars to try to retrieve and use the storage account keys
// from Azure CLI to authenticate
}
Supported cloud schemes from `crates/polars-io/src/cloud/options.rs:236-249`:
pub enum CloudType {
Aws,
Azure,
File,
Gcp,
Http,
Hf, // HuggingFace
}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `NoCredentialProvider` | No valid credentials found for cloud provider | Configure provider-specific credentials (see Credentials section) |
| `set POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY=1` | Azure auth needs explicit opt-in for CLI key retrieval | Set the environment variable to `1` |
| `Connection timed out` | Network or firewall issue | Check network access; adjust `POLARS_CLOUD_RETRY_TIMEOUT_MS` |
| `AccessDenied` | Insufficient permissions | Verify IAM/SAS/OAuth credentials have read/write access |
Compatibility Notes
- Protocol Prefixes: `s3://`, `az://`, `abfs://`, `gs://`, `gcs://`, `hf://`, `http://`, `https://`, `file://`
- Credential Providers: Polars supports custom Python credential provider functions for dynamic credential refresh
- Retry Behavior: All cloud retry parameters are configurable via environment variables with sensible defaults