Environment:Eventual Inc Daft Cloud Storage Credentials
| Knowledge Sources | |
|---|---|
| Domains | Cloud Storage, Authentication, AWS S3, Azure Blob, GCS, HuggingFace |
| Last Updated | 2026-02-08 15:30 GMT |
Overview
The Cloud_Storage_Credentials environment defines the credential chains, environment variables, and optional dependencies required for Daft to authenticate with cloud storage providers (AWS S3, Azure Blob Storage, Google Cloud Storage) and data platforms (HuggingFace, AI providers).
Description
Daft integrates with multiple cloud storage backends through a combination of Rust-native I/O clients (for S3 and Azure) and Python filesystem abstractions via fsspec and PyArrow. Each provider has its own credential chain with specific environment variables, configuration objects, and fallback behaviors.
The credential resolution follows a layered approach for each provider:
- AWS S3 -- Uses the standard boto3/botocore credential chain. If no credentials are found, Daft falls back to anonymous access and logs a warning. This allows seamless access to public S3 buckets without configuration.
- Azure Blob Storage -- Resolves credentials through explicit configuration, then environment variables, then
DefaultAzureCredential, and finally falls back to anonymous access. - Google Cloud Storage -- Leverages PyArrow's
GcsFileSystemand the standardGOOGLE_APPLICATION_CREDENTIALSservice account mechanism. - HuggingFace -- Uses the
HF_TOKENenvironment variable for authenticated access to private datasets and models. - AI Providers -- API keys for OpenAI (
OPENAI_API_KEY) and OpenRouter (OPENROUTER_API_KEY) are resolved from environment variables.
Additionally, Daft's Rust-based S3 client supports custom retry error message patterns via DAFT_S3_RETRY_ERROR_MSGS, allowing users to add application-specific retry logic for transient errors.
Usage
Configure this environment when:
- Reading or writing data from/to cloud storage (S3, Azure, GCS)
- Accessing private HuggingFace datasets or models
- Using AI integrations that require API keys (OpenAI, OpenRouter)
- Customizing S3 retry behavior for specific error patterns
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 3.10 | Inherited from the core environment |
| PyArrow | >= 9.0 (for GCS) | GcsFileSystem requires PyArrow 9.0 or later |
| Network | Internet access | Required for all cloud storage operations |
| Operating System | Linux, macOS, Windows | All platforms supported |
Dependencies
System Packages
- All core environment system packages
Python Packages
- All core environment Python packages (pyarrow >= 8.0.0, fsspec, etc.)
- boto3 < 1.43.0 -- AWS SDK for Python; install via
pip install daft[aws] - huggingface-hub < 1.2.0 -- HuggingFace Hub client; install via
pip install daft[huggingface] - datasets < 4.5.0 -- HuggingFace Datasets library; install via
pip install daft[huggingface] - adlfs -- Azure Data Lake Storage filesystem for fsspec (dev/optional)
- gcsfs -- Google Cloud Storage filesystem for fsspec (dev/optional)
Credentials
AWS S3
| Variable | Description | Required |
|---|---|---|
AWS_ACCESS_KEY_ID |
AWS access key ID | No (falls back to boto3 credential chain) |
AWS_SECRET_ACCESS_KEY |
AWS secret access key | No (falls back to boto3 credential chain) |
AWS_SESSION_TOKEN |
AWS session token for temporary credentials | No (optional, for STS) |
AWS_DEFAULT_REGION |
Default AWS region | No (defaults to boto3 configuration) |
DAFT_S3_RETRY_ERROR_MSGS |
Comma-separated list of custom error message patterns to trigger retries | No (default: empty) |
Fallback behavior: If no AWS credentials are found via botocore, Daft enables anonymous access (anon=True) and logs a warning. This allows reading from public S3 buckets without any credential configuration.
Azure Blob Storage
| Variable | Description | Required |
|---|---|---|
AZURE_STORAGE_ACCOUNT |
Azure Storage account name | Yes (if not set in AzureConfig) |
AZURE_STORAGE_KEY |
Azure Storage account access key | No (one of key/SAS/token required for private data) |
AZURE_STORAGE_SAS_TOKEN |
Azure Shared Access Signature token | No (alternative to access key) |
AZURE_STORAGE_TOKEN |
Azure bearer token (OAuth) | No (alternative to access key) |
AZURE_ENDPOINT_URL |
Custom Azure Blob endpoint URL | No (for custom endpoints or emulators) |
Fallback behavior: If no explicit credentials are provided, Daft attempts DefaultAzureCredential (which tries managed identity, Azure CLI, environment variables, etc.). If that also fails, it falls back to anonymous access.
Google Cloud Storage
| Variable | Description | Required |
|---|---|---|
GOOGLE_APPLICATION_CREDENTIALS |
Path to a service account JSON key file | No (falls back to default application credentials) |
Fallback behavior: Uses the standard Google Cloud credential chain (application default credentials, compute engine metadata, etc.).
HuggingFace
| Variable | Description | Required |
|---|---|---|
HF_TOKEN |
HuggingFace authentication token | No (required only for private repos) |
AI Providers
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for LLM/embedding endpoints | Yes (when using OpenAI provider) |
OPENROUTER_API_KEY |
OpenRouter API key for multi-provider LLM access | Yes (when using OpenRouter provider) |
Quick Install
# AWS S3 support
pip install "daft[aws]"
# Azure Blob Storage support (no extra Python deps, uses Rust native client)
pip install daft
# Google Cloud Storage support (no extra Python deps, uses PyArrow GcsFileSystem)
pip install daft
# HuggingFace support
pip install "daft[huggingface]"
# All cloud providers
pip install "daft[aws,huggingface]"
Code Evidence
S3 anonymous fallback from daft/filesystem.py lines 52-69:
def get_filesystem(protocol: str, **kwargs: Any) -> fsspec.AbstractFileSystem:
if protocol == "s3" or protocol == "s3a":
try:
import botocore.session
except ImportError:
logger.error(
"Error when importing botocore. install daft[aws] for the required "
"3rd party dependencies to interact with AWS S3"
)
raise
s3fs_kwargs = {}
credentials_available = botocore.session.get_session().get_credentials() is not None
if not credentials_available:
logger.warning(
"AWS credentials not found - using anonymous access to S3 which will "
"fail if the bucket you are accessing is not a public bucket."
)
s3fs_kwargs["anon"] = True
Azure credential resolution from src/daft-io/src/azure_blob.rs lines 177-242:
// Storage account from config or environment
} else if let Ok(storage_account) = std::env::var("AZURE_STORAGE_ACCOUNT") {
storage_account
}
// Access key from config or environment
let access_key = config.access_key.clone().or_else(|| {
std::env::var("AZURE_STORAGE_KEY").ok().map(std::convert::Into::into)
});
// SAS token from config or environment
.or_else(|| std::env::var("AZURE_STORAGE_SAS_TOKEN").ok());
// Bearer token from config or environment
.or_else(|| std::env::var("AZURE_STORAGE_TOKEN").ok());
// Endpoint URL from config or environment
std::env::var("AZURE_ENDPOINT_URL").ok()
S3 custom retry error messages from src/daft-io/src/s3_like.rs line 412:
let retry_error_msgs = std::env::var("DAFT_S3_RETRY_ERROR_MSGS")
.map(|s| s.split(',').map(|s| s.to_string()).collect::<Vec<_>>())
.unwrap_or_default();
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
Error when importing botocore. install daft[aws] |
boto3/botocore not installed. | Run pip install "daft[aws]".
|
AWS credentials not found - using anonymous access |
No AWS credentials detected via botocore. | Configure AWS credentials using aws configure, environment variables, or IAM roles. This is a warning; public buckets will still work.
|
Azure Storage Account not set and is required |
Neither AzureConfig.storage_account nor AZURE_STORAGE_ACCOUNT is set. |
Set the AZURE_STORAGE_ACCOUNT environment variable or pass the account name via configuration.
|
403 Forbidden (S3/Azure/GCS) |
Credentials are present but lack the required permissions. | Verify that the credential has read (and optionally write) access to the target bucket/container. |
HF_TOKEN is required for private repositories |
Attempting to access a private HuggingFace dataset without a token. | Set the HF_TOKEN environment variable with a valid HuggingFace access token.
|
Compatibility Notes
- AWS S3: The
daft[aws]extra installsboto3 < 1.43.0. Daft's Rust-native S3 client handles the actual I/O, while boto3 is used for credential resolution. - Azure Blob Storage: The
daft[azure]extra currently has no additional Python dependencies; Azure I/O is handled entirely by the Rust-native client. For fsspec-based access,adlfsis available as a dev dependency. - GCS: Requires
pyarrow >= 9.0for theGcsFileSystemintegration. Thedaft[gcp]extra currently has no additional Python dependencies. - HuggingFace: The
daft[huggingface]extra installs bothhuggingface-hubanddatasetslibraries. - S3 retry customization: The
DAFT_S3_RETRY_ERROR_MSGSenvironment variable accepts a comma-separated list of error message substrings. When an S3 error message matches any of these patterns, Daft will retry the request.
Related Pages
- Environment:Eventual_Inc_Daft_Python_PyArrow_Core
- Implementation:Eventual_Inc_Daft_Read_Parquet
- Implementation:Eventual_Inc_Daft_Read_Huggingface
- Implementation:Eventual_Inc_Daft_DataFrame_Write_Deltalake
- Implementation:Eventual_Inc_Daft_AI_Prompt
- Environment:Eventual_Inc_Daft_Ray_Distributed_Runner
- Environment:Eventual_Inc_Daft_AI_Provider_Dependencies