Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Lance format Lance Cloud Storage Credentials

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Cloud_Storage
Last Updated 2026-02-08 19:00 GMT

Overview

Cloud storage credential and configuration environment for Lance datasets on AWS S3, Google Cloud Storage, Azure Blob Storage, and other object stores.

Description

Lance supports reading and writing datasets to multiple cloud storage backends via the `object_store` crate (v0.12.3) and `opendal` (v0.55). Each backend requires specific environment variables or storage options for authentication and configuration. Lance also supports S3-compatible services (MinIO, LocalStack) and Hugging Face, Tencent, and Alibaba OSS storage.

Usage

This environment is required whenever a Lance dataset URI points to a cloud storage location (e.g., `s3://bucket/path`, `gs://bucket/path`, `az://container/path`). All read, write, scan, index, and optimization operations on cloud-hosted datasets require these credentials. Local filesystem datasets (`file:///path` or plain paths) do not require this environment.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows All platforms supported
Network Internet access to cloud provider Required for cloud storage operations
Disk Minimal Cloud storage is remote; local cache optional

Dependencies

Rust Feature Flags

Lance cloud storage backends are controlled by Cargo feature flags (all enabled by default):

  • `aws` — Amazon S3 support
  • `gcp` — Google Cloud Storage support
  • `azure` — Azure Blob Storage support
  • `oss` — Alibaba Cloud OSS support
  • `tencent` — Tencent Cloud COS support
  • `huggingface` — Hugging Face Hub support
  • `dynamodb` — DynamoDB-based commit locking for S3

System Packages

  • `libssl-dev` — Required for TLS connections to cloud providers

Credentials

IMPORTANT: Never store actual secret values in code or documentation.

AWS S3

  • `AWS_ACCESS_KEY_ID` — AWS access key for authentication
  • `AWS_SECRET_ACCESS_KEY` — AWS secret key for authentication
  • `AWS_SESSION_TOKEN` — Optional session token for temporary credentials
  • `AWS_PROFILE` — AWS profile name for SSO or named profiles
  • `AWS_DEFAULT_REGION` — AWS region (e.g., `us-east-1`)
  • `AWS_ENDPOINT` — Custom S3-compatible endpoint URL

Google Cloud Storage

  • `GOOGLE_SERVICE_ACCOUNT` — Path to service account JSON file
  • `GOOGLE_APPLICATION_CREDENTIALS` — Path to application credentials JSON
  • `HTTP1_ONLY` — Set to `false` to enable HTTP/2 (default is HTTP/1)

Azure Blob Storage

  • `AZURE_STORAGE_ACCOUNT_NAME` — Storage account name
  • `AZURE_STORAGE_ACCOUNT_KEY` — Storage account key
  • `AZURE_STORAGE_ALLOW_HTTP` — Allow non-TLS HTTP connections
  • `AZURE_STORAGE_USE_HTTP` — Use HTTP instead of HTTPS

Storage Options (Programmatic)

These can be passed as key-value pairs in Lance API calls:

  • `allow_http` — Allow non-TLS connections
  • `download_retry_count` — Retry count (default: 3)
  • `allow_invalid_certificates` — Skip certificate validation
  • `connect_timeout` — Connection timeout (default: 5s)
  • `request_timeout` — Request timeout (default: 30s)
  • `client_max_retries` — S3 client retry count (default: 10)
  • `client_retry_timeout` — S3 client retry timeout (default: 180s)

Quick Install

# AWS S3 - set credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

# Google Cloud Storage - set service account
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

# Azure Blob Storage - set account credentials
export AZURE_STORAGE_ACCOUNT_NAME="your-account"
export AZURE_STORAGE_ACCOUNT_KEY="your-key"

# For local development with LocalStack (S3-compatible)
cd test_data && docker compose up -d
export AWS_ENDPOINT="http://localhost:4566"
export AWS_ACCESS_KEY_ID="ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="SECRET_KEY"

Code Evidence

Default feature flags enabling cloud storage from `rust/lance/Cargo.toml`:

[features]
default = ["aws", "azure", "gcp", "oss", "huggingface", "tencent"]

S3 retry defaults from storage options configuration:

// client_max_retries default: 10
// client_retry_timeout default: 180s
// connect_timeout default: 5s
// request_timeout default: 30s

Docker Compose test services from `docker-compose.yml:1-17`:

services:
  localstack:
    image: localstack/localstack:4.0
    environment:
      - SERVICES=s3,dynamodb,kms
      - AWS_ACCESS_KEY_ID=ACCESS_KEY
      - AWS_SECRET_ACCESS_KEY=SECRET_KEY
    ports:
      - "4566:4566"

Common Errors

Error Message Cause Solution
`NoCredentialProviders: no valid providers in chain` AWS credentials not configured Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
`Connection refused` on S3 operations LocalStack not running or wrong endpoint Run `docker compose up -d` in `test_data/` and set `AWS_ENDPOINT`
`InvalidSignature` Credentials mismatch or clock skew Verify credentials and system clock synchronization
`SSL certificate problem` Self-signed cert in development Set `allow_invalid_certificates` storage option

Compatibility Notes

  • DynamoDB Commit Lock: For concurrent S3 writers, enable the `dynamodb` feature and configure a DynamoDB table for commit coordination.
  • S3-Compatible Services: MinIO, LocalStack, and other S3-compatible services work via the `AWS_ENDPOINT` environment variable.
  • HTTP/2 for GCS: Google Cloud Storage defaults to HTTP/1.1. Set `HTTP1_ONLY=false` to enable HTTP/2.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment