Environment:Spotify Luigi AWS S3 Storage
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Cloud_Storage |
| Last Updated | 2026-02-10 07:00 GMT |
Overview
AWS S3 storage environment with boto3 client for reading, writing, and managing data in Amazon S3 via Luigi.
Description
This environment provides the AWS S3 connectivity required by Luigi's `s3` contrib module. It uses boto3 and botocore for S3 operations, supporting direct credential injection, IAM role assumption via STS, and the standard boto3 credential resolution chain (environment variables, config files, instance profiles). The module provides `S3Client` for filesystem operations and `S3Target` for declaring S3 paths as task inputs and outputs.
Usage
Use this environment for any pipeline that reads from or writes to Amazon S3. It is required for the Spark_Processing_Pipeline workflow when using S3 as a data store, and for any task using `S3Target` or `S3Client`.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Cross-platform |
| Network | HTTPS access to AWS S3 endpoints | Outbound port 443 |
Dependencies
Python Packages
- `boto3` >= 1.11.0
- `botocore` (transitive via boto3)
- `s3transfer` >= 0.3, < 4.0
- `luigi` (core)
Credentials
AWS credentials can be provided via multiple methods (in order of precedence):
- Direct parameters: `aws_access_key_id`, `aws_secret_access_key`, `aws_session_token` passed to `S3Client` constructor
- IAM Role Assumption: `aws_role_arn` and `aws_role_session_name` for STS AssumeRole
- boto3 default chain: Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`), AWS config files, EC2 instance profiles
Configuration in `luigi.cfg`:
- `[s3] aws_access_key_id`: AWS access key
- `[s3] aws_secret_access_key`: AWS secret key
- `[s3] aws_session_token`: Session token (for temporary credentials)
- `[s3] aws_role_arn`: IAM role ARN for role assumption
- `[s3] aws_role_session_name`: Session name for assumed role
Quick Install
pip install luigi boto3 "s3transfer>=0.3,<4.0"
Code Evidence
boto3 import with warning from `luigi/contrib/s3.py:43-48`:
try:
from boto3.s3.transfer import TransferConfig
import botocore
except ImportError:
logger.warning("Loading S3 module without the python package boto3. "
"Will crash at runtime if S3 functionality is used.")
Credential resolution from `luigi/contrib/s3.py:77-132`:
def __init__(self, aws_access_key_id=None, aws_secret_access_key=None,
aws_session_token=None, **kwargs):
options = self._get_s3_config()
options.update(kwargs)
if aws_access_key_id:
options['aws_access_key_id'] = aws_access_key_id
if aws_secret_access_key:
options['aws_secret_access_key'] = aws_secret_access_key
if aws_session_token:
options['aws_session_token'] = aws_session_token
STS role assumption from `luigi/contrib/s3.py:111-117`:
sts_client = boto3.client('sts')
# ...
aws_secret_access_key = assumed_role['Credentials'].get(...)
aws_access_key_id = assumed_role['Credentials'].get('AccessKeyId')
aws_session_token = assumed_role['Credentials'].get('SessionToken')
Fallback to boto3 default chain from `luigi/contrib/s3.py:131-132`:
logger.debug('no credentials provided, delegating credentials resolution to boto3')
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Loading S3 module without the python package boto3` | boto3 not installed | `pip install boto3` |
| `NoCredentialsError` | No AWS credentials found | Configure credentials via env vars, config, or IAM role |
| `ClientError: Access Denied` | Insufficient S3 permissions | Check IAM policy for the bucket/key |
| `S3EmrTarget deprecated` | Using deprecated S3EmrTarget class | Migrate to S3Target |
Compatibility Notes
- S3EmrTarget: Deprecated in favor of `S3Target`. A warning is emitted at `luigi/contrib/s3.py:731`.
- Credential chain: When no explicit credentials are provided, boto3's default credential resolution is used, which checks environment variables, AWS config files, and EC2/ECS instance profiles in order.
- STS role assumption: Supports cross-account access via IAM role assumption using `aws_role_arn`.