Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Spotify Luigi AWS S3 Storage

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Cloud_Storage
Last Updated 2026-02-10 07:00 GMT

Overview

AWS S3 storage environment with boto3 client for reading, writing, and managing data in Amazon S3 via Luigi.

Description

This environment provides the AWS S3 connectivity required by Luigi's `s3` contrib module. It uses boto3 and botocore for S3 operations, supporting direct credential injection, IAM role assumption via STS, and the standard boto3 credential resolution chain (environment variables, config files, instance profiles). The module provides `S3Client` for filesystem operations and `S3Target` for declaring S3 paths as task inputs and outputs.

Usage

Use this environment for any pipeline that reads from or writes to Amazon S3. It is required for the Spark_Processing_Pipeline workflow when using S3 as a data store, and for any task using `S3Target` or `S3Client`.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows Cross-platform
Network HTTPS access to AWS S3 endpoints Outbound port 443

Dependencies

Python Packages

  • `boto3` >= 1.11.0
  • `botocore` (transitive via boto3)
  • `s3transfer` >= 0.3, < 4.0
  • `luigi` (core)

Credentials

AWS credentials can be provided via multiple methods (in order of precedence):

  • Direct parameters: `aws_access_key_id`, `aws_secret_access_key`, `aws_session_token` passed to `S3Client` constructor
  • IAM Role Assumption: `aws_role_arn` and `aws_role_session_name` for STS AssumeRole
  • boto3 default chain: Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`), AWS config files, EC2 instance profiles

Configuration in `luigi.cfg`:

  • `[s3] aws_access_key_id`: AWS access key
  • `[s3] aws_secret_access_key`: AWS secret key
  • `[s3] aws_session_token`: Session token (for temporary credentials)
  • `[s3] aws_role_arn`: IAM role ARN for role assumption
  • `[s3] aws_role_session_name`: Session name for assumed role

Quick Install

pip install luigi boto3 "s3transfer>=0.3,<4.0"

Code Evidence

boto3 import with warning from `luigi/contrib/s3.py:43-48`:

try:
    from boto3.s3.transfer import TransferConfig
    import botocore
except ImportError:
    logger.warning("Loading S3 module without the python package boto3. "
                   "Will crash at runtime if S3 functionality is used.")

Credential resolution from `luigi/contrib/s3.py:77-132`:

def __init__(self, aws_access_key_id=None, aws_secret_access_key=None,
             aws_session_token=None, **kwargs):
    options = self._get_s3_config()
    options.update(kwargs)
    if aws_access_key_id:
        options['aws_access_key_id'] = aws_access_key_id
    if aws_secret_access_key:
        options['aws_secret_access_key'] = aws_secret_access_key
    if aws_session_token:
        options['aws_session_token'] = aws_session_token

STS role assumption from `luigi/contrib/s3.py:111-117`:

sts_client = boto3.client('sts')
# ...
aws_secret_access_key = assumed_role['Credentials'].get(...)
aws_access_key_id = assumed_role['Credentials'].get('AccessKeyId')
aws_session_token = assumed_role['Credentials'].get('SessionToken')

Fallback to boto3 default chain from `luigi/contrib/s3.py:131-132`:

logger.debug('no credentials provided, delegating credentials resolution to boto3')

Common Errors

Error Message Cause Solution
`Loading S3 module without the python package boto3` boto3 not installed `pip install boto3`
`NoCredentialsError` No AWS credentials found Configure credentials via env vars, config, or IAM role
`ClientError: Access Denied` Insufficient S3 permissions Check IAM policy for the bucket/key
`S3EmrTarget deprecated` Using deprecated S3EmrTarget class Migrate to S3Target

Compatibility Notes

  • S3EmrTarget: Deprecated in favor of `S3Target`. A warning is emitted at `luigi/contrib/s3.py:731`.
  • Credential chain: When no explicit credentials are provided, boto3's default credential resolution is used, which checks environment variables, AWS config files, and EC2/ECS instance profiles in order.
  • STS role assumption: Supports cross-account access via IAM role assumption using `aws_role_arn`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment