Implementation:DataTalksClub Data engineering zoomcamp Toml Credentials Loader
| Page Metadata | |
|---|---|
| Knowledge Sources | repo: DataTalksClub/data-engineering-zoomcamp, dlt docs: dlt Documentation |
| Domains | Data_Engineering, Data_Ingestion |
| Last Updated | 2026-02-09 14:00 GMT |
Overview
Concrete tool for loading GCP service account credentials from a TOML configuration file and injecting them into the process environment as variables that the dlt framework can automatically discover.
Description
This implementation uses the toml Python library to parse a secrets file located at .dlt/secrets.toml. The file follows the dlt convention for storing credentials in a project-local configuration directory. After parsing, three specific credential fields -- project_id, private_key, and client_email -- are extracted from the [credentials] section and assigned to environment variables using the dlt double-underscore naming convention (CREDENTIALS__PROJECT_ID, CREDENTIALS__PRIVATE_KEY, CREDENTIALS__CLIENT_EMAIL).
This is a Pattern Doc implementation. The pattern bridges file-based secret storage with the dlt framework's environment-variable-based credential resolution. The dlt framework automatically looks for environment variables matching the CREDENTIALS__* pattern when authenticating with GCP BigQuery.
The expected TOML file structure is:
[credentials]
project_id = "your-gcp-project-id"
private_key = "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
client_email = "service-account@project.iam.gserviceaccount.com"
Usage
Use this implementation when:
- Setting up a dlt pipeline that targets BigQuery as its destination
- GCP service account credentials must be loaded from a local TOML file rather than Application Default Credentials
- The dlt framework's built-in credential resolution via environment variables should be leveraged
- The
.dlt/directory convention is used for project-local configuration
Code Reference
Source Location: cohorts/2025/workshops/dynamic_load_dlt.py, lines 11-22
Signature:
config = toml.load("./.dlt/secrets.toml")
os.environ["CREDENTIALS__PROJECT_ID"] = config["credentials"]["project_id"]
os.environ["CREDENTIALS__PRIVATE_KEY"] = config["credentials"]["private_key"]
os.environ["CREDENTIALS__CLIENT_EMAIL"] = config["credentials"]["client_email"]
Import:
import os
import toml
I/O Contract
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
| .dlt/secrets.toml | File (TOML) | Yes | TOML file containing a [credentials] section with GCP service account fields
|
| credentials.project_id | str | Yes | GCP project identifier |
| credentials.private_key | str | Yes | GCP service account private key in PEM format |
| credentials.client_email | str | Yes | GCP service account email address |
Outputs:
| Output | Type | Description |
|---|---|---|
| CREDENTIALS__PROJECT_ID | Environment variable (str) | GCP project ID available to dlt and other libraries via os.environ
|
| CREDENTIALS__PRIVATE_KEY | Environment variable (str) | GCP private key available to dlt and other libraries via os.environ
|
| CREDENTIALS__CLIENT_EMAIL | Environment variable (str) | GCP client email available to dlt and other libraries via os.environ
|
Usage Examples
Basic credential loading:
import os
import toml
# Load the TOML secrets file
config = toml.load("./.dlt/secrets.toml")
# Inject credentials into environment variables
os.environ["CREDENTIALS__PROJECT_ID"] = config["credentials"]["project_id"]
os.environ["CREDENTIALS__PRIVATE_KEY"] = config["credentials"]["private_key"]
os.environ["CREDENTIALS__CLIENT_EMAIL"] = config["credentials"]["client_email"]
# Verify credentials are set
print(os.environ.get("CREDENTIALS__PROJECT_ID"))
Using with dlt pipeline (credentials are auto-discovered):
import os
import toml
import dlt
# Load credentials first
config = toml.load("./.dlt/secrets.toml")
os.environ["CREDENTIALS__PROJECT_ID"] = config["credentials"]["project_id"]
os.environ["CREDENTIALS__PRIVATE_KEY"] = config["credentials"]["private_key"]
os.environ["CREDENTIALS__CLIENT_EMAIL"] = config["credentials"]["client_email"]
# dlt automatically resolves BigQuery credentials from environment
pipeline = dlt.pipeline(
pipeline_name="my_pipeline",
destination="bigquery",
dataset_name="my_dataset"
)