Environment:DataExpert io Data engineer handbook Python Development Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Development |
| Last Updated | 2026-02-09 06:00 GMT |
Overview
Python 3.11+ development environment with Docker, required across all bootcamp modules.
Description
This environment defines the base software prerequisites shared across all Data Engineer Handbook bootcamp modules. It requires Python 3.11 or higher for local development and Docker for running containerized infrastructure. The intermediate bootcamp additionally requires a SQL editor such as DataGrip. Individual modules layer their own specific dependencies (PySpark, Flink, Flask) on top of this base environment.
Usage
Use this environment as the base prerequisite for all bootcamp workflows. Every module assumes Docker and Python are installed locally. The PySpark Testing workflow specifically requires local Python with pytest and chispa for running tests outside Docker.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | Docker Desktop required on macOS/Windows |
| Language | Python 3.11 or higher | As specified in bootcamp prerequisites |
| Software | Docker (Docker Desktop or Docker Engine) | Required for all infrastructure modules |
| Software | DataGrip or any SQL editor | Required for intermediate bootcamp SQL exercises |
| Software | Git | For cloning the repository |
Dependencies
System Packages
- Python >= 3.11
- Docker (Docker Desktop on macOS/Windows, Docker Engine on Linux)
- pip (Python package manager, bundled with Python 3.11+)
- Git (for repository access)
SQL Tooling
- DataGrip (recommended) or any SQL editor capable of connecting to PostgreSQL
Credentials
No credentials are required for the base environment. Individual modules require their own credentials:
- See Environment:DataExpert_io_Data_engineer_handbook_PostgreSQL_Docker_Environment for database credentials
- See Environment:DataExpert_io_Data_engineer_handbook_Flink_Kafka_Docker_Environment for Kafka and API credentials
- See Environment:DataExpert_io_Data_engineer_handbook_Statsig_API_Environment for A/B testing credentials
Quick Install
# Verify Python version (must be 3.11+)
python3 --version
# Verify Docker is installed
docker --version
docker compose version
# Clone the repository
git clone https://github.com/DataExpert-io/data-engineer-handbook.git
cd data-engineer-handbook
Code Evidence
Beginner bootcamp prerequisites from `beginner-bootcamp/software.md:1-4`:
Make sure your computer can run:
- Docker (install guide here)
- Python 3.11 (or higher)
Intermediate bootcamp prerequisites from `intermediate-bootcamp/software.md:1-7`:
Make sure your computer can run:
- Docker (install guide here)
- Python 3.11 (or higher)
- DataGrip (install here) (or any other SQL editor)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `python3: command not found` | Python not installed or not in PATH | Install Python 3.11+ from python.org or via package manager |
| `docker: command not found` | Docker not installed | Install Docker Desktop (macOS/Windows) or Docker Engine (Linux) |
| `docker compose` not recognized | Old Docker Compose v1 installed | Upgrade to Docker Compose v2 (bundled with Docker Desktop) |
Compatibility Notes
- Python Version Conflict: The base requirement is Python 3.11+, but the Flink module requires Python 3.7.9 inside its Docker container. These do not conflict because Flink runs within Docker, not on the host Python.
- macOS: Docker Desktop for Mac is required. Allocate at least 4GB RAM in Docker settings for Spark workloads.
- Windows: WSL2 backend is required for Docker Desktop. Native Windows Python works for local development.