Implementation:DataExpert io Data engineer handbook Pg restore Init Script
Overview
This page documents the init-db.sh shell script used to seed a PostgreSQL database within the Dimensional Data Modeling workflow. The script restores a base dataset from a pg_dump archive and then iteratively applies supplementary SQL homework files.
Type
API Doc (Shell Script)
Source
- Repository: https://github.com/DataExpert-io/data-engineer-handbook
- File:
intermediate-bootcamp/materials/1-dimensional-data-modeling/scripts/init-db.sh(Lines 1-29)
Source Code
#!/bin/bash
set -e
pg_restore -v --no-owner --no-privileges -U $POSTGRES_USER -d $POSTGRES_DB /docker-entrypoint-initdb.d/data.dump
if [ -d /docker-entrypoint-initdb.d/homework ]; then
for f in /docker-entrypoint-initdb.d/homework/*.sql; do
if [ -f "$f" ]; then
psql -U $POSTGRES_USER -d $POSTGRES_DB -f $f
fi
done
fi
Detailed Breakdown
Line-by-Line Analysis
| Line(s) | Code | Purpose |
|---|---|---|
| 1 | #!/bin/bash |
Shebang line specifying the Bash interpreter. |
| 2 | set -e |
Enables exit-on-error mode. If any command returns a non-zero exit code, the script terminates immediately, preventing partial database state. |
| 3 | pg_restore -v --no-owner --no-privileges -U $POSTGRES_USER -d $POSTGRES_DB /docker-entrypoint-initdb.d/data.dump |
Restores the base dataset from the custom-format dump file. Flags explained below. |
| 4 | if [ -d /docker-entrypoint-initdb.d/homework ]; then |
Checks whether the homework directory exists before attempting to iterate over its contents. |
| 5-8 | for f in ... done |
Iterates over all .sql files in the homework directory and executes each one against the database using psql.
|
| 9 | fi |
Closes the outer conditional block. |
pg_restore Flags
-v(verbose) -- Outputs detailed progress information during the restore process, useful for debugging.--no-owner-- Skips restoration of object ownership. This is essential in containerized environments where the dump's original owner may not exist.--no-privileges-- Skips restoration of access privileges (GRANT/REVOKE statements). Prevents permission errors in environments with different role configurations.-U $POSTGRES_USER-- Specifies the PostgreSQL user to connect as, injected via environment variable.-d $POSTGRES_DB-- Specifies the target database name, injected via environment variable.
Inputs and Outputs
Inputs
| Input | Type | Description |
|---|---|---|
data.dump |
File (pg_dump custom format) | The base dataset archive containing the actor_films table and related schema objects. Mounted into the container at /docker-entrypoint-initdb.d/data.dump.
|
homework/*.sql |
File(s) (SQL scripts) | Optional supplementary SQL files that create additional tables, insert seed data, or define views for homework exercises. |
$POSTGRES_USER |
Environment Variable | The PostgreSQL superuser name used for authentication during restore and script execution. |
$POSTGRES_DB |
Environment Variable | The name of the target PostgreSQL database to seed. |
Outputs
| Output | Description |
|---|---|
| Populated PostgreSQL database | A fully initialized database containing the restored schema and data from data.dump, plus any additional objects created by the homework SQL scripts.
|
Execution Context
This script is designed to run inside a PostgreSQL Docker container as part of the container's initialization sequence. The official PostgreSQL Docker image automatically executes scripts placed in /docker-entrypoint-initdb.d/ when the container starts for the first time.
The execution flow is:
Docker container starts
--> PostgreSQL server initializes
--> /docker-entrypoint-initdb.d/init-db.sh is executed
--> pg_restore loads data.dump
--> psql executes each homework/*.sql file
--> Database is ready for connections
Error Handling
The set -e directive ensures that:
- If
pg_restorefails (e.g., corrupt dump file, missing database), the script exits immediately. - If any
psqlinvocation fails (e.g., syntax error in a SQL file), the script stops processing further files. - The container's entrypoint script detects the non-zero exit code and reports the initialization failure.
Related Pages
- Principle:DataExpert_io_Data_engineer_handbook_Database_Seeding -- The theoretical foundation for database initialization from dump files.
- Implementation:DataExpert_io_Data_engineer_handbook_Docker_Compose_PostgreSQL_Stack -- The Docker Compose configuration that mounts this script into the container.
- Implementation:DataExpert_io_Data_engineer_handbook_SQL_Select_Query_Pattern -- Verification queries executed after seeding to confirm data integrity.
- Environment:DataExpert_io_Data_engineer_handbook_PostgreSQL_Docker_Environment
- Heuristic:DataExpert_io_Data_engineer_handbook_Docker_Volume_Persistence_Management
Metadata
- Knowledge Sources: Data Engineer Handbook
- Domains: Data_Engineering, SQL, Infrastructure
- Last Updated: 2026-02-09 06:00 GMT