Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Spotify Luigi PostgreSQL Server

From Leeroopedia
Revision as of 18:39, 16 February 2026 by Admin (talk | contribs) (Auto-imported from environments/Spotify_Luigi_PostgreSQL_Server.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Infrastructure, Database
Last Updated 2026-02-10 07:00 GMT

Overview

PostgreSQL server environment with psycopg2 or pg8000 Python driver for Luigi database ingestion tasks.

Description

This environment provides the PostgreSQL connectivity required by Luigi's `postgres` contrib module. It supports two Python database drivers: psycopg2 (default, C-based, high performance) and pg8000 (pure Python, no C compiler needed). The driver selection is controlled via the `LUIGI_PGSQL_DRIVER` environment variable. The module uses a marker table pattern to track task completion, ensuring idempotent data loading.

Usage

Use this environment for any pipeline that loads data into PostgreSQL using `CopyToTable`, `PostgresQuery`, or tracks task completion with `PostgresTarget`. It is required for the Database_Ingestion_Pipeline workflow when targeting PostgreSQL.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows Cross-platform
PostgreSQL PostgreSQL server (any supported version) Default port: 5432
Network TCP access to PostgreSQL server Configurable host and port

Dependencies

System Packages

  • PostgreSQL client libraries (for psycopg2; not needed for pg8000)

Python Packages

  • `psycopg2` < 3.0 (default driver, recommended)
  • `pg8000` >= 1.23.0 (alternative pure-Python driver)
  • `luigi` (core)

Credentials

The following must be provided to task constructors or via configuration:

  • `host`: PostgreSQL server hostname
  • `database`: Database name
  • `user`: Database username
  • `password`: Database password
  • `port`: Server port (default: 5432)

Environment variables:

  • `LUIGI_PGSQL_DRIVER`: Selects the Python database driver (default: `psycopg2`, alternative: `pg8000`)

Configuration in `luigi.cfg`:

  • `[postgres] marker-table`: Name of the marker table for completion tracking (default: `table_updates`)
  • `[postgres] local-tmp-dir`: Local temporary directory for data staging

Quick Install

# Install Luigi with PostgreSQL support (psycopg2)
pip install luigi "psycopg2<3.0"

# Or with pure-Python pg8000 driver
pip install luigi "pg8000>=1.23.0"

Code Evidence

Driver selection via environment variable from `luigi/contrib/postgres.py:33`:

DB_DRIVER = os.environ.get('LUIGI_PGSQL_DRIVER', 'psycopg2')

psycopg2 import with fallback from `luigi/contrib/postgres.py:41-69`:

if DB_DRIVER == 'psycopg2':
    try:
        import psycopg2 as dbapi
        def update_error_codes():
            import psycopg2.errorcodes
            DB_ERROR_CODES.update({
                psycopg2.errorcodes.DUPLICATE_TABLE: ERROR_DUPLICATE_TABLE,
                psycopg2.errorcodes.UNDEFINED_TABLE: ERROR_UNDEFINED_TABLE,
            })
        update_error_codes()
    except ImportError:
        pass

if dbapi is None or DB_DRIVER == 'pg8000':
    try:
        import pg8000.dbapi as dbapi
        import pg8000.core
        DB_ERROR_CODES.update({
            '42P07': ERROR_DUPLICATE_TABLE,
            '42P01': ERROR_UNDEFINED_TABLE
        })
    except ImportError:
        pass

if dbapi is None:
    logger.warning("Loading postgres module without psycopg2 nor pg8000 installed. "
                   "Will crash at runtime if postgres functionality is used.")

Default port and marker table from `luigi/contrib/postgres.py:171-174`:

marker_table = luigi.configuration.get_config().get(
    'postgres', 'marker-table', 'table_updates')
# ...
DEFAULT_DB_PORT = 5432

Common Errors

Error Message Cause Solution
`Loading postgres module without psycopg2 nor pg8000 installed` Neither PostgreSQL driver installed `pip install psycopg2` or `pip install pg8000`
`connection refused` PostgreSQL server not running or wrong host/port Verify server is running and connection parameters
`ERROR_UNDEFINED_TABLE` on marker table Marker table not yet created Luigi auto-creates it; ensure DB user has CREATE TABLE privilege
`ERROR_DUPLICATE_TABLE` Table already exists during creation Expected during retry; Luigi handles this gracefully

Compatibility Notes

  • psycopg2 vs pg8000: psycopg2 is faster (C extension) but requires PostgreSQL client libraries to compile. pg8000 is pure Python and easier to install but slower.
  • Driver selection: Set `LUIGI_PGSQL_DRIVER=pg8000` to use the pure-Python driver without needing C compilation.
  • Connection reset: psycopg2 supports `connection.reset()` while pg8000 uses a different reset mechanism. Luigi handles both transparently.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment