Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:TobikoData Sqlmesh Fork Worker Tuning

From Leeroopedia



Knowledge Sources
Domains Performance, Configuration
Last Updated 2026-02-07 21:00 GMT

Overview

SQLMesh tunes parallel model loading via fork workers by respecting CPU affinity limits rather than total CPU count to prevent oversubscription and OOM kills.

Description

When loading SQLMesh projects, the system uses fork-based multiprocessing to parallelize model loading and validation. Instead of naively using the total CPU count, SQLMesh intelligently detects CPU affinity restrictions (e.g., from taskset, cgroups, or container limits) using os.sched_getaffinity(0). This prevents spawning too many workers that could cause system oversubscription, memory exhaustion, or kill signals from the OS.

Usage

Apply this heuristic when:

  • Running SQLMesh in containerized environments with CPU limits
  • Experiencing OOM errors or process kills during project loading
  • Debugging parallel loading issues (set MAX_FORK_WORKERS=1 to disable forking)
  • Optimizing load time for large projects with many models
  • Running in environments with taskset or cgroup CPU restrictions

The Insight (Rule of Thumb)

  • Action: Use os.sched_getaffinity(0) instead of os.cpu_count() to determine worker count
  • Value: Defaults to len(os.sched_getaffinity(0)) workers, fallback to 1 if fork unavailable
  • Trade-off: More workers = faster loading, but oversubscription can cause OOM/kill signals

Reasoning

The key insight is that os.cpu_count() returns the total number of CPUs on the system, which may not reflect the actual CPUs available to the process. In containerized environments (Docker, Kubernetes) or when using CPU pinning tools (taskset, cgroups), the process may only have access to a subset of CPUs. Using os.sched_getaffinity(0) returns the actual set of CPUs the process can use, preventing the creation of too many workers that would compete for limited resources.

Additionally, the system disables forking (MAX_FORK_WORKERS=1) when: 1. os.fork() is not available (e.g., Windows) 2. Running as a daemon process (forking from daemon processes is problematic)

This conservative approach prioritizes stability over performance in edge cases.

Code Evidence

# sqlmesh/core/constants.py:36-48

# The maximum number of fork processes, used for loading projects
# None means default to process pool, 1 means don't fork, :N is number of processes
# Factors in the number of available CPUs even if the process is bound to a subset of them
# (e.g. via taskset) to avoid oversubscribing the system and causing kill signals
if hasattr(os, "fork") and not mp.current_process().daemon:
    try:
        MAX_FORK_WORKERS: t.Optional[int] = int(os.getenv("MAX_FORK_WORKERS"))  # type: ignore
    except TypeError:
        MAX_FORK_WORKERS = (
            len(os.sched_getaffinity(0)) if hasattr(os, "sched_getaffinity") else None  # type: ignore
        )
else:
    MAX_FORK_WORKERS = 1

Environment variable override:

# Disable forking entirely for debugging
export MAX_FORK_WORKERS=1

# Set explicit worker count
export MAX_FORK_WORKERS=4

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment