Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub Pre Commit Config

From Leeroopedia
Revision as of 14:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datahub_project_Datahub_Pre_Commit_Config.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains DevOps, CodeQuality, CI_CD
Last Updated 2026-02-10 00:00 GMT

Overview

Auto-generated pre-commit hooks configuration that enforces code formatting and linting standards across the entire DataHub monorepo before commits are accepted.

Description

The .pre-commit-config.yaml file defines a comprehensive set of local pre-commit hooks that are automatically generated by the script .github/scripts/generate_pre_commit.py. This file should not be edited directly; instead, overrides should be placed in .github/scripts/pre-commit-override.yaml and the generator script should be re-run.

The configuration covers three main categories of hooks: Spotless Apply hooks for Java code formatting across approximately 40 different Gradle subprojects (such as metadata-service, metadata-io, datahub-graphql-core, and metadata-models); Lint Fix hooks for Python code formatting via ruff across modules like metadata-ingestion, datahub-actions, smoke-test, and various ingestion plugin modules; and Prettier hooks for Markdown and GitHub Actions YAML formatting. Each hook is scoped to specific file path patterns using regex, ensuring that only relevant files trigger the corresponding formatting task.

Additionally, the configuration includes utility hooks for updating lineage files, checking Gradle lockfile consistency, and validating quickstart Docker Compose configurations. All Gradle-based hooks include the -x generateGitPropertiesGlobal flag to exclude git property generation, which prevents failures when using git worktrees. Every hook uses pass_filenames: false because the Gradle tasks handle file discovery internally.

Usage

This file is consumed by the pre-commit framework (https://pre-commit.com). Developers install the hooks via pre-commit install in their local repository clone. Once installed, relevant hooks run automatically on each git commit, formatting changed files and preventing commits that do not meet project standards. It is also used in CI pipelines to enforce consistent formatting across all contributions.

Code Reference

Source Location

Signature

# Auto-generated by .github/scripts/generate_pre_commit.py
repos:
  - repo: local
    hooks:
      - id: <hook-id>
        name: <hook-name>
        entry: ./gradlew :<module>:<task> -x generateGitPropertiesGlobal
        language: system
        files: <regex-pattern>
        pass_filenames: false

Import

# Install pre-commit hooks in your local repository
pre-commit install

# Run all hooks against all files manually
pre-commit run --all-files

# Run a specific hook by ID
pre-commit run metadata-ingestion-lint-fix --all-files

I/O Contract

Inputs

Name Type Required Description
Staged files File paths Yes Git staged files that match the hook's files regex pattern
.github/scripts/pre-commit-override.yaml YAML No Override configuration for additional or modified hooks

Outputs

Name Type Description
Formatted files Modified files Files reformatted in-place by Spotless, ruff, or Prettier
Exit code Integer 0 if all hooks pass, non-zero if formatting changes were made or checks failed

Usage Examples

# Install the pre-commit hooks after cloning the repository
cd datahub
pre-commit install

# Manually run all hooks on all files (useful for initial setup)
pre-commit run --all-files

# Regenerate this configuration file after modifying overrides
python .github/scripts/generate_pre_commit.py

# Run only the Java Spotless hook for metadata-io
pre-commit run metadata-io-spotless

# Run only the Python lint fix hook for metadata-ingestion
pre-commit run metadata-ingestion-lint-fix

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment