Implementation:Datahub project Datahub Pre Commit Config
| Knowledge Sources | |
|---|---|
| Domains | DevOps, CodeQuality, CI_CD |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Auto-generated pre-commit hooks configuration that enforces code formatting and linting standards across the entire DataHub monorepo before commits are accepted.
Description
The .pre-commit-config.yaml file defines a comprehensive set of local pre-commit hooks that are automatically generated by the script .github/scripts/generate_pre_commit.py. This file should not be edited directly; instead, overrides should be placed in .github/scripts/pre-commit-override.yaml and the generator script should be re-run.
The configuration covers three main categories of hooks: Spotless Apply hooks for Java code formatting across approximately 40 different Gradle subprojects (such as metadata-service, metadata-io, datahub-graphql-core, and metadata-models); Lint Fix hooks for Python code formatting via ruff across modules like metadata-ingestion, datahub-actions, smoke-test, and various ingestion plugin modules; and Prettier hooks for Markdown and GitHub Actions YAML formatting. Each hook is scoped to specific file path patterns using regex, ensuring that only relevant files trigger the corresponding formatting task.
Additionally, the configuration includes utility hooks for updating lineage files, checking Gradle lockfile consistency, and validating quickstart Docker Compose configurations. All Gradle-based hooks include the -x generateGitPropertiesGlobal flag to exclude git property generation, which prevents failures when using git worktrees. Every hook uses pass_filenames: false because the Gradle tasks handle file discovery internally.
Usage
This file is consumed by the pre-commit framework (https://pre-commit.com). Developers install the hooks via pre-commit install in their local repository clone. Once installed, relevant hooks run automatically on each git commit, formatting changed files and preventing commits that do not meet project standards. It is also used in CI pipelines to enforce consistent formatting across all contributions.
Code Reference
Source Location
- Repository: Datahub_project_Datahub
- File: .pre-commit-config.yaml
Signature
# Auto-generated by .github/scripts/generate_pre_commit.py
repos:
- repo: local
hooks:
- id: <hook-id>
name: <hook-name>
entry: ./gradlew :<module>:<task> -x generateGitPropertiesGlobal
language: system
files: <regex-pattern>
pass_filenames: false
Import
# Install pre-commit hooks in your local repository
pre-commit install
# Run all hooks against all files manually
pre-commit run --all-files
# Run a specific hook by ID
pre-commit run metadata-ingestion-lint-fix --all-files
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Staged files | File paths | Yes | Git staged files that match the hook's files regex pattern
|
| .github/scripts/pre-commit-override.yaml | YAML | No | Override configuration for additional or modified hooks |
Outputs
| Name | Type | Description |
|---|---|---|
| Formatted files | Modified files | Files reformatted in-place by Spotless, ruff, or Prettier |
| Exit code | Integer | 0 if all hooks pass, non-zero if formatting changes were made or checks failed |
Usage Examples
# Install the pre-commit hooks after cloning the repository
cd datahub
pre-commit install
# Manually run all hooks on all files (useful for initial setup)
pre-commit run --all-files
# Regenerate this configuration file after modifying overrides
python .github/scripts/generate_pre_commit.py
# Run only the Java Spotless hook for metadata-io
pre-commit run metadata-io-spotless
# Run only the Python lint fix hook for metadata-ingestion
pre-commit run metadata-ingestion-lint-fix