Implementation:Apache Hudi Azure Pipelines CI Configuration

Knowledge Sources	Apache_Hudi Azure Pipelines YAML JaCoCo
Domains	CI_CD, Testing, Code_Coverage
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete CI pipeline definition for running Apache Hudi's unit and functional test suites across 10 parallel Azure DevOps jobs with aggregated JaCoCo code coverage reporting.

Description

The azure-pipelines-20230430.yml file defines the primary continuous integration pipeline for the Apache Hudi project on Azure DevOps. It orchestrates 10 parallel test jobs (UT_FT_1 through UT_FT_10) that split the project's test suite across Hudi's core modules, including hadoop-common, spark-client, spark-datasource (Java and Scala tests), Hudi Streamer/utilities, and other common modules. The pipeline builds with Spark 3.5 and Flink 1.18 profiles using Scala 2.12. A final MergeAndPublishCoverage job aggregates JaCoCo execution data files from all 10 test jobs into a unified code coverage report.

Two jobs (UT_FT_7 and UT_FT_10) run inside Docker containers using the apachehudi/hudi-ci-bundle-validation-base image, while the remaining jobs run directly on the Azure-hosted Ubuntu 22.04 agent with Maven 4 tasks.

Usage

This pipeline triggers automatically on all branch pushes. It is the primary quality gate for pull requests and commits to the Apache Hudi repository. Contributors should understand this configuration when:

Debugging CI failures on specific test jobs
Adding new modules that need test coverage
Modifying test profiles or Maven build arguments
Understanding which tests run in which parallel job
Investigating code coverage gaps in the aggregated JaCoCo report

Code Reference

Source Location

Repository: Apache_Hudi
File: azure-pipelines-20230430.yml
Lines: 1-599

Configuration Structure

# Top-level pipeline structure
trigger:
  branches:
    include:
      - '*'

pool:
  vmImage: 'ubuntu-22.04'

parameters:
  - name: job3456UTModules      # Spark datasource modules for jobs 3-6
  - name: job10UTModules         # Exclusion list for job 10 unit tests
  - name: job10FTModules         # Exclusion list for job 10 functional tests
  - name: job6HudiSparkDdlOthersWildcardSuites  # Scala test suites for job 6
  - name: jacocoModules          # Modules excluded from coverage aggregation

variables:
  BUILD_PROFILES: '-Dscala-2.12 -Dspark3.5 -Dflink1.18'
  PLUGIN_OPTS: '-Dcheckstyle.skip=true -Drat.skip=true ...'
  MVN_OPTS_INSTALL: '-T 3 -Phudi-platform-service -DskipTests ...'
  MVN_OPTS_TEST: '-fae -Pwarn-log ...'

stages:
  - stage: test
    jobs:
      - job: UT_FT_1 through UT_FT_10   # 10 parallel test jobs
      - job: MergeAndPublishCoverage     # Aggregation job (depends on all 10)

Import

# No import needed — this file is consumed by Azure DevOps automatically
# when placed at the repository root and configured as a pipeline.
# Reference in Azure DevOps project settings:
#   Pipeline source: azure-pipelines-20230430.yml

I/O Contract

Inputs

Name	Type	Required	Description
trigger	Branch filter	Yes	Triggers on all branches via wildcard *
BUILD_PROFILES	Maven profiles	Yes	-Dscala-2.12 -Dspark3.5 -Dflink1.18 — selects Scala, Spark, and Flink versions
job3456UTModules	List of module paths	Yes	Spark datasource modules tested in jobs 3-6
job10UTModules	List of exclusion patterns	Yes	Modules excluded from job 10 unit tests (tested elsewhere)
job10FTModules	List of exclusion patterns	Yes	Modules excluded from job 10 functional tests
jacocoModules	List of exclusion patterns	Yes	Packaging/example modules excluded from coverage aggregation
Docker registry	Container registry	Yes	apachehudi-docker-hub for jobs 7 and 10

Outputs

Name	Type	Description
JUnit test results	XML files	Published via PublishTestResults or Maven JUnit publisher per job
JaCoCo execution data	.exec files	Per-job merged JaCoCo execution data published as build artifacts
Aggregated coverage report	XML + HTML	Final jacoco-report.xml and jacoco-html-report published by MergeAndPublishCoverage
Top 100 long-running tests	Console output	Sorted list of slowest test cases displayed per job

Usage Examples

Job Distribution Overview

# Job 1: hadoop-common unit tests + spark-client unit/functional tests
# Job 2: hudi-spark functional tests (FTA)
# Job 3: spark-datasource Java unit tests (functional package)
# Job 4: spark-datasource Java unit tests (non-functional package)
# Job 5: spark-datasource Scala DML tests
# Job 6: spark-datasource Scala DDL & Others tests
# Job 7: Hudi Streamer unit tests + utilities functional tests (Docker)
# Job 8: spark-datasource Scala SQL features + DML insert + FTC tests
# Job 9: spark FTB functional tests
# Job 10: Common modules + remaining utilities (Docker)
# MergeAndPublishCoverage: Aggregates all .exec files into final report

Adding a New Module to CI

# To add a new module to the test pipeline:
# 1. If it should run in an existing job, add to appropriate parameter list
# 2. If it should be excluded from job 10, add to job10UTModules/job10FTModules

parameters:
  - name: job10UTModules
    type: object
    default:
      - '!hudi-hadoop-common'
      - '!hudi-client/hudi-spark-client'
      # Add exclusion for your new module if tested elsewhere:
      - '!hudi-new-module'

JaCoCo Coverage Pipeline

# Each test job runs these steps after tests complete:
# 1. Download JaCoCo CLI
./scripts/jacoco/download_jacoco.sh

# 2. Merge per-module .exec files into merged-jacoco.exec
./scripts/jacoco/merge_jacoco_exec_files.sh \
  jacoco-lib/lib/jacococli.jar $(Build.SourcesDirectory)

# 3. Publish as build artifact: merged-jacoco-{BuildId}-{JobNumber}

# Final aggregation job merges all per-job files:
./scripts/jacoco/merge_jacoco_job_files.sh \
  jacoco-lib/lib/jacococli.jar $(System.ArtifactsDirectory) $(Build.SourcesDirectory)

# Generate HTML+XML report:
./scripts/jacoco/generate_jacoco_coverage_report.sh \
  jacoco-lib/lib/jacococli.jar $(Build.SourcesDirectory)

Related Pages

Environment:Apache_Hudi_Java_Maven_Build_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment