Implementation:Ray project Ray Buildkite Data Pipeline
| Knowledge Sources | |
|---|---|
| Domains | CI, Testing, Data |
| Last Updated | 2026-02-13 16:00 GMT |
Overview
This file defines the Buildkite "data tests" pipeline group that builds and tests Ray Data, covering data processing, streaming, MongoDB integration, and various data source connectors across multiple Apache Arrow versions.
Description
The .buildkite/data.rayci.yml pipeline creates specialized Docker build images for different test configurations: data9build (Arrow v9), datalbuild (Arrow v23), databuild (general data), datanbuild (Arrow nightly), datamongobuild (MongoDB integration), datatfxbslbuild (TFX-BSL TFRecords), and datatfdsbuild (TensorFlow Datasets). It then runs test suites using test_in_docker with parallelism across multiple configurations, including doc tests, integration tests, dashboard tests, flaky test reruns, authenticated tests (Snowflake), and GPU-accelerated tests. The pipeline depends on forge, oss-ci-base_ml-multipy, ray-core-build, and ray-dashboard-build.
Usage
Developers modify this file when adding new data source connectors, updating Arrow version test matrices, changing test parallelism, adding new integration tests, or adjusting which data tests run on premerge versus postmerge builds.
Code Reference
Source Location
- Repository: Ray
- File:
.buildkite/data.rayci.yml - Lines: 1-374
Signature
group: data tests
depends_on:
- forge
- oss-ci-base_ml-multipy
- ray-core-build
- ray-dashboard-build
steps:
# builds
- name: data9build-multipy
label: "wanda: data9build-py{{matrix}}"
wanda: ci/docker/data9.build.wanda.yaml
matrix:
- "3.10"
env:
PYTHON: "{{matrix}}"
tags: cibase
- name: datalbuild-multipy
label: "wanda: datalbuild-py{{matrix}}"
wanda: ci/docker/datal.build.wanda.yaml
matrix: ["3.10", "3.12"]
Import
Configuration file, referenced by the Buildkite CI pipeline system. Loaded as part of the RayCI pipeline group mechanism and depends on forge, ML base images, ray-core-build, and ray-dashboard-build steps.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
forge |
dependency | yes | The forge build environment providing Bazel and tooling |
oss-ci-base_ml-multipy |
dependency | yes | ML base CI Docker images for multiple Python versions |
ray-core-build |
dependency | yes | Core Ray binary build artifacts |
ray-dashboard-build |
dependency | yes | Dashboard build artifacts |
| Snowflake credentials | env vars | conditional | Required for authenticated postmerge tests (SNOWFLAKE_USER, SNOWFLAKE_ACCOUNT, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
data9build-multipy |
Docker image | Arrow v9 data test images (Python 3.10) |
datalbuild-multipy |
Docker image | Arrow v23 data test images (Python 3.10, 3.12) |
databuild-multipy |
Docker image | General data test images (Python 3.10, 3.12) |
datanbuild-multipy |
Docker image | Arrow nightly data test images (Python 3.10) |
datamongobuild-multipy |
Docker image | MongoDB integration test images (Python 3.10) |
datatfxbslbuild-multipy |
Docker image | TFX-BSL TFRecords test images (Python 3.10) |
datatfdsbuild-multipy |
Docker image | TensorFlow Datasets test images (Python 3.12) |
| Test results | CI artifacts | Test results from all data test suites |
Usage Examples
The pipeline tests Ray Data across multiple Arrow versions and configurations:
# Arrow v9 tests with 8-way parallelism
- label: ":database: data: arrow v9 tests"
tags: [data]
instance_type: medium
parallelism: 8
commands:
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/data/... data
--workers "$${BUILDKITE_PARALLEL_JOB_COUNT}"
--worker-id "$${BUILDKITE_PARALLEL_JOB}"
--parallelism-per-worker 3
--build-name data9build-py3.10 --python-version 3.10
--except-tags data_integration,doctest,data_non_parallel,dask
# Authenticated Snowflake tests (postmerge only)
- label: ":database: data: postmerge authenticated tests"
tags: [python, data, oss, skip-on-premerge]
commands:
- $(python ci/env/setup_credentials.py)
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/data/... data
--only-tags needs_credentials
--test-env=SNOWFLAKE_USER --test-env=SNOWFLAKE_ACCOUNT