Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ray project Ray Buildkite Data Pipeline

From Leeroopedia
Knowledge Sources
Domains CI, Testing, Data
Last Updated 2026-02-13 16:00 GMT

Overview

This file defines the Buildkite "data tests" pipeline group that builds and tests Ray Data, covering data processing, streaming, MongoDB integration, and various data source connectors across multiple Apache Arrow versions.

Description

The .buildkite/data.rayci.yml pipeline creates specialized Docker build images for different test configurations: data9build (Arrow v9), datalbuild (Arrow v23), databuild (general data), datanbuild (Arrow nightly), datamongobuild (MongoDB integration), datatfxbslbuild (TFX-BSL TFRecords), and datatfdsbuild (TensorFlow Datasets). It then runs test suites using test_in_docker with parallelism across multiple configurations, including doc tests, integration tests, dashboard tests, flaky test reruns, authenticated tests (Snowflake), and GPU-accelerated tests. The pipeline depends on forge, oss-ci-base_ml-multipy, ray-core-build, and ray-dashboard-build.

Usage

Developers modify this file when adding new data source connectors, updating Arrow version test matrices, changing test parallelism, adding new integration tests, or adjusting which data tests run on premerge versus postmerge builds.

Code Reference

Source Location

  • Repository: Ray
  • File: .buildkite/data.rayci.yml
  • Lines: 1-374

Signature

group: data tests
depends_on:
  - forge
  - oss-ci-base_ml-multipy
  - ray-core-build
  - ray-dashboard-build
steps:
  # builds
  - name: data9build-multipy
    label: "wanda: data9build-py{{matrix}}"
    wanda: ci/docker/data9.build.wanda.yaml
    matrix:
      - "3.10"
    env:
      PYTHON: "{{matrix}}"
    tags: cibase

  - name: datalbuild-multipy
    label: "wanda: datalbuild-py{{matrix}}"
    wanda: ci/docker/datal.build.wanda.yaml
    matrix: ["3.10", "3.12"]

Import

Configuration file, referenced by the Buildkite CI pipeline system. Loaded as part of the RayCI pipeline group mechanism and depends on forge, ML base images, ray-core-build, and ray-dashboard-build steps.

I/O Contract

Inputs

Name Type Required Description
forge dependency yes The forge build environment providing Bazel and tooling
oss-ci-base_ml-multipy dependency yes ML base CI Docker images for multiple Python versions
ray-core-build dependency yes Core Ray binary build artifacts
ray-dashboard-build dependency yes Dashboard build artifacts
Snowflake credentials env vars conditional Required for authenticated postmerge tests (SNOWFLAKE_USER, SNOWFLAKE_ACCOUNT, etc.)

Outputs

Name Type Description
data9build-multipy Docker image Arrow v9 data test images (Python 3.10)
datalbuild-multipy Docker image Arrow v23 data test images (Python 3.10, 3.12)
databuild-multipy Docker image General data test images (Python 3.10, 3.12)
datanbuild-multipy Docker image Arrow nightly data test images (Python 3.10)
datamongobuild-multipy Docker image MongoDB integration test images (Python 3.10)
datatfxbslbuild-multipy Docker image TFX-BSL TFRecords test images (Python 3.10)
datatfdsbuild-multipy Docker image TensorFlow Datasets test images (Python 3.12)
Test results CI artifacts Test results from all data test suites

Usage Examples

The pipeline tests Ray Data across multiple Arrow versions and configurations:

# Arrow v9 tests with 8-way parallelism
- label: ":database: data: arrow v9 tests"
  tags: [data]
  instance_type: medium
  parallelism: 8
  commands:
    - bazel run //ci/ray_ci:test_in_docker -- //python/ray/data/... data
      --workers "$${BUILDKITE_PARALLEL_JOB_COUNT}"
      --worker-id "$${BUILDKITE_PARALLEL_JOB}"
      --parallelism-per-worker 3
      --build-name data9build-py3.10 --python-version 3.10
      --except-tags data_integration,doctest,data_non_parallel,dask

# Authenticated Snowflake tests (postmerge only)
- label: ":database: data: postmerge authenticated tests"
  tags: [python, data, oss, skip-on-premerge]
  commands:
    - $(python ci/env/setup_credentials.py)
    - bazel run //ci/ray_ci:test_in_docker -- //python/ray/data/... data
      --only-tags needs_credentials
      --test-env=SNOWFLAKE_USER --test-env=SNOWFLAKE_ACCOUNT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment