Implementation:Ray project Ray Release Data Tests Config

Knowledge Sources	Ray
Domains	Release, Testing, Data, Benchmarks
Last Updated	2026-02-13 16:00 GMT

Overview

This file defines the release test configurations for Ray Data, specifying benchmarks for reading, writing, aggregation, groupby, join, shuffle, sort, batch inference, training, iteration, TPCH queries, cross-AZ fault tolerance, and autoscaling workloads.

Description

The release/release_data_tests.yaml file uses a YAML-based test definition format with a DEFAULTS block that sets the working directory, frequency (nightly), team (data), and base cluster configuration (GPU image type with fixed-size CPU compute). Individual tests override these defaults and use matrix expansion for test variants (fixed_size vs. autoscaling, different data formats, shuffle strategies such as sort_shuffle_pull_based and hash_shuffle). Tests specify cluster compute configurations, timeouts, Python scripts to run, and some include chaos testing variations that terminate EC2 instances during execution. Test groups include reading (parquet, images, TFRecords, URIs), writing (parquet), aggregation (count), groupby (aggregate and map_groups), joins (inner, left_outer, right_outer, full_outer), sorting/shuffling, batch inference (image classification, image/text embeddings, multi-stage pipelines), distributed training, iteration (batches, TF, Torch), TPCH queries (Q1, Q6), and cross-AZ RPC fault tolerance.

Usage

Release engineers and data team members modify this file when adding new benchmark tests, adjusting test timeouts, updating scale factors, adding new data format support, changing cluster configurations, or creating new chaos test variations. Tests are run on nightly, weekly, or manual frequencies depending on their resource requirements and stability.

Code Reference

Source Location

Repository: Ray
File: release/release_data_tests.yaml
Lines: 1-843

Signature

- name: DEFAULTS
  group: data-base
  working_dir: nightly_tests/dataset
  frequency: nightly
  team: data
  cluster:
    byod:
      runtime_env:
        - RAY_DATA_DEBUG_RESOURCE_MANAGER=1
      type: gpu
    cluster_compute: fixed_size_cpu_compute.yaml

###############
# Reading tests
###############
- name: "read_parquet_{{scaling}}"
  python: "3.10"
  cluster:
    cluster_compute: "{{scaling}}_cpu_compute.yaml"
  matrix:
    setup:
      scaling: [fixed_size, autoscaling]
  run:
    timeout: 3600
    script: >
      python read_and_consume_benchmark.py ...

Import

Configuration file, consumed by the Ray release test framework. Referenced by the release test runner infrastructure to define and schedule nightly, weekly, and manual benchmark tests for the data team.

I/O Contract

Inputs

Name	Type	Required	Description
S3 benchmark data	S3 paths	yes	Test data stored in `s3://ray-benchmark-data` and `s3://ray-benchmark-data-internal-us-west-2` buckets
Cluster compute configs	YAML files	yes	Cluster sizing definitions (e.g., `fixed_size_cpu_compute.yaml`, `autoscaling_gpu_compute.yaml`)
Benchmark scripts	Python files	yes	Test scripts in `nightly_tests/dataset/` (e.g., `read_and_consume_benchmark.py`, `sort_benchmark.py`)
BYOD scripts	shell scripts	conditional	Post-build scripts like `byod_install_mosaicml.sh` for specialized dependencies
Python dependency lockfiles	lockfiles	conditional	Pinned dependencies like `image_classification_py3.10.lock`

Outputs

Name	Type	Description
Benchmark results	metrics	Performance metrics (throughput, latency) for each test
Test pass/fail status	boolean	Whether each benchmark completed within timeout
Release readiness signal	aggregate	Overall data team release test status for go/no-go decisions

Usage Examples

The file defines tests using matrix expansion and variations:

# Reading benchmark with fixed_size and autoscaling variants
- name: "read_parquet_{{scaling}}"
  python: "3.10"
  cluster:
    cluster_compute: "{{scaling}}_cpu_compute.yaml"
  matrix:
    setup:
      scaling: [fixed_size, autoscaling]
  run:
    timeout: 3600
    script: >
      python read_and_consume_benchmark.py
      s3://ray-benchmark-data-internal-us-west-2/imagenet/parquet
      --format parquet --iter-bundles

# Groupby benchmark with multiple shuffle strategies and column sets
- name: "aggregate_groups_{{scaling}}_{{shuffle_strategy}}_{{columns}}"
  matrix:
    setup:
      scaling: [fixed_size, autoscaling]
      shuffle_strategy: [sort_shuffle_pull_based, hash_shuffle]
      columns:
        - "column08 column13 column14"   # 84 groups
        - "column02 column14"            # 7M groups
  run:
    timeout: 3600
    script: >
      python groupby_benchmark.py --sf 100 --aggregate
      --group-by {{columns}} --shuffle-strategy {{shuffle_strategy}}

# Chaos test with EC2 instance termination during shuffle
- name: random_shuffle_chaos
  working_dir: nightly_tests
  cluster:
    cluster_compute: dataset/autoscaling_all_to_all_compute.yaml
  run:
    timeout: 10800
    prepare: >
      python setup_chaos.py --chaos TerminateEC2Instance
      --kill-interval 600 --max-to-kill 2
    script: >
      python dataset/sort_benchmark.py
      --num-partitions=1000 --partition-size=1e9 --shuffle

# Distributed training with chaos variation
- name: distributed_training
  cluster:
    cluster_compute: dataset/multi_node_train_16_workers.yaml
  run:
    script: >
      python dataset/multi_node_train_benchmark.py
      --num-workers 16 --file-type parquet --use-gpu
  variations:
    - __suffix__: regular
    - __suffix__: chaos
      run:
        prepare: >
          python setup_chaos.py --kill-interval 200 --max-to-kill 1

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment