Implementation:Ray project Ray Release Data Tests Config
| Knowledge Sources | |
|---|---|
| Domains | Release, Testing, Data, Benchmarks |
| Last Updated | 2026-02-13 16:00 GMT |
Overview
This file defines the release test configurations for Ray Data, specifying benchmarks for reading, writing, aggregation, groupby, join, shuffle, sort, batch inference, training, iteration, TPCH queries, cross-AZ fault tolerance, and autoscaling workloads.
Description
The release/release_data_tests.yaml file uses a YAML-based test definition format with a DEFAULTS block that sets the working directory, frequency (nightly), team (data), and base cluster configuration (GPU image type with fixed-size CPU compute). Individual tests override these defaults and use matrix expansion for test variants (fixed_size vs. autoscaling, different data formats, shuffle strategies such as sort_shuffle_pull_based and hash_shuffle). Tests specify cluster compute configurations, timeouts, Python scripts to run, and some include chaos testing variations that terminate EC2 instances during execution. Test groups include reading (parquet, images, TFRecords, URIs), writing (parquet), aggregation (count), groupby (aggregate and map_groups), joins (inner, left_outer, right_outer, full_outer), sorting/shuffling, batch inference (image classification, image/text embeddings, multi-stage pipelines), distributed training, iteration (batches, TF, Torch), TPCH queries (Q1, Q6), and cross-AZ RPC fault tolerance.
Usage
Release engineers and data team members modify this file when adding new benchmark tests, adjusting test timeouts, updating scale factors, adding new data format support, changing cluster configurations, or creating new chaos test variations. Tests are run on nightly, weekly, or manual frequencies depending on their resource requirements and stability.
Code Reference
Source Location
- Repository: Ray
- File:
release/release_data_tests.yaml - Lines: 1-843
Signature
- name: DEFAULTS
group: data-base
working_dir: nightly_tests/dataset
frequency: nightly
team: data
cluster:
byod:
runtime_env:
- RAY_DATA_DEBUG_RESOURCE_MANAGER=1
type: gpu
cluster_compute: fixed_size_cpu_compute.yaml
###############
# Reading tests
###############
- name: "read_parquet_{{scaling}}"
python: "3.10"
cluster:
cluster_compute: "{{scaling}}_cpu_compute.yaml"
matrix:
setup:
scaling: [fixed_size, autoscaling]
run:
timeout: 3600
script: >
python read_and_consume_benchmark.py ...
Import
Configuration file, consumed by the Ray release test framework. Referenced by the release test runner infrastructure to define and schedule nightly, weekly, and manual benchmark tests for the data team.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| S3 benchmark data | S3 paths | yes | Test data stored in s3://ray-benchmark-data and s3://ray-benchmark-data-internal-us-west-2 buckets
|
| Cluster compute configs | YAML files | yes | Cluster sizing definitions (e.g., fixed_size_cpu_compute.yaml, autoscaling_gpu_compute.yaml)
|
| Benchmark scripts | Python files | yes | Test scripts in nightly_tests/dataset/ (e.g., read_and_consume_benchmark.py, sort_benchmark.py)
|
| BYOD scripts | shell scripts | conditional | Post-build scripts like byod_install_mosaicml.sh for specialized dependencies
|
| Python dependency lockfiles | lockfiles | conditional | Pinned dependencies like image_classification_py3.10.lock
|
Outputs
| Name | Type | Description |
|---|---|---|
| Benchmark results | metrics | Performance metrics (throughput, latency) for each test |
| Test pass/fail status | boolean | Whether each benchmark completed within timeout |
| Release readiness signal | aggregate | Overall data team release test status for go/no-go decisions |
Usage Examples
The file defines tests using matrix expansion and variations:
# Reading benchmark with fixed_size and autoscaling variants
- name: "read_parquet_{{scaling}}"
python: "3.10"
cluster:
cluster_compute: "{{scaling}}_cpu_compute.yaml"
matrix:
setup:
scaling: [fixed_size, autoscaling]
run:
timeout: 3600
script: >
python read_and_consume_benchmark.py
s3://ray-benchmark-data-internal-us-west-2/imagenet/parquet
--format parquet --iter-bundles
# Groupby benchmark with multiple shuffle strategies and column sets
- name: "aggregate_groups_{{scaling}}_{{shuffle_strategy}}_{{columns}}"
matrix:
setup:
scaling: [fixed_size, autoscaling]
shuffle_strategy: [sort_shuffle_pull_based, hash_shuffle]
columns:
- "column08 column13 column14" # 84 groups
- "column02 column14" # 7M groups
run:
timeout: 3600
script: >
python groupby_benchmark.py --sf 100 --aggregate
--group-by {{columns}} --shuffle-strategy {{shuffle_strategy}}
# Chaos test with EC2 instance termination during shuffle
- name: random_shuffle_chaos
working_dir: nightly_tests
cluster:
cluster_compute: dataset/autoscaling_all_to_all_compute.yaml
run:
timeout: 10800
prepare: >
python setup_chaos.py --chaos TerminateEC2Instance
--kill-interval 600 --max-to-kill 2
script: >
python dataset/sort_benchmark.py
--num-partitions=1000 --partition-size=1e9 --shuffle
# Distributed training with chaos variation
- name: distributed_training
cluster:
cluster_compute: dataset/multi_node_train_16_workers.yaml
run:
script: >
python dataset/multi_node_train_benchmark.py
--num-workers 16 --file-type parquet --use-gpu
variations:
- __suffix__: regular
- __suffix__: chaos
run:
prepare: >
python setup_chaos.py --kill-interval 200 --max-to-kill 1