Implementation:Datajuicer Data juicer Pip Install Ray Extras
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, DevOps |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
External tool documentation for installing Data-Juicer's distributed processing dependencies via pip extras.
Description
Data-Juicer's pyproject.toml defines optional dependency groups (extras) that install Ray and related packages. The ray extra installs ray[data] and pydantic for distributed data processing. The ray_video extra additionally includes video deduplication dependencies.
Usage
Run the appropriate pip install command before using any Ray-based executor. This is a one-time setup step per Python environment.
Code Reference
Source Location
- Repository: data-juicer
- File: pyproject.toml
- Lines: L1-234 (extras definitions)
Commands
# Base distributed processing
pip install "data-juicer[ray]"
# With video deduplication support
pip install "data-juicer[ray_video]"
# Full install with all extras
pip install "data-juicer[all]"
Import
# Verify installation
import ray
import ray.data
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| extras group | str | Yes | Package extras name: 'ray', 'ray_video', or 'all' |
Outputs
| Name | Type | Description |
|---|---|---|
| installed packages | Python packages | ray[data], pydantic, and related dependencies in the environment |
Usage Examples
Install and Verify
# Install ray extras
pip install "data-juicer[ray]"
# Verify installation
python -c "import ray; print(ray.__version__)"
python -c "from data_juicer.core.executor import RayExecutor; print('OK')"
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment