Principle:Datajuicer Data juicer LLM Backend Configuration

Knowledge Sources	Data-Juicer vLLM Ray Data LLM
Domains	NLP, Distributed_Computing, LLM
Last Updated	2026-02-14 17:00 GMT

Overview

A model backend abstraction pattern that configures LLM inference engines (API-based or local vLLM) for use in data generation pipelines.

Description

LLM Backend Configuration sets up the model inference engine used by LLM-based data generation operators. It supports two modes: API-based (OpenAI-compatible endpoints for hosted models like GPT-4) and vLLM-based (local GPU inference using the vLLM engine, optionally distributed via Ray). The configuration includes model selection, inference parameters (temperature, top_p), engine parameters (tensor parallelism, max model length), and GPU allocation. For Ray pipelines, it uses vLLMEngineProcessorConfig from Ray Data's LLM module.

Usage

Use this principle when setting up LLM-powered data generation pipelines. Choose API mode for hosted models or vLLM mode for local GPU inference. Configure before running generation operators.

Theoretical Basis

# Abstract pattern (NOT real implementation)
# API mode:
config = {
    'model_type': 'api',
    'model': 'gpt-4o',
    'endpoint': 'https://api.openai.com/v1/chat/completions',
    'api_key': '...',
    'sampling_params': {'temperature': 0.7, 'max_tokens': 512}
}

# vLLM mode (local GPU):
config = {
    'model_type': 'vllm',
    'model': 'Qwen/Qwen2.5-7B-Instruct',
    'engine_kwargs': {'tensor_parallel_size': 2, 'max_model_len': 4096},
    'sampling_params': {'temperature': 0.7, 'top_p': 0.9}
}

# Ray vLLM pipeline:
config = vLLMEngineProcessorConfig(
    model_source='Qwen/Qwen2.5-7B-Instruct',
    engine_kwargs={'tensor_parallel_size': 2}
)

Related Pages

Implemented By

Implementation:Datajuicer_Data_juicer_RayVLLMEnginePipeline_Config

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment