Principle:Datajuicer Data juicer LLM Backend Configuration
| Knowledge Sources | |
|---|---|
| Domains | NLP, Distributed_Computing, LLM |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
A model backend abstraction pattern that configures LLM inference engines (API-based or local vLLM) for use in data generation pipelines.
Description
LLM Backend Configuration sets up the model inference engine used by LLM-based data generation operators. It supports two modes: API-based (OpenAI-compatible endpoints for hosted models like GPT-4) and vLLM-based (local GPU inference using the vLLM engine, optionally distributed via Ray). The configuration includes model selection, inference parameters (temperature, top_p), engine parameters (tensor parallelism, max model length), and GPU allocation. For Ray pipelines, it uses vLLMEngineProcessorConfig from Ray Data's LLM module.
Usage
Use this principle when setting up LLM-powered data generation pipelines. Choose API mode for hosted models or vLLM mode for local GPU inference. Configure before running generation operators.
Theoretical Basis
# Abstract pattern (NOT real implementation)
# API mode:
config = {
'model_type': 'api',
'model': 'gpt-4o',
'endpoint': 'https://api.openai.com/v1/chat/completions',
'api_key': '...',
'sampling_params': {'temperature': 0.7, 'max_tokens': 512}
}
# vLLM mode (local GPU):
config = {
'model_type': 'vllm',
'model': 'Qwen/Qwen2.5-7B-Instruct',
'engine_kwargs': {'tensor_parallel_size': 2, 'max_model_len': 4096},
'sampling_params': {'temperature': 0.7, 'top_p': 0.9}
}
# Ray vLLM pipeline:
config = vLLMEngineProcessorConfig(
model_source='Qwen/Qwen2.5-7B-Instruct',
engine_kwargs={'tensor_parallel_size': 2}
)