Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FlagOpen FlagEmbedding Reinforced IR Multi GPU

From Leeroopedia


Knowledge Sources
Domains Distributed Computing, LLM Inference, Data Generation
Last Updated 2026-02-09 00:00 GMT

Overview

Multi-GPU parallel text generation system for scaling LLM-based data generation in Reinforced IR pipeline.

Description

This module implements a multi-GPU parallel processing system for generating synthetic data at scale using large language models. It uses Python multiprocessing to spawn separate processes for each GPU, where each process handles a split of the input data independently. The system reads JSON files containing prompts, divides them across available GPUs, runs generation in parallel, and merges results back together.

The implementation uses CUDA_VISIBLE_DEVICES to assign specific GPUs to each process, preventing memory conflicts. Each worker loads its own instance of the LLM with configurable memory utilization. After all workers complete, the main process merges output files from temporary split directories and optionally cleans up intermediate files. This approach enables efficient scaling of data generation tasks that would be impractical on a single GPU due to time constraints.

Usage

Use this script to parallelize LLM-based data generation across multiple GPUs when processing large datasets for the Reinforced IR pipeline, particularly for query generation or augmentation at scale.

Code Reference

Source Location

Signature

def worker_function(device):
    """Worker function that runs on each GPU"""

def merge(args: Args):
    """Merge results from all workers"""

if __name__ == "__main__":
    """Main entry point for multi-GPU processing"""

Import

import os
import json
import shutil
import multiprocessing
from dataclasses import dataclass, field
from transformers import HfArgumentParser
from agent import LLMInstructAgent, LLMAgent

I/O Contract

Inputs

Name Type Required Description
generate_model_path str Yes Path to LLM for generation
input_dir str Yes Directory containing JSON files with prompts
output_dir str Yes Directory to save generated outputs
num_gpus int Yes Number of GPUs to use for parallel processing
temperature float No LLM generation temperature (default: 0.8)
gpu_memory_utilization float No GPU memory fraction per worker (default: 0.8)
max_tokens int No Max tokens per generation (default: 300)
model_type str No LLM type: "llm" or "llm_instruct" (default: "llm_instruct")
rm_tmp bool No Remove temporary split directories (default: True)

Outputs

Name Type Description
output JSON files JSON Generated text for each input file, merged across GPUs
tmp_split_N directories Directory Temporary worker outputs (removed if rm_tmp=True)

Usage Examples

# Command line usage
python multi.py \
    --generate_model_path Meta-Llama-3-8B-Instruct \
    --model_type llm_instruct \
    --input_dir ./prompts \
    --output_dir ./outputs \
    --num_gpus 8 \
    --temperature 0.7 \
    --gpu_memory_utilization 0.9 \
    --max_tokens 300 \
    --rm_tmp True

# Input directory structure:
# ./prompts/
#   dataset1.json  # List of prompt strings
#   dataset2.json

# Output directory structure (during processing):
# ./outputs/
#   tmp_split_0/
#     dataset1.json  # Partial results from GPU 0
#   tmp_split_1/
#     dataset1.json  # Partial results from GPU 1
#   ...

# Final output (after merge):
# ./outputs/
#   dataset1.json  # Merged results from all GPUs
#   dataset2.json

# Input JSON format:
[
    "Prompt 1 text",
    "Prompt 2 text",
    "Prompt 3 text"
]

# Output JSON format:
[
    "Generated response 1",
    "Generated response 2",
    "Generated response 3"
]

# The script automatically:
# 1. Splits data across num_gpus workers
# 2. Each worker processes its split independently
# 3. Results are merged in original order
# 4. Temporary files are cleaned up

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment