Workflow:FMInference FlexLLMGen Data Wrangling Batch Inference

Knowledge Sources	FlexLLMGen FlexGen: High-Throughput Generative Inference Can Foundation Models Wrangle Your Data?
Domains	LLM_Inference, Data_Wrangling, Batch_Processing
Last Updated	2026-02-09 12:00 GMT

Overview

End-to-end process for running high-throughput LLM-based data wrangling tasks (entity matching, data imputation, and error detection) over structured datasets using FlexLLMGen's offloaded inference engine.

Description

This workflow demonstrates how to apply large OPT models to structured data wrangling tasks following the HazyResearch fm_data_tasks benchmark. It covers loading structured CSV datasets, constructing few-shot prompts from training examples, running batched inference through FlexLLMGen's offloading engine, and evaluating predictions with precision, recall, and F1 metrics. The workflow supports three task types: entity matching (determining if two records refer to the same entity), data imputation (predicting missing attribute values), and error detection (identifying incorrect values in records). It supports both single-query mode (for correctness verification) and batched mode (for throughput measurement).

Usage

Execute this workflow when you need to apply a large language model to structured data quality tasks such as deduplication, missing value prediction, or data cleaning, and you want to maximize throughput on hardware with limited GPU memory. The workflow is designed for batch processing of datasets with long input sequences (123-1274 tokens) and short output sequences (3-10 tokens).

Execution Steps

Step 1: Install Dependencies and Download Datasets

Install additional Python libraries required for data processing (pandas, sentence-transformers, rich, pyarrow) and download the fm_data_tasks benchmark datasets from HazyResearch. The datasets include entity matching pairs, data imputation records, and error detection examples across 10 benchmark tasks.

Key considerations:

Run the install.sh script to install dependencies and download datasets
Datasets cover 7 entity matching tasks (Fodors-Zagats, Beer, iTunes-Amazon, etc.), 2 data imputation tasks (Restaurant, Buy), and 1 error detection task (Hospital)
Each dataset includes train/test splits in CSV format

Step 2: Configure Task and Model Parameters

Select the data wrangling task (entity matching, data imputation, or error detection), the specific dataset, the OPT model size, and the FlexLLMGen offloading policy. Also configure prompt construction parameters such as the number of few-shot examples and the prompting strategy (manual, random, or embedding-based selection).

Key considerations:

Task type determines the prompt format and expected output format
Choose between single-query mode (--run_single_query) for correctness verification and batch mode for throughput
The --num_run parameter controls how many test examples to evaluate
Offloading policy must account for the long input sequences typical of data wrangling tasks

Step 3: Load and Serialize Dataset

Load the structured CSV dataset and serialize each record or record pair into text format suitable for LLM inference. The serialization converts tabular data into natural language descriptions, handling different column schemas and task-specific formatting rules for each dataset.

Key considerations:

Entity matching serializes two records side-by-side for comparison
Data imputation masks the target column and asks the model to predict it
Error detection presents a record and asks the model to identify incorrect values
Dataset-specific constants define column schemas, instructions, and output formats

Step 4: Construct Few-shot Prompts

Build few-shot prompts by prepending training examples to each test query. Three prompting strategies are available: manually crafted examples, randomly sampled training examples, or embedding-based selection of the most relevant training examples using sentence-transformers for similarity matching.

Key considerations:

Few-shot examples provide in-context learning signal for the model
The number of examples is configurable (typically 3-5)
Embedding-based selection finds the hardest or most relevant examples for each query
Prompts include task-specific instructions and output format specifications

Step 5: Run Batched Inference

Process all evaluation queries through FlexLLMGen's generation pipeline. In batch mode, queries are padded to uniform length and processed in batches through the offloaded inference engine. In single-query mode, each query is processed individually for correctness verification.

Key considerations:

Batch mode groups queries by --gpu-batch-size and --num-gpu-batches
Queries are padded to --pad-to-seq-len for uniform batching
Output length is short (3-10 tokens) compared to input length (100-1300 tokens)
The model is reinitialized per batch to handle varying padded sequence lengths

Step 6: Evaluate Predictions

Parse the model's generated outputs, extract predictions, and compute evaluation metrics (precision, recall, accuracy, F1 score) by comparing against ground truth labels. Results are logged with throughput measurements including both input and output tokens per second.

Key considerations:

Entity matching outputs are parsed as Yes/No responses
Data imputation outputs are compared against the masked attribute value
Error detection outputs identify the erroneous column
Throughput is measured as (input_tokens + output_tokens) / total_time since prefill dominates runtime

Execution Diagram

GitHub URL

Workflow Repository