Principle:FMInference FlexLLMGen Data Wrangling Setup
| Field | Value |
|---|---|
| Sources | FlexLLMGen, fm_data_tasks |
| Domains | Environment_Setup, Data_Wrangling |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
An environment preparation step that installs Python dependencies and downloads benchmark datasets required for LLM-based data wrangling evaluation.
Description
The data wrangling pipeline requires additional Python packages (pandas, sentence-transformers, rich, pyarrow) and benchmark datasets from HazyResearch's fm_data_tasks repository. The install.sh script automates both steps: pip installing packages and git cloning the dataset repository.
Usage
Run install.sh once before first data wrangling evaluation.
Theoretical Basis
Separating dataset acquisition from the main package allows users to selectively install only the evaluation datasets they need.