Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FMInference FlexLLMGen Data Wrangling Setup

From Leeroopedia
Revision as of 17:10, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/FMInference_FlexLLMGen_Data_Wrangling_Setup.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Sources FlexLLMGen, fm_data_tasks
Domains Environment_Setup, Data_Wrangling
Last Updated 2026-02-09 00:00 GMT

Overview

An environment preparation step that installs Python dependencies and downloads benchmark datasets required for LLM-based data wrangling evaluation.

Description

The data wrangling pipeline requires additional Python packages (pandas, sentence-transformers, rich, pyarrow) and benchmark datasets from HazyResearch's fm_data_tasks repository. The install.sh script automates both steps: pip installing packages and git cloning the dataset repository.

Usage

Run install.sh once before first data wrangling evaluation.

Theoretical Basis

Separating dataset acquisition from the main package allows users to selectively install only the evaluation datasets they need.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment