Main Page
Welcome to Leeroopedia
Your ML & Data Knowledge Wiki. Best practices and expert-level knowledge for Machine Learning and Data Engineering, covering 1000+ frameworks and libraries from training to deployment.
Browse implementation patterns, configuration guides, debugging heuristics, and battle-tested defaults for frameworks like vLLM, DeepSpeed, Megatron-LM, FlashAttention, Triton, Unsloth, LangChain, and many more. Every page is structured so both humans and AI agents can find what they need fast.
Connect your AI coding agent. Plug Leeroopedia into your favorite coding agent, and let it build robust AI/ML systems autonomously:
- SuperML plugin — converts your AI coding agent into an expert ML engineer with agentic memory
- Leeroopedia MCP — search over best-practices and skills of ML/AI
- Kapso — experimentation platform for autonomous AI/ML software building
Browse by Category
| Category | Description | Browse |
|---|---|---|
| Workflows | Step-by-step processes and procedures | Browse All |
| Principles | Core ideas and foundational knowledge | Browse All |
| Implementations | Code-level details and modules | Browse All |
| Heuristics | Best practices and guidelines | Browse All |
| Environments | Setup and configuration guides | Browse All |
Explore Pages
Workflows
- Workflow:Snorkel team Snorkel Multitask Classification
- Workflow:Openai CLIP Zero shot image classification
- Workflow:Princeton nlp SimPO On Policy Data Generation
- Workflow:Sdv dev SDV Data quality evaluation
- Workflow:CrewAIInc CrewAI Sequential Crew Execution
- Workflow:Predibase Lorax Structured JSON Output
- Workflow:Astronomer Astronomer cosmos TaskGroup dbt integration
- Workflow:Datahub project Datahub Spark Lineage Capture
- Workflow:ChenghaoMou Text dedup Suffix Array Deduplication
- Workflow:Scikit learn contrib Imbalanced learn Balanced Deep Learning Training
Principles
- Principle:Iamhankai Forest of Thought Chain of Thought Reasoning
- Principle:BerriAI Litellm Logging Payload Construction
- Principle:Microsoft BIPIA Tokenizer And Model Preparation
- Principle:Microsoft Agent framework Durable Agent State Persistence
- Principle:Eventual Inc Daft Error Hierarchy
- Principle:Turboderp org Exllamav2 Calibration Tokenization
- Principle:Lucidrains X transformers Sequence to Sequence Data Preparation
- Principle:Openclaw Openclaw Routing Verification
- Principle:Scikit learn Scikit learn Cross Decomposition
- Principle:Openai Whisper Single Segment Decoding
Implementations
- Implementation:Haosulab ManiSkill PDJointVelController
- Implementation:PacktPublishing LLM Engineers Handbook DatasetGenerator Get Prompts
- Implementation:Apache Paimon RestClient
- Implementation:EvolvingLMMs Lab Lmms eval VideoMathQA CoT Step Evaluation
- Implementation:Infiniflow Ragflow Canvas Util
- Implementation:PacktPublishing LLM Engineers Handbook SelfQuery Generate
- Implementation:Lance format Lance BatchDecodeStream
- Implementation:Openai Whisper Decode
- Implementation:Google deepmind Mujoco Engine Collision Driver
- Implementation:Datajuicer Data juicer FrequencySpecifiedFieldSelector
Heuristics
- Heuristic:Haotian liu LLaVA Gradient Checkpointing Memory Optimization
- Heuristic:DataExpert io Data engineer handbook Docker Volume Persistence Management
- Heuristic:MaterializeInc Materialize CI Agent Prioritization
- Heuristic:Fastai Fastbook Discriminative Learning Rates
- Heuristic:Puppeteer Puppeteer Headless Linux Requirements
- Heuristic:Unstructured IO Unstructured Hi Res Model Configuration
- Heuristic:TA Lib Ta lib python Thread Safety With Abstract API
- Heuristic:Webdriverio Webdriverio Exponential Backoff Retry
- Heuristic:Iterative Dvc Path Performance Optimization
- Heuristic:Huggingface Diffusers Dtype Precision Selection
Environments
- Environment:Astronomer Astronomer cosmos Cloud Provider Dependencies
- Environment:Allenai Open instruct Docker Container
- Environment:PacktPublishing LLM Engineers Handbook AWS SageMaker GPU Environment
- Environment:Openai Evals OpenAI API Configuration
- Environment:ThreeSR Awesome Inference Time Scaling Python Runtime Environment
- Environment:Kubeflow Pipelines Python SDK
- Environment:Bentoml BentoML Triton Inference Server
- Environment:Openai Whisper Numba
- Environment:Snorkel team Snorkel PySpark
- Environment:Apache Shardingsphere ZooKeeper Cluster Coordination