Main Page
Welcome to Leeroopedia
Your ML & Data Knowledge Wiki. Best practices and expert-level knowledge for Machine Learning and Data Engineering, covering 1000+ frameworks and libraries from training to deployment.
Browse implementation patterns, configuration guides, debugging heuristics, and battle-tested defaults for frameworks like vLLM, DeepSpeed, Megatron-LM, FlashAttention, Triton, Unsloth, LangChain, and many more. Every page is structured so both humans and AI agents can find what they need fast.
Connect your AI coding agent. Plug Leeroopedia into your favorite coding agent, and let it build robust AI/ML systems autonomously:
- SuperML plugin — converts your AI coding agent into an expert ML engineer with agentic memory
- Leeroopedia MCP — search over best-practices and skills of ML/AI
- Kapso — experimentation platform for autonomous AI/ML software building
Browse by Category
| Category | Description | Browse |
|---|---|---|
| Workflows | Step-by-step processes and procedures | Browse All |
| Principles | Core ideas and foundational knowledge | Browse All |
| Implementations | Code-level details and modules | Browse All |
| Heuristics | Best practices and guidelines | Browse All |
| Environments | Setup and configuration guides | Browse All |
Explore Pages
Workflows
- Workflow:Allenai Open instruct Tulu3 Full Post Training
- Workflow:Huggingface Datatrove Common Crawl Processing
- Workflow:Dagster io Dagster Dbt Integration
- Workflow:Ggml org Llama cpp Embedding Extraction
- Workflow:SeleniumHQ Selenium Selenium Grid Deployment
- Workflow:Haosulab ManiSkill Sim2Real Deployment
- Workflow:Heibaiying BigData Notes Flink Kafka Streaming Pipeline
- Workflow:Lakeraai Pint benchmark Custom Dataset Benchmarking
- Workflow:Duckdb Duckdb Extension Development And Distribution
- Workflow:Risingwavelabs Risingwave CDC Data Replication
Principles
- Principle:DataTalksClub Data engineering zoomcamp Data Deduplication
- Principle:Huggingface Datatrove Multilingual Word Tokenization
- Principle:Tensorflow Tfjs Tensor Merging
- Principle:Duckdb Duckdb Serialization Code Generation
- Principle:Sdv dev SDV Schema Simplification
- Principle:Helicone Helicone Analytics Storage
- Principle:Apache Paimon Global Index Scan Building
- Principle:Isaac sim IsaacGymEnvs Environment Setup
- Principle:Huggingface Trl GRPO Training Loop
- Principle:Recommenders team Recommenders SAR Recommendation Generation
Implementations
- Implementation:Haosulab ManiSkill PDJointPosController
- Implementation:Alibaba MNN Protobuf Wire Format H
- Implementation:Risingwavelabs Risingwave Grafana Dashboard Generation
- Implementation:Microsoft Playwright Client Waiter
- Implementation:Deepspeedai DeepSpeed AIO Bench Perf Sweep
- Implementation:Openai Openai python Response File Search Call In Progress
- Implementation:VainF Torch Pruning Group Prune
- Implementation:TobikoData Sqlmesh Context Invalidate Environment
- Implementation:AnswerDotAI RAGatouille Export To Huggingface Hub
- Implementation:Marker Inc Korea AutoRAG QA To Parquet
Heuristics
- Heuristic:Apache Airflow Task Dependency Isolation
- Heuristic:ARISE Initiative Robomimic HDF5 Cache Mode Selection
- Heuristic:Microsoft Agent framework PowerFx Python Version Limit
- Heuristic:Princeton nlp Tree of thought llm Duplicate Candidate Zeroing
- Heuristic:Nautechsystems Nautilus trader Order Rate Limiting Configuration
- Heuristic:Kubeflow Kubeflow Sequential Infrastructure Deployment
- Heuristic:EvolvingLMMs Lab Lmms eval Request Caching Strategy
- Heuristic:Kserve Kserve Server Side Apply For CRDs
- Heuristic:FlowiseAI Flowise Edge Connection Type Matching
- Heuristic:Huggingface Peft DoRA Inference Caching
Environments
- Environment:Mlc ai Mlc llm TVM Runtime Environment
- Environment:Triton inference server Server Docker Container Build
- Environment:Cleanlab Cleanlab Datalab Dependencies
- Environment:Heibaiying BigData Notes Flink 1 9 Environment
- Environment:Pyro ppl Pyro CUDA GPU Acceleration
- Environment:Anthropics Anthropic sdk python Azure Foundry Environment
- Environment:Fede1024 Rust rdkafka CI Test Runner
- Environment:OWASP Www project top 10 for large language model applications Pre Commit Hooks Environment
- Environment:Treeverse LakeFS Go Runtime Environment
- Environment:Avdvg InjectGuard CUDA GPU