Heuristic:Treeverse LakeFS Batch Delay Tuning
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Performance |
| Last Updated | 2026-02-08 10:00 GMT |
Overview
The lakeFS server batches KV store operations with a 3ms maximum delay, balancing throughput (200-1000 req/s) against added latency on critical path queries.
Description
lakeFS uses a batching mechanism for metadata store (KV) operations to reduce the number of expensive queries. The MaxBatchDelay parameter controls the maximum time the server will wait to accumulate operations into a single batch before executing. The default of 3 milliseconds represents a careful trade-off: it enables effective batching under concurrent load (reducing database pressure) while keeping added latency imperceptible for typical interactive use cases.
Usage
Use this heuristic when tuning lakeFS server performance, diagnosing unexpectedly slow metadata operations, or configuring lakeFS for high-throughput workloads. Adjusting this value affects the trade-off between per-request latency and overall throughput.
The Insight (Rule of Thumb)
- Action: Configure `graveler.max_batch_delay` based on your concurrency profile.
- Value: Default is 3ms, representing 200-1000 req/s sweet spot.
- Trade-off: Lower values (1ms) reduce latency but decrease batching effectiveness. Higher values (5ms) improve batching under heavy load but add noticeable latency to every metadata operation.
- Guideline: Keep between 1-5ms. Below 1ms, batching becomes ineffective. Above 5ms, latency becomes perceptible.
Reasoning
The codebase comment explains the rationale explicitly: "Since reducing # of expensive operations is only beneficial when there are a lot of concurrent requests, the sweet spot is probably between 1-5 milliseconds (representing 200-1000 requests/second to the data store). 3ms of delay with ~300 requests/second per resource sounds like a reasonable tradeoff." This is a classic latency-vs-throughput optimization. At low concurrency, the batch delay adds unnecessary latency. At high concurrency, it dramatically reduces database load by combining multiple operations into single batch queries.
Code Evidence
Batch delay configuration from `pkg/config/defaults.go:162-169`:
// MaxBatchDelay - 3ms was chosen as a max delay time for critical path queries.
// It trades off amount of queries per second (and thus effectiveness of the batching mechanism) with added latency.
// Since reducing # of expensive operations is only beneficial when there are a lot of concurrent requests,
//
// the sweet spot is probably between 1-5 milliseconds (representing 200-1000 requests/second to the data store).
//
// 3ms of delay with ~300 requests/second per resource sounds like a reasonable tradeoff.
viper.SetDefault("graveler.max_batch_delay", 3*time.Millisecond)
Related cache configuration from `pkg/config/defaults.go:155-160`:
viper.SetDefault("graveler.repository_cache.size", 1000)
viper.SetDefault("graveler.repository_cache.expiry", 5*time.Second)
viper.SetDefault("graveler.repository_cache.jitter", 2*time.Second)
viper.SetDefault("graveler.commit_cache.size", 50_000)
viper.SetDefault("graveler.commit_cache.expiry", 10*time.Minute)
viper.SetDefault("graveler.commit_cache.jitter", 2*time.Second)