Heuristic:Ollama Ollama Download Retry Strategy
| Knowledge Sources | |
|---|---|
| Domains | Networking, Optimization, Infrastructure |
| Last Updated | 2026-02-14 22:00 GMT |
Overview
Resilient multi-part blob download strategy using 16 parallel chunks (100MB-1000MB each), exponential backoff with jitter (n^2 * 10ms, randomized 0.5-1.5x), 30-second stall detection, and maximum 6 retries per chunk.
Description
Ollama downloads model blobs (which can be multiple gigabytes) using a parallel chunked transfer strategy. Each blob is split into up to 16 parts, each sized between 100MB and 1000MB. Each part is downloaded independently with its own retry logic. Failed parts use exponential backoff with jitter to avoid thundering herd problems when many clients retry simultaneously. A stall detector monitors each part and triggers a retry if no progress is made within 30 seconds.
Usage
This heuristic applies during model pulling (`PullModel` and `DownloadBlob` implementations). Understanding the retry strategy is important for diagnosing slow or failed model downloads, especially on unreliable networks or behind corporate proxies.
The Insight (Rule of Thumb)
- Action: Split large blobs into 16 parallel parts (100MB-1000MB each).
- Value: Max 6 retries per part. Backoff formula: `min(n^2 * 10ms, maxBackoff)` randomized by 0.5-1.5x.
- Trade-off: 16 parallel connections maximize throughput on high-bandwidth links but may overwhelm slow connections or restrictive proxies. The 30-second stall timeout balances between tolerance for network hiccups and prompt failure detection.
- Redirect limit: Maximum 10 HTTP redirects when resolving direct download URLs.
- Progress polling: Download progress is checked every 1 second.
Reasoning
Large model files (7B parameters = ~4GB in Q4 format, 70B = ~40GB) require robust download strategies. The 16-part parallel download maximizes bandwidth utilization by overlapping I/O operations. The exponential backoff formula `n^2 * 10ms` provides fast initial retries (10ms, 40ms, 90ms) that ramp up for persistent failures. Jitter (0.5-1.5x randomization) prevents synchronized retry storms when multiple Ollama instances pull the same model.
The 30-second stall timeout is calibrated for real-world network conditions: long enough to tolerate brief network interruptions but short enough to recover from stuck TCP connections.
Parallel part configuration from `server/download.go:100-102`:
const (
numDownloadParts = 16
minDownloadPartSize int64 = 100 * format.MegaByte
maxDownloadPartSize int64 = 1000 * format.MegaByte
)
Max retries from `server/download.go:31`:
const maxRetries = 6
Exponential backoff with jitter from `server/download.go:193-214`:
func (b *blobDownload) backoff(ctx context.Context, n int) error {
d := min(time.Duration(n*n)*10*time.Millisecond, maxBackoff)
d = time.Duration(float64(d) * (rand.Float64() + 0.5))
// ...
}
Stall detection from `server/download.go:374-382`:
if !lastUpdated.IsZero() && time.Since(lastUpdated) > 30*time.Second {
slog.Info(fmt.Sprintf("%s part %d stalled; retrying...",
b.Digest[7:19], part.N))
return errPartStalled
}