Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Ollama Ollama Download Retry Strategy

From Leeroopedia
Knowledge Sources
Domains Networking, Optimization, Infrastructure
Last Updated 2026-02-14 22:00 GMT

Overview

Resilient multi-part blob download strategy using 16 parallel chunks (100MB-1000MB each), exponential backoff with jitter (n^2 * 10ms, randomized 0.5-1.5x), 30-second stall detection, and maximum 6 retries per chunk.

Description

Ollama downloads model blobs (which can be multiple gigabytes) using a parallel chunked transfer strategy. Each blob is split into up to 16 parts, each sized between 100MB and 1000MB. Each part is downloaded independently with its own retry logic. Failed parts use exponential backoff with jitter to avoid thundering herd problems when many clients retry simultaneously. A stall detector monitors each part and triggers a retry if no progress is made within 30 seconds.

Usage

This heuristic applies during model pulling (`PullModel` and `DownloadBlob` implementations). Understanding the retry strategy is important for diagnosing slow or failed model downloads, especially on unreliable networks or behind corporate proxies.

The Insight (Rule of Thumb)

  • Action: Split large blobs into 16 parallel parts (100MB-1000MB each).
  • Value: Max 6 retries per part. Backoff formula: `min(n^2 * 10ms, maxBackoff)` randomized by 0.5-1.5x.
  • Trade-off: 16 parallel connections maximize throughput on high-bandwidth links but may overwhelm slow connections or restrictive proxies. The 30-second stall timeout balances between tolerance for network hiccups and prompt failure detection.
  • Redirect limit: Maximum 10 HTTP redirects when resolving direct download URLs.
  • Progress polling: Download progress is checked every 1 second.

Reasoning

Large model files (7B parameters = ~4GB in Q4 format, 70B = ~40GB) require robust download strategies. The 16-part parallel download maximizes bandwidth utilization by overlapping I/O operations. The exponential backoff formula `n^2 * 10ms` provides fast initial retries (10ms, 40ms, 90ms) that ramp up for persistent failures. Jitter (0.5-1.5x randomization) prevents synchronized retry storms when multiple Ollama instances pull the same model.

The 30-second stall timeout is calibrated for real-world network conditions: long enough to tolerate brief network interruptions but short enough to recover from stuck TCP connections.

Parallel part configuration from `server/download.go:100-102`:

const (
    numDownloadParts          = 16
    minDownloadPartSize int64 = 100 * format.MegaByte
    maxDownloadPartSize int64 = 1000 * format.MegaByte
)

Max retries from `server/download.go:31`:

const maxRetries = 6

Exponential backoff with jitter from `server/download.go:193-214`:

func (b *blobDownload) backoff(ctx context.Context, n int) error {
    d := min(time.Duration(n*n)*10*time.Millisecond, maxBackoff)
    d = time.Duration(float64(d) * (rand.Float64() + 0.5))
    // ...
}

Stall detection from `server/download.go:374-382`:

if !lastUpdated.IsZero() && time.Since(lastUpdated) > 30*time.Second {
    slog.Info(fmt.Sprintf("%s part %d stalled; retrying...",
        b.Digest[7:19], part.N))
    return errPartStalled
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment