Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Vespa engine Vespa Config Polling Timeout Tuning

From Leeroopedia




Knowledge Sources
Domains Configuration, Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Config subscription timeout tuning with conservative 55-second defaults, tiered error timeouts (15s initial, 25s error, 600s success), and exponential backoff capped at 6x per connection.

Description

The Vespa config subscription system uses carefully tuned timing values to balance responsiveness and resilience. The default subscription timeout of 55 seconds is conservative, allowing config servers time for generation before timeout. A tiered timeout strategy uses shorter timeouts for initial connections (15s) and error recovery (25s), with long timeouts (600s) when the system is stable. Per-connection exponential backoff is capped at 6x the base delay to prevent excessive suspension, and warnings are throttled to once per 10 seconds to avoid log spam.

Usage

Apply this heuristic when troubleshooting config subscription timeouts, tuning connection failover behavior, or designing similar distributed configuration systems. Understanding the timing hierarchy is critical for diagnosing "slow config" issues.

The Insight (Rule of Thumb)

  • Default Timeouts:
    • successTimeout: 600s (10 minutes) when config was recently successful.
    • errorTimeout: 25s after errors.
    • initialTimeout: 15s for first config request (fail fast).
    • subscribeTimeout: 55s for subscription operations.
  • Delay Strategy:
    • successDelay: 250ms (rapid polling when changes detected).
    • fixedDelay: 5s base delay between requests.
    • unconfiguredDelay: 1s when client just started (no config yet).
    • configuredErrorDelay: 15s after error on previously configured component.
  • Backoff:
    • Global maxDelayMultiplier: 10 (max 5s x 10 = 50s).
    • Per-connection MAX_DELAY_MULTIPLIER: 6 (max 60s x 6 = 360s).
    • Both transient (connection timeout) and fatal (bad config) use same backoff.
  • Warning Throttle: WARN_INTERVAL: 10s (log suspension warnings at most once per 10s per connection).
  • Trade-off: Conservative timeouts sacrifice fast failure detection for resilience against temporary network issues.

Reasoning

The 55-second subscription timeout is set below the typical TCP timeout (60s) but above typical server-side generation time. The 250ms success delay enables rapid sequential requests when config changes are detected, critical for propagating updates across a cluster. The tiered timeout approach (15s -> 25s -> 600s) ensures fast failure for initial connections while being patient with established ones. The per-connection backoff multiplier of 6 (vs global 10) limits individual connection suspension to 360 seconds maximum, preventing permanent disconnection.

Code Evidence

Timing values from timingvalues.cpp:

TimingValues::TimingValues()
    : successTimeout(600s),
      errorTimeout(25s),
      initialTimeout(15s),
      subscribeTimeout(55s),
      fixedDelay(5s),
      successDelay(250ms),
      unconfiguredDelay(1s),
      configuredErrorDelay(15s),
      maxDelayMultiplier(10),
      transientDelay(60s),
      fatalDelay(60s)

Per-connection backoff from frtconnection.cpp:

constexpr uint32_t MAX_DELAY_MULTIPLIER = 6u;
constexpr vespalib::duration WARN_INTERVAL = 10s;

void FRTConnection::calculateSuspension(ErrorType type) {
    switch(type) {
    case TRANSIENT:
        delay = std::min(MAX_DELAY_MULTIPLIER, ++_transientFailures) * _transientDelay;
        break;
    case FATAL:
        delay = std::min(MAX_DELAY_MULTIPLIER, ++_fatalFailures) * _fatalDelay;
        break;
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment