Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:SeldonIO Seldon core Model Load Timeout Tip

From Leeroopedia
Knowledge Sources
Domains Operations, Debugging
Last Updated 2026-02-13 14:00 GMT

Overview

Model loading timeout and retry configuration guide: default 5 retries with 120-minute total timeout for loading, but only 1 retry with 15-minute timeout for unloading.

Description

Seldon Core 2's agent has asymmetric retry and timeout configurations for model load vs unload operations. Loading has generous retries (5) and a long timeout (120 minutes) to accommodate large models downloaded from remote storage. Unloading has minimal retries (1) and a short timeout (15 minutes) with only a 2-second grace period. Understanding these defaults is critical for debugging model deployment failures and for tuning timeouts for large models.

Usage

Use this heuristic when diagnosing model load failures, deploying very large models (>10GB), or tuning agent configuration for your infrastructure. Increase load timeouts for models on slow storage; increase unload retries if models get stuck during cleanup.

The Insight (Rule of Thumb)

  • Action: Adjust agent timeout environment variables based on your model sizes and storage speed:
    • `SELDON_MAX_LOAD_RETRY_COUNT`: Number of load retries (default: 5)
    • `SELDON_MAX_LOAD_ELAPSED_TIME_MINUTES`: Total load timeout including retries (default: 120)
    • `SELDON_MAX_UNLOAD_RETRY_COUNT`: Number of unload retries (default: 1)
    • `SELDON_MAX_UNLOAD_ELAPSED_TIME_MINUTES`: Total unload timeout (default: 15)
    • `SELDON_UNLOAD_GRACE_PERIOD_SECONDS`: Grace period before forced unload (default: 2)
  • Value: For large models (>10GB) on slow storage, increase `SELDON_MAX_LOAD_ELAPSED_TIME_MINUTES` to 240+.
  • Trade-off: Longer timeouts delay failure detection; shorter timeouts may cause premature failures for large models.
  • Sub-service readiness:
    • `SELDON_MAX_TIME_READY_SUB_SERVICE_AFTER_START_SECONDS`: Grace period after pod start (default: 30)
    • `SELDON_MAX_ELAPSED_TIME_READY_SUB_SERVICE_BEFORE_START_MINUTES`: Wait before giving up on services (default: 15)

Reasoning

From `scheduler/cmd/agent/cli/cli.go` default values:

SELDON_OVERCOMMIT_PERCENTAGE = 10
SELDON_MODEL_INFERENCE_LAG_THRESHOLD = 30
SELDON_MODEL_INACTIVE_SECONDS_THRESHOLD = 30
SELDON_SCALING_STATS_PERIOD_SECONDS = 5
SELDON_MAX_TIME_READY_SUB_SERVICE_AFTER_START_SECONDS = 30
SELDON_MAX_ELAPSED_TIME_READY_SUB_SERVICE_BEFORE_START_MINUTES = 15
SELDON_MAX_LOAD_ELAPSED_TIME_MINUTES = 120
SELDON_MAX_UNLOAD_ELAPSED_TIME_MINUTES = 15
SELDON_MAX_LOAD_RETRY_COUNT = 5
SELDON_MAX_UNLOAD_RETRY_COUNT = 1
SELDON_UNLOAD_GRACE_PERIOD_SECONDS = 2

Key observation: The asymmetry between load (5 retries, 120 min) and unload (1 retry, 15 min) reflects the operational reality that:

  • Loading involves downloading artifacts from remote storage (slow, failure-prone)
  • Unloading is a local operation (fast, rarely fails)
  • A failed unload that retries many times could block new model loads

Operator guidance:

  • For slow storage (S3, GCS over WAN): increase load timeout and retries
  • For Triton with large TensorFlow/PyTorch models: increase sub-service readiness timeout (Triton startup can be slow)
  • For MLServer with many models: increase sub-service readiness to allow Python model initialization
  • The 2-second unload grace period is very short - increase for models with cleanup requirements

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment