Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:SeldonIO Seldon core Tracing Latency Tip

From Leeroopedia
Knowledge Sources
Domains Optimization, Observability
Last Updated 2026-02-13 14:00 GMT

Overview

Latency reduction tip: disable or reduce OpenTelemetry tracing ratio in production to avoid unnecessary middleware overhead on every inference request.

Description

Seldon Core 2 includes OpenTelemetry tracing integration for distributed observability. However, the tracing middleware is always applied to the request path even when tracing is effectively disabled via configuration. This is a known issue documented in a TODO comment in the pipeline HTTP server code. Additionally, profiling settings (block rate, mutex rate) add significant performance overhead when enabled.

Usage

Use this heuristic when optimizing inference latency in production deployments. If tracing is not actively needed, set the trace ratio to 0 to minimize overhead. Never enable block or mutex profiling in production.

The Insight (Rule of Thumb)

  • Action 1: Set tracing `ratio` to `0` in Helm values if tracing is not needed.
  • Action 2: Never set `blockRate > 0` or `mutexRate > 0` in production.
  • Action 3: If tracing is needed, use a low ratio (e.g., `0.01` for 1% sampling) instead of `1.0` (100%).
  • Value: Default trace ratio is `1` (100% sampling). Set to `0` for no tracing.
  • Trade-off: Disabling tracing loses distributed request visibility but reduces per-request latency.

Reasoning

Known issue from `scheduler/pkg/kafka/pipeline/httpserver.go:113`:

// TODO we seem to always enforce tracing middleware even if tracing is
// not enabled via configmap?? needless latency

This means even with tracing disabled, the middleware adds overhead to every request.

Profiling performance warnings from `scheduler/cmd/scheduler/main.go:456-459`:

if blockRate > 0 {
    log.Warn("Block rate > 0 - performance will be affected")
}
if mutexRate > 0 {
    log.Warn("Mutex rate > 0 - performance will be affected")
}

Configuration from `docs-gb/getting-started/configuration.md`:

  • Tracing ratio: 0-1 inclusive, fraction of requests to trace
  • Default OpenTelemetry endpoint: `seldon-collector.seldon-mesh:4317`
  • Protocol: gRPC

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment