Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Tracer

From Leeroopedia
Knowledge Sources
Domains Observability, Tracing
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete tool for inference request tracing with support for both Triton-native file-based tracing and OpenTelemetry distributed tracing backends.

Description

The TraceManager class provides the complete tracing infrastructure for Triton. It supports two trace modes: Triton-native (writing JSON trace events to files) and OpenTelemetry (using OTLP HTTP exporters with batch span processors). The manager handles per-model and global trace configurations with runtime updates, trace sampling via rate/count controls, and OpenTelemetry context propagation across request spans using an AbstractCarrier interface. Thread safety is achieved through reader-writer mutex patterns.

Usage

Activated by Triton's command-line trace options (--trace-config, --trace-file, etc.) or runtime trace API. Used when operators need performance analysis, latency profiling, or distributed tracing integration.

Code Reference

Source Location

Signature

class TraceManager {
 public:
  // Trace modes
  enum TraceMode { TRITON, OPENTELEMETRY };

  struct TraceSetting {
    TraceMode mode_;
    std::string trace_file_;
    TRITONSERVER_InferenceTraceLevel level_;
    uint32_t rate_;
    int32_t count_;
    int32_t log_frequency_;
  };

  TraceManager(const TraceSetting& global_setting);
  ~TraceManager();

  // Trace lifecycle
  TRITONSERVER_InferenceTrace* SampleTrace(const std::string& model_name);
  void UpdateTraceSetting(
      const std::string& model_name, const TraceSetting& setting);

  // OpenTelemetry context propagation
  class Trace {
   public:
    void StartSpan(const std::string& name);
    void EndSpan();
  };

  class AbstractCarrier {
   public:
    virtual std::string Get(const std::string& key) = 0;
    virtual void Set(const std::string& key, const std::string& value) = 0;
  };

  static void TraceActivity(
      TRITONSERVER_InferenceTrace* trace,
      TRITONSERVER_InferenceTraceActivity activity,
      uint64_t timestamp_ns, void* userp);
};

Import

#include "tracer.h"

I/O Contract

Inputs

Name Type Required Description
global_setting TraceSetting Yes Global trace configuration
model_name string No Model-specific trace setting override
rate uint32_t No Sample every N-th request (0 = all)
count int32_t No Max number of traces to collect (-1 = unlimited)

Outputs

Name Type Description
trace_file JSON file Triton-native trace events (timestamps, activities)
OTLP spans HTTP OpenTelemetry spans exported via OTLP

Usage Examples

Enable Triton-Native Tracing

tritonserver \
  --model-repository=/models \
  --trace-config triton,file=/tmp/trace.json \
  --trace-config rate=100 \
  --trace-config level=TIMESTAMPS

Enable OpenTelemetry Tracing

tritonserver \
  --model-repository=/models \
  --trace-config opentelemetry,url=http://localhost:4318/v1/traces \
  --trace-config opentelemetry,resource=service.name=triton-server \
  --trace-config level=TIMESTAMPS

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment