Implementation:Triton inference server Server JetsonPeopleDetection

Knowledge Sources	Triton Inference Server Triton In-Process C API
Domains	Edge_Inference, Computer_Vision
Last Updated	2026-02-13 17:00 GMT

Overview

Example application demonstrating in-process Triton C API usage on NVIDIA Jetson for real-time people detection with dynamic batching and concurrent inference.

Description

people_detection.cc is a complete working example that demonstrates the Triton C API (in-process server mode) on Jetson hardware. It reads video frames using OpenCV, preprocesses them on the GPU using CUDA kernels, runs PeopleNet inference through an in-process Triton server with dynamic batching and concurrent execution, and renders bounding box results on the output video. The example showcases GPU-accelerated preprocessing, asynchronous inference, and integration with the Triton shared library.

Usage

Use this as a reference application when building edge inference pipelines on Jetson platforms using the Triton C API directly (without HTTP/gRPC). Demonstrates dynamic batching for throughput optimization on embedded devices.

Code Reference

Source Location

Repository: Triton Inference Server
File: docs/examples/jetson/concurrency_and_dynamic_batching/people_detection.cc
Lines: 1-1158

Signature

// Main detection pipeline
int main(int argc, char** argv);

// Custom response allocator for GPU memory
TRITONSERVER_Error* ResponseAlloc(...);
TRITONSERVER_Error* ResponseRelease(...);

// Inference completion callback
void InferResponseComplete(
    TRITONSERVER_InferenceResponse* response,
    const uint32_t flags, void* userp);

// CUDA preprocessing kernel (declaration)
void preprocess(
    const uint8_t* input, float* output,
    int batch_size, int height, int width,
    int channels, cudaStream_t stream);

Import

// Standalone example - link against libtritonserver.so
#include <tritonserver.h>
#include <opencv2/opencv.hpp>
#include <cuda_runtime_api.h>

I/O Contract

Inputs

Name	Type	Required	Description
model_repository	directory	Yes	Path to Triton model repository with PeopleNet
input_video	file	Yes	Input video file for detection
concurrency	int	No	Number of concurrent inference threads

Outputs

Name	Type	Description
output_video	file	Video with bounding box overlays
stdout	text	FPS and detection statistics

Usage Examples

Run People Detection on Jetson

# Build the example
mkdir build && cd build
cmake .. -DTRITON_ENABLE_GPU=ON
make people_detection

# Run detection
./people_detection \
  --model-repository=/models \
  --input-video=traffic.mp4 \
  --concurrency=4

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment