Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server JetsonPeopleDetection

From Leeroopedia
Revision as of 13:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_JetsonPeopleDetection.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Edge_Inference, Computer_Vision
Last Updated 2026-02-13 17:00 GMT

Overview

Example application demonstrating in-process Triton C API usage on NVIDIA Jetson for real-time people detection with dynamic batching and concurrent inference.

Description

people_detection.cc is a complete working example that demonstrates the Triton C API (in-process server mode) on Jetson hardware. It reads video frames using OpenCV, preprocesses them on the GPU using CUDA kernels, runs PeopleNet inference through an in-process Triton server with dynamic batching and concurrent execution, and renders bounding box results on the output video. The example showcases GPU-accelerated preprocessing, asynchronous inference, and integration with the Triton shared library.

Usage

Use this as a reference application when building edge inference pipelines on Jetson platforms using the Triton C API directly (without HTTP/gRPC). Demonstrates dynamic batching for throughput optimization on embedded devices.

Code Reference

Source Location

Signature

// Main detection pipeline
int main(int argc, char** argv);

// Custom response allocator for GPU memory
TRITONSERVER_Error* ResponseAlloc(...);
TRITONSERVER_Error* ResponseRelease(...);

// Inference completion callback
void InferResponseComplete(
    TRITONSERVER_InferenceResponse* response,
    const uint32_t flags, void* userp);

// CUDA preprocessing kernel (declaration)
void preprocess(
    const uint8_t* input, float* output,
    int batch_size, int height, int width,
    int channels, cudaStream_t stream);

Import

// Standalone example - link against libtritonserver.so
#include <tritonserver.h>
#include <opencv2/opencv.hpp>
#include <cuda_runtime_api.h>

I/O Contract

Inputs

Name Type Required Description
model_repository directory Yes Path to Triton model repository with PeopleNet
input_video file Yes Input video file for detection
concurrency int No Number of concurrent inference threads

Outputs

Name Type Description
output_video file Video with bounding box overlays
stdout text FPS and detection statistics

Usage Examples

Run People Detection on Jetson

# Build the example
mkdir build && cd build
cmake .. -DTRITON_ENABLE_GPU=ON
make people_detection

# Run detection
./people_detection \
  --model-repository=/models \
  --input-video=traffic.mp4 \
  --concurrency=4

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment