Implementation:NVIDIA DALI C API Legacy
| Knowledge Sources | |
|---|---|
| Domains | Data_Pipeline, C_API |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
The legacy DALI C API (v1) provides a C-callable interface for creating, configuring, running, and managing NVIDIA DALI data processing pipelines from non-Python environments.
⚠️ DEPRECATION WARNING: Several functions in this API are deprecated. See Heuristic:NVIDIA_DALI_Warning_Deprecated_C_API_V1_Functions for migration guidance. For new integrations, prefer C API v2 (dali/dali.h).
Description
This file implements the original (v1) C API for the NVIDIA DALI (Data Loading Library) framework. It provides a comprehensive set of C functions that wrap the C++ DALI Pipeline class, enabling pipeline lifecycle management (creation, serialization, deserialization, deletion), external input feeding (both contiguous and per-tensor allocations), output retrieval and copying, operator metadata and trace inspection, checkpointing and restoration, memory management, and plugin loading.
The implementation centers around the DALIPipeline struct, which aggregates a dali::Pipeline instance along with auxiliary objects such as batch size maps, data ID maps, a workspace, and a CUDA copy stream. The C API functions operate through opaque daliPipelineHandle pointers that reference these aggregated objects. Template helper functions are used internally to dispatch operations to the appropriate CPU or GPU backend.
The API supports multiple pipeline creation variants (daliCreatePipeline, daliCreatePipeline2, daliCreatePipeline3) with increasing levels of configuration control, including executor type flags. It also provides memory preallocations, checkpointing for fault-tolerant training, and deprecated compatibility shims for older function signatures.
Usage
Use this API when integrating DALI pipelines into C or C++ applications, or when building language bindings (e.g., for TensorFlow, PyTorch, or Triton Inference Server plugins). Typical scenarios include: deserializing a Python-defined pipeline for inference serving, feeding external data into pipeline inputs from a custom data source, copying pipeline outputs to framework-specific tensor buffers, and implementing checkpointing in distributed training systems.
Code Reference
Source Location
- Repository: NVIDIA_DALI
- File: dali/c_api/c_api.cc
- Lines: 1-908
Signature
void daliInitialize();
void daliCreatePipeline(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
int length, int max_batch_size, int num_threads, int device_id,
int separated_execution, int prefetch_queue_depth,
int cpu_prefetch_queue_depth, int gpu_prefetch_queue_depth,
int enable_memory_stats);
void daliCreatePipeline2(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
int length, int max_batch_size, int num_threads, int device_id,
int pipelined_execution, int async_execution, int separated_execution,
int prefetch_queue_depth, int cpu_prefetch_queue_depth,
int gpu_prefetch_queue_depth, int enable_memory_stats);
void daliCreatePipeline3(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
int length, int max_batch_size, int num_threads, int device_id,
dali_exec_flags_t exec_flags, int prefetch_queue_depth,
int cpu_prefetch_queue_depth, int gpu_prefetch_queue_depth,
int enable_memory_stats);
void daliDeserializeDefault(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
int length);
void daliRun(daliPipelineHandle_t pipe_handle);
void daliOutput(daliPipelineHandle_t pipe_handle);
void daliShareOutput(daliPipelineHandle_t pipe_handle);
void daliOutputRelease(daliPipelineHandle_t pipe_handle);
void daliOutputCopy(daliPipelineHandle_t pipe_handle, void *dst, int output_idx,
device_type_t dst_type, cudaStream_t stream, unsigned int flags);
void daliDeletePipeline(daliPipelineHandle_t pipe_handle);
void daliSetExternalInput(daliPipelineHandle_t pipe_handle, const char *name, device_type_t device,
const void *data_ptr, dali_data_type_t data_type, const int64_t *shapes,
int sample_dim, const char *layout_str, unsigned int flags);
Import
#include "dali/c_api.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| serialized_pipeline | const char * | Yes | Protobuf-serialized DALI pipeline definition |
| length | int | Yes | Length in bytes of the serialized pipeline |
| max_batch_size | int | Yes | Maximum number of samples per batch |
| num_threads | int | Yes | Number of CPU threads for pipeline execution |
| device_id | int | Yes | CUDA device ID (-1 for CPU-only) |
| data_ptr | const void * | Yes | Pointer to external input data (CPU or GPU memory) |
| shapes | const int64_t * | Yes | Flat array of sample shapes |
| sample_dim | int | Yes | Number of dimensions per sample |
Outputs
| Name | Type | Description |
|---|---|---|
| pipe_handle | daliPipelineHandle * | Opaque handle to the created pipeline |
| dst | void * | Destination buffer for output data copy |
| shape | int64_t * | 0-terminated shape array (caller must free with daliFree) |
Usage Examples
Create and Run a Pipeline
#include "dali/c_api.h"
// Initialize DALI
daliInitialize();
// Deserialize and create a pipeline from protobuf
daliPipelineHandle handle;
daliCreatePipeline(&handle, serialized_data, serialized_len,
batch_size, num_threads, device_id,
/*separated=*/0, prefetch_depth,
prefetch_depth, prefetch_depth,
/*enable_memory_stats=*/0);
// Prefetch initial batches
daliPrefetch(&handle);
// Get outputs
daliOutput(&handle);
// Query output info
unsigned num_outputs = daliGetNumOutput(&handle);
size_t num_tensors = daliNumTensors(&handle, 0);
// Copy output to user buffer
size_t total_size = daliTensorSize(&handle, 0);
void *output_buf = malloc(total_size);
daliOutputCopy(&handle, output_buf, 0, CPU, 0, DALI_ext_default);
// Cleanup
free(output_buf);
daliDeletePipeline(&handle);
Feed External Input
#include "dali/c_api.h"
// Set batch size for the input operator
daliSetExternalInputBatchSize(&handle, "input_name", current_batch_size);
// Feed contiguous data
int64_t shapes[] = {480, 640, 3, 600, 800, 3};
daliSetExternalInputAsync(&handle, "input_name", CPU,
data_ptr, DALI_UINT8, shapes,
/*sample_dim=*/3, "HWC",
cuda_stream, DALI_ext_default);