Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI C API Legacy

From Leeroopedia


Knowledge Sources
Domains Data_Pipeline, C_API
Last Updated 2026-02-08 16:00 GMT

Overview

The legacy DALI C API (v1) provides a C-callable interface for creating, configuring, running, and managing NVIDIA DALI data processing pipelines from non-Python environments.

⚠️ DEPRECATION WARNING: Several functions in this API are deprecated. See Heuristic:NVIDIA_DALI_Warning_Deprecated_C_API_V1_Functions for migration guidance. For new integrations, prefer C API v2 (dali/dali.h).

Description

This file implements the original (v1) C API for the NVIDIA DALI (Data Loading Library) framework. It provides a comprehensive set of C functions that wrap the C++ DALI Pipeline class, enabling pipeline lifecycle management (creation, serialization, deserialization, deletion), external input feeding (both contiguous and per-tensor allocations), output retrieval and copying, operator metadata and trace inspection, checkpointing and restoration, memory management, and plugin loading.

The implementation centers around the DALIPipeline struct, which aggregates a dali::Pipeline instance along with auxiliary objects such as batch size maps, data ID maps, a workspace, and a CUDA copy stream. The C API functions operate through opaque daliPipelineHandle pointers that reference these aggregated objects. Template helper functions are used internally to dispatch operations to the appropriate CPU or GPU backend.

The API supports multiple pipeline creation variants (daliCreatePipeline, daliCreatePipeline2, daliCreatePipeline3) with increasing levels of configuration control, including executor type flags. It also provides memory preallocations, checkpointing for fault-tolerant training, and deprecated compatibility shims for older function signatures.

Usage

Use this API when integrating DALI pipelines into C or C++ applications, or when building language bindings (e.g., for TensorFlow, PyTorch, or Triton Inference Server plugins). Typical scenarios include: deserializing a Python-defined pipeline for inference serving, feeding external data into pipeline inputs from a custom data source, copying pipeline outputs to framework-specific tensor buffers, and implementing checkpointing in distributed training systems.

Code Reference

Source Location

Signature

void daliInitialize();
void daliCreatePipeline(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
                        int length, int max_batch_size, int num_threads, int device_id,
                        int separated_execution, int prefetch_queue_depth,
                        int cpu_prefetch_queue_depth, int gpu_prefetch_queue_depth,
                        int enable_memory_stats);
void daliCreatePipeline2(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
                         int length, int max_batch_size, int num_threads, int device_id,
                         int pipelined_execution, int async_execution, int separated_execution,
                         int prefetch_queue_depth, int cpu_prefetch_queue_depth,
                         int gpu_prefetch_queue_depth, int enable_memory_stats);
void daliCreatePipeline3(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
                         int length, int max_batch_size, int num_threads, int device_id,
                         dali_exec_flags_t exec_flags, int prefetch_queue_depth,
                         int cpu_prefetch_queue_depth, int gpu_prefetch_queue_depth,
                         int enable_memory_stats);
void daliDeserializeDefault(daliPipelineHandle *pipe_handle, const char *serialized_pipeline,
                            int length);
void daliRun(daliPipelineHandle_t pipe_handle);
void daliOutput(daliPipelineHandle_t pipe_handle);
void daliShareOutput(daliPipelineHandle_t pipe_handle);
void daliOutputRelease(daliPipelineHandle_t pipe_handle);
void daliOutputCopy(daliPipelineHandle_t pipe_handle, void *dst, int output_idx,
                    device_type_t dst_type, cudaStream_t stream, unsigned int flags);
void daliDeletePipeline(daliPipelineHandle_t pipe_handle);
void daliSetExternalInput(daliPipelineHandle_t pipe_handle, const char *name, device_type_t device,
                          const void *data_ptr, dali_data_type_t data_type, const int64_t *shapes,
                          int sample_dim, const char *layout_str, unsigned int flags);

Import

#include "dali/c_api.h"

I/O Contract

Inputs

Name Type Required Description
serialized_pipeline const char * Yes Protobuf-serialized DALI pipeline definition
length int Yes Length in bytes of the serialized pipeline
max_batch_size int Yes Maximum number of samples per batch
num_threads int Yes Number of CPU threads for pipeline execution
device_id int Yes CUDA device ID (-1 for CPU-only)
data_ptr const void * Yes Pointer to external input data (CPU or GPU memory)
shapes const int64_t * Yes Flat array of sample shapes
sample_dim int Yes Number of dimensions per sample

Outputs

Name Type Description
pipe_handle daliPipelineHandle * Opaque handle to the created pipeline
dst void * Destination buffer for output data copy
shape int64_t * 0-terminated shape array (caller must free with daliFree)

Usage Examples

Create and Run a Pipeline

#include "dali/c_api.h"

// Initialize DALI
daliInitialize();

// Deserialize and create a pipeline from protobuf
daliPipelineHandle handle;
daliCreatePipeline(&handle, serialized_data, serialized_len,
                   batch_size, num_threads, device_id,
                   /*separated=*/0, prefetch_depth,
                   prefetch_depth, prefetch_depth,
                   /*enable_memory_stats=*/0);

// Prefetch initial batches
daliPrefetch(&handle);

// Get outputs
daliOutput(&handle);

// Query output info
unsigned num_outputs = daliGetNumOutput(&handle);
size_t num_tensors = daliNumTensors(&handle, 0);

// Copy output to user buffer
size_t total_size = daliTensorSize(&handle, 0);
void *output_buf = malloc(total_size);
daliOutputCopy(&handle, output_buf, 0, CPU, 0, DALI_ext_default);

// Cleanup
free(output_buf);
daliDeletePipeline(&handle);

Feed External Input

#include "dali/c_api.h"

// Set batch size for the input operator
daliSetExternalInputBatchSize(&handle, "input_name", current_batch_size);

// Feed contiguous data
int64_t shapes[] = {480, 640, 3, 600, 800, 3};
daliSetExternalInputAsync(&handle, "input_name", CPU,
                          data_ptr, DALI_UINT8, shapes,
                          /*sample_dim=*/3, "HWC",
                          cuda_stream, DALI_ext_default);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment