Principle:Huggingface Optimum Dummy Input Generation

Field	Value
Page Type	Principle
Source Repository	https://github.com/huggingface/optimum
Domains	NLP, Computer_Vision, Export
Last Updated	2026-02-15 00:00 GMT

Overview

Dummy Input Generation is the technique for generating synthetic model inputs with correct shapes and types to enable symbolic tracing and export validation. It produces concrete tensor inputs that allow the export system to trace the model's computation graph without requiring real data.

Description

Model export via ONNX or TFLite requires tracing the computation graph with concrete tensor inputs. This is because these export formats work by recording the operations performed on actual tensors, capturing the graph structure. Dummy input generators create properly-shaped tensors for each model architecture without needing real data.

The system generates inputs such as:

Text inputs -- input_ids, attention_mask, token_type_ids (integer tensors with appropriate vocabulary range)
Vision inputs -- pixel_values (float tensors with correct height, width, and channel dimensions)
Audio inputs -- input_features, input_values (float tensors matching feature extraction output shapes)
Past key-values -- past_key_values (nested float tensors matching the model's hidden state dimensions)
Decoder inputs -- decoder_input_ids, encoder_outputs for sequence-to-sequence models
Specialized inputs -- Bounding boxes, point coordinates, timesteps, and other architecture-specific inputs

The system uses a registry of architecture-specific generators that read shape information from the model's normalized config. Default shapes are defined in DEFAULT_DUMMY_SHAPES:

Shape Parameter	Default Value	Description
`batch_size`	2	Number of samples in the batch
`sequence_length`	16	Length of text sequences
`num_choices`	4	Number of choices for multiple-choice tasks
`width`	64	Image width in pixels
`height`	64	Image height in pixels
`num_channels`	3	Number of image channels
`feature_size`	80	Number of audio features (e.g., MEL bins)
`nb_max_frames`	3000	Number of audio frames
`audio_sequence_length`	16000	Raw audio sequence length

Usage

Use Dummy Input Generation when exporting models to traced formats that require concrete input tensors for graph construction. It is invoked automatically during the export process via the ExporterConfig.generate_dummy_inputs method.

Typical scenarios:

ONNX export via torch.onnx.export, which requires example inputs for tracing
TFLite export, which requires concrete tensors for graph freezing
Export validation, where dummy inputs are fed through both the original and exported models to compare outputs

Users can override default shapes by passing keyword arguments to generate_dummy_inputs (e.g., custom batch size, sequence length).

Theoretical Basis

Dummy Input Generation uses the Template Method Pattern with architecture-specific subclasses. The architecture consists of:

DummyInputGenerator (ABC) -- Defines the interface with two key methods:
- supports_input(input_name) -- Checks whether this generator can produce the named input
- generate(input_name, framework, int_dtype, float_dtype) -- Produces a concrete tensor for the named input
- Each subclass declares a SUPPORTED_INPUT_NAMES tuple listing the input names it handles

Modality-specific subclasses -- Each implements generation logic for its modality:
- DummyTextInputGenerator -- Generates input_ids, attention_mask, token_type_ids
- DummyVisionInputGenerator -- Generates pixel_values, pixel_mask
- DummyAudioInputGenerator -- Generates input_features, input_values
- DummyTimestepInputGenerator -- Generates timestep inputs for diffusion models
- DummyBboxInputGenerator -- Generates bounding box coordinates
- And 20+ additional specialized subclasses

ExporterConfig orchestration -- The DUMMY_INPUT_GENERATOR_CLASSES tuple on each ExporterConfig subclass lists which generators to use. The generate_dummy_inputs method iterates over declared input names, finds the first generator that supports each input, and calls generate.

This design ensures that:

New input modalities can be added by creating new generator subclasses
Each model architecture declares exactly which generators it needs
Shape information is read from the model's normalized config, adapting to each model's dimensions

Related Pages

Connections

Implementation:Huggingface_Optimum_ExporterConfig_Generate_Dummy_Inputs

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment