Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Optimum Save Utils

From Leeroopedia
Knowledge Sources
Domains Serialization, Preprocessing
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for loading and saving model preprocessors (tokenizers, processors, feature extractors, image processors) alongside optimized models provided by the Huggingface Optimum library.

Description

This module provides two functions for managing preprocessor persistence:

  • maybe_load_preprocessors — Attempts to load all available preprocessors (AutoTokenizer, AutoProcessor, AutoFeatureExtractor, AutoImageProcessor) from a model directory or Hub repo. Each load attempt is wrapped in a try/except so that missing preprocessor types are silently skipped.
  • maybe_save_preprocessors — Loads all available preprocessors from a source location and saves them to a destination directory. This ensures that when exporting or optimizing a model, the associated preprocessors are preserved alongside the model files.

Usage

Use maybe_save_preprocessors during model export or optimization to ensure all associated preprocessors are saved alongside the exported model. This is called internally by model export pipelines.

Code Reference

Source Location

Signature

def maybe_load_preprocessors(
    src_name_or_path: Union[str, Path],
    subfolder: str = "",
    trust_remote_code: bool = False,
) -> List:
    """Attempt to load all available preprocessors from a model source."""

def maybe_save_preprocessors(
    src_name_or_path: Union[str, Path],
    dest_dir: Union[str, Path],
    src_subfolder: str = "",
    trust_remote_code: bool = False,
):
    """Load preprocessors from source and save them to dest_dir.

    Args:
        src_name_or_path: Model name or path to load preprocessors from.
        dest_dir: Directory to save preprocessors to.
        src_subfolder: Subfolder within the model directory.
        trust_remote_code: Whether to allow running arbitrary code.
    """

Import

from optimum.utils.save_utils import maybe_save_preprocessors, maybe_load_preprocessors

I/O Contract

Inputs (maybe_save_preprocessors)

Name Type Required Description
src_name_or_path Union[str, Path] Yes Model name on Hub or local path to load preprocessors from
dest_dir Union[str, Path] Yes Directory to save preprocessors to
src_subfolder str No Subfolder within the model directory (default: "")
trust_remote_code bool No Allow running arbitrary code from preprocessors (default: False)

Outputs (maybe_save_preprocessors)

Name Type Description
(none) None Preprocessors are saved as files in dest_dir

Outputs (maybe_load_preprocessors)

Name Type Description
preprocessors List List of successfully loaded preprocessor instances

Usage Examples

Saving Preprocessors During Export

from optimum.utils.save_utils import maybe_save_preprocessors

# Save all preprocessors from a Hub model to a local directory
maybe_save_preprocessors(
    src_name_or_path="bert-base-uncased",
    dest_dir="./exported_model",
)
# The ./exported_model directory now contains tokenizer files

Loading Available Preprocessors

from optimum.utils.save_utils import maybe_load_preprocessors

preprocessors = maybe_load_preprocessors("bert-base-uncased")
for p in preprocessors:
    print(type(p).__name__)
# BertTokenizerFast
# BertTokenizerFast  (from AutoProcessor)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment