Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoTaggingFromAudioMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for generating video tags from audio streams provided by Data-Juicer.

Description

VideoTaggingFromAudioMapper is a mapper operator that generates semantic tags for videos based on their audio streams using the Audio Spectrogram Transformer (AST) model. It extracts audio from each video, resamples to the model's required sampling rate (16kHz), feeds the audio waveform through a HuggingFace AST model (default: MIT/ast-finetuned-audioset-10-10-0.4593), and selects the tag with the highest logit value, storing it in the sample metadata under a configurable field name, with "EMPTY" for videos without valid audio.

Usage

Use when you need audio-based content classification for video datasets, complementing visual tagging approaches and supporting multimodal data annotation workflows.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_tagging_from_audio_mapper")
class VideoTaggingFromAudioMapper(Mapper):
    def __init__(self, hf_ast: str = "MIT/ast-finetuned-audioset-10-10-0.4593", trust_remote_code: bool = False, tag_field_name: str = MetaKeys.video_audio_tags, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_tagging_from_audio_mapper import VideoTaggingFromAudioMapper

I/O Contract

Inputs

Name Type Required Description
hf_ast str No Path to the HuggingFace AST model (default: "MIT/ast-finetuned-audioset-10-10-0.4593")
trust_remote_code bool No Whether to trust remote code of HF models (default: False)
tag_field_name str No Field name to store the tags (default: "video_audio_tags")

Outputs

Name Type Description
samples Dict Transformed samples with audio-derived tags in metadata

Usage Examples

process:
  - video_tagging_from_audio_mapper:
      hf_ast: "MIT/ast-finetuned-audioset-10-10-0.4593"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment