Implementation:Datajuicer Data juicer VideoTaggingFromAudioMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for generating video tags from audio streams provided by Data-Juicer.
Description
VideoTaggingFromAudioMapper is a mapper operator that generates semantic tags for videos based on their audio streams using the Audio Spectrogram Transformer (AST) model. It extracts audio from each video, resamples to the model's required sampling rate (16kHz), feeds the audio waveform through a HuggingFace AST model (default: MIT/ast-finetuned-audioset-10-10-0.4593), and selects the tag with the highest logit value, storing it in the sample metadata under a configurable field name, with "EMPTY" for videos without valid audio.
Usage
Use when you need audio-based content classification for video datasets, complementing visual tagging approaches and supporting multimodal data annotation workflows.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_tagging_from_audio_mapper.py
Signature
@OPERATORS.register_module("video_tagging_from_audio_mapper")
class VideoTaggingFromAudioMapper(Mapper):
def __init__(self, hf_ast: str = "MIT/ast-finetuned-audioset-10-10-0.4593", trust_remote_code: bool = False, tag_field_name: str = MetaKeys.video_audio_tags, *args, **kwargs):
Import
from data_juicer.ops.mapper.video_tagging_from_audio_mapper import VideoTaggingFromAudioMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hf_ast | str | No | Path to the HuggingFace AST model (default: "MIT/ast-finetuned-audioset-10-10-0.4593") |
| trust_remote_code | bool | No | Whether to trust remote code of HF models (default: False) |
| tag_field_name | str | No | Field name to store the tags (default: "video_audio_tags") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with audio-derived tags in metadata |
Usage Examples
process:
- video_tagging_from_audio_mapper:
hf_ast: "MIT/ast-finetuned-audioset-10-10-0.4593"