Implementation:Zai org CogVideo Caption File Output

Attribute	Value
Implementation Name	Caption File Output
Workflow	Video Captioning
Step	5 of 5
Type	Pattern Doc
Source File	`tools/caption/video_caption.py:L103-112`
Repository	zai-org/CogVideo
Last Updated	2026-02-10 00:00 GMT

Overview

Implementation of caption text output for the video captioning pipeline. This pattern document describes how to save generated captions to files compatible with the CogVideoX fine-tuning dataset format.

Description

The caption file output pattern:

Calls predict() to generate the caption text for a video
Writes the caption to a file using standard Python I/O
The output format matches the fine-tuning dataset's caption_column expectations

The implementation uses a simple append-mode file write to build up a prompts file across multiple videos. Each caption is written on a separate line.

Usage

from tools.caption.video_caption import predict

with open("video.mp4", "rb") as f:
    video_data = f.read()

response = predict("Please describe this video in detail.", video_data, 0.1)

with open("prompts.txt", "a") as f:
    f.write(response + "\n")

Code Reference

Source Location

File	Lines	Description
`tools/caption/video_caption.py`	L103-112	Caption output section

Signature

# Pattern: Save caption text to file
# The output format should match the dataset's caption_column expectations
response = predict(prompt, video_data, temperature)
# Save to file (user implements this pattern):
with open("prompts.txt", "a") as f:
    f.write(response + "\n")

Import

# Standard Python I/O - no additional imports needed

I/O Contract

Inputs

Parameter	Type	Default	Description
`response`	`str`	Required	Generated caption text from `predict()`
Output file path	`str`	`"prompts.txt"`	Path to the output caption file

Outputs

Output	Type	Description
Side effect	Text file	Caption text appended to the output file, one caption per line

Usage Examples

Example 1: Single video caption to file

from tools.caption.video_caption import predict

with open("my_video.mp4", "rb") as f:
    video_data = f.read()

caption = predict(
    "Please describe this video in detail.",
    video_data,
    0.1
)

with open("prompts.txt", "w") as f:
    f.write(caption + "\n")

Example 2: Batch captioning with aggregated output

import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"
output_file = "prompts.txt"

with open(output_file, "w") as out_f:
    for filename in sorted(os.listdir(video_dir)):
        if filename.endswith(".mp4"):
            video_path = os.path.join(video_dir, filename)
            with open(video_path, "rb") as f:
                video_data = f.read()

            caption = predict(
                "Please describe this video in detail.",
                video_data,
                0.1
            )
            out_f.write(caption + "\n")
            print(f"Captioned: {filename}")

Example 3: Per-video caption files

import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"
caption_dir = "/data/captions/"
os.makedirs(caption_dir, exist_ok=True)

for filename in sorted(os.listdir(video_dir)):
    if filename.endswith(".mp4"):
        video_path = os.path.join(video_dir, filename)
        caption_path = os.path.join(
            caption_dir,
            filename.replace(".mp4", ".txt")
        )

        with open(video_path, "rb") as f:
            video_data = f.read()

        caption = predict(
            "Please describe this video in detail.",
            video_data,
            0.1
        )

        with open(caption_path, "w") as f:
            f.write(caption)
        print(f"Saved: {caption_path}")

Example 4: CSV format for dataset integration

import csv
import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"

with open("dataset.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["video_path", "caption"])

    for filename in sorted(os.listdir(video_dir)):
        if filename.endswith(".mp4"):
            video_path = os.path.join(video_dir, filename)
            with open(video_path, "rb") as f:
                video_data = f.read()

            caption = predict(
                "Please describe this video in detail.",
                video_data,
                0.1
            )
            writer.writerow([video_path, caption])

Related Pages

Principle:Zai_org_CogVideo_Caption_Output -- Principle governing caption output to files
Environment:Zai_org_CogVideo_Video_Captioning_Environment
Zai_org_CogVideo_CogVLM2_Predict -- Previous step: caption generation providing the text to save
Zai_org_CogVideo_Captioning_Requirements_Install -- Environment setup for the captioning pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment