Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Zai org CogVideo Caption File Output

From Leeroopedia


Attribute Value
Implementation Name Caption File Output
Workflow Video Captioning
Step 5 of 5
Type Pattern Doc
Source File tools/caption/video_caption.py:L103-112
Repository zai-org/CogVideo
Last Updated 2026-02-10 00:00 GMT

Overview

Implementation of caption text output for the video captioning pipeline. This pattern document describes how to save generated captions to files compatible with the CogVideoX fine-tuning dataset format.

Description

The caption file output pattern:

  1. Calls predict() to generate the caption text for a video
  2. Writes the caption to a file using standard Python I/O
  3. The output format matches the fine-tuning dataset's caption_column expectations

The implementation uses a simple append-mode file write to build up a prompts file across multiple videos. Each caption is written on a separate line.

Usage

from tools.caption.video_caption import predict

with open("video.mp4", "rb") as f:
    video_data = f.read()

response = predict("Please describe this video in detail.", video_data, 0.1)

with open("prompts.txt", "a") as f:
    f.write(response + "\n")

Code Reference

Source Location

File Lines Description
tools/caption/video_caption.py L103-112 Caption output section

Signature

# Pattern: Save caption text to file
# The output format should match the dataset's caption_column expectations
response = predict(prompt, video_data, temperature)
# Save to file (user implements this pattern):
with open("prompts.txt", "a") as f:
    f.write(response + "\n")

Import

# Standard Python I/O - no additional imports needed

I/O Contract

Inputs

Parameter Type Default Description
response str Required Generated caption text from predict()
Output file path str "prompts.txt" Path to the output caption file

Outputs

Output Type Description
Side effect Text file Caption text appended to the output file, one caption per line

Usage Examples

Example 1: Single video caption to file

from tools.caption.video_caption import predict

with open("my_video.mp4", "rb") as f:
    video_data = f.read()

caption = predict(
    "Please describe this video in detail.",
    video_data,
    0.1
)

with open("prompts.txt", "w") as f:
    f.write(caption + "\n")

Example 2: Batch captioning with aggregated output

import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"
output_file = "prompts.txt"

with open(output_file, "w") as out_f:
    for filename in sorted(os.listdir(video_dir)):
        if filename.endswith(".mp4"):
            video_path = os.path.join(video_dir, filename)
            with open(video_path, "rb") as f:
                video_data = f.read()

            caption = predict(
                "Please describe this video in detail.",
                video_data,
                0.1
            )
            out_f.write(caption + "\n")
            print(f"Captioned: {filename}")

Example 3: Per-video caption files

import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"
caption_dir = "/data/captions/"
os.makedirs(caption_dir, exist_ok=True)

for filename in sorted(os.listdir(video_dir)):
    if filename.endswith(".mp4"):
        video_path = os.path.join(video_dir, filename)
        caption_path = os.path.join(
            caption_dir,
            filename.replace(".mp4", ".txt")
        )

        with open(video_path, "rb") as f:
            video_data = f.read()

        caption = predict(
            "Please describe this video in detail.",
            video_data,
            0.1
        )

        with open(caption_path, "w") as f:
            f.write(caption)
        print(f"Saved: {caption_path}")

Example 4: CSV format for dataset integration

import csv
import os
from tools.caption.video_caption import predict

video_dir = "/data/training_videos/"

with open("dataset.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["video_path", "caption"])

    for filename in sorted(os.listdir(video_dir)):
        if filename.endswith(".mp4"):
            video_path = os.path.join(video_dir, filename)
            with open(video_path, "rb") as f:
                video_data = f.read()

            caption = predict(
                "Please describe this video in detail.",
                video_data,
                0.1
            )
            writer.writerow([video_path, caption])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment