Implementation:Zai org CogVideo Caption File Output
Appearance
| Attribute | Value |
|---|---|
| Implementation Name | Caption File Output |
| Workflow | Video Captioning |
| Step | 5 of 5 |
| Type | Pattern Doc |
| Source File | tools/caption/video_caption.py:L103-112
|
| Repository | zai-org/CogVideo |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Implementation of caption text output for the video captioning pipeline. This pattern document describes how to save generated captions to files compatible with the CogVideoX fine-tuning dataset format.
Description
The caption file output pattern:
- Calls
predict()to generate the caption text for a video - Writes the caption to a file using standard Python I/O
- The output format matches the fine-tuning dataset's
caption_columnexpectations
The implementation uses a simple append-mode file write to build up a prompts file across multiple videos. Each caption is written on a separate line.
Usage
from tools.caption.video_caption import predict
with open("video.mp4", "rb") as f:
video_data = f.read()
response = predict("Please describe this video in detail.", video_data, 0.1)
with open("prompts.txt", "a") as f:
f.write(response + "\n")
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
tools/caption/video_caption.py |
L103-112 | Caption output section |
Signature
# Pattern: Save caption text to file
# The output format should match the dataset's caption_column expectations
response = predict(prompt, video_data, temperature)
# Save to file (user implements this pattern):
with open("prompts.txt", "a") as f:
f.write(response + "\n")
Import
# Standard Python I/O - no additional imports needed
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
response |
str |
Required | Generated caption text from predict()
|
| Output file path | str |
"prompts.txt" |
Path to the output caption file |
Outputs
| Output | Type | Description |
|---|---|---|
| Side effect | Text file | Caption text appended to the output file, one caption per line |
Usage Examples
Example 1: Single video caption to file
from tools.caption.video_caption import predict
with open("my_video.mp4", "rb") as f:
video_data = f.read()
caption = predict(
"Please describe this video in detail.",
video_data,
0.1
)
with open("prompts.txt", "w") as f:
f.write(caption + "\n")
Example 2: Batch captioning with aggregated output
import os
from tools.caption.video_caption import predict
video_dir = "/data/training_videos/"
output_file = "prompts.txt"
with open(output_file, "w") as out_f:
for filename in sorted(os.listdir(video_dir)):
if filename.endswith(".mp4"):
video_path = os.path.join(video_dir, filename)
with open(video_path, "rb") as f:
video_data = f.read()
caption = predict(
"Please describe this video in detail.",
video_data,
0.1
)
out_f.write(caption + "\n")
print(f"Captioned: {filename}")
Example 3: Per-video caption files
import os
from tools.caption.video_caption import predict
video_dir = "/data/training_videos/"
caption_dir = "/data/captions/"
os.makedirs(caption_dir, exist_ok=True)
for filename in sorted(os.listdir(video_dir)):
if filename.endswith(".mp4"):
video_path = os.path.join(video_dir, filename)
caption_path = os.path.join(
caption_dir,
filename.replace(".mp4", ".txt")
)
with open(video_path, "rb") as f:
video_data = f.read()
caption = predict(
"Please describe this video in detail.",
video_data,
0.1
)
with open(caption_path, "w") as f:
f.write(caption)
print(f"Saved: {caption_path}")
Example 4: CSV format for dataset integration
import csv
import os
from tools.caption.video_caption import predict
video_dir = "/data/training_videos/"
with open("dataset.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["video_path", "caption"])
for filename in sorted(os.listdir(video_dir)):
if filename.endswith(".mp4"):
video_path = os.path.join(video_dir, filename)
with open(video_path, "rb") as f:
video_data = f.read()
caption = predict(
"Please describe this video in detail.",
video_data,
0.1
)
writer.writerow([video_path, caption])
Related Pages
- Principle:Zai_org_CogVideo_Caption_Output -- Principle governing caption output to files
- Environment:Zai_org_CogVideo_Video_Captioning_Environment
- Zai_org_CogVideo_CogVLM2_Predict -- Previous step: caption generation providing the text to save
- Zai_org_CogVideo_Captioning_Requirements_Install -- Environment setup for the captioning pipeline
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment