Implementation:Huggingface Diffusers VideoProcessor
Appearance
| Field | Value |
|---|---|
| Type | API Doc |
| Overview | VideoProcessor API for preprocessing input video frames and postprocessing decoded video tensors |
| Domains | Video Generation, Image Processing |
| Workflow | Video_Generation |
| Related Principle | Huggingface_Diffusers_Video_Input_Preparation |
| Source | src/diffusers/video_processor.py:L27-L176
|
| Last Updated | 2026-02-13 00:00 GMT |
Code Reference
VideoProcessor Class
Source: src/diffusers/video_processor.py:L25-L176
class VideoProcessor(VaeImageProcessor):
"""Simple video processor."""
def preprocess_video(self, video, height: int | None = None, width: int | None = None) -> torch.Tensor:
"""Preprocesses input video(s)."""
# Handle deprecated 5D list inputs
if isinstance(video, list) and isinstance(video[0], np.ndarray) and video[0].ndim == 5:
video = np.concatenate(video, axis=0)
if isinstance(video, list) and isinstance(video[0], torch.Tensor) and video[0].ndim == 5:
video = torch.cat(video, axis=0)
# Normalize to list of videos
if isinstance(video, (np.ndarray, torch.Tensor)) and video.ndim == 5:
video = list(video)
elif isinstance(video, list) and is_valid_image(video[0]) or is_valid_image_imagelist(video):
video = [video]
elif isinstance(video, list) and is_valid_image_imagelist(video[0]):
video = video
# Preprocess each video and stack
video = torch.stack([self.preprocess(img, height=height, width=width) for img in video], dim=0)
video = video.permute(0, 2, 1, 3, 4) # (B, C, F, H, W)
return video
def postprocess_video(self, video: torch.Tensor, output_type: str = "np"):
"""Converts a video tensor to a list of frames for export."""
batch_size = video.shape[0]
outputs = []
for batch_idx in range(batch_size):
batch_vid = video[batch_idx].permute(1, 0, 2, 3) # (F, C, H, W)
batch_output = self.postprocess(batch_vid, output_type)
outputs.append(batch_output)
if output_type == "np":
outputs = np.stack(outputs)
elif output_type == "pt":
outputs = torch.stack(outputs)
return outputs
resize_and_crop_tensor
Source: src/diffusers/video_processor.py:L134-L176
@staticmethod
def resize_and_crop_tensor(samples: torch.Tensor, new_width: int, new_height: int) -> torch.Tensor:
"""Resizes and crops a tensor of videos to the specified dimensions."""
orig_height, orig_width = samples.shape[3], samples.shape[4]
if orig_height != new_height or orig_width != new_width:
ratio = max(new_height / orig_height, new_width / orig_width)
resized_width = int(orig_width * ratio)
resized_height = int(orig_height * ratio)
n, c, t, h, w = samples.shape
samples = samples.permute(0, 2, 1, 3, 4).reshape(n * t, c, h, w)
samples = F.interpolate(samples, size=(resized_height, resized_width), mode="bilinear", align_corners=False)
# Center crop
start_x = (resized_width - new_width) // 2
start_y = (resized_height - new_height) // 2
samples = samples[:, :, start_y:start_y + new_height, start_x:start_x + new_width]
samples = samples.reshape(n, t, c, new_height, new_width).permute(0, 2, 1, 3, 4)
return samples
Import
from diffusers.video_processor import VideoProcessor
Key Parameters
| Parameter | Description | Default |
|---|---|---|
vae_scale_factor |
Spatial scale factor from VAE config; inherited from VaeImageProcessor |
8
|
do_resize |
Whether to resize input frames | True
|
do_normalize |
Whether to normalize pixel values to [-1, 1] | True
|
I/O Contract
preprocess_video
Inputs:
video: One of:list[PIL.Image]- Single video as list of frameslist[list[PIL.Image]]- Batch of videostorch.Tensor(4D:F,C,H,Wor 5D:B,F,C,H,W)np.ndarray(4D:F,H,W,Cor 5D:B,F,H,W,C)
height(int | None): Target heightwidth(int | None): Target width
Outputs:
torch.Tensorof shape(B, C, F, H, W)with values in[-1, 1]
postprocess_video
Inputs:
video:torch.Tensorof shape(B, C, F, H, W)output_type:"np","pt", or"pil"
Outputs:
- If
"np":np.ndarrayof shape(B, F, H, W, C)with values in[0, 1] - If
"pt":torch.Tensor - If
"pil":list[list[PIL.Image]]
Usage Examples
Preprocessing a Reference Image for Image-to-Video
from diffusers.video_processor import VideoProcessor
from PIL import Image
processor = VideoProcessor(vae_scale_factor=8)
image = Image.open("reference.png").convert("RGB")
# Wrap single image as single-frame video
video_tensor = processor.preprocess_video([image], height=480, width=832)
# Shape: (1, 3, 1, 480, 832)
Postprocessing Decoded Video for Export
# After pipeline decoding:
# decoded_video shape: (1, 3, 81, 480, 832), values in [-1, 1]
frames = processor.postprocess_video(decoded_video, output_type="np")
# frames shape: (1, 81, 480, 832, 3), values in [0, 1]
# For PIL output:
pil_frames = processor.postprocess_video(decoded_video, output_type="pil")
# pil_frames: list of list of PIL.Image
Resizing and Cropping Video Tensors
# Resize a (B, C, F, H, W) tensor to target dimensions
resized = VideoProcessor.resize_and_crop_tensor(video_tensor, new_width=1280, new_height=720)
Related Pages
- Huggingface_Diffusers_Video_Input_Preparation (principle for this implementation) - Theory of video preprocessing
- Huggingface_Diffusers_Video_Pipeline_From_Pretrained (creates VideoProcessor) - Pipeline initialization creates the processor
- Huggingface_Diffusers_Export_To_Video (consumes output) - Export uses postprocessed frames
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment