Implementation:CrewAIInc CrewAI RAG YouTube Video Loader

Knowledge Sources	CrewAI
Domains	RAG, Data_Loading, Web_Scraping
Last Updated	2026-02-11 00:00 GMT

Overview

Extracts full transcripts and metadata from individual YouTube videos, supporting multiple URL formats and language fallback for transcript retrieval.

Description

YoutubeVideoLoader extends BaseLoader to process individual YouTube videos. It requires the youtube-transcript-api library (lazily imported with a clear installation message if missing).

The _extract_video_id() static method parses video IDs from various YouTube URL formats using regex patterns (watch?v=, youtu.be/, embed/, /v/) and also performs URL query parameter parsing as a fallback, validating the hostname against youtube.com (including subdomains) and youtu.be.

Transcript extraction uses YouTubeTranscriptApi with a language preference fallback strategy: 1. Manual English transcript (find_transcript(["en"])) 2. Auto-generated English transcript (find_generated_transcript(["en"])) 3. Any available transcript (first in the list)

The transcript text is extracted from individual entries (each having a text attribute), stripped of whitespace, and joined with spaces into a continuous text. If the pytube library is available, the loader enriches the output with video title, author, length_seconds, and a description preview (500 characters), prepending these as a header to the transcript content.

Metadata tracks the source URL, video_id, data_type, language, and whether the transcript is auto-generated.

Usage

Import YoutubeVideoLoader when you need to extract transcripts from YouTube videos. It is typically instantiated automatically by the DataType.YOUTUBE_VIDEO registry when YouTube video URLs are detected.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py
Lines: 1-134

Signature

class YoutubeVideoLoader(BaseLoader):
    def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader

I/O Contract

Inputs

Name	Type	Required	Description
source	SourceContent	Yes	Wraps a YouTube video URL (supports watch, youtu.be, embed, and /v/ formats)
**kwargs	Any	No	Additional keyword arguments (unused)

Outputs

Name	Type	Description
return	LoaderResult	Contains full transcript text (optionally with title/author header); metadata includes source URL, video_id, data_type, language, is_generated, and optionally title, author, length_seconds, description

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader
from crewai_tools.rag.source_content import SourceContent

loader = YoutubeVideoLoader()

source = SourceContent("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
result = loader.load(source)

print(result.content)
# Title: Video Title
#
# Author: Channel Name
#
# Transcript:
# Full transcript text of the video joined as continuous text...

print(result.metadata)
# {'source': 'https://...', 'video_id': 'dQw4w9WgXcQ', 'data_type': 'youtube_video',
#  'language': 'en', 'is_generated': False, 'title': 'Video Title', ...}

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment