Implementation:CrewAIInc CrewAI RAG YouTube Video Loader
| Knowledge Sources | |
|---|---|
| Domains | RAG, Data_Loading, Web_Scraping |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Extracts full transcripts and metadata from individual YouTube videos, supporting multiple URL formats and language fallback for transcript retrieval.
Description
YoutubeVideoLoader extends BaseLoader to process individual YouTube videos. It requires the youtube-transcript-api library (lazily imported with a clear installation message if missing).
The _extract_video_id() static method parses video IDs from various YouTube URL formats using regex patterns (watch?v=, youtu.be/, embed/, /v/) and also performs URL query parameter parsing as a fallback, validating the hostname against youtube.com (including subdomains) and youtu.be.
Transcript extraction uses YouTubeTranscriptApi with a language preference fallback strategy: 1. Manual English transcript (find_transcript(["en"])) 2. Auto-generated English transcript (find_generated_transcript(["en"])) 3. Any available transcript (first in the list)
The transcript text is extracted from individual entries (each having a text attribute), stripped of whitespace, and joined with spaces into a continuous text. If the pytube library is available, the loader enriches the output with video title, author, length_seconds, and a description preview (500 characters), prepending these as a header to the transcript content.
Metadata tracks the source URL, video_id, data_type, language, and whether the transcript is auto-generated.
Usage
Import YoutubeVideoLoader when you need to extract transcripts from YouTube videos. It is typically instantiated automatically by the DataType.YOUTUBE_VIDEO registry when YouTube video URLs are detected.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py
- Lines: 1-134
Signature
class YoutubeVideoLoader(BaseLoader):
def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...
Import
from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source | SourceContent | Yes | Wraps a YouTube video URL (supports watch, youtu.be, embed, and /v/ formats) |
| **kwargs | Any | No | Additional keyword arguments (unused) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | LoaderResult | Contains full transcript text (optionally with title/author header); metadata includes source URL, video_id, data_type, language, is_generated, and optionally title, author, length_seconds, description |
Usage Examples
Basic Usage
from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader
from crewai_tools.rag.source_content import SourceContent
loader = YoutubeVideoLoader()
source = SourceContent("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
result = loader.load(source)
print(result.content)
# Title: Video Title
#
# Author: Channel Name
#
# Transcript:
# Full transcript text of the video joined as continuous text...
print(result.metadata)
# {'source': 'https://...', 'video_id': 'dQw4w9WgXcQ', 'data_type': 'youtube_video',
# 'language': 'en', 'is_generated': False, 'title': 'Video Title', ...}