Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG YouTube Video Loader

From Leeroopedia
Knowledge Sources
Domains RAG, Data_Loading, Web_Scraping
Last Updated 2026-02-11 00:00 GMT

Overview

Extracts full transcripts and metadata from individual YouTube videos, supporting multiple URL formats and language fallback for transcript retrieval.

Description

YoutubeVideoLoader extends BaseLoader to process individual YouTube videos. It requires the youtube-transcript-api library (lazily imported with a clear installation message if missing).

The _extract_video_id() static method parses video IDs from various YouTube URL formats using regex patterns (watch?v=, youtu.be/, embed/, /v/) and also performs URL query parameter parsing as a fallback, validating the hostname against youtube.com (including subdomains) and youtu.be.

Transcript extraction uses YouTubeTranscriptApi with a language preference fallback strategy: 1. Manual English transcript (find_transcript(["en"])) 2. Auto-generated English transcript (find_generated_transcript(["en"])) 3. Any available transcript (first in the list)

The transcript text is extracted from individual entries (each having a text attribute), stripped of whitespace, and joined with spaces into a continuous text. If the pytube library is available, the loader enriches the output with video title, author, length_seconds, and a description preview (500 characters), prepending these as a header to the transcript content.

Metadata tracks the source URL, video_id, data_type, language, and whether the transcript is auto-generated.

Usage

Import YoutubeVideoLoader when you need to extract transcripts from YouTube videos. It is typically instantiated automatically by the DataType.YOUTUBE_VIDEO registry when YouTube video URLs are detected.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/loaders/youtube_video_loader.py
  • Lines: 1-134

Signature

class YoutubeVideoLoader(BaseLoader):
    def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader

I/O Contract

Inputs

Name Type Required Description
source SourceContent Yes Wraps a YouTube video URL (supports watch, youtu.be, embed, and /v/ formats)
**kwargs Any No Additional keyword arguments (unused)

Outputs

Name Type Description
return LoaderResult Contains full transcript text (optionally with title/author header); metadata includes source URL, video_id, data_type, language, is_generated, and optionally title, author, length_seconds, description

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.youtube_video_loader import YoutubeVideoLoader
from crewai_tools.rag.source_content import SourceContent

loader = YoutubeVideoLoader()

source = SourceContent("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
result = loader.load(source)

print(result.content)
# Title: Video Title
#
# Author: Channel Name
#
# Transcript:
# Full transcript text of the video joined as continuous text...

print(result.metadata)
# {'source': 'https://...', 'video_id': 'dQw4w9WgXcQ', 'data_type': 'youtube_video',
#  'language': 'en', 'is_generated': False, 'title': 'Video Title', ...}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment