Implementation:CrewAIInc CrewAI RAG GitHub Loader

Knowledge Sources	CrewAI
Domains	RAG, Data_Loading
Last Updated	2026-02-11 00:00 GMT

Overview

Extracts various types of content from GitHub repositories including repository metadata, README files, repository structure, pull requests, and issues.

Description

GithubLoader extends BaseLoader to interact with the GitHub API via the PyGithub library. It parses GitHub repository URLs to extract the owner/repo identifier and supports both authenticated (via gh_token) and unauthenticated access.

Content extraction is controlled by the content_types parameter (default: ["code", "repo"]), which selects which categories of information to include:

"repo": Repository metadata including full name, description, language, star count, and fork count.
"code": The repository README (decoded from base64) and the top-level repository structure (up to 20 items with file/directory type indicators).
"pr": The 5 most recent open pull requests with titles and body previews (200 characters).
"issue": The 5 most recent open issues (excluding pull requests) with titles and body previews.

All content is formatted as structured text and combined into a single LoaderResult with metadata including the source URL, repository name, and selected content types.

Usage

Import GithubLoader when you need to explicitly load GitHub repository content. It is typically instantiated automatically by the DataType.GITHUB registry when GitHub URLs are detected. Pass gh_token and content_types via the metadata kwarg.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/loaders/github_loader.py
Lines: 1-110

Signature

class GithubLoader(BaseLoader):
    def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.github_loader import GithubLoader

I/O Contract

Inputs

Name	Type	Required	Description
source	SourceContent	Yes	Wraps a GitHub repository URL (https://github.com/owner/repo)
metadata (via kwargs)	dict	No	May contain gh_token (str) for authentication and content_types (list[str]) to select content categories

Outputs

Name	Type	Description
return	LoaderResult	Formatted text content with repository information; metadata includes source URL, repo name, and content_types

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.github_loader import GithubLoader
from crewai_tools.rag.source_content import SourceContent

loader = GithubLoader()

# Load repository metadata and code
source = SourceContent("https://github.com/crewAIInc/crewAI")
result = loader.load(source, metadata={
    "gh_token": "ghp_your_token_here",
    "content_types": ["repo", "code", "issue"],
})

print(result.content)
# Repository: crewAIInc/crewAI
# Description: Framework for orchestrating role-playing, autonomous AI agents
# Language: Python
# Stars: 12345
# ...

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment