Implementation:CrewAIInc CrewAI RAG GitHub Loader
| Knowledge Sources | |
|---|---|
| Domains | RAG, Data_Loading |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Extracts various types of content from GitHub repositories including repository metadata, README files, repository structure, pull requests, and issues.
Description
GithubLoader extends BaseLoader to interact with the GitHub API via the PyGithub library. It parses GitHub repository URLs to extract the owner/repo identifier and supports both authenticated (via gh_token) and unauthenticated access.
Content extraction is controlled by the content_types parameter (default: ["code", "repo"]), which selects which categories of information to include:
- "repo": Repository metadata including full name, description, language, star count, and fork count.
- "code": The repository README (decoded from base64) and the top-level repository structure (up to 20 items with file/directory type indicators).
- "pr": The 5 most recent open pull requests with titles and body previews (200 characters).
- "issue": The 5 most recent open issues (excluding pull requests) with titles and body previews.
All content is formatted as structured text and combined into a single LoaderResult with metadata including the source URL, repository name, and selected content types.
Usage
Import GithubLoader when you need to explicitly load GitHub repository content. It is typically instantiated automatically by the DataType.GITHUB registry when GitHub URLs are detected. Pass gh_token and content_types via the metadata kwarg.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/rag/loaders/github_loader.py
- Lines: 1-110
Signature
class GithubLoader(BaseLoader):
def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...
Import
from crewai_tools.rag.loaders.github_loader import GithubLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source | SourceContent | Yes | Wraps a GitHub repository URL (https://github.com/owner/repo) |
| metadata (via kwargs) | dict | No | May contain gh_token (str) for authentication and content_types (list[str]) to select content categories |
Outputs
| Name | Type | Description |
|---|---|---|
| return | LoaderResult | Formatted text content with repository information; metadata includes source URL, repo name, and content_types |
Usage Examples
Basic Usage
from crewai_tools.rag.loaders.github_loader import GithubLoader
from crewai_tools.rag.source_content import SourceContent
loader = GithubLoader()
# Load repository metadata and code
source = SourceContent("https://github.com/crewAIInc/crewAI")
result = loader.load(source, metadata={
"gh_token": "ghp_your_token_here",
"content_types": ["repo", "code", "issue"],
})
print(result.content)
# Repository: crewAIInc/crewAI
# Description: Framework for orchestrating role-playing, autonomous AI agents
# Language: Python
# Stars: 12345
# ...