Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG GitHub Loader

From Leeroopedia
Knowledge Sources
Domains RAG, Data_Loading
Last Updated 2026-02-11 00:00 GMT

Overview

Extracts various types of content from GitHub repositories including repository metadata, README files, repository structure, pull requests, and issues.

Description

GithubLoader extends BaseLoader to interact with the GitHub API via the PyGithub library. It parses GitHub repository URLs to extract the owner/repo identifier and supports both authenticated (via gh_token) and unauthenticated access.

Content extraction is controlled by the content_types parameter (default: ["code", "repo"]), which selects which categories of information to include:

  • "repo": Repository metadata including full name, description, language, star count, and fork count.
  • "code": The repository README (decoded from base64) and the top-level repository structure (up to 20 items with file/directory type indicators).
  • "pr": The 5 most recent open pull requests with titles and body previews (200 characters).
  • "issue": The 5 most recent open issues (excluding pull requests) with titles and body previews.

All content is formatted as structured text and combined into a single LoaderResult with metadata including the source URL, repository name, and selected content types.

Usage

Import GithubLoader when you need to explicitly load GitHub repository content. It is typically instantiated automatically by the DataType.GITHUB registry when GitHub URLs are detected. Pass gh_token and content_types via the metadata kwarg.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/loaders/github_loader.py
  • Lines: 1-110

Signature

class GithubLoader(BaseLoader):
    def load(self, source: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.github_loader import GithubLoader

I/O Contract

Inputs

Name Type Required Description
source SourceContent Yes Wraps a GitHub repository URL (https://github.com/owner/repo)
metadata (via kwargs) dict No May contain gh_token (str) for authentication and content_types (list[str]) to select content categories

Outputs

Name Type Description
return LoaderResult Formatted text content with repository information; metadata includes source URL, repo name, and content_types

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.github_loader import GithubLoader
from crewai_tools.rag.source_content import SourceContent

loader = GithubLoader()

# Load repository metadata and code
source = SourceContent("https://github.com/crewAIInc/crewAI")
result = loader.load(source, metadata={
    "gh_token": "ghp_your_token_here",
    "content_types": ["repo", "code", "issue"],
})

print(result.content)
# Repository: crewAIInc/crewAI
# Description: Framework for orchestrating role-playing, autonomous AI agents
# Language: Python
# Stars: 12345
# ...

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment