Implementation:Mage ai Mage ai Google Sheets Source
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, Google_Sheets, API |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Source connector implementation for extracting data from Google Sheets spreadsheets within the Mage data integration framework.
Description
The GoogleSheets class extends the Mage Source base class to provide full-table replication from Google Sheets worksheets. It implements the three core source connector lifecycle methods: discover (introspects spreadsheet metadata to build a Singer catalog with one stream per worksheet, inferring JSON schemas from header rows and cell values), load_data (extracts worksheet data in configurable batches of up to 10,000 rows, fetching both formatted and unformatted value representations for type-accurate transformation), and test_connection (validates credentials by connecting to the Google Sheets API). Each worksheet becomes a stream with __google_sheet_row as the key property and FULL_TABLE replication method.
Usage
Instantiated by the Mage integration framework with a config containing path_to_credentials_json_file, spreadsheet_id, and optionally selected_sheet_names to filter which worksheets to sync.
Code Reference
Source Location
- Repository: mage-ai
- File: mage_integrations/mage_integrations/sources/google_sheets/__init__.py
- Lines: 1-460
Signature
class GoogleSheets(Source):
def __init__(self, **kwargs):
...
def discover(self, streams: List[str] = None) -> Catalog:
...
def load_data(self, stream, bookmarks: Dict = None, query: Dict = None, **kwargs) -> Generator[List[Dict], None, None]:
...
def test_connection(self) -> None:
...
Import
from mage_integrations.sources.google_sheets import GoogleSheets
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config.path_to_credentials_json_file | str | Yes | Path to Google service account credentials JSON file |
| config.spreadsheet_id | str | Yes | Google Sheets spreadsheet ID from the URL |
| config.selected_sheet_names | list[str] | No | Optional list of worksheet names to sync; syncs all if omitted |
Outputs
| Name | Type | Description |
|---|---|---|
| catalog | Catalog | Singer catalog with one CatalogEntry per worksheet, including inferred JSON schema |
| records | Generator[List[Dict]] | Batches of transformed row dictionaries with column names as keys |
Key Behaviors
Discovery
- Fetches spreadsheet metadata via the Google Sheets API to enumerate worksheets.
- For each worksheet, reads header rows and first data rows to infer column types and build a JSON schema.
- Creates a CatalogEntry with FULL_TABLE replication method, __google_sheet_row as key property, and UNIQUE_CONFLICT_METHOD_UPDATE for conflict resolution.
Data Loading
- Paginates through worksheet rows in batches of 10,000 rows using A1 notation ranges (e.g., A2:Z10001).
- Fetches both FORMATTED_VALUE and UNFORMATTED_VALUE representations for each batch to enable type-aware transformation.
- Delegates row transformation to internal_transform.transform_sheet_data, which converts raw cell values to typed dictionaries.
- Yields each batch as a list of dictionaries, stopping when a completely blank page is encountered.
Connection Testing
Calls self.connection.connect() to verify that the provided credentials are valid and can reach the Google Sheets API.
Usage Examples
from mage_integrations.sources.google_sheets import GoogleSheets
config = {
"path_to_credentials_json_file": "/path/to/service_account.json",
"spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms",
"selected_sheet_names": ["Sheet1", "Revenue"],
}
source = GoogleSheets(config=config)
# Discover available streams
catalog = source.discover()
# Load data from a stream
for batch in source.load_data(catalog.streams[0]):
for record in batch:
print(record)