Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mage ai Mage ai Google Sheets Source

From Leeroopedia
Revision as of 15:36, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Mage_ai_Mage_ai_Google_Sheets_Source.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Integration, Google_Sheets, API
Last Updated 2026-02-09 00:00 GMT

Overview

Source connector implementation for extracting data from Google Sheets spreadsheets within the Mage data integration framework.

Description

The GoogleSheets class extends the Mage Source base class to provide full-table replication from Google Sheets worksheets. It implements the three core source connector lifecycle methods: discover (introspects spreadsheet metadata to build a Singer catalog with one stream per worksheet, inferring JSON schemas from header rows and cell values), load_data (extracts worksheet data in configurable batches of up to 10,000 rows, fetching both formatted and unformatted value representations for type-accurate transformation), and test_connection (validates credentials by connecting to the Google Sheets API). Each worksheet becomes a stream with __google_sheet_row as the key property and FULL_TABLE replication method.

Usage

Instantiated by the Mage integration framework with a config containing path_to_credentials_json_file, spreadsheet_id, and optionally selected_sheet_names to filter which worksheets to sync.

Code Reference

Source Location

  • Repository: mage-ai
  • File: mage_integrations/mage_integrations/sources/google_sheets/__init__.py
  • Lines: 1-460

Signature

class GoogleSheets(Source):
    def __init__(self, **kwargs):
        ...
    def discover(self, streams: List[str] = None) -> Catalog:
        ...
    def load_data(self, stream, bookmarks: Dict = None, query: Dict = None, **kwargs) -> Generator[List[Dict], None, None]:
        ...
    def test_connection(self) -> None:
        ...

Import

from mage_integrations.sources.google_sheets import GoogleSheets

I/O Contract

Inputs

Name Type Required Description
config.path_to_credentials_json_file str Yes Path to Google service account credentials JSON file
config.spreadsheet_id str Yes Google Sheets spreadsheet ID from the URL
config.selected_sheet_names list[str] No Optional list of worksheet names to sync; syncs all if omitted

Outputs

Name Type Description
catalog Catalog Singer catalog with one CatalogEntry per worksheet, including inferred JSON schema
records Generator[List[Dict]] Batches of transformed row dictionaries with column names as keys

Key Behaviors

Discovery

  • Fetches spreadsheet metadata via the Google Sheets API to enumerate worksheets.
  • For each worksheet, reads header rows and first data rows to infer column types and build a JSON schema.
  • Creates a CatalogEntry with FULL_TABLE replication method, __google_sheet_row as key property, and UNIQUE_CONFLICT_METHOD_UPDATE for conflict resolution.

Data Loading

  • Paginates through worksheet rows in batches of 10,000 rows using A1 notation ranges (e.g., A2:Z10001).
  • Fetches both FORMATTED_VALUE and UNFORMATTED_VALUE representations for each batch to enable type-aware transformation.
  • Delegates row transformation to internal_transform.transform_sheet_data, which converts raw cell values to typed dictionaries.
  • Yields each batch as a list of dictionaries, stopping when a completely blank page is encountered.

Connection Testing

Calls self.connection.connect() to verify that the provided credentials are valid and can reach the Google Sheets API.

Usage Examples

from mage_integrations.sources.google_sheets import GoogleSheets

config = {
    "path_to_credentials_json_file": "/path/to/service_account.json",
    "spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms",
    "selected_sheet_names": ["Sheet1", "Revenue"],
}
source = GoogleSheets(config=config)

# Discover available streams
catalog = source.discover()

# Load data from a stream
for batch in source.load_data(catalog.streams[0]):
    for record in batch:
        print(record)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment