Google Docs & Sheets Fetcher

Internal module — invoked by /bedrock:learn Phase 1 and /bedrock:sync Phase 2, not user-invocable.

Fetches a Google Docs document or Google Sheets spreadsheet and converts it to a local Markdown file. Supports both document types with automatic detection. Three layers in fallback order: MCP (preferred) → API / Public Export → Browser DOM extraction.

Dependency: Browser fallback (Layer 3) requires scripts/extract.js (relative to this skill directory).

Step 1 — Parse URL and Detect Type

Parse the URL. Accept these formats:

Google Docs:

https://docs.google.com/document/d/{docId}/edit
https://docs.google.com/document/d/{docId}/edit#heading=...
https://docs.google.com/document/d/{docId}/edit?tab=t.0
https://docs.google.com/document/d/{docId}
Raw document ID (no URL) — treat as Doc by default

Google Sheets:

https://docs.google.com/spreadsheets/d/{docId}/edit
https://docs.google.com/spreadsheets/d/{docId}/edit#gid=0
https://docs.google.com/spreadsheets/d/{docId}
Raw spreadsheet ID — only if the user explicitly mentions "sheet" or "spreadsheet"

Detect type:

URL contains /spreadsheets/d/ → Sheet
URL contains /document/d/ → Doc
Raw ID with no URL → default to Doc unless user context indicates Sheet

Extract the {docId} — the string between /d/ and the next / or end of path.

Step 2 — Layer 1: MCP (Google Docs)

The preferred layer. Checks if a Google Docs/Drive MCP server is installed and authenticated.

2.1 Check MCP availability

Use ToolSearch to check if any Google Docs or Google Drive MCP tools are available:

ToolSearch(query: "google docs drive document", max_results: 5)

Evaluate the result:

Google Docs/Drive MCP tools found and functional → proceed to 2.2 Fetch via MCP
MCP tools found but not authenticated → proceed to 2.3 Guide authentication
No Google Docs MCP tools found (this is the expected case today) → log and fall through:

MCP not available: No Google Docs/Drive MCP server installed. When a Google Docs MCP becomes available, install it for direct document access. Falling back to API (Layer 2).

2.2 Fetch via MCP

Use the Google Docs MCP tools to fetch the document content. The specific tool depends on what the MCP exposes.

Success → convert content to Markdown if not already, proceed to Output Contract
Error → log the error and fall through to Layer 2:

MCP fetch failed: {error message}. Falling back to API (Layer 2).

2.3 Guide authentication

If an MCP is installed but not authenticated, guide the user:

MCP not authenticated: A Google Docs MCP server is installed but requires authentication. Complete the authentication flow as prompted by the MCP server. After authentication, Google documents can be fetched directly via MCP.

Ask the user: "Would you like to authenticate the Google Docs MCP now, or skip to API fallback (Layer 2)?"

User wants to authenticate → invoke the MCP authentication tool, wait for the user to complete the flow, then retry 2.2 Fetch via MCP
User declines → log "User declined MCP authentication, falling to Layer 2" → continue to Step 3

Step 3 — Layer 2: API

Uses the Google Drive/Sheets API with a bearer token or public URL export.

Strategy selection

If GOOGLE_ACCESS_TOKEN env var exists → use Strategy A (API with token)
If GOOGLE_ACCESS_TOKEN is NOT set → use Strategy B (Public Export)
If Strategy B fails (private document) → guide and fall through to Layer 3:

API not available: This document requires authentication and no GOOGLE_ACCESS_TOKEN is set. Generate a token at https://developers.google.com/oauthplayground/ with the https://www.googleapis.com/auth/drive.readonly scope. Export: export GOOGLE_ACCESS_TOKEN="your-token". Falling back to Browser extraction (Layer 3).

Inform the caller which strategy is being used and whether the document is a Doc or Sheet.

Google Docs — Strategy A (API with token)

A.1 Fetch as Markdown

Use WebFetch:

WebFetch(
  url: "https://www.googleapis.com/drive/v3/files/{docId}/export?mimeType=text/markdown",
  headers: { "Authorization": "Bearer {GOOGLE_ACCESS_TOKEN}" },
  prompt: "Return the COMPLETE raw content exactly as-is. Do not summarize or truncate."
)

If WebFetch cannot send the Authorization header, fall back to Bash:

curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
  "https://www.googleapis.com/drive/v3/files/{docId}/export?mimeType=text/markdown"

A.2 Validate

Valid Markdown content → proceed to Output Contract
401 → guide and fall through to Layer 3:

API authentication failed: Google API returned 401. The token may be expired or invalid. Refresh your token at https://developers.google.com/oauthplayground/. Falling back to Browser extraction (Layer 3).

403 → abort (permissions issue, no fallback can bypass):

API access denied: Google API returned 403. You do not have access to this document. Verify document sharing permissions in Google Docs.

404 → abort:

Document not found: Google API returned 404. The document ID may be incorrect. Verify the URL or document ID.

Empty response → guide and fall through to Layer 3:

API returned empty: The document appears to be empty or the export failed. Falling back to Browser extraction (Layer 3).

Do not post-process the Markdown — return Google's native output as-is.

Google Docs — Strategy B (Public Export)

B.1 Fetch via public endpoint

curl -sL "https://docs.google.com/document/d/{docId}/export?format=md"

The -L flag follows the 307 redirect to *.googleusercontent.com.

B.2 Validate

Valid Markdown content → proceed to Output Contract
HTML error page or Google login page → document is private, fall through to Layer 3:

Public export not available: Document is private and requires authentication. Set GOOGLE_ACCESS_TOKEN for API access, or ensure you are logged into Google in Chrome. Falling back to Browser extraction (Layer 3).

Empty response → fall through to Layer 3:

Public export returned empty: The document appears to be empty or inaccessible. Falling back to Browser extraction (Layer 3).

Google Sheets — Strategy A (API with token)

A.1 List all sheet tabs

curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
  "https://sheets.googleapis.com/v4/spreadsheets/{docId}?fields=sheets.properties"

Returns JSON with sheets[].properties.title (sheet name) and sheets[].properties.sheetId (gid).

A.2 Export each tab as CSV

For each tab:

curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
  "https://docs.google.com/spreadsheets/d/{docId}/export?format=csv&gid={sheetGid}"

If the export endpoint fails for a specific tab, fall back to the Sheets API values endpoint:

curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
  "https://sheets.googleapis.com/v4/spreadsheets/{docId}/values/{sheetName}!A:ZZ"

Convert the JSON values array rows to comma-separated values to produce CSV.

A.3 Convert CSV to Markdown tables

For each tab's CSV:

Parse CSV correctly — respect quoted fields (fields containing commas, newlines, or double quotes wrapped in "..." are a single field)
First row = header → | col1 | col2 | ... |
Separator row → | --- | --- | ... |
Data rows → | val1 | val2 | ... |
Escape pipe characters | within cell values as \|

A.4 Concatenate all tabs

For each tab, prepend: ## {sheet_name} followed by a blank line, then the Markdown table, then a blank line. Tabs appear in the same order as returned by the Sheets API metadata.

A.5 Validate

At least one tab produced valid content → proceed to Output Contract
401

gdoc-to-markdown

How to add

Drop this on your repo README

Related skills

algorithmic-art

doc-coauthoring

blog-writing-guide

agents-md

Get new Escrita e Conteúdo skills every Monday