Google Docs & Sheets Fetcher
Internal module — invoked by /bedrock:learn Phase 1 and /bedrock:sync Phase 2, not user-invocable.
Fetches a Google Docs document or Google Sheets spreadsheet and converts it to a local Markdown file. Supports both document types with automatic detection. Three layers in fallback order: MCP (preferred) → API / Public Export → Browser DOM extraction.
Dependency: Browser fallback (Layer 3) requires scripts/extract.js (relative to this skill directory).
Step 1 — Parse URL and Detect Type
Parse the URL. Accept these formats:
Google Docs:
https://docs.google.com/document/d/{docId}/edithttps://docs.google.com/document/d/{docId}/edit#heading=...https://docs.google.com/document/d/{docId}/edit?tab=t.0https://docs.google.com/document/d/{docId}- Raw document ID (no URL) — treat as Doc by default
Google Sheets:
https://docs.google.com/spreadsheets/d/{docId}/edithttps://docs.google.com/spreadsheets/d/{docId}/edit#gid=0https://docs.google.com/spreadsheets/d/{docId}- Raw spreadsheet ID — only if the user explicitly mentions "sheet" or "spreadsheet"
Detect type:
- URL contains
/spreadsheets/d/→ Sheet - URL contains
/document/d/→ Doc - Raw ID with no URL → default to Doc unless user context indicates Sheet
Extract the {docId} — the string between /d/ and the next / or end of path.
Step 2 — Layer 1: MCP (Google Docs)
The preferred layer. Checks if a Google Docs/Drive MCP server is installed and authenticated.
2.1 Check MCP availability
Use ToolSearch to check if any Google Docs or Google Drive MCP tools are available:
ToolSearch(query: "google docs drive document", max_results: 5)
Evaluate the result:
- Google Docs/Drive MCP tools found and functional → proceed to 2.2 Fetch via MCP
- MCP tools found but not authenticated → proceed to 2.3 Guide authentication
- No Google Docs MCP tools found (this is the expected case today) → log and fall through:
MCP not available: No Google Docs/Drive MCP server installed. When a Google Docs MCP becomes available, install it for direct document access. Falling back to API (Layer 2).
2.2 Fetch via MCP
Use the Google Docs MCP tools to fetch the document content. The specific tool depends on what the MCP exposes.
- Success → convert content to Markdown if not already, proceed to Output Contract
- Error → log the error and fall through to Layer 2:
MCP fetch failed: {error message}. Falling back to API (Layer 2).
2.3 Guide authentication
If an MCP is installed but not authenticated, guide the user:
MCP not authenticated: A Google Docs MCP server is installed but requires authentication. Complete the authentication flow as prompted by the MCP server. After authentication, Google documents can be fetched directly via MCP.
Ask the user: "Would you like to authenticate the Google Docs MCP now, or skip to API fallback (Layer 2)?"
- User wants to authenticate → invoke the MCP authentication tool, wait for the user to complete the flow, then retry 2.2 Fetch via MCP
- User declines → log "User declined MCP authentication, falling to Layer 2" → continue to Step 3
Step 3 — Layer 2: API
Uses the Google Drive/Sheets API with a bearer token or public URL export.
Strategy selection
- If
GOOGLE_ACCESS_TOKENenv var exists → use Strategy A (API with token) - If
GOOGLE_ACCESS_TOKENis NOT set → use Strategy B (Public Export) - If Strategy B fails (private document) → guide and fall through to Layer 3:
API not available: This document requires authentication and no
GOOGLE_ACCESS_TOKENis set. Generate a token at https://developers.google.com/oauthplayground/ with thehttps://www.googleapis.com/auth/drive.readonlyscope. Export:export GOOGLE_ACCESS_TOKEN="your-token". Falling back to Browser extraction (Layer 3).
Inform the caller which strategy is being used and whether the document is a Doc or Sheet.
Google Docs — Strategy A (API with token)
A.1 Fetch as Markdown
Use WebFetch:
WebFetch(
url: "https://www.googleapis.com/drive/v3/files/{docId}/export?mimeType=text/markdown",
headers: { "Authorization": "Bearer {GOOGLE_ACCESS_TOKEN}" },
prompt: "Return the COMPLETE raw content exactly as-is. Do not summarize or truncate."
)
If WebFetch cannot send the Authorization header, fall back to Bash:
curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
"https://www.googleapis.com/drive/v3/files/{docId}/export?mimeType=text/markdown"
A.2 Validate
- Valid Markdown content → proceed to Output Contract
- 401 → guide and fall through to Layer 3:
API authentication failed: Google API returned 401. The token may be expired or invalid. Refresh your token at https://developers.google.com/oauthplayground/. Falling back to Browser extraction (Layer 3).
- 403 → abort (permissions issue, no fallback can bypass):
API access denied: Google API returned 403. You do not have access to this document. Verify document sharing permissions in Google Docs.
- 404 → abort:
Document not found: Google API returned 404. The document ID may be incorrect. Verify the URL or document ID.
- Empty response → guide and fall through to Layer 3:
API returned empty: The document appears to be empty or the export failed. Falling back to Browser extraction (Layer 3).
Do not post-process the Markdown — return Google's native output as-is.
Google Docs — Strategy B (Public Export)
B.1 Fetch via public endpoint
curl -sL "https://docs.google.com/document/d/{docId}/export?format=md"
The -L flag follows the 307 redirect to *.googleusercontent.com.
B.2 Validate
- Valid Markdown content → proceed to Output Contract
- HTML error page or Google login page → document is private, fall through to Layer 3:
Public export not available: Document is private and requires authentication. Set
GOOGLE_ACCESS_TOKENfor API access, or ensure you are logged into Google in Chrome. Falling back to Browser extraction (Layer 3).
- Empty response → fall through to Layer 3:
Public export returned empty: The document appears to be empty or inaccessible. Falling back to Browser extraction (Layer 3).
Google Sheets — Strategy A (API with token)
A.1 List all sheet tabs
curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
"https://sheets.googleapis.com/v4/spreadsheets/{docId}?fields=sheets.properties"
Returns JSON with sheets[].properties.title (sheet name) and sheets[].properties.sheetId (gid).
A.2 Export each tab as CSV
For each tab:
curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
"https://docs.google.com/spreadsheets/d/{docId}/export?format=csv&gid={sheetGid}"
If the export endpoint fails for a specific tab, fall back to the Sheets API values endpoint:
curl -sL -H "Authorization: Bearer ${GOOGLE_ACCESS_TOKEN}" \
"https://sheets.googleapis.com/v4/spreadsheets/{docId}/values/{sheetName}!A:ZZ"
Convert the JSON values array rows to comma-separated values to produce CSV.
A.3 Convert CSV to Markdown tables
For each tab's CSV:
- Parse CSV correctly — respect quoted fields (fields containing commas, newlines, or double quotes wrapped in
"..."are a single field) - First row = header →
| col1 | col2 | ... | - Separator row →
| --- | --- | ... | - Data rows →
| val1 | val2 | ... | - Escape pipe characters
|within cell values as\|
A.4 Concatenate all tabs
For each tab, prepend: ## {sheet_name} followed by a blank line, then the Markdown table, then a blank line. Tabs appear in the same order as returned by the Sheets API metadata.
A.5 Validate
- At least one tab produced valid content → proceed to Output Contract
- 401