Confluence Fetcher
Internal module — invoked by /bedrock:learn Phase 1 and /bedrock:sync Phase 2, not user-invocable.
Fetches a Confluence page and returns its content as Markdown. Three layers in fallback order: MCP (preferred) → REST API → Browser DOM extraction.
Dependency: Browser fallback (Layer 3) requires scripts/extract.js (relative to this skill directory).
Step 1 — Parse URL
Parse the Confluence URL. Accept these formats:
https://<domain>.atlassian.net/wiki/spaces/<spaceKey>/pages/<pageId>/<title>https://<domain>.atlassian.net/wiki/spaces/<spaceKey>/pages/<pageId>https://<domain>.atlassian.net/wiki/x/<shortlink>https://<domain>.atlassian.net/wiki/pages/viewpage.action?pageId=<pageId>
Extract:
- Base URL:
https://<domain>.atlassian.net(everything before/wiki/...) - Page ID: the numeric ID from the URL path (segment after
/pages/) orpageIdquery parameter - Full URL: the original URL as provided (needed for browser fallback)
Step 2 — Layer 1: MCP (Atlassian)
The preferred layer. Uses the plugin:atlassian:atlassian MCP server if installed and authenticated.
2.1 Check MCP availability
Use ToolSearch to check if Atlassian MCP tools are available:
ToolSearch(query: "atlassian confluence page", max_results: 5)
Evaluate the result:
- MCP tools found and functional (tools other than
authenticateandcomplete_authenticationare available) → proceed to 2.2 Fetch via MCP - Only
authenticate/complete_authenticationtools found (MCP installed but not authenticated) → proceed to 2.3 Guide authentication - No Atlassian MCP tools found → log and fall through:
MCP not available: No Atlassian MCP server installed. Install the Atlassian MCP plugin for Claude Code to enable direct Confluence access. Falling back to API (Layer 2).
2.2 Fetch via MCP
Use the Atlassian MCP tools to fetch the page content. The specific tool depends on what the MCP exposes after authentication (typically a page read or content retrieval tool).
Call the MCP tool passing the page ID or URL. The MCP returns the page content directly.
- Success → convert content to Markdown if not already, proceed to Output Contract
- Error → log the error and fall through to Layer 2:
MCP fetch failed: {error message}. Falling back to API (Layer 2).
2.3 Guide authentication
If the MCP is installed but not authenticated, guide the user:
MCP not authenticated: The Atlassian MCP server is installed but requires authentication. Run
mcp__plugin_atlassian_atlassian__authenticateto start the OAuth flow, then complete it in your browser. After authentication, Confluence pages can be fetched directly via MCP.
Ask the user: "Would you like to authenticate the Atlassian MCP now, or skip to API fallback (Layer 2)?"
- User wants to authenticate → invoke
mcp__plugin_atlassian_atlassian__authenticate, wait for the user to complete the OAuth flow, then retry 2.2 Fetch via MCP - User declines → log "User declined MCP authentication, falling to Layer 2" → continue to Step 3
Step 3 — Layer 2: API (REST)
Uses the Confluence REST API with Basic Auth (API token + email).
3.1 Check credentials
echo "CONFLUENCE_API_TOKEN: ${CONFLUENCE_API_TOKEN:+set}" && echo "CONFLUENCE_USER_EMAIL: ${CONFLUENCE_USER_EMAIL:+set}"
- Both set → proceed to 3.2 Compute auth header
- Either missing → guide and fall through:
API not available:
CONFLUENCE_API_TOKENorCONFLUENCE_USER_EMAILenvironment variable is not set. Generate an API token at https://id.atlassian.com/manage-profile/security/api-tokens and export both variables:export CONFLUENCE_API_TOKEN="your-token"andexport CONFLUENCE_USER_EMAIL="your-email". Falling back to Browser extraction (Layer 3).
3.2 Compute Basic Auth header
echo -n "${CONFLUENCE_USER_EMAIL}:${CONFLUENCE_API_TOKEN}" | base64
3.3 Fetch the page
Use WebFetch:
WebFetch(
url: "{baseUrl}/wiki/api/v2/pages/{pageId}?body-format=storage",
headers: {
"Authorization": "Basic {base64_value}",
"Accept": "application/json"
}
)
If WebFetch cannot send the Authorization header, fall back to curl via Bash:
curl -sL -H "Authorization: Basic {base64_value}" -H "Accept: application/json" \
"{baseUrl}/wiki/api/v2/pages/{pageId}?body-format=storage"
3.4 Extract content from response
The API returns JSON with:
title— page titlebody.storage.value— XHTML content (Confluence storage format)
3.5 Convert XHTML to Markdown
Convert the storage format XHTML to Markdown using these rules:
| XHTML element | Markdown output |
|---|---|
<h1> through <h6> | # through ###### |
<p> | Paragraph with blank line separation |
<strong>, <b> | **text** |
<em>, <i> | *text* |
<s>, <del> | ~~text~~ |
<a href="..."> | [text](url) |
<ul> / <ol> / <li> | Markdown lists (respect nesting) |
<table> | Markdown table with | separators and header row |
<ac:structured-macro ac:name="code"> | Fenced code block with language from <ac:parameter ac:name="language"> |
<pre> | Fenced code block |
<code> (inline) | `code` |
<blockquote> | > text |
<hr> | --- |
Confluence macros (<ac:*>) with text | Extract text content |
| Confluence macros with no text (images, drawio, attachments) | Skip silently |
3.6 Error handling
| HTTP status | Action |
|---|---|
| 200 OK | Proceed to Output Contract |
| 401 Unauthorized | Log and fall through to Layer 3: |
API authentication failed: API returned 401. The token may be expired or invalid. Regenerate your token at https://id.atlassian.com/manage-profile/security/api-tokens. Falling back to Browser extraction (Layer 3).
| HTTP status | Action |
|---|---|
| 403 Forbidden | Abort (no fallback can bypass permissions): |
API access denied: API returned 403. The user does not have access to this page. Verify page permissions in Confluence.
| HTTP status | Action |
|---|---|
| 404 Not Found | Abort: |
Page not found: API returned 404. The page ID may be incorrect. Verify the URL:
{original_url}.
Step 4 — Layer 3: Browser (Claude in Chrome)
Last resort. Opens the page in Chrome and extracts content via DOM scraping.
4.1 Load Chrome tools
Via ToolSearch:
select:mcp__claude-in-chrome__tabs_context_mcp,mcp__claude-in-chrome__tabs_create_mcp
select:mcp__claude-in-chrome__navigate
select:mcp__claude-in-chrome__javascript_tool
If Chrome MCP tools are not available, abort:
Browser not available: Claude in Chrome MCP is not installed or not running. Install the Claude in Chrome extension and ensure it is connected. No further fallback layers available — cannot fetch this Confluence page.
4.2 Get browser context
mcp__claude-in-chrome__tabs_context_mcp(createIfEmpty: true)
4.3 Navigate to the page
mcp__claude-in-chrome__tabs_create_mcp()
mcp__claude-in-chrome__navigate(url: "<full confluence URL>", tabId: <id>)
4.4 Execute extraction script
Read scripts/extract.js from this skill's directory using the Read tool. Then execute it:
mcp__claude-in-chrome__javascript_tool(
action: "javascript_exec",
text: <contents of extract.js>,
tabId: <id>
)
The script returns JSON:
{
"status": "ready",
"totalLength": 52969,
"totalChunks": 6,
"chunkSize": 10000,
"title": "Page Title",
"instructions": "Run window.__confluence.chunk(0), window.__confluence.chunk(1), etc."
}
If the script returns an error field: handle accordingly (login page, empty content, wrong page).
4.5 Read chunks
For each chunk from 0 to totalChunks - 1:
mcp__claude-in-chrome__javascript_tool(
action: "javascript_exec",
text: "window.__confluence.chunk(N)",
tabId: <id>
)
Concate