SDK Docs Auditor
Produces a comprehensive, cross-referenced audit of any SDK documentation site with a fully styled downloadable HTML report.
What this skill does
- Discovers all SDK pages via
llms.txtfirst, falling back tositemap.xml(using curl), then homepage nav crawl - Fetches and reads every relevant SDK page
- Audits six fixed sections: Installation, Quick Start, Error Handling, Troubleshooting, Examples, Best Practices
- Cross-references every gap across ALL other SDK pages — never flag something as missing if it exists elsewhere
- Scores each section 0–100 and assigns a rating tier
- Generates a beautiful, self-contained, downloadable HTML report
Step 1 — Discover all SDK pages
Use a three-tier discovery strategy, trying each method in order until one succeeds.
1a. Try llms.txt first (preferred)
Run a bash curl command to fetch llms.txt:
curl -s <docs_url>/llms.txt
If found:
- Extract every URL from lines matching the pattern
- [Page Title](URL): description - Filter to SDK-relevant pages only — keep URLs whose path contains any of:
sdk,installation,quickstart,quick-start,error,troubleshoot,example,best-practice,getting-started,reference,api-reference,overview,client,service,memory,search,worker,job,policy,agent,team - Exclude: marketing pages, changelog, blog, legal, community/forum pages
- Store as
SDK_PAGES[]— list of{title, url, description} - Note in the report: "Discovery method: llms.txt"
1b. Fallback — Try sitemap.xml
If llms.txt is unavailable or returns no useful URLs, run a bash curl command to fetch the sitemap:
curl -s <docs_url>/sitemap.xml | python3 -c "import sys, re; print('\n'.join(re.findall(r'<loc>(.*?)</loc>', sys.stdin.read())))"
If the sitemap returns URLs:
- Parse every
<loc>entry to get the full URL list - Filter using the same keyword list above
- Store as
SDK_PAGES[]— list of{title, url} - Note in the report: "Discovery method: sitemap.xml — N total URLs found, M SDK-relevant kept"
Also check for a sitemap index (multiple sitemaps) by looking for <sitemapindex> in the response. If found, curl each child sitemap and aggregate all URLs before filtering.
1c. Final fallback — Homepage nav crawl
If both llms.txt and sitemap.xml fail, fetch the docs homepage (<docs_url>) and extract all links from the nav sidebar or sitemap structure, filtering using the same keyword list.
- Note in the report: "Discovery method: homepage nav crawl (llms.txt and sitemap.xml unavailable)"
1d. Identify section mapping
From the discovered pages, identify which pages map to the six audit targets:
| Audit section | Look for paths/titles containing |
|---|---|
| Installation | install, setup, getting-started |
| Quick Start | quick-start, quickstart, tutorial |
| Error Handling | error, exception, errors |
| Troubleshooting | troubleshoot, faq, debug |
| Examples | example, sample, cookbook, tutorial |
| Best Practices | best-practice, guide, pattern |
If a dedicated page is not found for a section, note it — the absence itself is a finding.
Build the corpus
All discovered pages form the page corpus used for both section auditing and cross-referencing.
Report discovery summary:
Found N pages via [sitemap.xml / llms.txt / nav crawl]. Kept M SDK-relevant pages. Fetching now...
Step 2 — Fetch every page in the corpus
Fetch each page using bash_tool with curl, stripping HTML tags via Python to extract plain text. Do NOT use web_fetch for corpus pages — curl is more reliable and avoids permission errors.
Use this pattern for each page URL:
curl -s "https://docs.example.com/sdk/some-page" -L | python3 -c "
import sys, re, html
content = sys.stdin.read()
content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)
content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
text = re.sub(r'<[^>]+>', ' ', content)
text = html.unescape(text)
text = re.sub(r'\s+', ' ', text).strip()
print(text[:6000])
"
Batch multiple pages in a single bash call using a loop to minimise round-trips:
```bash
for page in installation quick-start error-handling troubleshooting examples best-practices overview api-reference; do
echo "=== PAGE: $page ==="
curl -s "https://docs.example.com/sdk/$page" -L | python3 -c "
import sys, re, html
content = sys.stdin.read()
content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)
content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
text = re.sub(r'<[^>]+>', ' ', content)
text = html.unescape(text)
text = re.sub(r'\s+', ' ', text).strip()
print(text[:6000])
"
echo ""
done
For large sites (>30 pages), prioritise in this order:
- The six target section pages (installation, quick-start, error-handling, troubleshooting, examples, best-practices)
- Overview / client overview pages
- API reference
- Service-specific pages (agents, jobs, workers, policies, memory, search, etc.)
Step 3 — Audit the six target sections
For each of the six sections below, read its page carefully and evaluate against the criteria. Then check every other fetched page to see if any gap is actually addressed elsewhere.
3a. Installation
Must have (deduct heavily if missing):
- Prerequisites with exact versions (Python/Node/language version, package manager)
- At least one complete install command (pip/npm/etc.)
- Authentication setup with exact steps to obtain and configure credentials
- A working verification snippet that proves the install succeeded
- At least one expected output or confirmation of success
Should have (deduct moderately):
- Multiple install methods (package manager, source, Docker)
- IDE/editor setup
- Environment variable configuration with
.envexamples - Version compatibility table
- Optional dependency explanations (what each extra includes)
- Constructor parameter reference (all params, types, defaults)
Common gaps to check across other pages:
- Env-var auto-detection (does the client read env vars when called with no args?) — check api-reference, overview
- Advanced constructor params (timeout, retries, org) — check api-reference
- Version pinning guidance — check any getting-started pages
3b. Quick Start
Must have:
- A minimal, numbered step-by-step path a first-time user can follow in under 5 minutes
- Complete working code from zero to first successful call
- Expected output for every code snippet
- Explanation of any prerequisite service/resource (queues, workers, etc.)
Should have:
- Both env-var and inline credential patterns
- Multiple install options (pip + poetry, etc.)
- Links to deeper docs for each concept introduced
Common gaps to check:
- Worker/queue prerequisites — check workers/environments pages
- Valid model/runtime IDs — check models/runtimes service pages
- Execute call signature accuracy — check api-reference and agents service pages
3c. Error Handling
Must have:
- Complete exception hierarchy (all exception classes, inheritance tree)
- Every exception class documented with: description, import path, code example
- At minimum: authentication, connection, timeout, rate-limit, API/HTTP errors
- HTTP status code branching (400/401/403/404/500 at minimum)
- At least one resilience pattern (retry with backoff)
Should have:
- Resource-specific exceptions (one per major service)
- Streaming error handling
- Context manager cleanup pattern
- Testing/pytest examples for error scenarios
- Do/don't anti-pattern examples
Cross-reference check — critical:
- List every exception class that appears in the api-reference or service pages
- Flag any that are missing from this page
- Check execute/method call signatures match across all pages — flag any discrepancies
3d. Troubleshooting
Must have:
- Authentication failures with