SDK Docs Auditor

Produces a comprehensive, cross-referenced audit of any SDK documentation site with a fully styled downloadable HTML report.

What this skill does

Discovers all SDK pages via llms.txt first, falling back to sitemap.xml (using curl), then homepage nav crawl
Fetches and reads every relevant SDK page
Audits six fixed sections: Installation, Quick Start, Error Handling, Troubleshooting, Examples, Best Practices
Cross-references every gap across ALL other SDK pages — never flag something as missing if it exists elsewhere
Scores each section 0–100 and assigns a rating tier
Generates a beautiful, self-contained, downloadable HTML report

Step 1 — Discover all SDK pages

Use a three-tier discovery strategy, trying each method in order until one succeeds.

1a. Try llms.txt first (preferred)

Run a bash curl command to fetch llms.txt:

curl -s <docs_url>/llms.txt

If found:

Extract every URL from lines matching the pattern - [Page Title](URL): description
Filter to SDK-relevant pages only — keep URLs whose path contains any of: sdk, installation, quickstart, quick-start, error, troubleshoot, example, best-practice, getting-started, reference, api-reference, overview, client, service, memory, search, worker, job, policy, agent, team
Exclude: marketing pages, changelog, blog, legal, community/forum pages
Store as SDK_PAGES[] — list of {title, url, description}
Note in the report: "Discovery method: llms.txt"

1b. Fallback — Try sitemap.xml

If llms.txt is unavailable or returns no useful URLs, run a bash curl command to fetch the sitemap:

curl -s <docs_url>/sitemap.xml | python3 -c "import sys, re; print('\n'.join(re.findall(r'<loc>(.*?)</loc>', sys.stdin.read())))"

If the sitemap returns URLs:

Parse every <loc> entry to get the full URL list
Filter using the same keyword list above
Store as SDK_PAGES[] — list of {title, url}
Note in the report: "Discovery method: sitemap.xml — N total URLs found, M SDK-relevant kept"

Also check for a sitemap index (multiple sitemaps) by looking for <sitemapindex> in the response. If found, curl each child sitemap and aggregate all URLs before filtering.

1c. Final fallback — Homepage nav crawl

If both llms.txt and sitemap.xml fail, fetch the docs homepage (<docs_url>) and extract all links from the nav sidebar or sitemap structure, filtering using the same keyword list.

Note in the report: "Discovery method: homepage nav crawl (llms.txt and sitemap.xml unavailable)"

1d. Identify section mapping

From the discovered pages, identify which pages map to the six audit targets:

Audit section	Look for paths/titles containing
Installation	`install`, `setup`, `getting-started`
Quick Start	`quick-start`, `quickstart`, `tutorial`
Error Handling	`error`, `exception`, `errors`
Troubleshooting	`troubleshoot`, `faq`, `debug`
Examples	`example`, `sample`, `cookbook`, `tutorial`
Best Practices	`best-practice`, `guide`, `pattern`

If a dedicated page is not found for a section, note it — the absence itself is a finding.

Build the corpus

All discovered pages form the page corpus used for both section auditing and cross-referencing.

Report discovery summary:

Found N pages via [sitemap.xml / llms.txt / nav crawl]. Kept M SDK-relevant pages. Fetching now...

Step 2 — Fetch every page in the corpus

Fetch each page using bash_tool with curl, stripping HTML tags via Python to extract plain text. Do NOT use web_fetch for corpus pages — curl is more reliable and avoids permission errors.

Use this pattern for each page URL:

curl -s "https://docs.example.com/sdk/some-page" -L | python3 -c "
import sys, re, html
content = sys.stdin.read()
content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)
content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
text = re.sub(r'<[^>]+>', ' ', content)
text = html.unescape(text)
text = re.sub(r'\s+', ' ', text).strip()
print(text[:6000])
"

Batch multiple pages in a single bash call using a loop to minimise round-trips:

```bash
for page in installation quick-start error-handling troubleshooting examples best-practices overview api-reference; do
  echo "=== PAGE: $page ==="
  curl -s "https://docs.example.com/sdk/$page" -L | python3 -c "
import sys, re, html
content = sys.stdin.read()
content = re.sub(r'<script[^>]*>.*?</script>', '', content, flags=re.DOTALL)
content = re.sub(r'<style[^>]*>.*?</style>', '', content, flags=re.DOTALL)
text = re.sub(r'<[^>]+>', ' ', content)
text = html.unescape(text)
text = re.sub(r'\s+', ' ', text).strip()
print(text[:6000])
"
  echo ""
done

For large sites (>30 pages), prioritise in this order:

The six target section pages (installation, quick-start, error-handling, troubleshooting, examples, best-practices)
Overview / client overview pages
API reference
Service-specific pages (agents, jobs, workers, policies, memory, search, etc.)

Step 3 — Audit the six target sections

For each of the six sections below, read its page carefully and evaluate against the criteria. Then check every other fetched page to see if any gap is actually addressed elsewhere.

3a. Installation

Must have (deduct heavily if missing):

Prerequisites with exact versions (Python/Node/language version, package manager)
At least one complete install command (pip/npm/etc.)
Authentication setup with exact steps to obtain and configure credentials
A working verification snippet that proves the install succeeded
At least one expected output or confirmation of success

Should have (deduct moderately):

Multiple install methods (package manager, source, Docker)
IDE/editor setup
Environment variable configuration with .env examples
Version compatibility table
Optional dependency explanations (what each extra includes)
Constructor parameter reference (all params, types, defaults)

Common gaps to check across other pages:

Env-var auto-detection (does the client read env vars when called with no args?) — check api-reference, overview
Advanced constructor params (timeout, retries, org) — check api-reference
Version pinning guidance — check any getting-started pages

3b. Quick Start

Must have:

A minimal, numbered step-by-step path a first-time user can follow in under 5 minutes
Complete working code from zero to first successful call
Expected output for every code snippet
Explanation of any prerequisite service/resource (queues, workers, etc.)

Should have:

Both env-var and inline credential patterns
Multiple install options (pip + poetry, etc.)
Links to deeper docs for each concept introduced

Common gaps to check:

Worker/queue prerequisites — check workers/environments pages
Valid model/runtime IDs — check models/runtimes service pages
Execute call signature accuracy — check api-reference and agents service pages

3c. Error Handling

Must have:

Complete exception hierarchy (all exception classes, inheritance tree)
Every exception class documented with: description, import path, code example
At minimum: authentication, connection, timeout, rate-limit, API/HTTP errors
HTTP status code branching (400/401/403/404/500 at minimum)
At least one resilience pattern (retry with backoff)

Should have:

Resource-specific exceptions (one per major service)
Streaming error handling
Context manager cleanup pattern
Testing/pytest examples for error scenarios
Do/don't anti-pattern examples

Cross-reference check — critical:

List every exception class that appears in the api-reference or service pages
Flag any that are missing from this page
Check execute/method call signatures match across all pages — flag any discrepancies

3d. Troubleshooting

Must have:

Authentication failures with

sdk-docs-auditor

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

doc-coauthoring

algorithmic-art

seo-aeo-blog-writer

wordpress-centric-high-seo-optimized-blogwriting-skill

Recibe nuevas skills de Escrita e Conteúdo todos los lunes

SDK Docs Auditor

What this skill does

Step 1 — Discover all SDK pages

1a. Try llms.txt first (preferred)

1b. Fallback — Try sitemap.xml

1c. Final fallback — Homepage nav crawl

1d. Identify section mapping

Build the corpus

Step 2 — Fetch every page in the corpus

Step 3 — Audit the six target sections

3a. Installation

3b. Quick Start

3c. Error Handling

3d. Troubleshooting

Comentarios · Sin comentarios