SKILL: Dead-End Pages — No Outgoing Internal Links Audit
DESCRIPTION
Finds every blog/content page on a site that has zero outgoing links to any URL on the same domain in its body content. These pages absorb link equity but pass none on — they are structural dead-ends that silently suppress topical authority and hurt crawl efficiency across the site.
For each dead-end page the skill generates 3 suggestions: pages on the same site that the dead-end page SHOULD link out to, including the exact anchor text (the target page's top keyword), where in the dead-end page to place the link, and a ready-to-paste context copy sentence with the link already embedded.
This skill is the structural inverse of the Orphan Pages skill:
- Orphan Pages: pages with no incoming internal links (nobody links TO them)
- Dead-End Pages: pages with no outgoing internal links (they link out to nobody)
TOOLS
- Step 1 uses curl via bash_tool for sitemap discovery
- Step 2 uses curl via bash_tool for framework detection
- Step 3 uses curl via bash_tool for outgoing link detection (method varies by framework)
- Step 5 uses Ahrefs MCP (
site-explorer-top-pages) for keyword research
WORKFLOW
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 1 — DISCOVER ALL PAGES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHY THIS STEP EXISTS You need a complete list of all blog/content pages to check. Curl-based sitemap discovery is fast, free, and reflects the live site — not a stale crawl cache.
SUBSTEP 1A — FIND THE SITEMAP
Run in order until one returns valid XML:
curl -sL --max-time 15 "https://www.DOMAIN.com/sitemap.xml" | head -50
curl -sL --max-time 15 "https://www.DOMAIN.com/sitemap_index.xml" | head -50
curl -sL --max-time 15 "https://www.DOMAIN.com/robots.txt" | grep -i sitemap
If none work, crawl the blog index directly:
curl -sL "https://www.DOMAIN.com/blog/" | grep -oP 'href="[^"]*"' | grep '/blog/'
SUBSTEP 1B — EXTRACT ALL CONTENT URLS
curl -sL "https://www.DOMAIN.com/sitemap.xml" \
| grep -oP '(?<=<loc>)[^<]+' \
| grep '/blog/' \
| grep -v '/blog/$' \
| grep -v '/page/' \
| sort -u > /tmp/blog_urls.txt
wc -l /tmp/blog_urls.txt
Call this LIST_ALL. Record TOTAL_PAGES = length of LIST_ALL.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 2 — DETECT SITE FRAMEWORK ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHY THIS STEP EXISTS The detection method for outgoing links depends entirely on how the site renders HTML. Running the wrong method gives completely wrong results — this step prevents that.
Run both checks:
# Check response headers
curl -sI "https://www.DOMAIN.com" | grep -i "x-powered-by\|x-nextjs\|x-vercel\|server\|generator"
# Check HTML source for framework signals
curl -sL "https://www.DOMAIN.com" \
| grep -o 'name="generator"[^>]*\|__NEXT_DATA__\|_next\|__nuxt\|ng-version\|gatsby'
DECISION TABLE:
| Signal found | Framework | Method |
|---|---|---|
| x-nextjs-prerender in headers | Next.js | Step 3A |
| _next or NEXT_DATA in HTML | Next.js | Step 3A |
| x-vercel-cache in headers + _next | Next.js/Vercel | Step 3A |
| gatsby in HTML source | Gatsby (SSG) | Step 3B |
| name="generator" content="WordPress" | WordPress | Step 3B |
| x-powered-by: PHP | Traditional CMS | Step 3B |
| Static HTML (<a> tags present) | Static/SSG | Step 3B |
| Zero <a> tags in curl response | Unknown SPA | Step 3C |
| __nuxt in HTML | Nuxt.js | Step 3C |
IMPORTANT: Always test the detection method on 3 sample pages before running the full list. Verify that a page you know has links returns links, and a page you know is sparse returns few/none. If results look wrong, re-read this step.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 3 — DETECT OUTGOING INTERNAL LINKS PER PAGE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Run the method determined in Step 2.
──────────────────────────────────────── STEP 3A — NEXT.JS SITES (RSC HEADER METHOD) ────────────────────────────────────────
HOW IT WORKS
Next.js with React Server Components exposes a server-rendered payload endpoint.
By sending RSC: 1 as a request header, the server returns the full rendered page
data as a serialised React component tree — including all links — without needing
a headless browser. This works on all Next.js sites using the App Router (Next.js 13+).
This is the correct method for any site running on Next.js / Vercel, regardless of whether it uses ISR, SSG, or SSR. The RSC payload always contains the rendered content with all internal links visible as plain text paths.
HOW TO VALIDATE THE FETCH WORKED A valid RSC response:
- Is always larger than 10KB (most blog posts return 50KB–250KB)
- Starts with React markers like
$Sreact.fragmentorJ:or1:"$ - If a page returns less than 10KB or has no
$in the first 500 chars, the fetch failed — flag that page for manual review, do NOT mark it as a dead-end
import subprocess, re, time
domain = "DOMAIN.com" # without www, e.g. "infrasity.com"
blog_prefix = "/blog/" # content prefix, e.g. "/blog/"
with open('/tmp/blog_urls.txt') as f:
urls = [u.strip() for u in f if u.strip()]
# Asset extensions to exclude — these are not outgoing content links
SKIP_EXT = (
'.png','.webp','.jpg','.jpeg','.gif','.svg','.ico',
'.woff2','.woff','.ttf','.eot','.otf',
'.css','.js','.jsx','.ts','.tsx',
'.xml','.txt','.pdf','.zip','.mp4','.mp3','.webm'
)
# Path prefixes to exclude — assets, CDN, internal Next.js paths
SKIP_PREFIX = (
'/PostImages/', '/images/', '/fonts/', '/favicon',
'/_next/', '/_vercel/', '/static/',
'/api/', '//cdn.', '//assets.'
)
def get_outgoing_links(url):
result = subprocess.run([
'curl', '-sL', '--max-time', '10',
'-H', 'RSC: 1',
'-H', 'Next-Router-State-Tree: %5B%22%22%2C%7B%22children%22%3A%5B%22__PAGE__%22%2C%7B%7D%5D%7D%2Cnull%2Cnull%2Ctrue%5D',
url
], capture_output=True, text=True, timeout=15)
data = result.stdout
# Validate RSC fetch succeeded
if len(data) < 10000 or not any(m in data[:500] for m in ['$S', 'J:', '1:"']):
return None # Fetch failed — flag for manual review
# Extract all paths pointing to the same domain
# Extract both absolute domain links and relative paths starting with /
# (excluding protocol-relative links starting with //)
raw = re.findall(
rf'(?:(?:https?://)?(?:www\\.)?{re.escape(domain)}|(?<!/))/(?!/)([^\\s\"\\\'\\\\),\\]]+)',
data
)
raw = ['/' + r for r in raw]
clean = set()
for l in raw:
# Normalise: strip trailing junk
l = l.split('\\')[0].split('"')[0].split("'")[0].rstrip('),.]\\/')
if len(l) < 2:
continue
# Exclude assets
if any(l.lower().endswith(e) for e in SKIP_EXT):
continue
if any(l.startswith(p) for p in SKIP_PREFIX):
continue
# Exclude self-link
slug = url.rstrip('/').split('/')[-1]
if l.rstrip('/') == blog_prefix.rstrip('/') + '/' + slug:
continue
clean.add(l)
return clean
dead_ends = []
has_outlinks = []
failed = []
for i, url in enumerate(urls):
slug = url.split('/')[-1]
try:
links = get_outgoing_links(url)
if links is None:
failed.append(url)
print(f"[{i+1:3d}] FAILED {slug}")
elif len(links) == 0:
dead_ends.append(url)
print(f"[{i+1:3d}] DEAD-END {slug}")
else:
has_outlinks.append(url)
print(f"[{i+1:3d}] LINKS({len(links):2d}) {slug}")
except Exception as e:
failed.append(url)
print(f"