SKILL: Dead-End Pages — No Outgoing Internal Links Audit

DESCRIPTION

Finds every blog/content page on a site that has zero outgoing links to any URL on the same domain in its body content. These pages absorb link equity but pass none on — they are structural dead-ends that silently suppress topical authority and hurt crawl efficiency across the site.

For each dead-end page the skill generates 3 suggestions: pages on the same site that the dead-end page SHOULD link out to, including the exact anchor text (the target page's top keyword), where in the dead-end page to place the link, and a ready-to-paste context copy sentence with the link already embedded.

This skill is the structural inverse of the Orphan Pages skill:

Orphan Pages: pages with no incoming internal links (nobody links TO them)
Dead-End Pages: pages with no outgoing internal links (they link out to nobody)

TOOLS

Step 1 uses curl via bash_tool for sitemap discovery
Step 2 uses curl via bash_tool for framework detection
Step 3 uses curl via bash_tool for outgoing link detection (method varies by framework)
Step 5 uses Ahrefs MCP (site-explorer-top-pages) for keyword research

WORKFLOW

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 1 — DISCOVER ALL PAGES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

WHY THIS STEP EXISTS You need a complete list of all blog/content pages to check. Curl-based sitemap discovery is fast, free, and reflects the live site — not a stale crawl cache.

SUBSTEP 1A — FIND THE SITEMAP

Run in order until one returns valid XML:

curl -sL --max-time 15 "https://www.DOMAIN.com/sitemap.xml" | head -50
curl -sL --max-time 15 "https://www.DOMAIN.com/sitemap_index.xml" | head -50
curl -sL --max-time 15 "https://www.DOMAIN.com/robots.txt" | grep -i sitemap

If none work, crawl the blog index directly:

curl -sL "https://www.DOMAIN.com/blog/" | grep -oP 'href="[^"]*"' | grep '/blog/'

SUBSTEP 1B — EXTRACT ALL CONTENT URLS

curl -sL "https://www.DOMAIN.com/sitemap.xml" \
  | grep -oP '(?<=<loc>)[^<]+' \
  | grep '/blog/' \
  | grep -v '/blog/$' \
  | grep -v '/page/' \
  | sort -u > /tmp/blog_urls.txt

wc -l /tmp/blog_urls.txt

Call this LIST_ALL. Record TOTAL_PAGES = length of LIST_ALL.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 2 — DETECT SITE FRAMEWORK ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

WHY THIS STEP EXISTS The detection method for outgoing links depends entirely on how the site renders HTML. Running the wrong method gives completely wrong results — this step prevents that.

Run both checks:

# Check response headers
curl -sI "https://www.DOMAIN.com" | grep -i "x-powered-by\|x-nextjs\|x-vercel\|server\|generator"

# Check HTML source for framework signals
curl -sL "https://www.DOMAIN.com" \
  | grep -o 'name="generator"[^>]*\|__NEXT_DATA__\|_next\|__nuxt\|ng-version\|gatsby'

DECISION TABLE:

Signal found	Framework	Method
x-nextjs-prerender in headers	Next.js	Step 3A
_next or NEXT_DATA in HTML	Next.js	Step 3A
x-vercel-cache in headers + _next	Next.js/Vercel	Step 3A
gatsby in HTML source	Gatsby (SSG)	Step 3B
name="generator" content="WordPress"	WordPress	Step 3B
x-powered-by: PHP	Traditional CMS	Step 3B
Static HTML (<a> tags present)	Static/SSG	Step 3B
Zero <a> tags in curl response	Unknown SPA	Step 3C
__nuxt in HTML	Nuxt.js	Step 3C

IMPORTANT: Always test the detection method on 3 sample pages before running the full list. Verify that a page you know has links returns links, and a page you know is sparse returns few/none. If results look wrong, re-read this step.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 3 — DETECT OUTGOING INTERNAL LINKS PER PAGE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Run the method determined in Step 2.

──────────────────────────────────────── STEP 3A — NEXT.JS SITES (RSC HEADER METHOD) ────────────────────────────────────────

HOW IT WORKS Next.js with React Server Components exposes a server-rendered payload endpoint. By sending RSC: 1 as a request header, the server returns the full rendered page data as a serialised React component tree — including all links — without needing a headless browser. This works on all Next.js sites using the App Router (Next.js 13+).

This is the correct method for any site running on Next.js / Vercel, regardless of whether it uses ISR, SSG, or SSR. The RSC payload always contains the rendered content with all internal links visible as plain text paths.

HOW TO VALIDATE THE FETCH WORKED A valid RSC response:

Is always larger than 10KB (most blog posts return 50KB–250KB)
Starts with React markers like $Sreact.fragment or J: or 1:"$
If a page returns less than 10KB or has no $ in the first 500 chars, the fetch failed — flag that page for manual review, do NOT mark it as a dead-end

import subprocess, re, time

domain = "DOMAIN.com"        # without www, e.g. "infrasity.com"
blog_prefix = "/blog/"       # content prefix, e.g. "/blog/"

with open('/tmp/blog_urls.txt') as f:
    urls = [u.strip() for u in f if u.strip()]

# Asset extensions to exclude — these are not outgoing content links
SKIP_EXT = (
    '.png','.webp','.jpg','.jpeg','.gif','.svg','.ico',
    '.woff2','.woff','.ttf','.eot','.otf',
    '.css','.js','.jsx','.ts','.tsx',
    '.xml','.txt','.pdf','.zip','.mp4','.mp3','.webm'
)

# Path prefixes to exclude — assets, CDN, internal Next.js paths
SKIP_PREFIX = (
    '/PostImages/', '/images/', '/fonts/', '/favicon',
    '/_next/', '/_vercel/', '/static/',
    '/api/', '//cdn.', '//assets.'
)

def get_outgoing_links(url):
    result = subprocess.run([
        'curl', '-sL', '--max-time', '10',
        '-H', 'RSC: 1',
        '-H', 'Next-Router-State-Tree: %5B%22%22%2C%7B%22children%22%3A%5B%22__PAGE__%22%2C%7B%7D%5D%7D%2Cnull%2Cnull%2Ctrue%5D',
        url
    ], capture_output=True, text=True, timeout=15)

    data = result.stdout

    # Validate RSC fetch succeeded
    if len(data) < 10000 or not any(m in data[:500] for m in ['$S', 'J:', '1:"']):
        return None  # Fetch failed — flag for manual review

    # Extract all paths pointing to the same domain
    # Extract both absolute domain links and relative paths starting with /
    # (excluding protocol-relative links starting with //)
    raw = re.findall(
        rf'(?:(?:https?://)?(?:www\\.)?{re.escape(domain)}|(?<!/))/(?!/)([^\\s\"\\\'\\\\),\\]]+)',
        data
    )
    raw = ['/' + r for r in raw]

    clean = set()
    for l in raw:
        # Normalise: strip trailing junk
        l = l.split('\\')[0].split('"')[0].split("'")[0].rstrip('),.]\\/')
        if len(l) < 2:
            continue
        # Exclude assets
        if any(l.lower().endswith(e) for e in SKIP_EXT):
            continue
        if any(l.startswith(p) for p in SKIP_PREFIX):
            continue
        # Exclude self-link
        slug = url.rstrip('/').split('/')[-1]
        if l.rstrip('/') == blog_prefix.rstrip('/') + '/' + slug:
            continue
        clean.add(l)

    return clean

dead_ends = []
has_outlinks = []
failed = []

for i, url in enumerate(urls):
    slug = url.split('/')[-1]
    try:
        links = get_outgoing_links(url)
        if links is None:
            failed.append(url)
            print(f"[{i+1:3d}] FAILED   {slug}")
        elif len(links) == 0:
            dead_ends.append(url)
            print(f"[{i+1:3d}] DEAD-END {slug}")
        else:
            has_outlinks.append(url)
            print(f"[{i+1:3d}] LINKS({len(links):2d}) {slug}")
    except Exception as e:
        failed.append(url)
        print(f"

no-outlinks-audit

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

doc-coauthoring

algorithmic-art

seo-aeo-blog-writer

wordpress-centric-high-seo-optimized-blogwriting-skill

Recibe nuevas skills de Escrita e Conteúdo todos los lunes

SKILL: Dead-End Pages — No Outgoing Internal Links Audit

DESCRIPTION

TOOLS

WORKFLOW

Comentarios · Sin comentarios