AI Text Detection Guide
Derived from Wikipedia's empirically documented field guide on LLM writing patterns. Applicable to any text — not just Wikipedia.
How to analyze text
Scan for signals across the categories below. Weight clusters more heavily than isolated occurrences — one "pivotal" is coincidence; five AI-vocab words in a paragraph is a strong signal.
Output format: Quote specific examples from the text, categorize each signal, then give an overall verdict: Likely AI / Possibly AI / Likely Human with brief reasoning.
Important: Newer models (GPT-5.1+, Claude 4+) actively suppress known tells like em dashes and classic AI vocabulary. Absence of classic markers is NOT proof of human origin. Look for subtler patterns: overly cautious hedging, Latinate word preference, "quietly" narratives, structural uniformity, the four deeper tells in Section 12 (abstraction trap, sensing without sensing, treadmill effect, subtext vacuum), and the in-disguise patterns in Sections 14–16 (decorative triplets, drumroll-in-disguise, false idioms/calques) — these are the most reliable signals on modern models.
1. AI Vocabulary Density (strongest signal)
LLMs statistically overuse specific words. Check for density/clusters.
See word-lists.md for the complete list by model era.
Quick scan: pivotal, underscore, tapestry, delve, meticulous, vibrant, intricate, testament, bolstered, garner, fostering, showcasing, align with, enhance, highlighting
One word = coincidence. Five+ in a short passage = strong AI signal.
2. Content Patterns
Significance inflation
Adds statements about how mundane facts "represent a shift," "mark a pivotal moment," or "contribute to the broader landscape" — even for population data or etymology.
Trigger phrases: stands as, serves as, marks a pivotal, underscores its importance, reflects broader, symbolizing its enduring, setting the stage for, indelible mark, deeply rooted, evolving landscape
Challenges formula
Rigid structure, usually at the end: "Despite its [positive adjective], [subject] faces challenges including... Despite these challenges, [vague optimism about the future]."
Superficial -ing clauses
Tacks present participle phrases onto sentences as pseudo-analysis: "...highlighting its importance," "...underscoring the significance," "...reflecting the broader trend."
Vague attribution weaseling
Attributes claims to no one specific: "Experts argue," "Observers have noted," "Industry reports suggest," "Several sources indicate," "Some critics argue."
Promotional/travel-guide tone
Warm, advertisement-like prose even on neutral topics: nestled, vibrant, rich cultural heritage, breathtaking, diverse array, boasts, showcasing, groundbreaking, renowned, in the heart of.
Overgeneralization from few sources
LLMs take information from 1–2 sources but present it as widely held or as an incomplete list. Watch for: "including..." or "such as..." when the enumeration actually covers everything the sources say; opinions from a single source framed as consensus; "widely regarded," "generally considered" without evidence of breadth.
"Quietly" narratives
LLMs often describe actions or developments as happening "quietly" — a pseudo-narrative detail that adds false drama or modesty. Rare in factual human writing, common across multiple LLM families.
3. Sentence Structure Tells
Copula avoidance
Replaces simple "is/are/has" with elaborate constructions:
- ❌ "serves as the primary hub" → ✅ "is the primary hub"
- ❌ "marks a significant milestone" → ✅ "was a significant milestone"
- ❌ "boasts four separate spaces" → ✅ "has four separate spaces"
Latinate over Saxon
Systematically chooses Latin/Greek-derived words over simpler Germanic equivalents, even in plain contexts:
- ❌ "utilize" → ✅ "use"
- ❌ "facilitate" → ✅ "help"
- ❌ "is devoid of" → ✅ "has no"
- ❌ "commence" → ✅ "start/begin"
- ❌ "ameliorate" → ✅ "improve"
- ❌ "expedite" → ✅ "speed up"
This is distinct from copula avoidance — it's about lexical register, not syntax. As Orwell noted, bad writers are "haunted by the notion that Latin or Greek words are grander than Saxon ones." LLMs embody this tendency statistically.
Negative parallelisms
Pseudo-balance: "Not only X, but also Y", "It's not just X, it's Y", "Not X — it's Y." Sounds thoughtful but is formulaic. See Section 14 for the more specific decorative triplets extension of this pattern.
Rule of three
Compulsive tripling: "adjective, adjective, and adjective" or "phrase, phrase, and phrase" even when two or four would be more natural. The decorative subtype is now treated separately — see Section 14.
Elegant variation
Avoids repeating a word by substituting awkward synonyms throughout: subject → "the eponymous figure" → "the key player" → "this individual."
4. Formatting Tells
- Title Case In Every Section Heading (humans normally use sentence case)
- Excessive boldface — bolding key takeaways scattered throughout body text
- Inline-header lists:
• **Bold Term**: description textpattern - Emoji in headings or bullet points 🎯
- Em dash overuse — used dramatically — where a comma or parenthesis would do — often multiple times per paragraph
- Unnecessary tables for 2–3 data points that should just be prose
- Curly quotes (" " ' ') instead of straight quotes (" ') — common in ChatGPT/DeepSeek
- Markdown in non-markdown contexts —
**bold**,# Header,- list itemwith dashes appearing in documents, emails, wiki markup, or any context where markdown is not the native format. Wikipedia added an edit filter specifically for this. Strong signal when combined with other tells.
5. Communication Leakage
Text meant for the AI chat interface that got pasted into the document:
- "I hope this helps," "Would you like me to...", "Let me know if you need anything else"
- Knowledge-cutoff disclaimers: "as of my last update," "not widely documented," "maintains a low profile"
- Unfilled placeholders:
[INSERT NAME HERE],[Describe the specific section] - Subject lines in body text: "Subject: Request for Edit"
- Submission statements: "This article meets WP:RS because..."
6. Citation & Markup Artifacts (documents/web content)
Technical residue from specific AI tools:
turn0search0,turn0search1— ChatGPT search citation artifactscontentReference[oaicite:0]{index=0}— ChatGPT reference rendering bugoai_citation,+1inline — ChatGPT markup bugs{"attribution":{"attributableIndex":"X-Y"}}— ChatGPT JSON attributionutm_source=chatgpt.comorutm_source=openaiin URLs[attached_file:1],[web:1]— Perplexity artifacts<grok_card ...>— Grok artifacts- Invalid/unresolvable DOIs or ISBNs with correct-looking formats
- Book citations with no page numbers
7. Image & Caption Tells
LLMs select illustrations by keyword-matching file names or titles, not by actual visual relevance:
- Generic images loosely related to the topic but not to the specific content (e.g., a 19th-century painting of a waiter for a modern hospitality article)
- Captions that inflate a trivial detail from the image metadata: "1946, marking an important year in..." — significance inflation applied to image dates
- Multiple images that all illustrate the same general concept rather than different aspects of the text
8. Argumentative & Discussion Tells
When LLMs generate persuasive, argumentative, or discussion text (not just articles):
- Rhetorical anaphora — repeating sentence openers for emphasis in contexts where it's inappropriate (e.g., formal discussions, business emails)
- Needless length — walls of text that reiterate the same points in slightly different wording
- Hallucinated references — citing policies, rules, or frameworks that don't exist or don't appl