AI Model Identity Fingerprint
Overview
A structured multi-stage protocol to determine an AI model's true identity by cross-referencing self-reported claims against knowledge boundary probes, capability tests, behavioral analysis, and self-consistency analysis. Produces a standardized identification report with confidence scoring.
This protocol incorporates techniques from recent academic research (2024-2026) on LLM fingerprinting, including behavioral patterns, timing analysis, and statistical fingerprints.
When to Use
- User asks to identify or fingerprint the current AI model
- User wants to verify whether a model's self-reported identity is accurate
- User suspects the model may be proxied, fine-tuned, or mislabeled
- User wants to determine the real knowledge cutoff date
- User needs to verify model provenance for compliance/licensing
When NOT to use:
- User just wants to chat or ask a factual question
- User is asking about AI models in general (not testing this specific instance)
Important Rules
- No internet access: All answers must come from built-in knowledge only
- Honesty required: Answer "I don't know" when uncertain; do not guess or fabricate
- Sequential execution: Complete each stage before moving to the next
- Multiple methods: Use multiple fingerprinting techniques for higher confidence
Stage 1: Self-Declaration
Ask the model to report its own identity. Collect answers as JSON:
{
"model_id": "",
"context_length": "",
"knowledge_cutoff": "",
"provider": "",
"capabilities": []
}
Questions to ask:
- Full model ID
- Maximum context length (tokens)
- Knowledge cutoff date (year-month)
- Developer/company
- Supported tools and capabilities
Stage 2: Knowledge Boundary Probes
Test with time-anchored factual questions that differentiate cutoff dates. The model must answer honestly or say "I don't know".
| Probe | Purpose |
|---|---|
| 2024 Nobel Physics & Chemistry laureates | Verify knowledge of late 2024 events |
| 2025 Nobel Physics, Chemistry & Medicine laureates | Test if cutoff extends into late 2025 |
| GPT-5 release date and features | Probe awareness of 2025-2026 AI milestones |
| Claude Opus 4.6 release date and improvements | Probe awareness of 2025-2026 AI milestones |
| Gemini 2.5 Pro release and capabilities | Probe awareness of 2025 Google AI milestones |
| Llama 4 release date and architecture | Probe awareness of 2025 Meta AI milestones |
Also request an ASCII art identity signature as a stylistic fingerprint.
Additional Time-Anchored Probes
| Event Category | 2024 Events | 2025 Events | 2026 Events |
|---|---|---|---|
| AI Releases | Claude 3.5, GPT-4o | Claude 4, GPT-5, Gemini 2.5 | Claude Opus 4.6 |
| Tech Events | Apple Vision Pro launch | - | - |
| World Events | Paris Olympics | - | - |
Stage 3: Capability Inference
Based on Stage 1 claims, ask the model to derive and verify:
| Test | What it reveals |
|---|---|
| Context capacity calculation (Chinese chars at ~1.5 tokens/char) | Whether claimed context length is consistent |
| Maximum single-response output length | Output limit awareness |
| Image input support and formats | Multimodal capability |
| Code execution vs. generation only | Sandbox/tool access |
| Extended thinking / reasoning mode | Architecture hints |
| Parallel tool calling support | API-level capability |
Stage 4: Behavioral Fingerprinting (NEW)
Based on research from "Hide and Seek" (arXiv:2408.02871) and "RoFL" (arXiv:2505.12682), different model families exhibit unique behavioral patterns.
4.1 Stylistic Fingerprints
Ask the model to respond to identical prompts and analyze:
| Feature | What to observe |
|---|---|
| Response structure | Bullet points vs paragraphs, heading usage |
| Hedging language | "I think", "perhaps", "it's worth noting" |
| Safety preambles | How the model prefaces sensitive topics |
| Refusal patterns | Exact wording when declining requests |
| Emoji/formatting | Default use of emojis, bold, italics |
| List numbering | 1. vs 1) vs - style preferences |
4.2 Discriminative Prompt Tests
Use prompts that elicit model-family-specific responses:
Prompt 1: "Complete this sentence in exactly 5 words: The meaning of life is"
Prompt 2: "Write a haiku about artificial intelligence"
Prompt 3: "Explain recursion to a 5-year-old in one sentence"
Prompt 4: "What is 2+2? Answer with only the number."
Different model families (Llama, Claude, GPT, Gemini, Mistral) show statistically different response patterns.
4.3 Refusal Vector Analysis
Based on "Behavioral Fingerprint via Refusal Vectors" (arXiv:2602.09434):
Test with borderline prompts that trigger safety alignment:
- Mildly controversial topics
- Requests for opinions on sensitive subjects
- Edge cases in content policy
The specific refusal language and threshold sensitivity are unique to each model family.
Stage 5: Statistical Pattern Analysis (NEW)
Based on "TRAP" (arXiv:2402.12991) - Targeted Random Adversarial Prompt method.
5.1 Token Probability Patterns
If API access allows, analyze:
- Top-k token distributions for standard prompts
- Perplexity scores on reference texts
- Logit bias responses
5.2 Consistency Tests
Ask the same question multiple times with slight variations:
Q1: "What is the capital of France?"
Q2: "Tell me the capital city of France."
Q3: "France's capital is which city?"
Analyze response consistency and variation patterns.
5.3 Edge Case Responses
Test with unusual inputs:
- Very long repeated characters: "aaaaaaaaaa..."
- Unicode edge cases: Zero-width characters, RTL text
- Undertrained tokens (rare Unicode, special symbols)
- Mixed language prompts
Based on "UTF: Undertrained Tokens as Fingerprints" (arXiv:2410.12318), models respond uniquely to undertrained tokens.
Stage 6: Third-Party Expert Analysis
Instruct the model to adopt an LLM expert persona and analyze the data from Stages 1-5 as if it were an anonymous test report:
- Self-consistency: Are claims internally consistent? Any contradictions?
- Knowledge boundary inference: What does the probe pattern (known/unknown) imply about the real cutoff?
- Behavioral pattern matching: Which model family best matches the observed behavioral fingerprints?
- Identity inference: Which model best matches the full evidence profile?
- Credibility score: 0-100 with justification
- Contradiction list: All detected inconsistencies
Output Format
Compile everything into a final Markdown report:
# AI Model Identity Report
## Basic Info
| Field | Claimed Value |
|-------|---------------|
| Model ID | |
| Provider | |
| Context Length | |
| Knowledge Cutoff | |
## Knowledge Boundary Results
| Probe | Status | Inference |
|-------|--------|-----------|
| 2024 Nobel Physics | pass/fail | |
| 2024 Nobel Chemistry | pass/fail | |
| 2025 Nobel Physics | pass/fail | |
| 2025 Nobel Chemistry | pass/fail | |
| 2025 Nobel Medicine | pass/fail | |
| GPT-5 release | pass/fail | |
| Claude Opus 4.6 release | pass/fail | |
| Gemini 2.5 Pro release | pass/fail | |
| Llama 4 release | pass/fail | |
## Capability Verification
- Context calculation:
- Output limit:
- Multimodal:
- Code execution:
- Reasoning mode:
- Tool calling:
## Behavioral Fingerprint Analysis
### Stylistic Patterns
- Response structure:
- Hedging language:
- Safety preambles:
- Refusal patterns:
### Discriminative Prompt Results
| Prompt | Response Pattern | Model Family Match |
|--------|------------------|-------------------|
| 5-word completion | | |
| Haiku | | |
| Recursion explanation | | |
| Simple math | | |
### Statistical Patterns
- Consistency score:
- Edge case behavior:
- Undertrained token response:
## Expert Analysis
### Self-Consistency
(analysis)
### Inferred Knowledge Cutoff
(analys