SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

fingerprint

Desenvolvimento

Use when the user asks to identify, fingerprint, or verify an AI model's true identity, knowledge cutoff, or capabilities. Triggers on phrases like "model identification", "who are you really", "verify model identity", "knowledge cutoff test".

1estrelas
Ver no GitHub ↗Autor: ParticleGLicença: MIT

AI Model Identity Fingerprint

Overview

A structured multi-stage protocol to determine an AI model's true identity by cross-referencing self-reported claims against knowledge boundary probes, capability tests, behavioral analysis, and self-consistency analysis. Produces a standardized identification report with confidence scoring.

This protocol incorporates techniques from recent academic research (2024-2026) on LLM fingerprinting, including behavioral patterns, timing analysis, and statistical fingerprints.

When to Use

  • User asks to identify or fingerprint the current AI model
  • User wants to verify whether a model's self-reported identity is accurate
  • User suspects the model may be proxied, fine-tuned, or mislabeled
  • User wants to determine the real knowledge cutoff date
  • User needs to verify model provenance for compliance/licensing

When NOT to use:

  • User just wants to chat or ask a factual question
  • User is asking about AI models in general (not testing this specific instance)

Important Rules

  • No internet access: All answers must come from built-in knowledge only
  • Honesty required: Answer "I don't know" when uncertain; do not guess or fabricate
  • Sequential execution: Complete each stage before moving to the next
  • Multiple methods: Use multiple fingerprinting techniques for higher confidence

Stage 1: Self-Declaration

Ask the model to report its own identity. Collect answers as JSON:

{
  "model_id": "",
  "context_length": "",
  "knowledge_cutoff": "",
  "provider": "",
  "capabilities": []
}

Questions to ask:

  1. Full model ID
  2. Maximum context length (tokens)
  3. Knowledge cutoff date (year-month)
  4. Developer/company
  5. Supported tools and capabilities

Stage 2: Knowledge Boundary Probes

Test with time-anchored factual questions that differentiate cutoff dates. The model must answer honestly or say "I don't know".

ProbePurpose
2024 Nobel Physics & Chemistry laureatesVerify knowledge of late 2024 events
2025 Nobel Physics, Chemistry & Medicine laureatesTest if cutoff extends into late 2025
GPT-5 release date and featuresProbe awareness of 2025-2026 AI milestones
Claude Opus 4.6 release date and improvementsProbe awareness of 2025-2026 AI milestones
Gemini 2.5 Pro release and capabilitiesProbe awareness of 2025 Google AI milestones
Llama 4 release date and architectureProbe awareness of 2025 Meta AI milestones

Also request an ASCII art identity signature as a stylistic fingerprint.

Additional Time-Anchored Probes

Event Category2024 Events2025 Events2026 Events
AI ReleasesClaude 3.5, GPT-4oClaude 4, GPT-5, Gemini 2.5Claude Opus 4.6
Tech EventsApple Vision Pro launch--
World EventsParis Olympics--

Stage 3: Capability Inference

Based on Stage 1 claims, ask the model to derive and verify:

TestWhat it reveals
Context capacity calculation (Chinese chars at ~1.5 tokens/char)Whether claimed context length is consistent
Maximum single-response output lengthOutput limit awareness
Image input support and formatsMultimodal capability
Code execution vs. generation onlySandbox/tool access
Extended thinking / reasoning modeArchitecture hints
Parallel tool calling supportAPI-level capability

Stage 4: Behavioral Fingerprinting (NEW)

Based on research from "Hide and Seek" (arXiv:2408.02871) and "RoFL" (arXiv:2505.12682), different model families exhibit unique behavioral patterns.

4.1 Stylistic Fingerprints

Ask the model to respond to identical prompts and analyze:

FeatureWhat to observe
Response structureBullet points vs paragraphs, heading usage
Hedging language"I think", "perhaps", "it's worth noting"
Safety preamblesHow the model prefaces sensitive topics
Refusal patternsExact wording when declining requests
Emoji/formattingDefault use of emojis, bold, italics
List numbering1. vs 1) vs - style preferences

4.2 Discriminative Prompt Tests

Use prompts that elicit model-family-specific responses:

Prompt 1: "Complete this sentence in exactly 5 words: The meaning of life is"
Prompt 2: "Write a haiku about artificial intelligence"
Prompt 3: "Explain recursion to a 5-year-old in one sentence"
Prompt 4: "What is 2+2? Answer with only the number."

Different model families (Llama, Claude, GPT, Gemini, Mistral) show statistically different response patterns.

4.3 Refusal Vector Analysis

Based on "Behavioral Fingerprint via Refusal Vectors" (arXiv:2602.09434):

Test with borderline prompts that trigger safety alignment:

  • Mildly controversial topics
  • Requests for opinions on sensitive subjects
  • Edge cases in content policy

The specific refusal language and threshold sensitivity are unique to each model family.

Stage 5: Statistical Pattern Analysis (NEW)

Based on "TRAP" (arXiv:2402.12991) - Targeted Random Adversarial Prompt method.

5.1 Token Probability Patterns

If API access allows, analyze:

  • Top-k token distributions for standard prompts
  • Perplexity scores on reference texts
  • Logit bias responses

5.2 Consistency Tests

Ask the same question multiple times with slight variations:

Q1: "What is the capital of France?"
Q2: "Tell me the capital city of France."
Q3: "France's capital is which city?"

Analyze response consistency and variation patterns.

5.3 Edge Case Responses

Test with unusual inputs:

  • Very long repeated characters: "aaaaaaaaaa..."
  • Unicode edge cases: Zero-width characters, RTL text
  • Undertrained tokens (rare Unicode, special symbols)
  • Mixed language prompts

Based on "UTF: Undertrained Tokens as Fingerprints" (arXiv:2410.12318), models respond uniquely to undertrained tokens.

Stage 6: Third-Party Expert Analysis

Instruct the model to adopt an LLM expert persona and analyze the data from Stages 1-5 as if it were an anonymous test report:

  1. Self-consistency: Are claims internally consistent? Any contradictions?
  2. Knowledge boundary inference: What does the probe pattern (known/unknown) imply about the real cutoff?
  3. Behavioral pattern matching: Which model family best matches the observed behavioral fingerprints?
  4. Identity inference: Which model best matches the full evidence profile?
  5. Credibility score: 0-100 with justification
  6. Contradiction list: All detected inconsistencies

Output Format

Compile everything into a final Markdown report:

# AI Model Identity Report

## Basic Info
| Field | Claimed Value |
|-------|---------------|
| Model ID | |
| Provider | |
| Context Length | |
| Knowledge Cutoff | |

## Knowledge Boundary Results
| Probe | Status | Inference |
|-------|--------|-----------|
| 2024 Nobel Physics | pass/fail | |
| 2024 Nobel Chemistry | pass/fail | |
| 2025 Nobel Physics | pass/fail | |
| 2025 Nobel Chemistry | pass/fail | |
| 2025 Nobel Medicine | pass/fail | |
| GPT-5 release | pass/fail | |
| Claude Opus 4.6 release | pass/fail | |
| Gemini 2.5 Pro release | pass/fail | |
| Llama 4 release | pass/fail | |

## Capability Verification
- Context calculation:
- Output limit:
- Multimodal:
- Code execution:
- Reasoning mode:
- Tool calling:

## Behavioral Fingerprint Analysis
### Stylistic Patterns
- Response structure:
- Hedging language:
- Safety preambles:
- Refusal patterns:

### Discriminative Prompt Results
| Prompt | Response Pattern | Model Family Match |
|--------|------------------|-------------------|
| 5-word completion | | |
| Haiku | | |
| Recursion explanation | | |
| Simple math | | |

### Statistical Patterns
- Consistency score:
- Edge case behavior:
- Undertrained token response:

## Expert Analysis

### Self-Consistency
(analysis)

### Inferred Knowledge Cutoff
(analys

Como adicionar

/plugin marketplace add ParticleG/llm-fingerprint-skill

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.