OWASP Top 10 for LLM Applications Security Audit
This skill enables AI agents to perform a comprehensive security assessment of Large Language Model (LLM) and Generative AI applications using the OWASP Top 10 for LLM Applications 2025, published by the OWASP GenAI Security Project.
The OWASP Top 10 for LLM Applications identifies the most critical security risks in systems that integrate large language models, covering vulnerabilities from prompt injection to unbounded resource consumption. This is the authoritative industry standard for LLM application security.
Use this skill to identify security vulnerabilities, assess risk exposure, prioritize remediation, and establish secure development practices for AI-powered applications.
Combine with "NIST AI RMF" for comprehensive risk management or "ISO 42001 AI Governance" for governance compliance.
When to Use This Skill
Invoke this skill when:
- Auditing security of LLM-powered applications before deployment
- Reviewing GenAI integrations for security vulnerabilities
- Assessing RAG (Retrieval-Augmented Generation) systems
- Evaluating chatbot or AI assistant security
- Conducting penetration testing of AI features
- Building secure AI application architectures
- Reviewing third-party AI API integrations
- Preparing for security compliance reviews
- Responding to AI-related security incidents
Inputs Required
When executing this audit, gather:
- application_description: Description of the AI application (purpose, LLM used, architecture, features, user base) [REQUIRED]
- architecture_details: System architecture (APIs, databases, vector stores, plugins, integrations) [OPTIONAL but recommended]
- llm_provider: LLM provider and model (OpenAI GPT-4, Anthropic Claude, self-hosted, etc.) [OPTIONAL]
- deployment_context: Deployment environment (cloud, on-premise, hybrid, edge) [OPTIONAL]
- data_sensitivity: Types of data processed (PII, financial, health, proprietary) [OPTIONAL]
- existing_controls: Current security measures (auth, rate limiting, content filtering) [OPTIONAL]
- specific_concerns: Known vulnerabilities or areas of focus [OPTIONAL]
The OWASP Top 10 for LLM Applications (2025)
LLM01: Prompt Injection
Severity: Critical
Description: Attackers manipulate LLM operations through crafted inputs, either directly or indirectly, to bypass intended functionality, access unauthorized data, or trigger unintended actions.
Attack Vectors:
- Direct injection: Malicious user prompts containing override commands
- Indirect injection: Hidden instructions in external content (web pages, documents, emails) processed by the LLM
- Jailbreaks: Techniques to bypass safety constraints and content policies
Impact:
- Unauthorized data access and exfiltration
- Bypass of content safety filters
- Manipulation of downstream system actions
- Social engineering of users through manipulated outputs
Assessment Checklist:
- Input sanitization and validation implemented
- System prompts separated from user inputs with clear delimiters
- Least privilege applied to LLM backend access
- Output validation before downstream actions
- Human-in-the-loop for critical operations
- Adversarial testing conducted with known injection techniques
- Content filtering layers applied pre- and post-LLM
Mitigation Strategies:
- Enforce privilege controls on LLM backend access
- Segregate external content from user prompts
- Maintain human oversight for critical functions
- Implement input/output validation pipelines
- Conduct regular adversarial testing
LLM02: Sensitive Information Disclosure
Severity: Critical
Description: LLMs inadvertently expose confidential data including PII, proprietary algorithms, credentials, intellectual property, or internal system information through their outputs.
Attack Vectors:
- Crafted prompts designed to extract training data
- Legitimate queries that trigger memorized sensitive content
- Model outputs revealing internal system architecture
- Embedding leakage from vector databases
Impact:
- Privacy violations and regulatory non-compliance (GDPR, CCPA)
- Intellectual property theft
- Credential exposure enabling further attacks
- Reputational damage
Assessment Checklist:
- PII and sensitive data removed from training/fine-tuning data
- Data masking and tokenization in logs and outputs
- System instructions forbidding sensitive disclosures
- Output filtering for known sensitive patterns (SSN, credit cards, API keys)
- Model access restricted to necessary information via middleware
- User education against pasting confidential content
- Output monitoring for anomalous data exposure
Mitigation Strategies:
- Sanitize training data to remove sensitive information
- Implement data loss prevention (DLP) on outputs
- Apply access controls limiting model's data reach
- Monitor outputs for sensitive data patterns
- Use differential privacy techniques in training
LLM03: Supply Chain Vulnerabilities
Severity: High
Description: Compromised third-party components (models, datasets, libraries, plugins) introduce security risks including malware, backdoors, or biased behavior.
Attack Vectors:
- Malicious pre-trained models from public repositories
- Poisoned datasets with embedded triggers
- Vulnerable ML libraries and dependencies
- Compromised plugins with unauthorized access
- Trojanized fine-tuning adapters
Impact:
- System compromise and data theft
- Backdoor access to production systems
- Model corruption affecting all users
- Legal liability from unlicensed content
Assessment Checklist:
- Models sourced from verified, reputable providers
- Digital signatures and checksums verified
- Model files scanned for suspicious code (picklescan, etc.)
- Third-party models deployed in sandboxed environments
- Dependencies regularly updated and audited
- Plugin permissions restricted with allowlists
- Complete inventory of all models and components maintained
- SBOM (Software Bill of Materials) maintained for AI components
Mitigation Strategies:
- Source models only from trusted, verified providers
- Scan model files for malicious code before deployment
- Sandbox third-party models with restricted permissions
- Maintain updated dependency inventory
- Implement model signing and integrity verification
LLM04: Data and Model Poisoning
Severity: High
Description: Attackers manipulate training or fine-tuning data to introduce vulnerabilities, backdoors, or biases that compromise model security and reliability.
Attack Vectors:
- Crafted training examples with hidden trigger phrases
- Poisoned web-scraped content absorbed during training
- Direct tampering with model weights or parameters
- Malicious fine-tuning data
- Subtle label manipulation or data anomalies
Impact:
- Biased or degraded model outputs
- Trigger-activated backdoors in production
- Erosion of model trustworthiness
- Long-term hidden threats difficult to detect
Assessment Checklist:
- Training data validated, cleaned, and audited
- Data provenance tracked and documented
- Rate limiting and moderation for crowdsourced data
- Differential privacy techniques applied
- Models tested with known trigger phrases before deployment
- Deployed models monitored for behavioral drift
- Model file checksums verified against known-good states
Mitigation Strategies:
- Validate and clean all training data sources
- Implement data provenance tracking
- Apply differential privacy to limit individual data influence
- Test with adversarial inputs before deployment
- Monitor production models for unexpected behavior
LLM05: Improper Output Handling
Severity: High
Description: Applications blindly execute or render LLM o