Loading...
Prompt Injection
Techniques to manipulate AI responses through malicious prompts
Available Techniques
Basic Prompt Injection
(BPI)Fundamental techniques to inject malicious instructions into AI prompts to bypass intended behavior.
Key Features
- •Simple instruction override
- •Context manipulation
- •Role confusion attacks
Primary Defenses
- •Input sanitization and validation
- •Prompt template isolation
- •Context boundaries enforcement
Key Risks
Indirect Prompt Injection
(IPI)Advanced technique where malicious instructions are embedded in external content that the AI processes.
Key Features
- •Hidden instruction embedding
- •Content-based manipulation
- •Cross-context attacks
Primary Defenses
- •Content preprocessing and sanitization
- •Source validation and verification
- •Context isolation mechanisms
Key Risks
Many-Shot Jailbreaking
(MSJ)Advanced technique using large number of harmful question-answer pairs to gradually shift model behavior through in-context learning.
Key Features
- •In-context learning exploitation
- •Gradual behavior modification
- •128+ shot examples
Primary Defenses
- •Context window limitations
- •Few-shot example filtering
- •Constitutional AI training
Key Risks
Indirect Prompt Injection via External Content
(IPI)Embedding malicious instructions in external content that AI systems process, causing unintended behaviors when the content is ingested.
Key Features
- •Hidden instruction embedding
- •Cross-system contamination
- •Persistent attack vectors
Primary Defenses
- •Content preprocessing and sanitization
- •Instruction filtering from external sources
- •Context isolation between user and external content
Key Risks
Copy-Paste Injection Attack
(CPI)Embedding hidden malicious prompts in copyable text that execute when pasted into AI systems, exploiting user trust in copied content.
Key Features
- •Hidden instruction embedding
- •Clipboard exploitation
- •User behavior manipulation
Primary Defenses
- •Unicode normalization and filtering
- •Character set validation
- •Hidden content detection
Key Risks
System Prompt Leakage Attacks
(SPL)Techniques to extract hidden system prompts, instructions, and configuration details from AI systems.
Key Features
- •System instruction extraction
- •Configuration revelation
- •Hidden prompt discovery
Primary Defenses
- •System prompt isolation techniques
- •Instruction filtering and detection
- •Response content filtering
Key Risks
Policy Puppetry Configuration Attack
(PPA)Formatting prompts as configuration files (XML, JSON, INI) to bypass content policies by disguising harmful requests as system configurations.
Key Features
- •Configuration file mimicry
- •Policy circumvention
- •Format-based deception
Primary Defenses
- •Configuration format detection and blocking
- •Structured input validation
- •Content-agnostic policy enforcement
Key Risks
ASCII Art Injection Attack
(AAI)Using ASCII art and visual text manipulation to bypass AI content filters that may not properly parse visual or artistic text representations.
Key Features
- •Visual obfuscation techniques
- •ASCII art exploitation
- •Character pattern manipulation
Primary Defenses
- •ASCII art pattern recognition
- •Character sequence normalization
- •Visual text parsing and analysis
Key Risks
Ethical Guidelines for Prompt Injection
When working with prompt injection techniques, always follow these ethical guidelines:
- • Only test on systems you own or have explicit written permission to test
- • Focus on building better defenses, not conducting attacks
- • Follow responsible disclosure practices for any vulnerabilities found
- • Document and report findings to improve security for everyone
- • Consider the potential impact on users and society
- • Ensure compliance with all applicable laws and regulations
Prompt Injection
Techniques to manipulate AI responses through malicious prompts
Available Techniques
Basic Prompt Injection
(BPI)Fundamental techniques to inject malicious instructions into AI prompts to bypass intended behavior.
Key Features
- •Simple instruction override
- •Context manipulation
- •Role confusion attacks
Primary Defenses
- •Input sanitization and validation
- •Prompt template isolation
- •Context boundaries enforcement
Key Risks
Indirect Prompt Injection
(IPI)Advanced technique where malicious instructions are embedded in external content that the AI processes.
Key Features
- •Hidden instruction embedding
- •Content-based manipulation
- •Cross-context attacks
Primary Defenses
- •Content preprocessing and sanitization
- •Source validation and verification
- •Context isolation mechanisms
Key Risks
Many-Shot Jailbreaking
(MSJ)Advanced technique using large number of harmful question-answer pairs to gradually shift model behavior through in-context learning.
Key Features
- •In-context learning exploitation
- •Gradual behavior modification
- •128+ shot examples
Primary Defenses
- •Context window limitations
- •Few-shot example filtering
- •Constitutional AI training
Key Risks
Indirect Prompt Injection via External Content
(IPI)Embedding malicious instructions in external content that AI systems process, causing unintended behaviors when the content is ingested.
Key Features
- •Hidden instruction embedding
- •Cross-system contamination
- •Persistent attack vectors
Primary Defenses
- •Content preprocessing and sanitization
- •Instruction filtering from external sources
- •Context isolation between user and external content
Key Risks
Copy-Paste Injection Attack
(CPI)Embedding hidden malicious prompts in copyable text that execute when pasted into AI systems, exploiting user trust in copied content.
Key Features
- •Hidden instruction embedding
- •Clipboard exploitation
- •User behavior manipulation
Primary Defenses
- •Unicode normalization and filtering
- •Character set validation
- •Hidden content detection
Key Risks
System Prompt Leakage Attacks
(SPL)Techniques to extract hidden system prompts, instructions, and configuration details from AI systems.
Key Features
- •System instruction extraction
- •Configuration revelation
- •Hidden prompt discovery
Primary Defenses
- •System prompt isolation techniques
- •Instruction filtering and detection
- •Response content filtering
Key Risks
Policy Puppetry Configuration Attack
(PPA)Formatting prompts as configuration files (XML, JSON, INI) to bypass content policies by disguising harmful requests as system configurations.
Key Features
- •Configuration file mimicry
- •Policy circumvention
- •Format-based deception
Primary Defenses
- •Configuration format detection and blocking
- •Structured input validation
- •Content-agnostic policy enforcement
Key Risks
ASCII Art Injection Attack
(AAI)Using ASCII art and visual text manipulation to bypass AI content filters that may not properly parse visual or artistic text representations.
Key Features
- •Visual obfuscation techniques
- •ASCII art exploitation
- •Character pattern manipulation
Primary Defenses
- •ASCII art pattern recognition
- •Character sequence normalization
- •Visual text parsing and analysis
Key Risks
Ethical Guidelines for Prompt Injection
When working with prompt injection techniques, always follow these ethical guidelines:
- • Only test on systems you own or have explicit written permission to test
- • Focus on building better defenses, not conducting attacks
- • Follow responsible disclosure practices for any vulnerabilities found
- • Document and report findings to improve security for everyone
- • Consider the potential impact on users and society
- • Ensure compliance with all applicable laws and regulations