Agentic Design

Patterns

Prompt Injection

Techniques to manipulate AI responses through malicious prompts

8
Techniques
2
low
low Complexity
5
medium
medium Complexity
1
high
high Complexity

Available Techniques

🎯

Basic Prompt Injection

(BPI)
low

Fundamental techniques to inject malicious instructions into AI prompts to bypass intended behavior.

Key Features

  • Simple instruction override
  • Context manipulation
  • Role confusion attacks

Primary Defenses

  • Input sanitization and validation
  • Prompt template isolation
  • Context boundaries enforcement

Key Risks

Unauthorized information disclosureSystem behavior manipulationSecurity control bypassReputation damage
🔄

Indirect Prompt Injection

(IPI)
medium

Advanced technique where malicious instructions are embedded in external content that the AI processes.

Key Features

  • Hidden instruction embedding
  • Content-based manipulation
  • Cross-context attacks

Primary Defenses

  • Content preprocessing and sanitization
  • Source validation and verification
  • Context isolation mechanisms

Key Risks

Cross-system contaminationData exfiltrationPersistent injection attacksSupply chain vulnerabilities
🎯

Many-Shot Jailbreaking

(MSJ)
high

Advanced technique using large number of harmful question-answer pairs to gradually shift model behavior through in-context learning.

Key Features

  • In-context learning exploitation
  • Gradual behavior modification
  • 128+ shot examples

Primary Defenses

  • Context window limitations
  • Few-shot example filtering
  • Constitutional AI training

Key Risks

Complete safety bypass with sufficient examplesScalable attack methodologyDifficult to detect without context analysisHigh success rates across model families
🔄

Indirect Prompt Injection via External Content

(IPI)
medium

Embedding malicious instructions in external content that AI systems process, causing unintended behaviors when the content is ingested.

Key Features

  • Hidden instruction embedding
  • Cross-system contamination
  • Persistent attack vectors

Primary Defenses

  • Content preprocessing and sanitization
  • Instruction filtering from external sources
  • Context isolation between user and external content

Key Risks

Data exfiltration through external contentCross-system attack propagationPersistent contamination of AI workflowsSupply chain attack vectors
📋

Copy-Paste Injection Attack

(CPI)
medium

Embedding hidden malicious prompts in copyable text that execute when pasted into AI systems, exploiting user trust in copied content.

Key Features

  • Hidden instruction embedding
  • Clipboard exploitation
  • User behavior manipulation

Primary Defenses

  • Unicode normalization and filtering
  • Character set validation
  • Hidden content detection

Key Risks

Silent execution of malicious instructionsUser trust exploitationWidespread distribution through sharingDifficult detection by users
🔓

System Prompt Leakage Attacks

(SPL)
low

Techniques to extract hidden system prompts, instructions, and configuration details from AI systems.

Key Features

  • System instruction extraction
  • Configuration revelation
  • Hidden prompt discovery

Primary Defenses

  • System prompt isolation techniques
  • Instruction filtering and detection
  • Response content filtering

Key Risks

Exposure of business logic and strategiesAPI key and credential theftIntellectual property leakageCompetitive advantage loss
🎭

Policy Puppetry Configuration Attack

(PPA)
medium

Formatting prompts as configuration files (XML, JSON, INI) to bypass content policies by disguising harmful requests as system configurations.

Key Features

  • Configuration file mimicry
  • Policy circumvention
  • Format-based deception

Primary Defenses

  • Configuration format detection and blocking
  • Structured input validation
  • Content-agnostic policy enforcement

Key Risks

Universal policy bypass techniqueHigh success rate across platformsEasy to automate and scaleDifficult to detect without format analysis
🎨

ASCII Art Injection Attack

(AAI)
medium

Using ASCII art and visual text manipulation to bypass AI content filters that may not properly parse visual or artistic text representations.

Key Features

  • Visual obfuscation techniques
  • ASCII art exploitation
  • Character pattern manipulation

Primary Defenses

  • ASCII art pattern recognition
  • Character sequence normalization
  • Visual text parsing and analysis

Key Risks

Bypass of keyword-based filteringVisual obfuscation of harmful contentScaling through automated art generationDifficulty in pattern-based detection

Ethical Guidelines for Prompt Injection

When working with prompt injection techniques, always follow these ethical guidelines:

  • • Only test on systems you own or have explicit written permission to test
  • • Focus on building better defenses, not conducting attacks
  • • Follow responsible disclosure practices for any vulnerabilities found
  • • Document and report findings to improve security for everyone
  • • Consider the potential impact on users and society
  • • Ensure compliance with all applicable laws and regulations

AI Red Teaming

closed

Loading...