Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #llm-security 9
- #jailbreak 6
- #in-context-learning 2
- #red-team 2
- #taxonomy 2
- #adaptive-attacks 1
- #adversarial-attacks 1
- #arms-race 1
- #base64 1
- #benchmark 1
- #chatgpt 1
- #classifier-evasion 1
- #content-filter 1
- #context-window 1
- #crescendo 1
- #cvd 1
- #dan 1
- #dan-prompt 1
- #defense 1
- #detection-evasion 1
- #encoding-attacks 1
- #evaluation 1
- #gcg 1
- #gradient-attacks 1
- #harmbench 1
- #jailbreak-history 1
- #jailbreakbench 1
- #jailbreaking 1
- #long-context 1
- #many-shot 1
- #many-shot-jailbreaking 1
- #multi-turn 1
- #obfuscation 1
- #openai 1
- #persona 1
- #prompt-injection 1
- #red-teaming 1
- #research 1
- #research-ethics 1
- #responsible-disclosure 1
- #rlhf 1
- #roleplay-jailbreak 1
- #strongreject 1
- #transferability 1
- #unicode 1
- #universal-suffix 1
- #vulnerability-disclosure 1
Categories
technique 5 posts
- The Crescendo Class: Multi-Turn Jailbreaks and Why They're Hard to CatchSingle-turn defenses miss the jailbreak class where no individual message is harmful. How crescendo and multi-turn escalation work as a category, why
- Encoding and Obfuscation Jailbreaks: The Filter-Model GapContent filters typically operate on decoded, normalized text. LLMs process tokens, not text. The gap between these two layers is an attack surface that
- Roleplay and Persona Jailbreaks: Why They Mostly Don't Work NowDAN, AIM, STAN, and dozens of variants. Persona-based jailbreaks were the dominant technique from 2022-2023. Understanding why they worked — and why
- Universal Adversarial Suffixes: The GCG Attack and Transfer SinceGreedy Coordinate Gradient produces adversarial suffixes that transfer across models. Two years after the original paper, where does this technique stand
- Many-Shot Jailbreaking: How Long Context Created a New AttackThe same architectural decision that makes LLMs better at long-context tasks — extended context windows — enabled a new class of jailbreak.
research 3 posts
- How Jailbreak Benchmarks Measure Success (ASR Explained)Jailbreak benchmarks measure success via attack success rate, but the behavior set, attacker, and judge decide the number.
- The Jailbreak Detection Evasion Arms Race: How Attackers AdaptSafety classifiers get deployed; attackers find variants that evade them. This cycle is predictable. Understanding the mechanics of classifier evasion
- LLM Jailbreak Taxonomy 2026: How the Techniques ClusterSix years of jailbreak research has produced a messy literature. This taxonomy organizes working techniques by the behavioral property they exploit —