All posts

DAN Prompt Jailbreak History: From Reddit Post to Research Case Study

The complete dan prompt jailbreak history — how 'Do Anything Now' went from a December 2022 r/ChatGPT experiment through twelve-plus iterations and became
June 12, 2026
The Crescendo Class: Multi-Turn Jailbreaks and Why They're Hard to Catch

Single-turn defenses miss the jailbreak class where no individual message is harmful. How crescendo and multi-turn escalation work as a category, why
May 22, 2026
How Jailbreak Benchmarks Measure Success (ASR Explained)

Jailbreak benchmarks measure success via attack success rate, but the behavior set, attacker, and judge decide the number.
May 22, 2026
Many-Shot vs. Single-Shot Jailbreaks: Long-Context Risks

Single-shot jailbreaks compress the entire attack into one prompt; many-shot jailbreaks exploit the model's in-context learning.
May 10, 2026
Responsible Disclosure Norms for LLM Jailbreaks

Software vulnerability disclosure has 30 years of evolved norms. LLM jailbreak disclosure is 4 years old and still contested.
May 5, 2026
Encoding and Obfuscation Jailbreaks: The Filter-Model Gap

Content filters typically operate on decoded, normalized text. LLMs process tokens, not text. The gap between these two layers is an attack surface that
May 4, 2026
The Jailbreak Detection Evasion Arms Race: How Attackers Adapt

Safety classifiers get deployed; attackers find variants that evade them. This cycle is predictable. Understanding the mechanics of classifier evasion
May 4, 2026
Roleplay and Persona Jailbreaks: Why They Mostly Don't Work Now

DAN, AIM, STAN, and dozens of variants. Persona-based jailbreaks were the dominant technique from 2022-2023. Understanding why they worked — and why
May 3, 2026
Universal Adversarial Suffixes: The GCG Attack and Transfer Since

Greedy Coordinate Gradient produces adversarial suffixes that transfer across models. Two years after the original paper, where does this technique stand
May 3, 2026
LLM Jailbreak Taxonomy 2026: How the Techniques Cluster

Six years of jailbreak research has produced a messy literature. This taxonomy organizes working techniques by the behavioral property they exploit —
May 2, 2026
Many-Shot Jailbreaking: How Long Context Created a New Attack

The same architectural decision that makes LLMs better at long-context tasks — extended context windows — enabled a new class of jailbreak.
May 2, 2026