OpenAI’s latest open-weight models, GPT-OSS-120b and GPT-OSS-20b, were reportedly jailbroken within hours of their release on August 7, 2025, by AI jailbreaker Pliny the Liberator, despite OpenAI’s robust safety claims.
The models were touted as fast, efficient, and highly resistant to jailbreaks, having undergone “worst-case fine-tuning” in biological and cyber domains. OpenAI’s Safety Advisory Group reviewed the testing and concluded that the models did not reach high-risk thresholds. The company claimed the models performed at parity with their o4-mini model on jailbreak resistance benchmarks like StrongReject.
However, Pliny the Liberator announced on X, “OPENAI: PWNED GPT-OSS: LIBERATED,” sharing screenshots that showed the models generating instructions for illicit activities, including making methamphetamine, Molotov cocktails, VX nerve agent, and malware. Pliny commented, “Took some tweakin!” regarding his successful breach.
The jailbreak occurred as OpenAI is preparing to release its highly anticipated GPT-5 and had launched a $500,000 red teaming challenge to uncover novel risks. Pliny’s public disclosure likely disqualifies him from this initiative.
Pliny’s jailbreaking technique involved a multi-stage prompt that initially appeared to be a refusal, then incorporated a divider with his signature “LOVE PLINY” markers, and subsequently shifted into generating unrestricted content using leetspeak to evade detection. This approach mirrored the methods he successfully employed against previous OpenAI models.
This incident marks another rapid jailbreak by Pliny, who has consistently bypassed major OpenAI releases within hours or days of their launch. His GitHub repository, L1B3RT4S, hosts a library of jailbreak prompts and has garnered over 10,000 stars, remaining a significant resource for the AI jailbreaking community.




