projectpersonal · open source

Zephyron

Open-source adversarial red-teaming framework for LLMs — 34 attack techniques.

VAPTAI Engineeringcross-domain

An open-source Python framework that red-teams LLMs with 34 adversarial attack techniques, translating recent research (Nature Communications 2026, ICLR 2025, NeurIPS 2025) into modular attack, judge, and mutator components.

Evaluation is multi-stage: deterministic refusal detection, LLM-as-judge classification, 10-assertion weighted signal voting, and hard-override rules — 0% false positives on a 24-case golden validation set. An adaptive Thompson-sampling orchestrator allocates scan budget across attack categories by observed bypass rate. Ships with a live dashboard, SARIF/HTML reports, and compliance mapping to OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, and the EU AI Act.

role: Author
status: personal · open source
impact: 0% false positives on the golden set; research turned into working attacks.
stack: PythonLLM-as-judgeThompson samplingSARIF

// skills

AI SecurityLLM Red Teaming

github

see this in the graph