Jailbreak attacks pose a serious threat to large language models (LLMs) by bypassing builtin safety …
Read more
Jailbreak attacks pose a serious threat to large
language models (LLMs) by bypassing builtin safety mechanisms and leading to harmful outputs. Studying these attacks is crucial
for identifying vulnerabilities and improving
model security. This paper presents a systematic survey of jailbreak methods from the
novel perspective of stealth.
Read less