Proj. CLJ Paper Reading: Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Abstract
- Tool: In-Context Learning
- Tool1: In-Context Attack
- Tool2: In-Context Defense
- Task: modulate the alignment of LLMs, especially Safety alignment
1. intro
P1: 什么是jailbreak
P2: Categories of jailbreak
- Optimization based: 效率瓶颈
- Template based: 缺乏可拓展性和灵活性
- 本文,利用in-context demonstration, 灵活可拓展

浙公网安备 33010602011771号