Proj. CLJ Paper Reading: Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Abstract

  • Tool: In-Context Learning
    • Tool1: In-Context Attack
    • Tool2: In-Context Defense
  • Task: modulate the alignment of LLMs, especially Safety alignment

1. intro

P1: 什么是jailbreak

P2: Categories of jailbreak

  • Optimization based: 效率瓶颈
  • Template based: 缺乏可拓展性和灵活性
    • 本文,利用in-context demonstration, 灵活可拓展
posted @ 2025-03-27 22:57  雪溯  阅读(25)  评论(0)    收藏  举报