雪溯 - 博客园

2025年2月2日

Proj CJI Paper Reading: Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

摘要： Abstract Background: Competitors: GCG with gradient-based search to generate adversarial suffixes in order to jailbreak LLM GCG的缺点：计算效率地下，没有对可转移性还有可拓展阅读全文

posted @ 2025-02-02 00:49 雪溯阅读(26) 评论(0) 推荐(0)

2025年1月15日

Proj CJI Paper Reading: AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs

摘要： Abstract Background: 目前的jailbreak mutator方式更集中在语义level，更容易被防御措施检查到本文: AdaPPA (Adaptive Position Pre-Filled Jailbreak Attack) Task: adaptive position 阅读全文

posted @ 2025-01-15 23:13 雪溯阅读(50) 评论(0) 推荐(0)

Proj CJI Paper Reading: A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

摘要： Abstract background: 本文认为现有的jailbreaking方法要么需要人力，要么需要大模型，本文不需要本文: ReNELLM Task: Jailbreaking LLM blackbox Method: Prompt Rewriting, Scenario Nesting，阅读全文

posted @ 2025-01-15 23:12 雪溯阅读(90) 评论(0) 推荐(0)

2025年1月13日

Proj CJI Paper Reading: A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses

摘要： Abstract 本文: Tasks: Decomposition Attacks: get information leakage of LLM Method: 利用LLM(称为ADVLLM)+Few shots example把一个恶意的问题分成许多小的问题，发送给Victim LLMs，再使用阅读全文

posted @ 2025-01-13 23:52 雪溯阅读(19) 评论(0) 推荐(0)

2025年1月12日

Proj CJI Paper Reading: "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

摘要： Abstract Github: https://github.com/verazuo/jailbreak_llms Method: 从多个数据源中总结jailbreaking prompts和模式，直接攻击，但侧重总结 Tasks: Tool: JAILBREAKHUB Task: jailbre 阅读全文

posted @ 2025-01-12 00:08 雪溯阅读(101) 评论(0) 推荐(0)

2024年12月30日

Proj CJI Paper Reading: OffsetBias: Leveraging Debiased Data for Tuning Evaluators

摘要：目的： reduce bias of LLMs(length, concreteness, empty reference, content continuation, nested instruction, familiar knowledge) Tool: OffsetBias： pairwis 阅读全文

posted @ 2024-12-30 18:58 雪溯阅读(35) 评论(0) 推荐(0)

2024年12月25日

Proj. CLJ Paper Reading: Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

摘要： Abstract 本文： Speculative RAG Task: improving retrieval results by combining RAG with LLMs refinement Method: 利用large Generalist LM大点的通用模型来验证RAG drafts 阅读全文

posted @ 2024-12-25 01:54 雪溯阅读(77) 评论(0) 推荐(0)

2024年12月21日

Proj. CLJ Paper Reading: A Survey on LLM-as-a-Judge

摘要： Abstract good words: subjectivity, variability, scale Task: Survey of LLM-as-a-Judge, benchmark & evaluation of LLM-as-a-Judge systems Core question: 阅读全文

posted @ 2024-12-21 00:46 雪溯阅读(151) 评论(0) 推荐(0)

2024年12月13日

Proj. CLJ Paper Reading: Are you still on track!? Catching LLM Task Drift with Activations

摘要： Abstract Task: Defense LLM from prompt injection attacks Tool: TaskTracker Methods: use activation deltas( the difference in activations before and af 阅读全文

posted @ 2024-12-13 15:58 雪溯阅读(70) 评论(0) 推荐(0)

2024年12月10日

Paper Reading: JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

摘要： Abstract Github: https://github.com/JailbreakBench/jailbreakbench https://jailbreakbench.github.io/ Task: Opensource benchmark an evolving repository 阅读全文

posted @ 2024-12-10 22:42 雪溯阅读(135) 评论(0) 推荐(0)

雪溯

总之心情不好的话大概就会来这边做两道OJ，此处顺便储存部分笔记

公告