Proj CJI Paper Reading: Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Abstract

  • Background: adversarial images/prompts can jailbreak Multimodal large language model and cause unaligned behaviors
  • 本文报告了在multi-agent + MLLM环境下的严重安全隐患: infectious jailbreak
posted @ 2025-02-04 19:02  雪溯  阅读(22)  评论(0)    收藏  举报