Proj CJI Paper Reading: Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Abstract
- Background: adversarial images/prompts can jailbreak Multimodal large language model and cause unaligned behaviors
- 本文报告了在multi-agent + MLLM环境下的严重安全隐患: infectious jailbreak

浙公网安备 33010602011771号