How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20, 2024 • 15
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs Paper • 2510.11288 • Published 27 days ago • 46