Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22 • 12
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2 • 20
ARGUS: Hallucination and Omission Evaluation in Video-LLMs Paper • 2506.07371 • Published Jun 9 • 8
Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published Feb 7 • 25
From Pixels to Prose: A Large Dataset of Dense Image Captions Paper • 2406.10328 • Published Jun 14, 2024 • 18