Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2 • 12 • 2
TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection Paper • 2503.24115 • Published Mar 31 • 11 • 2
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation Paper • 2502.08168 • Published Feb 12 • 12 • 4