Lost in Embeddings: Information Loss in Vision-Language Models Paper β’ 2509.11986 β’ Published Sep 15 β’ 27
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation Paper β’ 2506.01565 β’ Published Jun 2 β’ 3
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper β’ 2406.11030 β’ Published Jun 16, 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper β’ 2406.02265 β’ Published Jun 4, 2024 β’ 7
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish Paper β’ 2402.09759 β’ Published Feb 15, 2024 β’ 1