MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 50
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 29
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published Dec 18, 2024 • 18
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models Paper • 2410.09342 • Published Oct 12, 2024 • 39