Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 544
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 97
Tools 4 learning AI Collection This is a collection of tools on the hub that teachers and students can use to learn AI! • 10 items • Updated Jun 26 • 67
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated Jul 10 • 150
BhasaAnuvaad Collection A Speech Translation Dataset for 13 Indian Languages • 11 items • Updated Jan 16 • 21
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 294
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
🎯DART-Math Collection Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving [NeurIPS 2024] @ https://github.com/hkust-nlp/dart-math • 20 items • Updated Feb 19 • 7
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 641
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 298