view article Article Provence: efficient and robust context pruning for retrieval-augmented generation Jan 28 • 22
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 94
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5 • 23
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published Oct 3 • 24
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26 • 78
view article Article Introducing Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training May 17 • 10
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 717
view article Article Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU! Apr 21, 2024 • 44
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 69
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Jul 21 • 210
InstantID: Zero-shot Identity-Preserving Generation in Seconds Paper • 2401.07519 • Published Jan 15, 2024 • 57
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper • 2311.16502 • Published Nov 27, 2023 • 37
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 15