view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix 23 days ago • 45
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16 • 33
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7 • 382
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published Apr 30 • 59
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 7 items • Updated Jul 11 • 349
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10 • 210
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published Apr 28 • 39
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22, 2024 • 104
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 36
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 232
mistralai_hackathon Collection Synthetic datasets and fine-tuned Mistral models used in MistralAI Hackathon • 21 items • Updated Feb 4 • 4