Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 426
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era Jan 15 • 48
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Paper • 2411.14343 • Published Nov 21, 2024 • 7
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware Mar 20, 2024 • 32
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 26
MGM-Data Collection Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7