Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning Paper • 2508.07101 • Published Aug 9 • 13
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Accelerating Retrieval-Augmented Language Model Serving with Speculation Paper • 2401.14021 • Published Jan 25, 2024 • 2
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Paper • 2410.05076 • Published Oct 7, 2024 • 8
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 120
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 151
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2, 2024 • 14
Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7, 2024 • 10
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17, 2024 • 16
Self-Specialization: Uncovering Latent Expertise within Large Language Models Paper • 2310.00160 • Published Sep 29, 2023
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Paper • 2403.00425 • Published Mar 1, 2024 • 1