Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5 • 38
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22 • 28
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 72
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task Paper • 2406.14213 • Published Jun 20, 2024 • 21
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published Jun 14, 2024 • 52