view article Article Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement 17 days ago β’ 4
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18 β’ 87
view article Article You could have designed state of the art positional encoding Nov 25, 2024 β’ 398
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One Jun 26 β’ 48
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16 β’ 55
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper β’ 2506.01939 β’ Published Jun 2 β’ 185
view article Article Enabling Long Context Training with Sequence Parallelism in Axolotl Apr 4 β’ 14
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks Aug 4, 2024 β’ 30
Scotch & SOTA π₯ Pt. 7: Human Feedback Datasets π«£ Collection The elusive βhumanβ feedback β’ 1 item β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 6: Dialogue Tuning Datasets π¬ Collection Conversations, turn-based dialog, and things that can be turned into that. β’ 4 items β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 5: Instruction Tuning Datasets π©βπ« Collection Question & answer, task completion, general SFT and otherwise finetuney data. β’ 7 items β’ Updated Sep 13, 2023 β’ 1
view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? May 7, 2024 β’ 8