SuryaBench Collection Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction • 8 items • Updated Aug 18 • 6
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 20
Paloma Collection Dataset and baseline models for Paloma, a benchmark of language model fit to 546 textual domains • 8 items • Updated about 4 hours ago • 15
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated about 4 hours ago • 95
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 25