view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Jul 29 • 198
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28 • 35
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic Paper • 2408.16326 • Published Aug 29, 2024 • 1
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3 • 18