hdong0/Qwen2.5-Math-1.5B-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Jul 7
cutelemonlili/Qwen2.5-Math-1.5B_Teacher_forget_RL_data_QwQ-Preview Text Generation • 2B • Updated Jul 20
JayHyeon/Qwen_1.5B-math-cDPO_5e-7_0.1lsmooth-1.0vpo_constant-1ep Text Generation • 2B • Updated Aug 5 • 1
JayHyeon/Qwen_1.5B-math-cDPO_5e-7_0.3lsmooth-1.0vpo_constant-1ep Text Generation • 2B • Updated Aug 5 • 1
JayHyeon/Qwen_1.5B-math-rDPO_5e-7_0.1lsmooth-1.0vpo_constant-1ep Text Generation • 2B • Updated Aug 5 • 1
JayHyeon/Qwen_1.5B-math-rDPO_5e-7_0.3lsmooth-1.0vpo_constant-1ep Text Generation • 2B • Updated Aug 5 • 1