arXiv:2508.16949
YANG ZHOU
Yang-Zhou
AI & ML interests
RLHF and DPO
Recent Activity
updated
a dataset
6 days ago
Yang-Zhou/DAPO-Math-17k-Qwen3-235B-A22B-Thinking-2507-rejection-distill
published
a dataset
25 days ago
Yang-Zhou/DAPO-Math-17k-Qwen3-235B-A22B-Thinking-2507-rejection-distill
authored
a paper
3 months ago
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Organizations
None yet