Xiaoyang Cao's picture

4

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

upvoted a paper about 1 month ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

upvoted a paper about 1 month ago

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

upvoted a paper about 1 month ago

Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment

View all activity

Organizations

None yet

models 28

Sean13/llama-8b-instruct-rsimpo-full

Text Generation • 8B • Updated Sep 24 • 4

Sean13/llama-8b-instruct-simpo-full

Text Generation • 8B • Updated Sep 24 • 3

Sean13/llama-8b-instruct-ripo-full

Text Generation • 8B • Updated Sep 24 • 6

Sean13/llama-8b-instruct-ipo-full

Text Generation • 8B • Updated Sep 23 • 2

Sean13/llama-8b-instruct-rdpo-full

Text Generation • 8B • Updated Sep 23 • 3

Sean13/llama-8b-instruct-dpo-full

Text Generation • 8B • Updated Sep 23 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75

Text Generation • 7B • Updated Sep 22 • 6

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.5

datasets 0

None public yet