Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Xiaoyang Cao's picture
4

Xiaoyang Cao

Sean13
·
https://xiaoyangcao1113.github.io/
  • XiaoyangCao1113
  • xiaoyangcao

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

upvoted a paper about 1 month ago
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
upvoted a paper about 1 month ago
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
upvoted a paper about 1 month ago
Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment
View all activity

Organizations

None yet

models 28

Sean13/llama-8b-instruct-rsimpo-full

Text Generation • 8B • Updated Sep 24 • 4

Sean13/llama-8b-instruct-simpo-full

Text Generation • 8B • Updated Sep 24 • 3

Sean13/llama-8b-instruct-ripo-full

Text Generation • 8B • Updated Sep 24 • 6

Sean13/llama-8b-instruct-ipo-full

Text Generation • 8B • Updated Sep 23 • 2

Sean13/llama-8b-instruct-rdpo-full

Text Generation • 8B • Updated Sep 23 • 3

Sean13/llama-8b-instruct-dpo-full

Text Generation • 8B • Updated Sep 23 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75

Text Generation • 7B • Updated Sep 22 • 6

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55

Text Generation • 7B • Updated Sep 22 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.5

Updated Sep 22
View 28 models

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs