Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
4
Xiaoyang Cao
Sean13
Follow
0 followers
·
2 following
https://xiaoyangcao1113.github.io/
XiaoyangCao1113
xiaoyangcao
AI & ML interests
RLFH, Deep Reinfrocement Learning
Recent Activity
upvoted
a
paper
about 1 month ago
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
upvoted
a
paper
about 1 month ago
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
upvoted
a
paper
about 1 month ago
Latent Collective Preference Optimization: A General Framework for Robust LLM Alignment
View all activity
Organizations
None yet
models
28
Sort: Recently updated
Sean13/llama-8b-instruct-rsimpo-full
Text Generation
•
8B
•
Updated
Sep 24
•
4
Sean13/llama-8b-instruct-simpo-full
Text Generation
•
8B
•
Updated
Sep 24
•
3
Sean13/llama-8b-instruct-ripo-full
Text Generation
•
8B
•
Updated
Sep 24
•
6
Sean13/llama-8b-instruct-ipo-full
Text Generation
•
8B
•
Updated
Sep 23
•
2
Sean13/llama-8b-instruct-rdpo-full
Text Generation
•
8B
•
Updated
Sep 23
•
3
Sean13/llama-8b-instruct-dpo-full
Text Generation
•
8B
•
Updated
Sep 23
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.99
Text Generation
•
7B
•
Updated
Sep 22
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.75
Text Generation
•
7B
•
Updated
Sep 22
•
6
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.55
Text Generation
•
7B
•
Updated
Sep 22
•
3
Sean13/mistral-7b-instruct-v0.2-rdpo-full-eta0.5
Updated
Sep 22
View 28 models
datasets
0
None public yet