Actually functional this time Alpha/SimPO checkpoint trained from Apertus base.
Trained on a mix of curated C2, Gutenberg, and Instruct Skill-Mix
Alpaca chat template, temp .5, min_p .05, rep pen 1.05 seems reasonable.
Now I just have to make a better preference dataset...
- Downloads last month
- 12