sophia_130m_8 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
d0f30a1 verified

Model Card

Best configuration

Hyperparameter Value
beta1 0.95
beta2 0.95
epsilon 1e-07
gamma 0.0125
learning_rate 0.002
max_grad_norm 1
min_lr_ratio 0
train_batch_size 128
warmup 4000
weight_decay 0.2