# Model Card - Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046) - Optimizer: `mini` - Model size: `300m` - Data size: `48B` ## Best configuration | Hyperparameter | Value | |---|---| | beta1 | `0.9` | | beta2 | `0.98` | | epsilon | `9.999999999999999e-26` | | learning_rate | `0.002` | | max_grad_norm | `2` | | min_lr_ratio | `0` | | nesterov | `False` | | train_batch_size | `128` | | warmup | `2000` | | weight_decay | `0.2` |