contrastive
This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: -13.2422
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 552
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 2
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| -3.2816 | 0.0547 | 250 | -0.7767 |
| -22.042 | 0.1094 | 500 | -6.1697 |
| -30.8617 | 0.1641 | 750 | -7.4942 |
| -44.9144 | 0.2188 | 1000 | -9.7224 |
| -48.507 | 0.2735 | 1250 | -9.8642 |
| -46.8178 | 0.3282 | 1500 | -10.0852 |
| -46.7409 | 0.3829 | 1750 | -11.3118 |
| -50.4048 | 0.4376 | 2000 | -11.8037 |
| -45.3385 | 0.4923 | 2250 | -11.5016 |
| -32.1133 | 0.5470 | 2500 | -10.8557 |
| -45.1595 | 0.6017 | 2750 | -11.9776 |
| -50.3197 | 0.6564 | 3000 | -12.6582 |
| -53.3858 | 0.7111 | 3250 | -13.1406 |
| -62.5761 | 0.7658 | 3500 | -12.9881 |
| -63.006 | 0.8205 | 3750 | -12.7458 |
| -52.0048 | 0.8752 | 4000 | -13.9200 |
| -62.9389 | 0.9299 | 4250 | -13.7936 |
| -55.1089 | 0.9846 | 4500 | -12.9449 |
| -58.2402 | 1.0392 | 4750 | -13.2422 |
Framework versions
- Transformers 4.52.3
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 3
Model tree for jnsffrt/MNLP_M3_mcqa_model
Base model
Qwen/Qwen3-0.6B-Base