contrastive

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 552
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
-3.2816	0.0547	250	-0.7767
-22.042	0.1094	500	-6.1697
-30.8617	0.1641	750	-7.4942
-44.9144	0.2188	1000	-9.7224
-48.507	0.2735	1250	-9.8642
-46.8178	0.3282	1500	-10.0852
-46.7409	0.3829	1750	-11.3118
-50.4048	0.4376	2000	-11.8037
-45.3385	0.4923	2250	-11.5016
-32.1133	0.5470	2500	-10.8557
-45.1595	0.6017	2750	-11.9776
-50.3197	0.6564	3000	-12.6582
-53.3858	0.7111	3250	-13.1406
-62.5761	0.7658	3500	-12.9881
-63.006	0.8205	3750	-12.7458
-52.0048	0.8752	4000	-13.9200
-62.9389	0.9299	4250	-13.7936
-55.1089	0.9846	4500	-12.9449
-58.2402	1.0392	4750	-13.2422

Safetensors

Model size

0.6B params

Tensor type

BF16

Base model

Finetuned

(457)

this model