---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_replace_iter3_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_replace_iter3_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.8388
- Num Input Tokens Seen: 8195288

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.5833        | 0.0321 | 5    | 1.3056          | 264760            |
| 1.4802        | 0.0642 | 10   | 1.2129          | 531232            |
| 1.0682        | 0.0963 | 15   | 1.2152          | 792048            |
| 0.877         | 0.1285 | 20   | 1.3193          | 1057288           |
| 0.5869        | 0.1606 | 25   | 1.4425          | 1314432           |
| 0.5375        | 0.1927 | 30   | 1.5556          | 1577064           |
| 0.3261        | 0.2248 | 35   | 1.6875          | 1842448           |
| 0.2372        | 0.2569 | 40   | 1.8303          | 2102536           |
| 0.1862        | 0.2890 | 45   | 1.9071          | 2365920           |
| 0.1235        | 0.3212 | 50   | 1.9770          | 2636264           |
| 0.1133        | 0.3533 | 55   | 2.0005          | 2893776           |
| 0.0811        | 0.3854 | 60   | 1.9080          | 3156592           |
| 0.0467        | 0.4175 | 65   | 1.9028          | 3412792           |
| 0.053         | 0.4496 | 70   | 1.9141          | 3681376           |
| 0.1024        | 0.4817 | 75   | 1.8865          | 3943248           |
| 0.0689        | 0.5138 | 80   | 1.8100          | 4209328           |
| 0.0592        | 0.5460 | 85   | 1.7858          | 4475792           |
| 0.0753        | 0.5781 | 90   | 1.7337          | 4742648           |
| 0.0373        | 0.6102 | 95   | 1.7169          | 5010136           |
| 0.0492        | 0.6423 | 100  | 1.7129          | 5275128           |
| 0.041         | 0.6744 | 105  | 1.7290          | 5545064           |
| 0.0362        | 0.7065 | 110  | 1.7868          | 5804720           |
| 0.0454        | 0.7387 | 115  | 1.8283          | 6071728           |
| 0.0387        | 0.7708 | 120  | 1.8346          | 6344272           |
| 0.058         | 0.8029 | 125  | 1.7726          | 6612848           |
| 0.0502        | 0.8350 | 130  | 1.7259          | 6885872           |
| 0.0512        | 0.8671 | 135  | 1.7473          | 7146016           |
| 0.0594        | 0.8992 | 140  | 1.8113          | 7410008           |
| 0.042         | 0.9314 | 145  | 1.8112          | 7672408           |
| 0.0445        | 0.9635 | 150  | 1.7757          | 7936232           |
| 0.0329        | 0.9956 | 155  | 1.8388          | 8195288           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1