trollek
/

ThoughtStream-4B-v0.1

@@ -9,7 +9,7 @@ library_name: transformers
 ---
 # ThoughtStream-4B-v0.1
-This model is based on [h2oai/h2o-danube3-4b-base](https://huggingface.co/h2oai/h2o-danube3-4b-base) and fine-tuned using [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") with LLama-Factory. It uses the ChatML template, without a system message, and was trained on the [ThoughtfulAssistant-v01](https://huggingface.co/datasets/trollek/ThoughtfulAssistant-v01) dataset.
 The idea is to abstract the thoughts away or into a thought bubble when chatting.
@@ -23,7 +23,8 @@ The idea is to abstract the thoughts away or into a thought bubble when chatting
 {{response}}<|im_end|>
 ```
-### LLama-Factory config
 ```yaml
 ### model
 model_name_or_path: danube3/thinking-base-chatml
@@ -39,7 +40,7 @@ lora_rank: 8
 lora_alpha: 16
 use_unsloth: true
 upcast_layernorm: true
-seed: 404
 additional_target: embed_tokens
 ### dataset
@@ -59,17 +60,65 @@ overwrite_output_dir: false
 ### train
 per_device_train_batch_size: 4
-gradient_accumulation_steps: 2
 learning_rate: 0.00001
-num_train_epochs: 1
 lr_scheduler_type: cosine
 warmup_ratio: 0.01
 bf16: true
 flash_attn: fa2
 ### eval
-val_size: 0.02
 per_device_eval_batch_size: 1
 eval_strategy: steps
-eval_steps: 250
 ```

 ---
 # ThoughtStream-4B-v0.1
+This model is based on [h2oai/h2o-danube3-4b-base](https://huggingface.co/h2oai/h2o-danube3-4b-base) and fine-tuned using [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") and [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") with LLama-Factory. It uses the ChatML template, without a system message, and was trained on the [ThoughtfulAssistant-v01](https://huggingface.co/datasets/trollek/ThoughtfulAssistant-v01) dataset.
 The idea is to abstract the thoughts away or into a thought bubble when chatting.
 {{response}}<|im_end|>
 ```
+### LLama-Factory configs
 ```yaml
 ### model
 model_name_or_path: danube3/thinking-base-chatml
 lora_alpha: 16
 use_unsloth: true
 upcast_layernorm: true
+seed: 24
 additional_target: embed_tokens
 ### dataset
 ### train
 per_device_train_batch_size: 4
+gradient_accumulation_steps: 4
 learning_rate: 0.00001
+num_train_epochs: 2
 lr_scheduler_type: cosine
 warmup_ratio: 0.01
 bf16: true
 flash_attn: fa2
 ### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 500
+```
+```yaml
+### model
+model_name_or_path: danube3/thinking-base-chatml/merged_loraplus
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_badam: true
+badam_switch_mode: ascending
+badam_start_block: 7
+badam_switch_interval: 20
+badam_verbose: 1
+seed: 768
+### dataset
+dataset: thinking_capybara,thinking_panoia
+template: hermes_chatml
+cutoff_len: 8192
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir: danube3/ThoughtStream-4B-v0.1
+logging_steps: 1
+save_steps: 1
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 4
+learning_rate: 0.00001
+num_train_epochs: 1
+lr_scheduler_type: constant_with_warmup
+warmup_ratio: 0.01
+pure_bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.01
 per_device_eval_batch_size: 1
 eval_strategy: steps
+eval_steps: 200
 ```