trollek commited on
Commit
a7cf060
·
verified ·
1 Parent(s): b149143

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -7
README.md CHANGED
@@ -9,7 +9,7 @@ library_name: transformers
9
  ---
10
  # ThoughtStream-4B-v0.1
11
 
12
- This model is based on [h2oai/h2o-danube3-4b-base](https://huggingface.co/h2oai/h2o-danube3-4b-base) and fine-tuned using [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") with LLama-Factory. It uses the ChatML template, without a system message, and was trained on the [ThoughtfulAssistant-v01](https://huggingface.co/datasets/trollek/ThoughtfulAssistant-v01) dataset.
13
 
14
  The idea is to abstract the thoughts away or into a thought bubble when chatting.
15
 
@@ -23,7 +23,8 @@ The idea is to abstract the thoughts away or into a thought bubble when chatting
23
  {{response}}<|im_end|>
24
  ```
25
 
26
- ### LLama-Factory config
 
27
  ```yaml
28
  ### model
29
  model_name_or_path: danube3/thinking-base-chatml
@@ -39,7 +40,7 @@ lora_rank: 8
39
  lora_alpha: 16
40
  use_unsloth: true
41
  upcast_layernorm: true
42
- seed: 404
43
  additional_target: embed_tokens
44
 
45
  ### dataset
@@ -59,17 +60,65 @@ overwrite_output_dir: false
59
 
60
  ### train
61
  per_device_train_batch_size: 4
62
- gradient_accumulation_steps: 2
63
  learning_rate: 0.00001
64
- num_train_epochs: 1
65
  lr_scheduler_type: cosine
66
  warmup_ratio: 0.01
67
  bf16: true
68
  flash_attn: fa2
69
 
70
  ### eval
71
- val_size: 0.02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  per_device_eval_batch_size: 1
73
  eval_strategy: steps
74
- eval_steps: 250
75
  ```
 
9
  ---
10
  # ThoughtStream-4B-v0.1
11
 
12
+ This model is based on [h2oai/h2o-danube3-4b-base](https://huggingface.co/h2oai/h2o-danube3-4b-base) and fine-tuned using [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") and [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") with LLama-Factory. It uses the ChatML template, without a system message, and was trained on the [ThoughtfulAssistant-v01](https://huggingface.co/datasets/trollek/ThoughtfulAssistant-v01) dataset.
13
 
14
  The idea is to abstract the thoughts away or into a thought bubble when chatting.
15
 
 
23
  {{response}}<|im_end|>
24
  ```
25
 
26
+ ### LLama-Factory configs
27
+
28
  ```yaml
29
  ### model
30
  model_name_or_path: danube3/thinking-base-chatml
 
40
  lora_alpha: 16
41
  use_unsloth: true
42
  upcast_layernorm: true
43
+ seed: 24
44
  additional_target: embed_tokens
45
 
46
  ### dataset
 
60
 
61
  ### train
62
  per_device_train_batch_size: 4
63
+ gradient_accumulation_steps: 4
64
  learning_rate: 0.00001
65
+ num_train_epochs: 2
66
  lr_scheduler_type: cosine
67
  warmup_ratio: 0.01
68
  bf16: true
69
  flash_attn: fa2
70
 
71
  ### eval
72
+ val_size: 0.01
73
+ per_device_eval_batch_size: 1
74
+ eval_strategy: steps
75
+ eval_steps: 500
76
+ ```
77
+
78
+
79
+ ```yaml
80
+ ### model
81
+ model_name_or_path: danube3/thinking-base-chatml/merged_loraplus
82
+
83
+ ### method
84
+ stage: sft
85
+ do_train: true
86
+ finetuning_type: full
87
+ use_badam: true
88
+ badam_switch_mode: ascending
89
+ badam_start_block: 7
90
+ badam_switch_interval: 20
91
+ badam_verbose: 1
92
+ seed: 768
93
+
94
+ ### dataset
95
+ dataset: thinking_capybara,thinking_panoia
96
+ template: hermes_chatml
97
+ cutoff_len: 8192
98
+ overwrite_cache: false
99
+ preprocessing_num_workers: 12
100
+
101
+ ### output
102
+ output_dir: danube3/ThoughtStream-4B-v0.1
103
+ logging_steps: 1
104
+ save_steps: 1
105
+ save_strategy: epoch
106
+ plot_loss: true
107
+ overwrite_output_dir: false
108
+
109
+ ### train
110
+ per_device_train_batch_size: 1
111
+ gradient_accumulation_steps: 4
112
+ learning_rate: 0.00001
113
+ num_train_epochs: 1
114
+ lr_scheduler_type: constant_with_warmup
115
+ warmup_ratio: 0.01
116
+ pure_bf16: true
117
+ flash_attn: fa2
118
+
119
+ ### eval
120
+ val_size: 0.01
121
  per_device_eval_batch_size: 1
122
  eval_strategy: steps
123
+ eval_steps: 200
124
  ```