minpeter commited on
Commit
7adda02
·
verified ·
1 Parent(s): 20be8cb

End of training

Browse files
Files changed (2) hide show
  1. README.md +213 -0
  2. generation_config.json +9 -0
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: minpeter/tiny-ko-20m-base
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ datasets:
8
+ - lemon-mint/Korean-FineTome-100k
9
+ - lemon-mint/smol-koreantalk
10
+ - heegyu/open-korean-instructions-v20231020
11
+ - FreedomIntelligence/evol-instruct-korean
12
+ - FreedomIntelligence/alpaca-gpt4-korean
13
+ - FreedomIntelligence/sharegpt-korean
14
+ - coastral/korean-writing-style-instruct
15
+ - devngho/korean-instruction-mix
16
+ - youjunhyeok/Magpie-Pro-300K-Filtered-ko
17
+ - youjunhyeok/smoltalk-ko-translate
18
+ model-index:
19
+ - name: tiny-ko-20m-sft
20
+ results: []
21
+ ---
22
+
23
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
+ should probably proofread and complete it, then remove this comment. -->
25
+
26
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
27
+ <details><summary>See axolotl config</summary>
28
+
29
+ axolotl version: `0.11.0.dev0`
30
+ ```yaml
31
+ base_model: minpeter/tiny-ko-20m-base
32
+
33
+ hub_model_id: minpeter/tiny-ko-20m-sft
34
+ output_dir: ./outputs/tiny-ko-20m-sft
35
+ wandb_project: "axolotl"
36
+ wandb_entity: "kasfiekfs-e"
37
+
38
+ chat_template: chatml
39
+ datasets:
40
+ - path: lemon-mint/Korean-FineTome-100k
41
+ type: chat_template
42
+ split: train[:1000]
43
+ field_messages: messages
44
+ message_property_mappings:
45
+ role: role
46
+ content: content
47
+
48
+ - path: lemon-mint/smol-koreantalk
49
+ type: chat_template
50
+ split: train[:1000]
51
+ field_messages: messages
52
+ message_property_mappings:
53
+ role: role
54
+ content: content
55
+
56
+ - path: heegyu/open-korean-instructions-v20231020
57
+ type: chat_template
58
+ split: train[:1000]
59
+ field_messages: conversations
60
+ message_property_mappings:
61
+ role: from
62
+ content: value
63
+ roles:
64
+ user: ["human", "user"]
65
+ assistant: ["gpt", "assistant", "bot"]
66
+ system: ["system", "input"]
67
+
68
+ # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
69
+ - path: FreedomIntelligence/evol-instruct-korean
70
+ type: chat_template
71
+ split: train[:1000]
72
+ field_messages: conversations
73
+ message_property_mappings:
74
+ role: from
75
+ content: value
76
+
77
+ - path: FreedomIntelligence/alpaca-gpt4-korean
78
+ type: chat_template
79
+ split: train[:1000]
80
+ field_messages: conversations
81
+ message_property_mappings:
82
+ role: from
83
+ content: value
84
+
85
+ - path: FreedomIntelligence/sharegpt-korean
86
+ type: chat_template
87
+ split: train[:1000]
88
+ field_messages: conversations
89
+ message_property_mappings:
90
+ role: from
91
+ content: value
92
+
93
+ - path: coastral/korean-writing-style-instruct
94
+ type: chat_template
95
+ split: train[:1000]
96
+ field_messages: conversations
97
+ message_property_mappings:
98
+ role: from
99
+ content: value
100
+
101
+ - path: devngho/korean-instruction-mix
102
+ type: chat_template
103
+ split: train[:1000]
104
+ field_messages: messages
105
+ message_property_mappings:
106
+ role: from
107
+ content: value
108
+
109
+ - path: youjunhyeok/Magpie-Pro-300K-Filtered-ko
110
+ type: chat_template
111
+ split: train[:1000]
112
+ field_messages: conversations
113
+ message_property_mappings:
114
+ role: from
115
+ content: value
116
+
117
+ - path: youjunhyeok/smoltalk-ko-translate
118
+ type: chat_template
119
+ split: train[:1000]
120
+ name: merge_filtered
121
+ field_messages: conversations
122
+ message_property_mappings:
123
+ role: role
124
+ content: content
125
+
126
+ dataset_prepared_path: last_run_prepared
127
+ val_set_size: 0.05
128
+
129
+ save_steps: 200
130
+ warmup_steps: 20
131
+ eval_steps: 200
132
+
133
+ sequence_len: 8192
134
+
135
+ # <<<< experimental settings <<<<
136
+ sample_packing: false
137
+ train_on_inputs: false
138
+ # >>>> experimental settings >>>
139
+
140
+ pad_to_sequence_len: true
141
+
142
+ gradient_accumulation_steps: 4
143
+ micro_batch_size: 16
144
+
145
+ optimizer: paged_adamw_8bit
146
+ lr_scheduler: cosine
147
+ learning_rate: 1e-3
148
+
149
+ bf16: auto
150
+ tf32: false
151
+
152
+ gradient_checkpointing: true
153
+ gradient_checkpointing_kwargs:
154
+ use_reentrant: false
155
+ resume_from_checkpoint:
156
+ logging_steps: 1
157
+ flash_attention: true
158
+
159
+ num_epochs: 1
160
+ weight_decay: 0.0
161
+
162
+ ```
163
+
164
+ </details><br>
165
+
166
+ # tiny-ko-20m-sft
167
+
168
+ This model is a fine-tuned version of [minpeter/tiny-ko-20m-base](https://huggingface.co/minpeter/tiny-ko-20m-base) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct, the devngho/korean-instruction-mix, the youjunhyeok/Magpie-Pro-300K-Filtered-ko and the youjunhyeok/smoltalk-ko-translate datasets.
169
+
170
+ ## Model description
171
+
172
+ More information needed
173
+
174
+ ## Intended uses & limitations
175
+
176
+ More information needed
177
+
178
+ ## Training and evaluation data
179
+
180
+ More information needed
181
+
182
+ ## Training procedure
183
+
184
+ ### Training hyperparameters
185
+
186
+ The following hyperparameters were used during training:
187
+ - learning_rate: 0.001
188
+ - train_batch_size: 16
189
+ - eval_batch_size: 16
190
+ - seed: 42
191
+ - distributed_type: multi-GPU
192
+ - num_devices: 2
193
+ - gradient_accumulation_steps: 4
194
+ - total_train_batch_size: 128
195
+ - total_eval_batch_size: 32
196
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
197
+ - lr_scheduler_type: cosine
198
+ - lr_scheduler_warmup_steps: 20
199
+ - training_steps: 75
200
+
201
+ ### Training results
202
+
203
+ | Training Loss | Epoch | Step | Validation Loss |
204
+ |:-------------:|:-----:|:----:|:---------------:|
205
+ | No log | 0 | 0 | 3.5333 |
206
+
207
+
208
+ ### Framework versions
209
+
210
+ - Transformers 4.52.4
211
+ - Pytorch 2.6.0+cu124
212
+ - Datasets 3.6.0
213
+ - Tokenizers 0.21.1
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 32001,
6
+ "pad_token_id": 32003,
7
+ "transformers_version": "4.52.4",
8
+ "use_cache": false
9
+ }