phee27 commited on
Commit
e229d07
·
verified ·
1 Parent(s): a5968e9

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-Next-80B-A3B-Thinking
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:Qwen/Qwen3-Next-80B-A3B-Thinking
8
+ - lora
9
+ - transformers
10
+ ---
11
+
12
+ # Model Card for Model ID
13
+
14
+ <!-- Provide a quick summary of what the model is/does. -->
15
+
16
+
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+
25
+
26
+ - **Developed by:** [More Information Needed]
27
+ - **Funded by [optional]:** [More Information Needed]
28
+ - **Shared by [optional]:** [More Information Needed]
29
+ - **Model type:** [More Information Needed]
30
+ - **Language(s) (NLP):** [More Information Needed]
31
+ - **License:** [More Information Needed]
32
+ - **Finetuned from model [optional]:** [More Information Needed]
33
+
34
+ ### Model Sources [optional]
35
+
36
+ <!-- Provide the basic links for the model. -->
37
+
38
+ - **Repository:** [More Information Needed]
39
+ - **Paper [optional]:** [More Information Needed]
40
+ - **Demo [optional]:** [More Information Needed]
41
+
42
+ ## Uses
43
+
44
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
45
+
46
+ ### Direct Use
47
+
48
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Downstream Use [optional]
53
+
54
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
55
+
56
+ [More Information Needed]
57
+
58
+ ### Out-of-Scope Use
59
+
60
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ## Bias, Risks, and Limitations
65
+
66
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
67
+
68
+ [More Information Needed]
69
+
70
+ ### Recommendations
71
+
72
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
73
+
74
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
75
+
76
+ ## How to Get Started with the Model
77
+
78
+ Use the code below to get started with the model.
79
+
80
+ [More Information Needed]
81
+
82
+ ## Training Details
83
+
84
+ ### Training Data
85
+
86
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
87
+
88
+ [More Information Needed]
89
+
90
+ ### Training Procedure
91
+
92
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
93
+
94
+ #### Preprocessing [optional]
95
+
96
+ [More Information Needed]
97
+
98
+
99
+ #### Training Hyperparameters
100
+
101
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
102
+
103
+ #### Speeds, Sizes, Times [optional]
104
+
105
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
106
+
107
+ [More Information Needed]
108
+
109
+ ## Evaluation
110
+
111
+ <!-- This section describes the evaluation protocols and provides the results. -->
112
+
113
+ ### Testing Data, Factors & Metrics
114
+
115
+ #### Testing Data
116
+
117
+ <!-- This should link to a Dataset Card if possible. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Factors
122
+
123
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
124
+
125
+ [More Information Needed]
126
+
127
+ #### Metrics
128
+
129
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
130
+
131
+ [More Information Needed]
132
+
133
+ ### Results
134
+
135
+ [More Information Needed]
136
+
137
+ #### Summary
138
+
139
+
140
+
141
+ ## Model Examination [optional]
142
+
143
+ <!-- Relevant interpretability work for the model goes here -->
144
+
145
+ [More Information Needed]
146
+
147
+ ## Environmental Impact
148
+
149
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
150
+
151
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
152
+
153
+ - **Hardware Type:** [More Information Needed]
154
+ - **Hours used:** [More Information Needed]
155
+ - **Cloud Provider:** [More Information Needed]
156
+ - **Compute Region:** [More Information Needed]
157
+ - **Carbon Emitted:** [More Information Needed]
158
+
159
+ ## Technical Specifications [optional]
160
+
161
+ ### Model Architecture and Objective
162
+
163
+ [More Information Needed]
164
+
165
+ ### Compute Infrastructure
166
+
167
+ [More Information Needed]
168
+
169
+ #### Hardware
170
+
171
+ [More Information Needed]
172
+
173
+ #### Software
174
+
175
+ [More Information Needed]
176
+
177
+ ## Citation [optional]
178
+
179
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
180
+
181
+ **BibTeX:**
182
+
183
+ [More Information Needed]
184
+
185
+ **APA:**
186
+
187
+ [More Information Needed]
188
+
189
+ ## Glossary [optional]
190
+
191
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
192
+
193
+ [More Information Needed]
194
+
195
+ ## More Information [optional]
196
+
197
+ [More Information Needed]
198
+
199
+ ## Model Card Authors [optional]
200
+
201
+ [More Information Needed]
202
+
203
+ ## Model Card Contact
204
+
205
+ [More Information Needed]
206
+ ### Framework versions
207
+
208
+ - PEFT 0.17.1
adapter_config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen3-Next-80B-A3B-Thinking",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": null,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 8,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "q_proj",
29
+ "shared_expert_gate",
30
+ "linear_attn.in_proj_qkvz",
31
+ "shared_expert.gate_proj",
32
+ "shared_expert.down_proj",
33
+ "o_proj",
34
+ "shared_expert.up_proj",
35
+ "linear_attn.in_proj_ba",
36
+ "k_proj",
37
+ "linear_attn.out_proj",
38
+ "v_proj",
39
+ "mlp.gate"
40
+ ],
41
+ "target_parameters": [],
42
+ "task_type": "CAUSAL_LM",
43
+ "trainable_token_indices": null,
44
+ "use_dora": false,
45
+ "use_qalora": false,
46
+ "use_rslora": false
47
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a20b6ab68a405fef7856f5b1e2890ce481f8475b508a1134ed13fda97d5dd8ef
3
+ size 106429272
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n<think>\n' }}
86
+ {%- endif %}
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddda833950388a06484c359715f38c8ba18f1e29bc495fd81bea18eb7cceb147
3
+ size 55092381
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85ad5156cb5ab55f8cd0ece74445bd91b51e4b70f18b043a45705f36ab0130d4
3
+ size 15429
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fec8319af6b2d6a743854f6c5ca2fa58a4eca9142a778df742e377e0ff94303
3
+ size 15429
rng_state_2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:472a0c27c9c4a1ff02f8d7e105c3c3b7448fe82c11d32db353e38ee4846df0ef
3
+ size 15429
rng_state_3.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90778dcc377d95e6a4ed8f6f74550df1165f14dbd78b3643a5900648026d353d
3
+ size 15429
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:253623eb1c82cfe00d3e3cb586d78e339aadb91eb510a90f07642ff3a74a6516
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 1010000,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
trainer_state.json ADDED
@@ -0,0 +1,1277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 1.972972972972973,
6
+ "eval_steps": 55,
7
+ "global_step": 110,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0,
14
+ "eval_loss": 6.015130996704102,
15
+ "eval_runtime": 175.115,
16
+ "eval_samples_per_second": 0.571,
17
+ "eval_steps_per_second": 0.074,
18
+ "memory/device_reserved (GiB)": 57.3,
19
+ "memory/max_active (GiB)": 43.52,
20
+ "memory/max_allocated (GiB)": 43.52,
21
+ "step": 0
22
+ },
23
+ {
24
+ "epoch": 0.018018018018018018,
25
+ "grad_norm": 0.601690411567688,
26
+ "learning_rate": 0.0,
27
+ "loss": 5.8902,
28
+ "memory/device_reserved (GiB)": 57.78,
29
+ "memory/max_active (GiB)": 52.56,
30
+ "memory/max_allocated (GiB)": 52.56,
31
+ "step": 1,
32
+ "tokens_per_second_per_gpu": 177.27
33
+ },
34
+ {
35
+ "epoch": 0.036036036036036036,
36
+ "grad_norm": 0.5987619161605835,
37
+ "learning_rate": 1.818181818181818e-06,
38
+ "loss": 5.9238,
39
+ "memory/device_reserved (GiB)": 57.83,
40
+ "memory/max_active (GiB)": 52.61,
41
+ "memory/max_allocated (GiB)": 52.61,
42
+ "step": 2,
43
+ "tokens_per_second_per_gpu": 32.34
44
+ },
45
+ {
46
+ "epoch": 0.05405405405405406,
47
+ "grad_norm": 0.6140171885490417,
48
+ "learning_rate": 3.636363636363636e-06,
49
+ "loss": 6.108,
50
+ "memory/device_reserved (GiB)": 57.85,
51
+ "memory/max_active (GiB)": 52.61,
52
+ "memory/max_allocated (GiB)": 52.61,
53
+ "step": 3,
54
+ "tokens_per_second_per_gpu": 32.29
55
+ },
56
+ {
57
+ "epoch": 0.07207207207207207,
58
+ "grad_norm": 0.6393939256668091,
59
+ "learning_rate": 5.4545454545454545e-06,
60
+ "loss": 6.0509,
61
+ "memory/device_reserved (GiB)": 57.85,
62
+ "memory/max_active (GiB)": 52.61,
63
+ "memory/max_allocated (GiB)": 52.61,
64
+ "step": 4,
65
+ "tokens_per_second_per_gpu": 34.19
66
+ },
67
+ {
68
+ "epoch": 0.09009009009009009,
69
+ "grad_norm": 0.6186049580574036,
70
+ "learning_rate": 7.272727272727272e-06,
71
+ "loss": 6.0799,
72
+ "memory/device_reserved (GiB)": 57.85,
73
+ "memory/max_active (GiB)": 52.61,
74
+ "memory/max_allocated (GiB)": 52.61,
75
+ "step": 5,
76
+ "tokens_per_second_per_gpu": 48.39
77
+ },
78
+ {
79
+ "epoch": 0.10810810810810811,
80
+ "grad_norm": 0.6133891344070435,
81
+ "learning_rate": 9.090909090909091e-06,
82
+ "loss": 6.0418,
83
+ "memory/device_reserved (GiB)": 57.85,
84
+ "memory/max_active (GiB)": 52.61,
85
+ "memory/max_allocated (GiB)": 52.61,
86
+ "step": 6,
87
+ "tokens_per_second_per_gpu": 35.74
88
+ },
89
+ {
90
+ "epoch": 0.12612612612612611,
91
+ "grad_norm": 0.6060707569122314,
92
+ "learning_rate": 1.0909090909090909e-05,
93
+ "loss": 5.9976,
94
+ "memory/device_reserved (GiB)": 57.85,
95
+ "memory/max_active (GiB)": 52.61,
96
+ "memory/max_allocated (GiB)": 52.61,
97
+ "step": 7,
98
+ "tokens_per_second_per_gpu": 42.46
99
+ },
100
+ {
101
+ "epoch": 0.14414414414414414,
102
+ "grad_norm": 0.6184359192848206,
103
+ "learning_rate": 1.2727272727272727e-05,
104
+ "loss": 6.0328,
105
+ "memory/device_reserved (GiB)": 57.85,
106
+ "memory/max_active (GiB)": 52.6,
107
+ "memory/max_allocated (GiB)": 52.6,
108
+ "step": 8,
109
+ "tokens_per_second_per_gpu": 38.38
110
+ },
111
+ {
112
+ "epoch": 0.16216216216216217,
113
+ "grad_norm": 0.6444172859191895,
114
+ "learning_rate": 1.4545454545454545e-05,
115
+ "loss": 5.9618,
116
+ "memory/device_reserved (GiB)": 57.85,
117
+ "memory/max_active (GiB)": 52.61,
118
+ "memory/max_allocated (GiB)": 52.61,
119
+ "step": 9,
120
+ "tokens_per_second_per_gpu": 26.99
121
+ },
122
+ {
123
+ "epoch": 0.18018018018018017,
124
+ "grad_norm": 0.6325266361236572,
125
+ "learning_rate": 1.6363636363636366e-05,
126
+ "loss": 6.1674,
127
+ "memory/device_reserved (GiB)": 57.85,
128
+ "memory/max_active (GiB)": 52.61,
129
+ "memory/max_allocated (GiB)": 52.61,
130
+ "step": 10,
131
+ "tokens_per_second_per_gpu": 29.93
132
+ },
133
+ {
134
+ "epoch": 0.1981981981981982,
135
+ "grad_norm": 0.6881551146507263,
136
+ "learning_rate": 1.8181818181818182e-05,
137
+ "loss": 5.9809,
138
+ "memory/device_reserved (GiB)": 57.85,
139
+ "memory/max_active (GiB)": 52.61,
140
+ "memory/max_allocated (GiB)": 52.61,
141
+ "step": 11,
142
+ "tokens_per_second_per_gpu": 33.12
143
+ },
144
+ {
145
+ "epoch": 0.21621621621621623,
146
+ "grad_norm": 0.6200075745582581,
147
+ "learning_rate": 2e-05,
148
+ "loss": 5.9737,
149
+ "memory/device_reserved (GiB)": 57.85,
150
+ "memory/max_active (GiB)": 52.61,
151
+ "memory/max_allocated (GiB)": 52.61,
152
+ "step": 12,
153
+ "tokens_per_second_per_gpu": 33.23
154
+ },
155
+ {
156
+ "epoch": 0.23423423423423423,
157
+ "grad_norm": 0.6758139133453369,
158
+ "learning_rate": 2.1818181818181818e-05,
159
+ "loss": 5.9377,
160
+ "memory/device_reserved (GiB)": 57.85,
161
+ "memory/max_active (GiB)": 52.61,
162
+ "memory/max_allocated (GiB)": 52.61,
163
+ "step": 13,
164
+ "tokens_per_second_per_gpu": 31.95
165
+ },
166
+ {
167
+ "epoch": 0.25225225225225223,
168
+ "grad_norm": 0.6687948107719421,
169
+ "learning_rate": 2.3636363636363637e-05,
170
+ "loss": 6.0492,
171
+ "memory/device_reserved (GiB)": 57.85,
172
+ "memory/max_active (GiB)": 52.61,
173
+ "memory/max_allocated (GiB)": 52.61,
174
+ "step": 14,
175
+ "tokens_per_second_per_gpu": 31.88
176
+ },
177
+ {
178
+ "epoch": 0.2702702702702703,
179
+ "grad_norm": 0.6872105002403259,
180
+ "learning_rate": 2.5454545454545454e-05,
181
+ "loss": 6.1111,
182
+ "memory/device_reserved (GiB)": 57.85,
183
+ "memory/max_active (GiB)": 52.6,
184
+ "memory/max_allocated (GiB)": 52.6,
185
+ "step": 15,
186
+ "tokens_per_second_per_gpu": 40.32
187
+ },
188
+ {
189
+ "epoch": 0.2882882882882883,
190
+ "grad_norm": 0.7107413411140442,
191
+ "learning_rate": 2.7272727272727273e-05,
192
+ "loss": 5.9434,
193
+ "memory/device_reserved (GiB)": 57.85,
194
+ "memory/max_active (GiB)": 52.61,
195
+ "memory/max_allocated (GiB)": 52.61,
196
+ "step": 16,
197
+ "tokens_per_second_per_gpu": 31.11
198
+ },
199
+ {
200
+ "epoch": 0.3063063063063063,
201
+ "grad_norm": 0.7199774384498596,
202
+ "learning_rate": 2.909090909090909e-05,
203
+ "loss": 6.0871,
204
+ "memory/device_reserved (GiB)": 57.85,
205
+ "memory/max_active (GiB)": 52.62,
206
+ "memory/max_allocated (GiB)": 52.62,
207
+ "step": 17,
208
+ "tokens_per_second_per_gpu": 40.84
209
+ },
210
+ {
211
+ "epoch": 0.32432432432432434,
212
+ "grad_norm": 0.7394067049026489,
213
+ "learning_rate": 3.090909090909091e-05,
214
+ "loss": 5.8008,
215
+ "memory/device_reserved (GiB)": 57.85,
216
+ "memory/max_active (GiB)": 52.61,
217
+ "memory/max_allocated (GiB)": 52.61,
218
+ "step": 18,
219
+ "tokens_per_second_per_gpu": 52.0
220
+ },
221
+ {
222
+ "epoch": 0.34234234234234234,
223
+ "grad_norm": 0.7493309378623962,
224
+ "learning_rate": 3.272727272727273e-05,
225
+ "loss": 5.8617,
226
+ "memory/device_reserved (GiB)": 57.85,
227
+ "memory/max_active (GiB)": 52.61,
228
+ "memory/max_allocated (GiB)": 52.61,
229
+ "step": 19,
230
+ "tokens_per_second_per_gpu": 35.37
231
+ },
232
+ {
233
+ "epoch": 0.36036036036036034,
234
+ "grad_norm": 0.7457339763641357,
235
+ "learning_rate": 3.454545454545455e-05,
236
+ "loss": 5.7479,
237
+ "memory/device_reserved (GiB)": 57.85,
238
+ "memory/max_active (GiB)": 52.62,
239
+ "memory/max_allocated (GiB)": 52.62,
240
+ "step": 20,
241
+ "tokens_per_second_per_gpu": 35.34
242
+ },
243
+ {
244
+ "epoch": 0.3783783783783784,
245
+ "grad_norm": 0.7670865058898926,
246
+ "learning_rate": 3.6363636363636364e-05,
247
+ "loss": 5.7939,
248
+ "memory/device_reserved (GiB)": 57.85,
249
+ "memory/max_active (GiB)": 52.61,
250
+ "memory/max_allocated (GiB)": 52.61,
251
+ "step": 21,
252
+ "tokens_per_second_per_gpu": 37.74
253
+ },
254
+ {
255
+ "epoch": 0.3963963963963964,
256
+ "grad_norm": 0.7689312100410461,
257
+ "learning_rate": 3.818181818181819e-05,
258
+ "loss": 5.7546,
259
+ "memory/device_reserved (GiB)": 57.85,
260
+ "memory/max_active (GiB)": 52.61,
261
+ "memory/max_allocated (GiB)": 52.61,
262
+ "step": 22,
263
+ "tokens_per_second_per_gpu": 34.11
264
+ },
265
+ {
266
+ "epoch": 0.4144144144144144,
267
+ "grad_norm": 0.7929359674453735,
268
+ "learning_rate": 4e-05,
269
+ "loss": 5.8247,
270
+ "memory/device_reserved (GiB)": 57.85,
271
+ "memory/max_active (GiB)": 52.61,
272
+ "memory/max_allocated (GiB)": 52.61,
273
+ "step": 23,
274
+ "tokens_per_second_per_gpu": 22.92
275
+ },
276
+ {
277
+ "epoch": 0.43243243243243246,
278
+ "grad_norm": 0.7598868012428284,
279
+ "learning_rate": 4.181818181818182e-05,
280
+ "loss": 5.6557,
281
+ "memory/device_reserved (GiB)": 57.85,
282
+ "memory/max_active (GiB)": 52.6,
283
+ "memory/max_allocated (GiB)": 52.6,
284
+ "step": 24,
285
+ "tokens_per_second_per_gpu": 29.09
286
+ },
287
+ {
288
+ "epoch": 0.45045045045045046,
289
+ "grad_norm": 0.7897383570671082,
290
+ "learning_rate": 4.3636363636363636e-05,
291
+ "loss": 5.6503,
292
+ "memory/device_reserved (GiB)": 57.85,
293
+ "memory/max_active (GiB)": 52.62,
294
+ "memory/max_allocated (GiB)": 52.62,
295
+ "step": 25,
296
+ "tokens_per_second_per_gpu": 43.23
297
+ },
298
+ {
299
+ "epoch": 0.46846846846846846,
300
+ "grad_norm": 0.8077855706214905,
301
+ "learning_rate": 4.545454545454546e-05,
302
+ "loss": 5.7059,
303
+ "memory/device_reserved (GiB)": 57.85,
304
+ "memory/max_active (GiB)": 52.61,
305
+ "memory/max_allocated (GiB)": 52.61,
306
+ "step": 26,
307
+ "tokens_per_second_per_gpu": 38.97
308
+ },
309
+ {
310
+ "epoch": 0.4864864864864865,
311
+ "grad_norm": 0.7965295910835266,
312
+ "learning_rate": 4.7272727272727275e-05,
313
+ "loss": 5.5299,
314
+ "memory/device_reserved (GiB)": 57.85,
315
+ "memory/max_active (GiB)": 52.61,
316
+ "memory/max_allocated (GiB)": 52.61,
317
+ "step": 27,
318
+ "tokens_per_second_per_gpu": 29.02
319
+ },
320
+ {
321
+ "epoch": 0.5045045045045045,
322
+ "grad_norm": 0.7723223567008972,
323
+ "learning_rate": 4.909090909090909e-05,
324
+ "loss": 5.442,
325
+ "memory/device_reserved (GiB)": 57.85,
326
+ "memory/max_active (GiB)": 52.62,
327
+ "memory/max_allocated (GiB)": 52.62,
328
+ "step": 28,
329
+ "tokens_per_second_per_gpu": 25.82
330
+ },
331
+ {
332
+ "epoch": 0.5225225225225225,
333
+ "grad_norm": 0.7679380178451538,
334
+ "learning_rate": 5.090909090909091e-05,
335
+ "loss": 5.3683,
336
+ "memory/device_reserved (GiB)": 57.86,
337
+ "memory/max_active (GiB)": 52.62,
338
+ "memory/max_allocated (GiB)": 52.62,
339
+ "step": 29,
340
+ "tokens_per_second_per_gpu": 36.19
341
+ },
342
+ {
343
+ "epoch": 0.5405405405405406,
344
+ "grad_norm": 0.7431294322013855,
345
+ "learning_rate": 5.272727272727272e-05,
346
+ "loss": 5.2314,
347
+ "memory/device_reserved (GiB)": 57.86,
348
+ "memory/max_active (GiB)": 52.62,
349
+ "memory/max_allocated (GiB)": 52.62,
350
+ "step": 30,
351
+ "tokens_per_second_per_gpu": 48.12
352
+ },
353
+ {
354
+ "epoch": 0.5585585585585585,
355
+ "grad_norm": 0.7318829298019409,
356
+ "learning_rate": 5.4545454545454546e-05,
357
+ "loss": 5.2971,
358
+ "memory/device_reserved (GiB)": 57.86,
359
+ "memory/max_active (GiB)": 52.62,
360
+ "memory/max_allocated (GiB)": 52.62,
361
+ "step": 31,
362
+ "tokens_per_second_per_gpu": 30.47
363
+ },
364
+ {
365
+ "epoch": 0.5765765765765766,
366
+ "grad_norm": 0.6973780393600464,
367
+ "learning_rate": 5.636363636363636e-05,
368
+ "loss": 5.0413,
369
+ "memory/device_reserved (GiB)": 57.86,
370
+ "memory/max_active (GiB)": 52.62,
371
+ "memory/max_allocated (GiB)": 52.62,
372
+ "step": 32,
373
+ "tokens_per_second_per_gpu": 39.86
374
+ },
375
+ {
376
+ "epoch": 0.5945945945945946,
377
+ "grad_norm": 0.7133749127388,
378
+ "learning_rate": 5.818181818181818e-05,
379
+ "loss": 5.1071,
380
+ "memory/device_reserved (GiB)": 57.86,
381
+ "memory/max_active (GiB)": 52.63,
382
+ "memory/max_allocated (GiB)": 52.63,
383
+ "step": 33,
384
+ "tokens_per_second_per_gpu": 23.77
385
+ },
386
+ {
387
+ "epoch": 0.6126126126126126,
388
+ "grad_norm": 0.6688926219940186,
389
+ "learning_rate": 6e-05,
390
+ "loss": 4.906,
391
+ "memory/device_reserved (GiB)": 57.86,
392
+ "memory/max_active (GiB)": 52.62,
393
+ "memory/max_allocated (GiB)": 52.62,
394
+ "step": 34,
395
+ "tokens_per_second_per_gpu": 24.91
396
+ },
397
+ {
398
+ "epoch": 0.6306306306306306,
399
+ "grad_norm": 0.6534309983253479,
400
+ "learning_rate": 6.181818181818182e-05,
401
+ "loss": 4.9764,
402
+ "memory/device_reserved (GiB)": 57.88,
403
+ "memory/max_active (GiB)": 52.64,
404
+ "memory/max_allocated (GiB)": 52.64,
405
+ "step": 35,
406
+ "tokens_per_second_per_gpu": 33.75
407
+ },
408
+ {
409
+ "epoch": 0.6486486486486487,
410
+ "grad_norm": 0.6284618377685547,
411
+ "learning_rate": 6.363636363636364e-05,
412
+ "loss": 4.8235,
413
+ "memory/device_reserved (GiB)": 57.91,
414
+ "memory/max_active (GiB)": 52.64,
415
+ "memory/max_allocated (GiB)": 52.64,
416
+ "step": 36,
417
+ "tokens_per_second_per_gpu": 32.66
418
+ },
419
+ {
420
+ "epoch": 0.6666666666666666,
421
+ "grad_norm": 0.5952987670898438,
422
+ "learning_rate": 6.545454545454546e-05,
423
+ "loss": 4.7779,
424
+ "memory/device_reserved (GiB)": 57.91,
425
+ "memory/max_active (GiB)": 52.63,
426
+ "memory/max_allocated (GiB)": 52.63,
427
+ "step": 37,
428
+ "tokens_per_second_per_gpu": 28.28
429
+ },
430
+ {
431
+ "epoch": 0.6846846846846847,
432
+ "grad_norm": 0.6216407418251038,
433
+ "learning_rate": 6.727272727272727e-05,
434
+ "loss": 4.7969,
435
+ "memory/device_reserved (GiB)": 57.94,
436
+ "memory/max_active (GiB)": 52.64,
437
+ "memory/max_allocated (GiB)": 52.64,
438
+ "step": 38,
439
+ "tokens_per_second_per_gpu": 30.94
440
+ },
441
+ {
442
+ "epoch": 0.7027027027027027,
443
+ "grad_norm": 0.5679822564125061,
444
+ "learning_rate": 6.90909090909091e-05,
445
+ "loss": 4.7705,
446
+ "memory/device_reserved (GiB)": 57.95,
447
+ "memory/max_active (GiB)": 52.63,
448
+ "memory/max_allocated (GiB)": 52.63,
449
+ "step": 39,
450
+ "tokens_per_second_per_gpu": 39.6
451
+ },
452
+ {
453
+ "epoch": 0.7207207207207207,
454
+ "grad_norm": 0.5590559244155884,
455
+ "learning_rate": 7.090909090909092e-05,
456
+ "loss": 4.7452,
457
+ "memory/device_reserved (GiB)": 57.95,
458
+ "memory/max_active (GiB)": 52.63,
459
+ "memory/max_allocated (GiB)": 52.63,
460
+ "step": 40,
461
+ "tokens_per_second_per_gpu": 37.24
462
+ },
463
+ {
464
+ "epoch": 0.7387387387387387,
465
+ "grad_norm": 0.5368968844413757,
466
+ "learning_rate": 7.272727272727273e-05,
467
+ "loss": 4.7783,
468
+ "memory/device_reserved (GiB)": 57.95,
469
+ "memory/max_active (GiB)": 52.63,
470
+ "memory/max_allocated (GiB)": 52.63,
471
+ "step": 41,
472
+ "tokens_per_second_per_gpu": 28.42
473
+ },
474
+ {
475
+ "epoch": 0.7567567567567568,
476
+ "grad_norm": 0.5522942543029785,
477
+ "learning_rate": 7.454545454545455e-05,
478
+ "loss": 4.7181,
479
+ "memory/device_reserved (GiB)": 57.95,
480
+ "memory/max_active (GiB)": 52.64,
481
+ "memory/max_allocated (GiB)": 52.64,
482
+ "step": 42,
483
+ "tokens_per_second_per_gpu": 32.13
484
+ },
485
+ {
486
+ "epoch": 0.7747747747747747,
487
+ "grad_norm": 0.4933941066265106,
488
+ "learning_rate": 7.636363636363637e-05,
489
+ "loss": 4.8171,
490
+ "memory/device_reserved (GiB)": 57.95,
491
+ "memory/max_active (GiB)": 52.63,
492
+ "memory/max_allocated (GiB)": 52.63,
493
+ "step": 43,
494
+ "tokens_per_second_per_gpu": 24.91
495
+ },
496
+ {
497
+ "epoch": 0.7927927927927928,
498
+ "grad_norm": 0.487724244594574,
499
+ "learning_rate": 7.818181818181818e-05,
500
+ "loss": 4.5267,
501
+ "memory/device_reserved (GiB)": 57.95,
502
+ "memory/max_active (GiB)": 52.64,
503
+ "memory/max_allocated (GiB)": 52.64,
504
+ "step": 44,
505
+ "tokens_per_second_per_gpu": 38.94
506
+ },
507
+ {
508
+ "epoch": 0.8108108108108109,
509
+ "grad_norm": 0.484387069940567,
510
+ "learning_rate": 8e-05,
511
+ "loss": 4.4651,
512
+ "memory/device_reserved (GiB)": 57.95,
513
+ "memory/max_active (GiB)": 52.62,
514
+ "memory/max_allocated (GiB)": 52.62,
515
+ "step": 45,
516
+ "tokens_per_second_per_gpu": 28.12
517
+ },
518
+ {
519
+ "epoch": 0.8288288288288288,
520
+ "grad_norm": 0.47269508242607117,
521
+ "learning_rate": 8.181818181818183e-05,
522
+ "loss": 4.4782,
523
+ "memory/device_reserved (GiB)": 57.95,
524
+ "memory/max_active (GiB)": 52.63,
525
+ "memory/max_allocated (GiB)": 52.63,
526
+ "step": 46,
527
+ "tokens_per_second_per_gpu": 37.47
528
+ },
529
+ {
530
+ "epoch": 0.8468468468468469,
531
+ "grad_norm": 0.4501938819885254,
532
+ "learning_rate": 8.363636363636364e-05,
533
+ "loss": 4.2217,
534
+ "memory/device_reserved (GiB)": 57.95,
535
+ "memory/max_active (GiB)": 52.64,
536
+ "memory/max_allocated (GiB)": 52.64,
537
+ "step": 47,
538
+ "tokens_per_second_per_gpu": 36.52
539
+ },
540
+ {
541
+ "epoch": 0.8648648648648649,
542
+ "grad_norm": 0.4240598678588867,
543
+ "learning_rate": 8.545454545454545e-05,
544
+ "loss": 4.2584,
545
+ "memory/device_reserved (GiB)": 57.95,
546
+ "memory/max_active (GiB)": 52.63,
547
+ "memory/max_allocated (GiB)": 52.63,
548
+ "step": 48,
549
+ "tokens_per_second_per_gpu": 25.0
550
+ },
551
+ {
552
+ "epoch": 0.8828828828828829,
553
+ "grad_norm": 0.4062960743904114,
554
+ "learning_rate": 8.727272727272727e-05,
555
+ "loss": 4.3664,
556
+ "memory/device_reserved (GiB)": 57.95,
557
+ "memory/max_active (GiB)": 52.66,
558
+ "memory/max_allocated (GiB)": 52.66,
559
+ "step": 49,
560
+ "tokens_per_second_per_gpu": 34.05
561
+ },
562
+ {
563
+ "epoch": 0.9009009009009009,
564
+ "grad_norm": 0.4040940999984741,
565
+ "learning_rate": 8.90909090909091e-05,
566
+ "loss": 4.31,
567
+ "memory/device_reserved (GiB)": 57.95,
568
+ "memory/max_active (GiB)": 52.63,
569
+ "memory/max_allocated (GiB)": 52.63,
570
+ "step": 50,
571
+ "tokens_per_second_per_gpu": 31.81
572
+ },
573
+ {
574
+ "epoch": 0.918918918918919,
575
+ "grad_norm": 0.38634198904037476,
576
+ "learning_rate": 9.090909090909092e-05,
577
+ "loss": 4.1829,
578
+ "memory/device_reserved (GiB)": 57.95,
579
+ "memory/max_active (GiB)": 52.64,
580
+ "memory/max_allocated (GiB)": 52.64,
581
+ "step": 51,
582
+ "tokens_per_second_per_gpu": 31.96
583
+ },
584
+ {
585
+ "epoch": 0.9369369369369369,
586
+ "grad_norm": 0.4119090139865875,
587
+ "learning_rate": 9.272727272727273e-05,
588
+ "loss": 4.211,
589
+ "memory/device_reserved (GiB)": 57.95,
590
+ "memory/max_active (GiB)": 52.64,
591
+ "memory/max_allocated (GiB)": 52.64,
592
+ "step": 52,
593
+ "tokens_per_second_per_gpu": 26.32
594
+ },
595
+ {
596
+ "epoch": 0.954954954954955,
597
+ "grad_norm": 0.39360716938972473,
598
+ "learning_rate": 9.454545454545455e-05,
599
+ "loss": 4.1027,
600
+ "memory/device_reserved (GiB)": 57.95,
601
+ "memory/max_active (GiB)": 52.62,
602
+ "memory/max_allocated (GiB)": 52.62,
603
+ "step": 53,
604
+ "tokens_per_second_per_gpu": 36.47
605
+ },
606
+ {
607
+ "epoch": 0.972972972972973,
608
+ "grad_norm": 0.358804851770401,
609
+ "learning_rate": 9.636363636363637e-05,
610
+ "loss": 4.1262,
611
+ "memory/device_reserved (GiB)": 57.95,
612
+ "memory/max_active (GiB)": 52.63,
613
+ "memory/max_allocated (GiB)": 52.63,
614
+ "step": 54,
615
+ "tokens_per_second_per_gpu": 35.0
616
+ },
617
+ {
618
+ "epoch": 0.990990990990991,
619
+ "grad_norm": 0.3619638681411743,
620
+ "learning_rate": 9.818181818181818e-05,
621
+ "loss": 3.9671,
622
+ "memory/device_reserved (GiB)": 57.95,
623
+ "memory/max_active (GiB)": 52.64,
624
+ "memory/max_allocated (GiB)": 52.64,
625
+ "step": 55,
626
+ "tokens_per_second_per_gpu": 21.34
627
+ },
628
+ {
629
+ "epoch": 0.990990990990991,
630
+ "eval_loss": 4.040650367736816,
631
+ "eval_runtime": 179.0976,
632
+ "eval_samples_per_second": 0.558,
633
+ "eval_steps_per_second": 0.073,
634
+ "memory/device_reserved (GiB)": 57.95,
635
+ "memory/max_active (GiB)": 43.5,
636
+ "memory/max_allocated (GiB)": 43.5,
637
+ "step": 55
638
+ },
639
+ {
640
+ "epoch": 1.0,
641
+ "grad_norm": 0.3906014859676361,
642
+ "learning_rate": 0.0001,
643
+ "loss": 4.1367,
644
+ "memory/device_reserved (GiB)": 57.88,
645
+ "memory/max_active (GiB)": 52.55,
646
+ "memory/max_allocated (GiB)": 52.55,
647
+ "step": 56,
648
+ "tokens_per_second_per_gpu": 28.38
649
+ },
650
+ {
651
+ "epoch": 1.018018018018018,
652
+ "grad_norm": 0.34430640935897827,
653
+ "learning_rate": 0.00010181818181818181,
654
+ "loss": 3.9357,
655
+ "memory/device_reserved (GiB)": 57.91,
656
+ "memory/max_active (GiB)": 52.62,
657
+ "memory/max_allocated (GiB)": 52.62,
658
+ "step": 57,
659
+ "tokens_per_second_per_gpu": 26.74
660
+ },
661
+ {
662
+ "epoch": 1.0360360360360361,
663
+ "grad_norm": 0.348283588886261,
664
+ "learning_rate": 0.00010363636363636364,
665
+ "loss": 3.9594,
666
+ "memory/device_reserved (GiB)": 57.91,
667
+ "memory/max_active (GiB)": 52.63,
668
+ "memory/max_allocated (GiB)": 52.63,
669
+ "step": 58,
670
+ "tokens_per_second_per_gpu": 25.8
671
+ },
672
+ {
673
+ "epoch": 1.054054054054054,
674
+ "grad_norm": 0.3484898507595062,
675
+ "learning_rate": 0.00010545454545454545,
676
+ "loss": 4.0163,
677
+ "memory/device_reserved (GiB)": 57.91,
678
+ "memory/max_active (GiB)": 52.63,
679
+ "memory/max_allocated (GiB)": 52.63,
680
+ "step": 59,
681
+ "tokens_per_second_per_gpu": 26.23
682
+ },
683
+ {
684
+ "epoch": 1.072072072072072,
685
+ "grad_norm": 0.3627394735813141,
686
+ "learning_rate": 0.00010727272727272728,
687
+ "loss": 3.9347,
688
+ "memory/device_reserved (GiB)": 57.91,
689
+ "memory/max_active (GiB)": 52.64,
690
+ "memory/max_allocated (GiB)": 52.64,
691
+ "step": 60,
692
+ "tokens_per_second_per_gpu": 32.7
693
+ },
694
+ {
695
+ "epoch": 1.09009009009009,
696
+ "grad_norm": 0.3439123034477234,
697
+ "learning_rate": 0.00010909090909090909,
698
+ "loss": 3.9091,
699
+ "memory/device_reserved (GiB)": 57.92,
700
+ "memory/max_active (GiB)": 52.64,
701
+ "memory/max_allocated (GiB)": 52.64,
702
+ "step": 61,
703
+ "tokens_per_second_per_gpu": 46.09
704
+ },
705
+ {
706
+ "epoch": 1.1081081081081081,
707
+ "grad_norm": 0.34011831879615784,
708
+ "learning_rate": 0.00011090909090909092,
709
+ "loss": 3.8579,
710
+ "memory/device_reserved (GiB)": 57.92,
711
+ "memory/max_active (GiB)": 52.63,
712
+ "memory/max_allocated (GiB)": 52.63,
713
+ "step": 62,
714
+ "tokens_per_second_per_gpu": 34.82
715
+ },
716
+ {
717
+ "epoch": 1.1261261261261262,
718
+ "grad_norm": 0.3363277018070221,
719
+ "learning_rate": 0.00011272727272727272,
720
+ "loss": 3.8762,
721
+ "memory/device_reserved (GiB)": 57.92,
722
+ "memory/max_active (GiB)": 52.62,
723
+ "memory/max_allocated (GiB)": 52.62,
724
+ "step": 63,
725
+ "tokens_per_second_per_gpu": 41.68
726
+ },
727
+ {
728
+ "epoch": 1.1441441441441442,
729
+ "grad_norm": 0.30976247787475586,
730
+ "learning_rate": 0.00011454545454545456,
731
+ "loss": 3.8585,
732
+ "memory/device_reserved (GiB)": 57.92,
733
+ "memory/max_active (GiB)": 52.63,
734
+ "memory/max_allocated (GiB)": 52.63,
735
+ "step": 64,
736
+ "tokens_per_second_per_gpu": 37.27
737
+ },
738
+ {
739
+ "epoch": 1.1621621621621623,
740
+ "grad_norm": 0.3248283565044403,
741
+ "learning_rate": 0.00011636363636363636,
742
+ "loss": 3.7179,
743
+ "memory/device_reserved (GiB)": 57.92,
744
+ "memory/max_active (GiB)": 52.62,
745
+ "memory/max_allocated (GiB)": 52.62,
746
+ "step": 65,
747
+ "tokens_per_second_per_gpu": 26.17
748
+ },
749
+ {
750
+ "epoch": 1.1801801801801801,
751
+ "grad_norm": 0.3173442482948303,
752
+ "learning_rate": 0.0001181818181818182,
753
+ "loss": 3.8197,
754
+ "memory/device_reserved (GiB)": 57.92,
755
+ "memory/max_active (GiB)": 52.62,
756
+ "memory/max_allocated (GiB)": 52.62,
757
+ "step": 66,
758
+ "tokens_per_second_per_gpu": 28.41
759
+ },
760
+ {
761
+ "epoch": 1.1981981981981982,
762
+ "grad_norm": 0.33076199889183044,
763
+ "learning_rate": 0.00012,
764
+ "loss": 3.6631,
765
+ "memory/device_reserved (GiB)": 57.92,
766
+ "memory/max_active (GiB)": 52.61,
767
+ "memory/max_allocated (GiB)": 52.61,
768
+ "step": 67,
769
+ "tokens_per_second_per_gpu": 32.3
770
+ },
771
+ {
772
+ "epoch": 1.2162162162162162,
773
+ "grad_norm": 0.32531851530075073,
774
+ "learning_rate": 0.00012181818181818183,
775
+ "loss": 3.6563,
776
+ "memory/device_reserved (GiB)": 57.92,
777
+ "memory/max_active (GiB)": 52.62,
778
+ "memory/max_allocated (GiB)": 52.62,
779
+ "step": 68,
780
+ "tokens_per_second_per_gpu": 32.11
781
+ },
782
+ {
783
+ "epoch": 1.2342342342342343,
784
+ "grad_norm": 0.295604944229126,
785
+ "learning_rate": 0.00012363636363636364,
786
+ "loss": 3.6487,
787
+ "memory/device_reserved (GiB)": 57.92,
788
+ "memory/max_active (GiB)": 52.62,
789
+ "memory/max_allocated (GiB)": 52.62,
790
+ "step": 69,
791
+ "tokens_per_second_per_gpu": 31.08
792
+ },
793
+ {
794
+ "epoch": 1.2522522522522523,
795
+ "grad_norm": 0.3253607749938965,
796
+ "learning_rate": 0.00012545454545454546,
797
+ "loss": 3.741,
798
+ "memory/device_reserved (GiB)": 57.92,
799
+ "memory/max_active (GiB)": 52.61,
800
+ "memory/max_allocated (GiB)": 52.61,
801
+ "step": 70,
802
+ "tokens_per_second_per_gpu": 29.52
803
+ },
804
+ {
805
+ "epoch": 1.2702702702702702,
806
+ "grad_norm": 0.28945258259773254,
807
+ "learning_rate": 0.00012727272727272728,
808
+ "loss": 3.6727,
809
+ "memory/device_reserved (GiB)": 57.92,
810
+ "memory/max_active (GiB)": 52.61,
811
+ "memory/max_allocated (GiB)": 52.61,
812
+ "step": 71,
813
+ "tokens_per_second_per_gpu": 38.43
814
+ },
815
+ {
816
+ "epoch": 1.2882882882882882,
817
+ "grad_norm": 0.287298321723938,
818
+ "learning_rate": 0.0001290909090909091,
819
+ "loss": 3.5821,
820
+ "memory/device_reserved (GiB)": 57.92,
821
+ "memory/max_active (GiB)": 52.61,
822
+ "memory/max_allocated (GiB)": 52.61,
823
+ "step": 72,
824
+ "tokens_per_second_per_gpu": 29.55
825
+ },
826
+ {
827
+ "epoch": 1.3063063063063063,
828
+ "grad_norm": 0.26835423707962036,
829
+ "learning_rate": 0.00013090909090909093,
830
+ "loss": 3.648,
831
+ "memory/device_reserved (GiB)": 57.92,
832
+ "memory/max_active (GiB)": 52.62,
833
+ "memory/max_allocated (GiB)": 52.62,
834
+ "step": 73,
835
+ "tokens_per_second_per_gpu": 37.82
836
+ },
837
+ {
838
+ "epoch": 1.3243243243243243,
839
+ "grad_norm": 0.27674639225006104,
840
+ "learning_rate": 0.00013272727272727275,
841
+ "loss": 3.4623,
842
+ "memory/device_reserved (GiB)": 57.92,
843
+ "memory/max_active (GiB)": 52.62,
844
+ "memory/max_allocated (GiB)": 52.62,
845
+ "step": 74,
846
+ "tokens_per_second_per_gpu": 48.99
847
+ },
848
+ {
849
+ "epoch": 1.3423423423423424,
850
+ "grad_norm": 0.28284698724746704,
851
+ "learning_rate": 0.00013454545454545455,
852
+ "loss": 3.4366,
853
+ "memory/device_reserved (GiB)": 57.92,
854
+ "memory/max_active (GiB)": 52.61,
855
+ "memory/max_allocated (GiB)": 52.61,
856
+ "step": 75,
857
+ "tokens_per_second_per_gpu": 32.8
858
+ },
859
+ {
860
+ "epoch": 1.3603603603603602,
861
+ "grad_norm": 0.2780005931854248,
862
+ "learning_rate": 0.00013636363636363637,
863
+ "loss": 3.4308,
864
+ "memory/device_reserved (GiB)": 57.92,
865
+ "memory/max_active (GiB)": 52.61,
866
+ "memory/max_allocated (GiB)": 52.61,
867
+ "step": 76,
868
+ "tokens_per_second_per_gpu": 33.52
869
+ },
870
+ {
871
+ "epoch": 1.3783783783783785,
872
+ "grad_norm": 0.2978385388851166,
873
+ "learning_rate": 0.0001381818181818182,
874
+ "loss": 3.4822,
875
+ "memory/device_reserved (GiB)": 57.92,
876
+ "memory/max_active (GiB)": 52.61,
877
+ "memory/max_allocated (GiB)": 52.61,
878
+ "step": 77,
879
+ "tokens_per_second_per_gpu": 35.74
880
+ },
881
+ {
882
+ "epoch": 1.3963963963963963,
883
+ "grad_norm": 0.28048908710479736,
884
+ "learning_rate": 0.00014,
885
+ "loss": 3.4922,
886
+ "memory/device_reserved (GiB)": 57.92,
887
+ "memory/max_active (GiB)": 52.62,
888
+ "memory/max_allocated (GiB)": 52.62,
889
+ "step": 78,
890
+ "tokens_per_second_per_gpu": 32.76
891
+ },
892
+ {
893
+ "epoch": 1.4144144144144144,
894
+ "grad_norm": 0.2921410799026489,
895
+ "learning_rate": 0.00014181818181818184,
896
+ "loss": 3.5634,
897
+ "memory/device_reserved (GiB)": 57.92,
898
+ "memory/max_active (GiB)": 52.61,
899
+ "memory/max_allocated (GiB)": 52.61,
900
+ "step": 79,
901
+ "tokens_per_second_per_gpu": 22.05
902
+ },
903
+ {
904
+ "epoch": 1.4324324324324325,
905
+ "grad_norm": 0.28046825528144836,
906
+ "learning_rate": 0.00014363636363636363,
907
+ "loss": 3.4562,
908
+ "memory/device_reserved (GiB)": 57.92,
909
+ "memory/max_active (GiB)": 52.61,
910
+ "memory/max_allocated (GiB)": 52.61,
911
+ "step": 80,
912
+ "tokens_per_second_per_gpu": 27.93
913
+ },
914
+ {
915
+ "epoch": 1.4504504504504505,
916
+ "grad_norm": 0.28950053453445435,
917
+ "learning_rate": 0.00014545454545454546,
918
+ "loss": 3.4771,
919
+ "memory/device_reserved (GiB)": 57.92,
920
+ "memory/max_active (GiB)": 52.6,
921
+ "memory/max_allocated (GiB)": 52.6,
922
+ "step": 81,
923
+ "tokens_per_second_per_gpu": 43.19
924
+ },
925
+ {
926
+ "epoch": 1.4684684684684686,
927
+ "grad_norm": 0.2990242838859558,
928
+ "learning_rate": 0.00014727272727272728,
929
+ "loss": 3.4552,
930
+ "memory/device_reserved (GiB)": 57.92,
931
+ "memory/max_active (GiB)": 52.61,
932
+ "memory/max_allocated (GiB)": 52.61,
933
+ "step": 82,
934
+ "tokens_per_second_per_gpu": 39.02
935
+ },
936
+ {
937
+ "epoch": 1.4864864864864864,
938
+ "grad_norm": 0.3110749125480652,
939
+ "learning_rate": 0.0001490909090909091,
940
+ "loss": 3.3635,
941
+ "memory/device_reserved (GiB)": 57.92,
942
+ "memory/max_active (GiB)": 52.61,
943
+ "memory/max_allocated (GiB)": 52.61,
944
+ "step": 83,
945
+ "tokens_per_second_per_gpu": 28.78
946
+ },
947
+ {
948
+ "epoch": 1.5045045045045045,
949
+ "grad_norm": 0.2659832537174225,
950
+ "learning_rate": 0.0001509090909090909,
951
+ "loss": 3.309,
952
+ "memory/device_reserved (GiB)": 57.92,
953
+ "memory/max_active (GiB)": 52.61,
954
+ "memory/max_allocated (GiB)": 52.61,
955
+ "step": 84,
956
+ "tokens_per_second_per_gpu": 26.52
957
+ },
958
+ {
959
+ "epoch": 1.5225225225225225,
960
+ "grad_norm": 0.2891514003276825,
961
+ "learning_rate": 0.00015272727272727275,
962
+ "loss": 3.2953,
963
+ "memory/device_reserved (GiB)": 57.92,
964
+ "memory/max_active (GiB)": 52.61,
965
+ "memory/max_allocated (GiB)": 52.61,
966
+ "step": 85,
967
+ "tokens_per_second_per_gpu": 37.61
968
+ },
969
+ {
970
+ "epoch": 1.5405405405405406,
971
+ "grad_norm": 0.2862309217453003,
972
+ "learning_rate": 0.00015454545454545454,
973
+ "loss": 3.3016,
974
+ "memory/device_reserved (GiB)": 57.92,
975
+ "memory/max_active (GiB)": 52.6,
976
+ "memory/max_allocated (GiB)": 52.6,
977
+ "step": 86,
978
+ "tokens_per_second_per_gpu": 49.89
979
+ },
980
+ {
981
+ "epoch": 1.5585585585585586,
982
+ "grad_norm": 0.3269289433956146,
983
+ "learning_rate": 0.00015636363636363637,
984
+ "loss": 3.4022,
985
+ "memory/device_reserved (GiB)": 57.92,
986
+ "memory/max_active (GiB)": 52.61,
987
+ "memory/max_allocated (GiB)": 52.61,
988
+ "step": 87,
989
+ "tokens_per_second_per_gpu": 30.66
990
+ },
991
+ {
992
+ "epoch": 1.5765765765765765,
993
+ "grad_norm": 0.2758469581604004,
994
+ "learning_rate": 0.0001581818181818182,
995
+ "loss": 3.1596,
996
+ "memory/device_reserved (GiB)": 57.92,
997
+ "memory/max_active (GiB)": 52.6,
998
+ "memory/max_allocated (GiB)": 52.6,
999
+ "step": 88,
1000
+ "tokens_per_second_per_gpu": 39.43
1001
+ },
1002
+ {
1003
+ "epoch": 1.5945945945945947,
1004
+ "grad_norm": 0.2842893600463867,
1005
+ "learning_rate": 0.00016,
1006
+ "loss": 3.2368,
1007
+ "memory/device_reserved (GiB)": 57.92,
1008
+ "memory/max_active (GiB)": 52.61,
1009
+ "memory/max_allocated (GiB)": 52.61,
1010
+ "step": 89,
1011
+ "tokens_per_second_per_gpu": 23.64
1012
+ },
1013
+ {
1014
+ "epoch": 1.6126126126126126,
1015
+ "grad_norm": 0.27873268723487854,
1016
+ "learning_rate": 0.00016181818181818184,
1017
+ "loss": 3.1778,
1018
+ "memory/device_reserved (GiB)": 57.92,
1019
+ "memory/max_active (GiB)": 52.6,
1020
+ "memory/max_allocated (GiB)": 52.6,
1021
+ "step": 90,
1022
+ "tokens_per_second_per_gpu": 24.52
1023
+ },
1024
+ {
1025
+ "epoch": 1.6306306306306306,
1026
+ "grad_norm": 0.25983887910842896,
1027
+ "learning_rate": 0.00016363636363636366,
1028
+ "loss": 3.2287,
1029
+ "memory/device_reserved (GiB)": 57.92,
1030
+ "memory/max_active (GiB)": 52.6,
1031
+ "memory/max_allocated (GiB)": 52.6,
1032
+ "step": 91,
1033
+ "tokens_per_second_per_gpu": 34.4
1034
+ },
1035
+ {
1036
+ "epoch": 1.6486486486486487,
1037
+ "grad_norm": 0.2840956151485443,
1038
+ "learning_rate": 0.00016545454545454545,
1039
+ "loss": 3.1411,
1040
+ "memory/device_reserved (GiB)": 57.92,
1041
+ "memory/max_active (GiB)": 52.6,
1042
+ "memory/max_allocated (GiB)": 52.6,
1043
+ "step": 92,
1044
+ "tokens_per_second_per_gpu": 33.89
1045
+ },
1046
+ {
1047
+ "epoch": 1.6666666666666665,
1048
+ "grad_norm": 0.2628091275691986,
1049
+ "learning_rate": 0.00016727272727272728,
1050
+ "loss": 3.1159,
1051
+ "memory/device_reserved (GiB)": 57.92,
1052
+ "memory/max_active (GiB)": 52.6,
1053
+ "memory/max_allocated (GiB)": 52.6,
1054
+ "step": 93,
1055
+ "tokens_per_second_per_gpu": 29.27
1056
+ },
1057
+ {
1058
+ "epoch": 1.6846846846846848,
1059
+ "grad_norm": 0.2681942582130432,
1060
+ "learning_rate": 0.0001690909090909091,
1061
+ "loss": 3.1647,
1062
+ "memory/device_reserved (GiB)": 57.92,
1063
+ "memory/max_active (GiB)": 52.61,
1064
+ "memory/max_allocated (GiB)": 52.61,
1065
+ "step": 94,
1066
+ "tokens_per_second_per_gpu": 30.7
1067
+ },
1068
+ {
1069
+ "epoch": 1.7027027027027026,
1070
+ "grad_norm": 0.2515859603881836,
1071
+ "learning_rate": 0.0001709090909090909,
1072
+ "loss": 3.1587,
1073
+ "memory/device_reserved (GiB)": 57.92,
1074
+ "memory/max_active (GiB)": 52.61,
1075
+ "memory/max_allocated (GiB)": 52.61,
1076
+ "step": 95,
1077
+ "tokens_per_second_per_gpu": 40.07
1078
+ },
1079
+ {
1080
+ "epoch": 1.7207207207207207,
1081
+ "grad_norm": 0.2735103666782379,
1082
+ "learning_rate": 0.00017272727272727275,
1083
+ "loss": 3.1537,
1084
+ "memory/device_reserved (GiB)": 57.92,
1085
+ "memory/max_active (GiB)": 52.6,
1086
+ "memory/max_allocated (GiB)": 52.6,
1087
+ "step": 96,
1088
+ "tokens_per_second_per_gpu": 37.91
1089
+ },
1090
+ {
1091
+ "epoch": 1.7387387387387387,
1092
+ "grad_norm": 0.24973994493484497,
1093
+ "learning_rate": 0.00017454545454545454,
1094
+ "loss": 3.2266,
1095
+ "memory/device_reserved (GiB)": 57.92,
1096
+ "memory/max_active (GiB)": 52.61,
1097
+ "memory/max_allocated (GiB)": 52.61,
1098
+ "step": 97,
1099
+ "tokens_per_second_per_gpu": 28.61
1100
+ },
1101
+ {
1102
+ "epoch": 1.7567567567567568,
1103
+ "grad_norm": 0.26508864760398865,
1104
+ "learning_rate": 0.00017636363636363637,
1105
+ "loss": 3.135,
1106
+ "memory/device_reserved (GiB)": 57.92,
1107
+ "memory/max_active (GiB)": 52.6,
1108
+ "memory/max_allocated (GiB)": 52.6,
1109
+ "step": 98,
1110
+ "tokens_per_second_per_gpu": 34.76
1111
+ },
1112
+ {
1113
+ "epoch": 1.7747747747747749,
1114
+ "grad_norm": 0.2922559678554535,
1115
+ "learning_rate": 0.0001781818181818182,
1116
+ "loss": 3.359,
1117
+ "memory/device_reserved (GiB)": 57.92,
1118
+ "memory/max_active (GiB)": 52.6,
1119
+ "memory/max_allocated (GiB)": 52.6,
1120
+ "step": 99,
1121
+ "tokens_per_second_per_gpu": 26.34
1122
+ },
1123
+ {
1124
+ "epoch": 1.7927927927927927,
1125
+ "grad_norm": 0.2632916271686554,
1126
+ "learning_rate": 0.00018,
1127
+ "loss": 3.1131,
1128
+ "memory/device_reserved (GiB)": 57.92,
1129
+ "memory/max_active (GiB)": 52.61,
1130
+ "memory/max_allocated (GiB)": 52.61,
1131
+ "step": 100,
1132
+ "tokens_per_second_per_gpu": 42.17
1133
+ },
1134
+ {
1135
+ "epoch": 1.810810810810811,
1136
+ "grad_norm": 0.2974204123020172,
1137
+ "learning_rate": 0.00018181818181818183,
1138
+ "loss": 3.1127,
1139
+ "memory/device_reserved (GiB)": 57.92,
1140
+ "memory/max_active (GiB)": 52.6,
1141
+ "memory/max_allocated (GiB)": 52.6,
1142
+ "step": 101,
1143
+ "tokens_per_second_per_gpu": 30.71
1144
+ },
1145
+ {
1146
+ "epoch": 1.8288288288288288,
1147
+ "grad_norm": 0.28947019577026367,
1148
+ "learning_rate": 0.00018363636363636366,
1149
+ "loss": 3.1019,
1150
+ "memory/device_reserved (GiB)": 57.92,
1151
+ "memory/max_active (GiB)": 52.6,
1152
+ "memory/max_allocated (GiB)": 52.6,
1153
+ "step": 102,
1154
+ "tokens_per_second_per_gpu": 40.45
1155
+ },
1156
+ {
1157
+ "epoch": 1.8468468468468469,
1158
+ "grad_norm": 0.29779183864593506,
1159
+ "learning_rate": 0.00018545454545454545,
1160
+ "loss": 2.8855,
1161
+ "memory/device_reserved (GiB)": 57.92,
1162
+ "memory/max_active (GiB)": 52.6,
1163
+ "memory/max_allocated (GiB)": 52.6,
1164
+ "step": 103,
1165
+ "tokens_per_second_per_gpu": 38.82
1166
+ },
1167
+ {
1168
+ "epoch": 1.864864864864865,
1169
+ "grad_norm": 0.27393272519111633,
1170
+ "learning_rate": 0.00018727272727272728,
1171
+ "loss": 2.9937,
1172
+ "memory/device_reserved (GiB)": 57.92,
1173
+ "memory/max_active (GiB)": 52.6,
1174
+ "memory/max_allocated (GiB)": 52.6,
1175
+ "step": 104,
1176
+ "tokens_per_second_per_gpu": 26.96
1177
+ },
1178
+ {
1179
+ "epoch": 1.8828828828828827,
1180
+ "grad_norm": 0.28197985887527466,
1181
+ "learning_rate": 0.0001890909090909091,
1182
+ "loss": 3.1189,
1183
+ "memory/device_reserved (GiB)": 57.92,
1184
+ "memory/max_active (GiB)": 52.6,
1185
+ "memory/max_allocated (GiB)": 52.6,
1186
+ "step": 105,
1187
+ "tokens_per_second_per_gpu": 37.87
1188
+ },
1189
+ {
1190
+ "epoch": 1.900900900900901,
1191
+ "grad_norm": 0.27397748827934265,
1192
+ "learning_rate": 0.00019090909090909092,
1193
+ "loss": 3.0858,
1194
+ "memory/device_reserved (GiB)": 57.92,
1195
+ "memory/max_active (GiB)": 52.59,
1196
+ "memory/max_allocated (GiB)": 52.59,
1197
+ "step": 106,
1198
+ "tokens_per_second_per_gpu": 35.13
1199
+ },
1200
+ {
1201
+ "epoch": 1.9189189189189189,
1202
+ "grad_norm": 0.274027943611145,
1203
+ "learning_rate": 0.00019272727272727274,
1204
+ "loss": 2.9537,
1205
+ "memory/device_reserved (GiB)": 57.92,
1206
+ "memory/max_active (GiB)": 52.59,
1207
+ "memory/max_allocated (GiB)": 52.59,
1208
+ "step": 107,
1209
+ "tokens_per_second_per_gpu": 33.48
1210
+ },
1211
+ {
1212
+ "epoch": 1.936936936936937,
1213
+ "grad_norm": 0.2898459732532501,
1214
+ "learning_rate": 0.00019454545454545457,
1215
+ "loss": 2.9996,
1216
+ "memory/device_reserved (GiB)": 57.92,
1217
+ "memory/max_active (GiB)": 52.6,
1218
+ "memory/max_allocated (GiB)": 52.6,
1219
+ "step": 108,
1220
+ "tokens_per_second_per_gpu": 27.65
1221
+ },
1222
+ {
1223
+ "epoch": 1.954954954954955,
1224
+ "grad_norm": 0.2991600036621094,
1225
+ "learning_rate": 0.00019636363636363636,
1226
+ "loss": 2.9123,
1227
+ "memory/device_reserved (GiB)": 57.92,
1228
+ "memory/max_active (GiB)": 52.61,
1229
+ "memory/max_allocated (GiB)": 52.61,
1230
+ "step": 109,
1231
+ "tokens_per_second_per_gpu": 39.01
1232
+ },
1233
+ {
1234
+ "epoch": 1.972972972972973,
1235
+ "grad_norm": 0.27946925163269043,
1236
+ "learning_rate": 0.00019818181818181821,
1237
+ "loss": 3.0439,
1238
+ "memory/device_reserved (GiB)": 57.92,
1239
+ "memory/max_active (GiB)": 52.61,
1240
+ "memory/max_allocated (GiB)": 52.61,
1241
+ "step": 110,
1242
+ "tokens_per_second_per_gpu": 35.44
1243
+ },
1244
+ {
1245
+ "epoch": 1.972972972972973,
1246
+ "eval_loss": 2.936415672302246,
1247
+ "eval_runtime": 158.8267,
1248
+ "eval_samples_per_second": 0.63,
1249
+ "eval_steps_per_second": 0.082,
1250
+ "memory/device_reserved (GiB)": 57.92,
1251
+ "memory/max_active (GiB)": 43.5,
1252
+ "memory/max_allocated (GiB)": 43.5,
1253
+ "step": 110
1254
+ }
1255
+ ],
1256
+ "logging_steps": 1,
1257
+ "max_steps": 1100,
1258
+ "num_input_tokens_seen": 0,
1259
+ "num_train_epochs": 20,
1260
+ "save_steps": 55,
1261
+ "stateful_callbacks": {
1262
+ "TrainerControl": {
1263
+ "args": {
1264
+ "should_epoch_stop": false,
1265
+ "should_evaluate": false,
1266
+ "should_log": false,
1267
+ "should_save": true,
1268
+ "should_training_stop": false
1269
+ },
1270
+ "attributes": {}
1271
+ }
1272
+ },
1273
+ "total_flos": 1.0902735727406088e+19,
1274
+ "train_batch_size": 2,
1275
+ "trial_name": null,
1276
+ "trial_params": null
1277
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6ccf0ff07d5c330083c9eacf2c2e0e307a13dbd2363503c35a015caa695253a
3
+ size 7313
vocab.json ADDED
The diff for this file is too large to render. See raw diff