learn3r commited on
Commit
4d4ae1d
·
1 Parent(s): 84ec769

End of training

Browse files
Files changed (5) hide show
  1. README.md +21 -8
  2. all_results.json +18 -0
  3. eval_results.json +13 -0
  4. train_results.json +8 -0
  5. trainer_state.json +896 -0
README.md CHANGED
@@ -1,11 +1,24 @@
1
  ---
 
2
  tags:
3
  - generated_from_trainer
 
 
4
  metrics:
5
  - rouge
6
  model-index:
7
  - name: longt5_xl_gov_report_bp_10_continue
8
- results: []
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,14 +26,14 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # longt5_xl_gov_report_bp_10_continue
15
 
16
- This model was trained from scratch on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 1.8739
19
- - Rouge1: 70.7685
20
- - Rouge2: 42.5122
21
- - Rougel: 41.7454
22
- - Rougelsum: 68.0785
23
- - Gen Len: 671.4938
24
 
25
  ## Model description
26
 
 
1
  ---
2
+ base_model: /home/co-ou1/rds/hpc-work/transformers/examples/pytorch/summarization/longt5_xl_gov_report_bp_10/checkpoint-477
3
  tags:
4
  - generated_from_trainer
5
+ datasets:
6
+ - learn3r/gov_report_memsum_oracle
7
  metrics:
8
  - rouge
9
  model-index:
10
  - name: longt5_xl_gov_report_bp_10_continue
11
+ results:
12
+ - task:
13
+ name: Summarization
14
+ type: summarization
15
+ dataset:
16
+ name: learn3r/gov_report_memsum_oracle
17
+ type: learn3r/gov_report_memsum_oracle
18
+ metrics:
19
+ - name: Rouge1
20
+ type: rouge
21
+ value: 71.9439
22
  ---
23
 
24
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
26
 
27
  # longt5_xl_gov_report_bp_10_continue
28
 
29
+ This model is a fine-tuned version of [/home/co-ou1/rds/hpc-work/transformers/examples/pytorch/summarization/longt5_xl_gov_report_bp_10/checkpoint-477](https://huggingface.co//home/co-ou1/rds/hpc-work/transformers/examples/pytorch/summarization/longt5_xl_gov_report_bp_10/checkpoint-477) on the learn3r/gov_report_memsum_oracle dataset.
30
  It achieves the following results on the evaluation set:
31
+ - Loss: 1.4878
32
+ - Rouge1: 71.9439
33
+ - Rouge2: 43.7031
34
+ - Rougel: 41.8301
35
+ - Rougelsum: 69.1853
36
+ - Gen Len: 833.0319
37
 
38
  ## Model description
39
 
all_results.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.99,
3
+ "eval_gen_len": 833.0318930041152,
4
+ "eval_loss": 1.487816572189331,
5
+ "eval_rouge1": 71.9439,
6
+ "eval_rouge2": 43.7031,
7
+ "eval_rougeL": 41.8301,
8
+ "eval_rougeLsum": 69.1853,
9
+ "eval_runtime": 5147.063,
10
+ "eval_samples": 972,
11
+ "eval_samples_per_second": 0.189,
12
+ "eval_steps_per_second": 0.024,
13
+ "train_loss": 0.47826748297495003,
14
+ "train_runtime": 56154.1614,
15
+ "train_samples": 17457,
16
+ "train_samples_per_second": 1.244,
17
+ "train_steps_per_second": 0.005
18
+ }
eval_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.99,
3
+ "eval_gen_len": 833.0318930041152,
4
+ "eval_loss": 1.487816572189331,
5
+ "eval_rouge1": 71.9439,
6
+ "eval_rouge2": 43.7031,
7
+ "eval_rougeL": 41.8301,
8
+ "eval_rougeLsum": 69.1853,
9
+ "eval_runtime": 5147.063,
10
+ "eval_samples": 972,
11
+ "eval_samples_per_second": 0.189,
12
+ "eval_steps_per_second": 0.024
13
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.99,
3
+ "train_loss": 0.47826748297495003,
4
+ "train_runtime": 56154.1614,
5
+ "train_samples": 17457,
6
+ "train_samples_per_second": 1.244,
7
+ "train_steps_per_second": 0.005
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,896 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.487816572189331,
3
+ "best_model_checkpoint": "longt5_xl_gov_report_bp_10_continue/checkpoint-68",
4
+ "epoch": 3.9871736142922583,
5
+ "eval_steps": 500,
6
+ "global_step": 272,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.03,
13
+ "learning_rate": 0.001,
14
+ "loss": 0.6441,
15
+ "step": 2
16
+ },
17
+ {
18
+ "epoch": 0.06,
19
+ "learning_rate": 0.001,
20
+ "loss": 0.6068,
21
+ "step": 4
22
+ },
23
+ {
24
+ "epoch": 0.09,
25
+ "learning_rate": 0.001,
26
+ "loss": 0.6344,
27
+ "step": 6
28
+ },
29
+ {
30
+ "epoch": 0.12,
31
+ "learning_rate": 0.001,
32
+ "loss": 0.6447,
33
+ "step": 8
34
+ },
35
+ {
36
+ "epoch": 0.15,
37
+ "learning_rate": 0.001,
38
+ "loss": 0.6177,
39
+ "step": 10
40
+ },
41
+ {
42
+ "epoch": 0.18,
43
+ "learning_rate": 0.001,
44
+ "loss": 0.615,
45
+ "step": 12
46
+ },
47
+ {
48
+ "epoch": 0.21,
49
+ "learning_rate": 0.001,
50
+ "loss": 0.6113,
51
+ "step": 14
52
+ },
53
+ {
54
+ "epoch": 0.23,
55
+ "learning_rate": 0.001,
56
+ "loss": 0.6076,
57
+ "step": 16
58
+ },
59
+ {
60
+ "epoch": 0.26,
61
+ "learning_rate": 0.001,
62
+ "loss": 0.5993,
63
+ "step": 18
64
+ },
65
+ {
66
+ "epoch": 0.29,
67
+ "learning_rate": 0.001,
68
+ "loss": 0.6105,
69
+ "step": 20
70
+ },
71
+ {
72
+ "epoch": 0.32,
73
+ "learning_rate": 0.001,
74
+ "loss": 0.6164,
75
+ "step": 22
76
+ },
77
+ {
78
+ "epoch": 0.35,
79
+ "learning_rate": 0.001,
80
+ "loss": 0.6188,
81
+ "step": 24
82
+ },
83
+ {
84
+ "epoch": 0.38,
85
+ "learning_rate": 0.001,
86
+ "loss": 0.617,
87
+ "step": 26
88
+ },
89
+ {
90
+ "epoch": 0.41,
91
+ "learning_rate": 0.001,
92
+ "loss": 0.6007,
93
+ "step": 28
94
+ },
95
+ {
96
+ "epoch": 0.44,
97
+ "learning_rate": 0.001,
98
+ "loss": 0.6138,
99
+ "step": 30
100
+ },
101
+ {
102
+ "epoch": 0.47,
103
+ "learning_rate": 0.001,
104
+ "loss": 0.6189,
105
+ "step": 32
106
+ },
107
+ {
108
+ "epoch": 0.5,
109
+ "learning_rate": 0.001,
110
+ "loss": 0.6087,
111
+ "step": 34
112
+ },
113
+ {
114
+ "epoch": 0.53,
115
+ "learning_rate": 0.001,
116
+ "loss": 0.6234,
117
+ "step": 36
118
+ },
119
+ {
120
+ "epoch": 0.56,
121
+ "learning_rate": 0.001,
122
+ "loss": 0.6588,
123
+ "step": 38
124
+ },
125
+ {
126
+ "epoch": 0.59,
127
+ "learning_rate": 0.001,
128
+ "loss": 0.6542,
129
+ "step": 40
130
+ },
131
+ {
132
+ "epoch": 0.62,
133
+ "learning_rate": 0.001,
134
+ "loss": 0.6255,
135
+ "step": 42
136
+ },
137
+ {
138
+ "epoch": 0.64,
139
+ "learning_rate": 0.001,
140
+ "loss": 0.6007,
141
+ "step": 44
142
+ },
143
+ {
144
+ "epoch": 0.67,
145
+ "learning_rate": 0.001,
146
+ "loss": 0.6116,
147
+ "step": 46
148
+ },
149
+ {
150
+ "epoch": 0.7,
151
+ "learning_rate": 0.001,
152
+ "loss": 0.6159,
153
+ "step": 48
154
+ },
155
+ {
156
+ "epoch": 0.73,
157
+ "learning_rate": 0.001,
158
+ "loss": 0.6497,
159
+ "step": 50
160
+ },
161
+ {
162
+ "epoch": 0.76,
163
+ "learning_rate": 0.001,
164
+ "loss": 0.6595,
165
+ "step": 52
166
+ },
167
+ {
168
+ "epoch": 0.79,
169
+ "learning_rate": 0.001,
170
+ "loss": 0.6221,
171
+ "step": 54
172
+ },
173
+ {
174
+ "epoch": 0.82,
175
+ "learning_rate": 0.001,
176
+ "loss": 0.6243,
177
+ "step": 56
178
+ },
179
+ {
180
+ "epoch": 0.85,
181
+ "learning_rate": 0.001,
182
+ "loss": 0.5897,
183
+ "step": 58
184
+ },
185
+ {
186
+ "epoch": 0.88,
187
+ "learning_rate": 0.001,
188
+ "loss": 0.5845,
189
+ "step": 60
190
+ },
191
+ {
192
+ "epoch": 0.91,
193
+ "learning_rate": 0.001,
194
+ "loss": 0.6273,
195
+ "step": 62
196
+ },
197
+ {
198
+ "epoch": 0.94,
199
+ "learning_rate": 0.001,
200
+ "loss": 0.5985,
201
+ "step": 64
202
+ },
203
+ {
204
+ "epoch": 0.97,
205
+ "learning_rate": 0.001,
206
+ "loss": 0.6252,
207
+ "step": 66
208
+ },
209
+ {
210
+ "epoch": 1.0,
211
+ "learning_rate": 0.001,
212
+ "loss": 0.6226,
213
+ "step": 68
214
+ },
215
+ {
216
+ "epoch": 1.0,
217
+ "eval_gen_len": 833.0318930041152,
218
+ "eval_loss": 1.487816572189331,
219
+ "eval_rouge1": 71.9439,
220
+ "eval_rouge2": 43.7031,
221
+ "eval_rougeL": 41.8301,
222
+ "eval_rougeLsum": 69.1853,
223
+ "eval_runtime": 5182.0273,
224
+ "eval_samples_per_second": 0.188,
225
+ "eval_steps_per_second": 0.024,
226
+ "step": 68
227
+ },
228
+ {
229
+ "epoch": 1.03,
230
+ "learning_rate": 0.001,
231
+ "loss": 0.5449,
232
+ "step": 70
233
+ },
234
+ {
235
+ "epoch": 1.06,
236
+ "learning_rate": 0.001,
237
+ "loss": 0.4986,
238
+ "step": 72
239
+ },
240
+ {
241
+ "epoch": 1.08,
242
+ "learning_rate": 0.001,
243
+ "loss": 0.518,
244
+ "step": 74
245
+ },
246
+ {
247
+ "epoch": 1.11,
248
+ "learning_rate": 0.001,
249
+ "loss": 0.5077,
250
+ "step": 76
251
+ },
252
+ {
253
+ "epoch": 1.14,
254
+ "learning_rate": 0.001,
255
+ "loss": 0.5037,
256
+ "step": 78
257
+ },
258
+ {
259
+ "epoch": 1.17,
260
+ "learning_rate": 0.001,
261
+ "loss": 0.5092,
262
+ "step": 80
263
+ },
264
+ {
265
+ "epoch": 1.2,
266
+ "learning_rate": 0.001,
267
+ "loss": 0.5303,
268
+ "step": 82
269
+ },
270
+ {
271
+ "epoch": 1.23,
272
+ "learning_rate": 0.001,
273
+ "loss": 0.5167,
274
+ "step": 84
275
+ },
276
+ {
277
+ "epoch": 1.26,
278
+ "learning_rate": 0.001,
279
+ "loss": 0.5016,
280
+ "step": 86
281
+ },
282
+ {
283
+ "epoch": 1.29,
284
+ "learning_rate": 0.001,
285
+ "loss": 0.5096,
286
+ "step": 88
287
+ },
288
+ {
289
+ "epoch": 1.32,
290
+ "learning_rate": 0.001,
291
+ "loss": 0.5177,
292
+ "step": 90
293
+ },
294
+ {
295
+ "epoch": 1.35,
296
+ "learning_rate": 0.001,
297
+ "loss": 0.5035,
298
+ "step": 92
299
+ },
300
+ {
301
+ "epoch": 1.38,
302
+ "learning_rate": 0.001,
303
+ "loss": 0.5163,
304
+ "step": 94
305
+ },
306
+ {
307
+ "epoch": 1.41,
308
+ "learning_rate": 0.001,
309
+ "loss": 0.5284,
310
+ "step": 96
311
+ },
312
+ {
313
+ "epoch": 1.44,
314
+ "learning_rate": 0.001,
315
+ "loss": 0.513,
316
+ "step": 98
317
+ },
318
+ {
319
+ "epoch": 1.47,
320
+ "learning_rate": 0.001,
321
+ "loss": 0.4994,
322
+ "step": 100
323
+ },
324
+ {
325
+ "epoch": 1.5,
326
+ "learning_rate": 0.001,
327
+ "loss": 0.653,
328
+ "step": 102
329
+ },
330
+ {
331
+ "epoch": 1.52,
332
+ "learning_rate": 0.001,
333
+ "loss": 0.5039,
334
+ "step": 104
335
+ },
336
+ {
337
+ "epoch": 1.55,
338
+ "learning_rate": 0.001,
339
+ "loss": 0.5044,
340
+ "step": 106
341
+ },
342
+ {
343
+ "epoch": 1.58,
344
+ "learning_rate": 0.001,
345
+ "loss": 0.5153,
346
+ "step": 108
347
+ },
348
+ {
349
+ "epoch": 1.61,
350
+ "learning_rate": 0.001,
351
+ "loss": 0.5094,
352
+ "step": 110
353
+ },
354
+ {
355
+ "epoch": 1.64,
356
+ "learning_rate": 0.001,
357
+ "loss": 0.5041,
358
+ "step": 112
359
+ },
360
+ {
361
+ "epoch": 1.67,
362
+ "learning_rate": 0.001,
363
+ "loss": 0.5047,
364
+ "step": 114
365
+ },
366
+ {
367
+ "epoch": 1.7,
368
+ "learning_rate": 0.001,
369
+ "loss": 0.4967,
370
+ "step": 116
371
+ },
372
+ {
373
+ "epoch": 1.73,
374
+ "learning_rate": 0.001,
375
+ "loss": 0.5125,
376
+ "step": 118
377
+ },
378
+ {
379
+ "epoch": 1.76,
380
+ "learning_rate": 0.001,
381
+ "loss": 0.5047,
382
+ "step": 120
383
+ },
384
+ {
385
+ "epoch": 1.79,
386
+ "learning_rate": 0.001,
387
+ "loss": 0.4937,
388
+ "step": 122
389
+ },
390
+ {
391
+ "epoch": 1.82,
392
+ "learning_rate": 0.001,
393
+ "loss": 0.5005,
394
+ "step": 124
395
+ },
396
+ {
397
+ "epoch": 1.85,
398
+ "learning_rate": 0.001,
399
+ "loss": 0.4989,
400
+ "step": 126
401
+ },
402
+ {
403
+ "epoch": 1.88,
404
+ "learning_rate": 0.001,
405
+ "loss": 0.5631,
406
+ "step": 128
407
+ },
408
+ {
409
+ "epoch": 1.91,
410
+ "learning_rate": 0.001,
411
+ "loss": 0.4991,
412
+ "step": 130
413
+ },
414
+ {
415
+ "epoch": 1.93,
416
+ "learning_rate": 0.001,
417
+ "loss": 0.4786,
418
+ "step": 132
419
+ },
420
+ {
421
+ "epoch": 1.96,
422
+ "learning_rate": 0.001,
423
+ "loss": 0.4918,
424
+ "step": 134
425
+ },
426
+ {
427
+ "epoch": 1.99,
428
+ "learning_rate": 0.001,
429
+ "loss": 0.4983,
430
+ "step": 136
431
+ },
432
+ {
433
+ "epoch": 1.99,
434
+ "eval_gen_len": 627.5030864197531,
435
+ "eval_loss": 1.5908194780349731,
436
+ "eval_rouge1": 70.6191,
437
+ "eval_rouge2": 43.2627,
438
+ "eval_rougeL": 42.581,
439
+ "eval_rougeLsum": 68.0871,
440
+ "eval_runtime": 4338.9375,
441
+ "eval_samples_per_second": 0.224,
442
+ "eval_steps_per_second": 0.028,
443
+ "step": 136
444
+ },
445
+ {
446
+ "epoch": 2.02,
447
+ "learning_rate": 0.001,
448
+ "loss": 0.4259,
449
+ "step": 138
450
+ },
451
+ {
452
+ "epoch": 2.05,
453
+ "learning_rate": 0.001,
454
+ "loss": 0.3974,
455
+ "step": 140
456
+ },
457
+ {
458
+ "epoch": 2.08,
459
+ "learning_rate": 0.001,
460
+ "loss": 0.4039,
461
+ "step": 142
462
+ },
463
+ {
464
+ "epoch": 2.11,
465
+ "learning_rate": 0.001,
466
+ "loss": 0.3859,
467
+ "step": 144
468
+ },
469
+ {
470
+ "epoch": 2.14,
471
+ "learning_rate": 0.001,
472
+ "loss": 0.3938,
473
+ "step": 146
474
+ },
475
+ {
476
+ "epoch": 2.17,
477
+ "learning_rate": 0.001,
478
+ "loss": 0.4126,
479
+ "step": 148
480
+ },
481
+ {
482
+ "epoch": 2.2,
483
+ "learning_rate": 0.001,
484
+ "loss": 0.3954,
485
+ "step": 150
486
+ },
487
+ {
488
+ "epoch": 2.23,
489
+ "learning_rate": 0.001,
490
+ "loss": 0.3978,
491
+ "step": 152
492
+ },
493
+ {
494
+ "epoch": 2.26,
495
+ "learning_rate": 0.001,
496
+ "loss": 0.3894,
497
+ "step": 154
498
+ },
499
+ {
500
+ "epoch": 2.29,
501
+ "learning_rate": 0.001,
502
+ "loss": 0.3894,
503
+ "step": 156
504
+ },
505
+ {
506
+ "epoch": 2.32,
507
+ "learning_rate": 0.001,
508
+ "loss": 0.402,
509
+ "step": 158
510
+ },
511
+ {
512
+ "epoch": 2.35,
513
+ "learning_rate": 0.001,
514
+ "loss": 0.4169,
515
+ "step": 160
516
+ },
517
+ {
518
+ "epoch": 2.37,
519
+ "learning_rate": 0.001,
520
+ "loss": 0.412,
521
+ "step": 162
522
+ },
523
+ {
524
+ "epoch": 2.4,
525
+ "learning_rate": 0.001,
526
+ "loss": 0.4095,
527
+ "step": 164
528
+ },
529
+ {
530
+ "epoch": 2.43,
531
+ "learning_rate": 0.001,
532
+ "loss": 0.3844,
533
+ "step": 166
534
+ },
535
+ {
536
+ "epoch": 2.46,
537
+ "learning_rate": 0.001,
538
+ "loss": 0.3901,
539
+ "step": 168
540
+ },
541
+ {
542
+ "epoch": 2.49,
543
+ "learning_rate": 0.001,
544
+ "loss": 0.4013,
545
+ "step": 170
546
+ },
547
+ {
548
+ "epoch": 2.52,
549
+ "learning_rate": 0.001,
550
+ "loss": 0.4001,
551
+ "step": 172
552
+ },
553
+ {
554
+ "epoch": 2.55,
555
+ "learning_rate": 0.001,
556
+ "loss": 0.4127,
557
+ "step": 174
558
+ },
559
+ {
560
+ "epoch": 2.58,
561
+ "learning_rate": 0.001,
562
+ "loss": 0.4019,
563
+ "step": 176
564
+ },
565
+ {
566
+ "epoch": 2.61,
567
+ "learning_rate": 0.001,
568
+ "loss": 0.4057,
569
+ "step": 178
570
+ },
571
+ {
572
+ "epoch": 2.64,
573
+ "learning_rate": 0.001,
574
+ "loss": 0.4049,
575
+ "step": 180
576
+ },
577
+ {
578
+ "epoch": 2.67,
579
+ "learning_rate": 0.001,
580
+ "loss": 0.4242,
581
+ "step": 182
582
+ },
583
+ {
584
+ "epoch": 2.7,
585
+ "learning_rate": 0.001,
586
+ "loss": 0.4293,
587
+ "step": 184
588
+ },
589
+ {
590
+ "epoch": 2.73,
591
+ "learning_rate": 0.001,
592
+ "loss": 0.4003,
593
+ "step": 186
594
+ },
595
+ {
596
+ "epoch": 2.76,
597
+ "learning_rate": 0.001,
598
+ "loss": 0.4089,
599
+ "step": 188
600
+ },
601
+ {
602
+ "epoch": 2.79,
603
+ "learning_rate": 0.001,
604
+ "loss": 0.418,
605
+ "step": 190
606
+ },
607
+ {
608
+ "epoch": 2.81,
609
+ "learning_rate": 0.001,
610
+ "loss": 0.4329,
611
+ "step": 192
612
+ },
613
+ {
614
+ "epoch": 2.84,
615
+ "learning_rate": 0.001,
616
+ "loss": 0.4341,
617
+ "step": 194
618
+ },
619
+ {
620
+ "epoch": 2.87,
621
+ "learning_rate": 0.001,
622
+ "loss": 0.4535,
623
+ "step": 196
624
+ },
625
+ {
626
+ "epoch": 2.9,
627
+ "learning_rate": 0.001,
628
+ "loss": 0.433,
629
+ "step": 198
630
+ },
631
+ {
632
+ "epoch": 2.93,
633
+ "learning_rate": 0.001,
634
+ "loss": 0.4192,
635
+ "step": 200
636
+ },
637
+ {
638
+ "epoch": 2.96,
639
+ "learning_rate": 0.001,
640
+ "loss": 0.4163,
641
+ "step": 202
642
+ },
643
+ {
644
+ "epoch": 2.99,
645
+ "learning_rate": 0.001,
646
+ "loss": 0.4175,
647
+ "step": 204
648
+ },
649
+ {
650
+ "epoch": 2.99,
651
+ "eval_gen_len": 737.4351851851852,
652
+ "eval_loss": 1.6406528949737549,
653
+ "eval_rouge1": 71.6704,
654
+ "eval_rouge2": 43.1655,
655
+ "eval_rougeL": 41.9746,
656
+ "eval_rougeLsum": 68.992,
657
+ "eval_runtime": 4895.6396,
658
+ "eval_samples_per_second": 0.199,
659
+ "eval_steps_per_second": 0.025,
660
+ "step": 204
661
+ },
662
+ {
663
+ "epoch": 3.02,
664
+ "learning_rate": 0.001,
665
+ "loss": 0.3648,
666
+ "step": 206
667
+ },
668
+ {
669
+ "epoch": 3.05,
670
+ "learning_rate": 0.001,
671
+ "loss": 0.3583,
672
+ "step": 208
673
+ },
674
+ {
675
+ "epoch": 3.08,
676
+ "learning_rate": 0.001,
677
+ "loss": 0.3338,
678
+ "step": 210
679
+ },
680
+ {
681
+ "epoch": 3.11,
682
+ "learning_rate": 0.001,
683
+ "loss": 0.3452,
684
+ "step": 212
685
+ },
686
+ {
687
+ "epoch": 3.14,
688
+ "learning_rate": 0.001,
689
+ "loss": 0.3704,
690
+ "step": 214
691
+ },
692
+ {
693
+ "epoch": 3.17,
694
+ "learning_rate": 0.001,
695
+ "loss": 0.3507,
696
+ "step": 216
697
+ },
698
+ {
699
+ "epoch": 3.2,
700
+ "learning_rate": 0.001,
701
+ "loss": 0.3698,
702
+ "step": 218
703
+ },
704
+ {
705
+ "epoch": 3.22,
706
+ "learning_rate": 0.001,
707
+ "loss": 0.3549,
708
+ "step": 220
709
+ },
710
+ {
711
+ "epoch": 3.25,
712
+ "learning_rate": 0.001,
713
+ "loss": 0.3476,
714
+ "step": 222
715
+ },
716
+ {
717
+ "epoch": 3.28,
718
+ "learning_rate": 0.001,
719
+ "loss": 0.3557,
720
+ "step": 224
721
+ },
722
+ {
723
+ "epoch": 3.31,
724
+ "learning_rate": 0.001,
725
+ "loss": 0.3791,
726
+ "step": 226
727
+ },
728
+ {
729
+ "epoch": 3.34,
730
+ "learning_rate": 0.001,
731
+ "loss": 0.3737,
732
+ "step": 228
733
+ },
734
+ {
735
+ "epoch": 3.37,
736
+ "learning_rate": 0.001,
737
+ "loss": 0.3744,
738
+ "step": 230
739
+ },
740
+ {
741
+ "epoch": 3.4,
742
+ "learning_rate": 0.001,
743
+ "loss": 0.3663,
744
+ "step": 232
745
+ },
746
+ {
747
+ "epoch": 3.43,
748
+ "learning_rate": 0.001,
749
+ "loss": 0.3585,
750
+ "step": 234
751
+ },
752
+ {
753
+ "epoch": 3.46,
754
+ "learning_rate": 0.001,
755
+ "loss": 0.3718,
756
+ "step": 236
757
+ },
758
+ {
759
+ "epoch": 3.49,
760
+ "learning_rate": 0.001,
761
+ "loss": 0.3703,
762
+ "step": 238
763
+ },
764
+ {
765
+ "epoch": 3.52,
766
+ "learning_rate": 0.001,
767
+ "loss": 0.3746,
768
+ "step": 240
769
+ },
770
+ {
771
+ "epoch": 3.55,
772
+ "learning_rate": 0.001,
773
+ "loss": 0.3684,
774
+ "step": 242
775
+ },
776
+ {
777
+ "epoch": 3.58,
778
+ "learning_rate": 0.001,
779
+ "loss": 0.358,
780
+ "step": 244
781
+ },
782
+ {
783
+ "epoch": 3.61,
784
+ "learning_rate": 0.001,
785
+ "loss": 0.3735,
786
+ "step": 246
787
+ },
788
+ {
789
+ "epoch": 3.64,
790
+ "learning_rate": 0.001,
791
+ "loss": 0.3927,
792
+ "step": 248
793
+ },
794
+ {
795
+ "epoch": 3.66,
796
+ "learning_rate": 0.001,
797
+ "loss": 0.3775,
798
+ "step": 250
799
+ },
800
+ {
801
+ "epoch": 3.69,
802
+ "learning_rate": 0.001,
803
+ "loss": 0.3643,
804
+ "step": 252
805
+ },
806
+ {
807
+ "epoch": 3.72,
808
+ "learning_rate": 0.001,
809
+ "loss": 0.3714,
810
+ "step": 254
811
+ },
812
+ {
813
+ "epoch": 3.75,
814
+ "learning_rate": 0.001,
815
+ "loss": 0.3723,
816
+ "step": 256
817
+ },
818
+ {
819
+ "epoch": 3.78,
820
+ "learning_rate": 0.001,
821
+ "loss": 0.397,
822
+ "step": 258
823
+ },
824
+ {
825
+ "epoch": 3.81,
826
+ "learning_rate": 0.001,
827
+ "loss": 0.3953,
828
+ "step": 260
829
+ },
830
+ {
831
+ "epoch": 3.84,
832
+ "learning_rate": 0.001,
833
+ "loss": 0.3786,
834
+ "step": 262
835
+ },
836
+ {
837
+ "epoch": 3.87,
838
+ "learning_rate": 0.001,
839
+ "loss": 0.3756,
840
+ "step": 264
841
+ },
842
+ {
843
+ "epoch": 3.9,
844
+ "learning_rate": 0.001,
845
+ "loss": 0.391,
846
+ "step": 266
847
+ },
848
+ {
849
+ "epoch": 3.93,
850
+ "learning_rate": 0.001,
851
+ "loss": 0.3809,
852
+ "step": 268
853
+ },
854
+ {
855
+ "epoch": 3.96,
856
+ "learning_rate": 0.001,
857
+ "loss": 0.3818,
858
+ "step": 270
859
+ },
860
+ {
861
+ "epoch": 3.99,
862
+ "learning_rate": 0.001,
863
+ "loss": 0.3958,
864
+ "step": 272
865
+ },
866
+ {
867
+ "epoch": 3.99,
868
+ "eval_gen_len": 671.4938271604939,
869
+ "eval_loss": 1.8738882541656494,
870
+ "eval_rouge1": 70.7685,
871
+ "eval_rouge2": 42.5122,
872
+ "eval_rougeL": 41.7454,
873
+ "eval_rougeLsum": 68.0785,
874
+ "eval_runtime": 4528.4783,
875
+ "eval_samples_per_second": 0.215,
876
+ "eval_steps_per_second": 0.027,
877
+ "step": 272
878
+ },
879
+ {
880
+ "epoch": 3.99,
881
+ "step": 272,
882
+ "total_flos": 1.2383608053121597e+18,
883
+ "train_loss": 0.47826748297495003,
884
+ "train_runtime": 56154.1614,
885
+ "train_samples_per_second": 1.244,
886
+ "train_steps_per_second": 0.005
887
+ }
888
+ ],
889
+ "logging_steps": 2,
890
+ "max_steps": 272,
891
+ "num_train_epochs": 4,
892
+ "save_steps": 500,
893
+ "total_flos": 1.2383608053121597e+18,
894
+ "trial_name": null,
895
+ "trial_params": null
896
+ }