master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** [2024-11-03 20:11:35,850] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,851] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,853] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,854] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,854] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,859] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,859] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:35,860] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-11-03 20:11:40,199] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,225] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,400] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,558] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,618] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,642] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,673] [INFO] [comm.py:637:init_distributed] cdb=None [2024-11-03 20:11:40,673] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-11-03 20:11:40,772] [INFO] [comm.py:637:init_distributed] cdb=None 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=configs/deepspeed.json, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=output/echo28-20241103-201128-1e-4/runs/Nov03_20-11-34_gpu1, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=10000, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=output/echo28-20241103-201128-1e-4, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=8, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=output/echo28-20241103-201128-1e-4, save_on_each_node=False, save_safetensors=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) [INFO|configuration_utils.py:667] 2024-11-03 20:11:41,688 >> loading configuration file /public1/home/amzhou/lwt/model/Echo/config.json [INFO|configuration_utils.py:667] 2024-11-03 20:11:41,696 >> loading configuration file /public1/home/amzhou/lwt/model/Echo/config.json [INFO|configuration_utils.py:725] 2024-11-03 20:11:41,697 >> Model config ChatGLMConfig { "_name_or_path": "/public1/home/amzhou/lwt/model/Echo", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": 2, "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1e-05, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_layers": 28, "original_rope": true, "pad_token_id": 0, "padded_vocab_size": 65024, "post_layer_norm": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "rmsnorm": true, "seq_length": 8192, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.30.2", "use_cache": true, "vocab_size": 65024 } 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: True [INFO|tokenization_utils_base.py:1821] 2024-11-03 20:11:41,709 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:1821] 2024-11-03 20:11:41,709 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:1821] 2024-11-03 20:11:41,709 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:1821] 2024-11-03 20:11:41,709 >> loading file tokenizer_config.json 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: True 11/03/2024 20:11:41 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True [INFO|modeling_utils.py:2575] 2024-11-03 20:11:42,057 >> loading weights file /public1/home/amzhou/lwt/model/Echo/pytorch_model.bin.index.json [INFO|configuration_utils.py:577] 2024-11-03 20:11:42,061 >> Generate config GenerationConfig { "_from_model_config": true, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.30.2" } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. [INFO|modeling_utils.py:3303] 2024-11-03 20:12:11,004 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /public1/home/amzhou/lwt/model/Echo. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:2927] 2024-11-03 20:12:11,006 >> Generation config file not found, using a generation config created from the model config. max leng of data is 25842 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 25842 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 25842 PrefixTrainer () max leng of data is 25842/public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 25842 PrefixTrainer () max leng of data is 25842 PrefixTrainer () max leng of data is 25842 PrefixTrainer () /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( max leng of data is 25842 PrefixTrainer () 11/03/2024 20:17:00 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:577] 2024-11-03 20:17:00,784 >> max_steps is given, it will override any value given in num_train_epochs /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [2024-11-03 20:17:01,347] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown [2024-11-03 20:17:52,239] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2024-11-03 20:17:52,244] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2024-11-03 20:17:52,244] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-11-03 20:17:52,249] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW [2024-11-03 20:17:52,249] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= [2024-11-03 20:17:52,249] [WARNING] [engine.py:1188:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** [2024-11-03 20:17:52,249] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2024-11-03 20:17:52,249] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000 [2024-11-03 20:17:52,249] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000 [2024-11-03 20:17:52,250] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False [2024-11-03 20:17:52,250] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False 11/03/2024 20:18:07 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [2024-11-03 20:18:07,482] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states [2024-11-03 20:18:07,483] [INFO] [utils.py:801:see_memory_usage] MA 14.54 GB Max_MA 14.54 GB CA 14.54 GB Max_CA 15 GB [2024-11-03 20:18:07,483] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 73.43 GB, percent = 14.6% [2024-11-03 20:18:07,616] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states [2024-11-03 20:18:07,617] [INFO] [utils.py:801:see_memory_usage] MA 14.54 GB Max_MA 17.45 GB CA 17.45 GB Max_CA 17 GB [2024-11-03 20:18:07,617] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 74.14 GB, percent = 14.7% [2024-11-03 20:18:07,617] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized [2024-11-03 20:18:07,747] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer [2024-11-03 20:18:07,748] [INFO] [utils.py:801:see_memory_usage] MA 14.54 GB Max_MA 14.54 GB CA 17.45 GB Max_CA 17 GB [2024-11-03 20:18:07,748] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 74.83 GB, percent = 14.9% [2024-11-03 20:18:07,749] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW [2024-11-03 20:18:07,749] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2024-11-03 20:18:07,749] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2024-11-03 20:18:07,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05, 5e-05], mom=[(0.9, 0.999), (0.9, 0.999)] [2024-11-03 20:18:07,750] [INFO] [config.py:996:print] DeepSpeedEngine configuration: [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] amp_enabled .................. False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] amp_params ................... False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] bfloat16_enabled ............. False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] comms_config ................. [2024-11-03 20:18:07,750] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] dump_state ................... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] fp16_auto_cast ............... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] fp16_enabled ................. True [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] gradient_clipping ............ 0.0 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] loss_scale ................... 0 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-11-03 20:18:07,751] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] pld_params ................... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] steps_per_print .............. inf [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] train_batch_size ............. 64 [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 8 [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] world_size ................... 8 [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-11-03 20:18:07,752] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 [2024-11-03 20:18:07,752] [INFO] [config.py:986:print_user_config] json = { "train_micro_batch_size_per_gpu": 8, "zero_allow_untested_optimizer": true, "fp16": { "enabled": true, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 1, "steps_per_print": inf, "bf16": { "enabled": false } } [INFO|trainer.py:1786] 2024-11-03 20:18:07,752 >> ***** Running training ***** [INFO|trainer.py:1787] 2024-11-03 20:18:07,752 >> Num examples = 48,888 [INFO|trainer.py:1788] 2024-11-03 20:18:07,752 >> Num Epochs = 14 [INFO|trainer.py:1789] 2024-11-03 20:18:07,753 >> Instantaneous batch size per device = 8 [INFO|trainer.py:1790] 2024-11-03 20:18:07,753 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:1791] 2024-11-03 20:18:07,753 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1792] 2024-11-03 20:18:07,753 >> Total optimization steps = 10,000 [INFO|trainer.py:1793] 2024-11-03 20:18:07,753 >> Number of trainable parameters = 6,243,584,000 0%| | 0/10000 [00:00> The following columns in the training set don't have a corresponding argument in `ChatGLMForConditionalGeneration.forward` and have been ignored: lengg. If lengg are not expected by `ChatGLMForConditionalGeneration.forward`, you can safely ignore this message. 11/03/2024 20:18:07 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:15 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:21 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:28 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:38 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:38 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11/03/2024 20:18:38 - WARNING - transformers_modules.Echo.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [2024-11-03 20:18:55,078] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 0%| | 1/10000 [00:47<131:27:33, 47.33s/it] {'loss': 2.493, 'learning_rate': 5e-05, 'epoch': 0.0} 0%| | 1/10000 [00:49<131:27:33, 47.33s/it][2024-11-03 20:19:10,168] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 0%| | 2/10000 [01:02<78:46:12, 28.36s/it] {'loss': 1.9244, 'learning_rate': 5e-05, 'epoch': 0.0} 0%| | 2/10000 [01:02<78:46:12, 28.36s/it] 0%| | 3/10000 [01:16<60:08:07, 21.66s/it] {'loss': 1.9557, 'learning_rate': 4.9995000000000005e-05, 'epoch': 0.0} 0%| | 3/10000 [01:16<60:08:07, 21.66s/it] 0%| | 4/10000 [01:29<51:36:48, 18.59s/it] {'loss': 1.5277, 'learning_rate': 4.999e-05, 'epoch': 0.01} 0%| | 4/10000 [01:30<51:36:48, 18.59s/it] 0%| | 5/10000 [01:43<47:00:25, 16.93s/it] {'loss': 1.4193, 'learning_rate': 4.9985e-05, 'epoch': 0.01} 0%| | 5/10000 [01:44<47:00:25, 16.93s/it] 0%| | 6/10000 [01:57<44:05:06, 15.88s/it] {'loss': 1.3539, 'learning_rate': 4.9980000000000006e-05, 'epoch': 0.01} 0%| | 6/10000 [01:57<44:05:06, 15.88s/it] 0%| | 7/10000 [02:11<42:20:10, 15.25s/it] {'loss': 1.3051, 'learning_rate': 4.9975e-05, 'epoch': 0.01} 0%| | 7/10000 [02:11<42:20:10, 15.25s/it] 0%| | 8/10000 [02:25<41:09:48, 14.83s/it] {'loss': 1.1617, 'learning_rate': 4.997e-05, 'epoch': 0.01} 0%| | 8/10000 [02:25<41:09:48, 14.83s/it] 0%| | 9/10000 [02:39<40:33:36, 14.61s/it] {'loss': 1.2347, 'learning_rate': 4.9965e-05, 'epoch': 0.01} 0%| | 9/10000 [02:39<40:33:36, 14.61s/it] 0%| | 10/10000 [02:53<40:02:47, 14.43s/it] {'loss': 1.5691, 'learning_rate': 4.996e-05, 'epoch': 0.01} 0%| | 10/10000 [02:53<40:02:47, 14.43s/it] 0%| | 11/10000 [03:07<39:40:14, 14.30s/it] {'loss': 1.29, 'learning_rate': 4.9955e-05, 'epoch': 0.01} 0%| | 11/10000 [03:07<39:40:14, 14.30s/it] 0%| | 12/10000 [03:21<39:24:06, 14.20s/it] {'loss': 1.2394, 'learning_rate': 4.995e-05, 'epoch': 0.02} 0%| | 12/10000 [03:21<39:24:06, 14.20s/it] 0%| | 13/10000 [03:35<39:12:03, 14.13s/it] {'loss': 1.5466, 'learning_rate': 4.9945000000000004e-05, 'epoch': 0.02} 0%| | 13/10000 [03:35<39:12:03, 14.13s/it] 0%| | 14/10000 [03:49<39:07:15, 14.10s/it] {'loss': 1.1706, 'learning_rate': 4.9940000000000006e-05, 'epoch': 0.02} 0%| | 14/10000 [03:49<39:07:15, 14.10s/it] 0%| | 15/10000 [04:03<39:07:15, 14.10s/it] {'loss': 1.1086, 'learning_rate': 4.9935e-05, 'epoch': 0.02} 0%| | 15/10000 [04:04<39:07:15, 14.10s/it] 0%| | 16/10000 [04:17<38:56:35, 14.04s/it] {'loss': 1.2341, 'learning_rate': 4.9930000000000005e-05, 'epoch': 0.02} 0%| | 16/10000 [04:17<38:56:35, 14.04s/it] 0%| | 17/10000 [04:31<38:56:38, 14.04s/it] {'loss': 1.1465, 'learning_rate': 4.992500000000001e-05, 'epoch': 0.02} 0%| | 17/10000 [04:31<38:56:38, 14.04s/it] 0%| | 18/10000 [04:45<38:53:27, 14.03s/it] {'loss': 1.262, 'learning_rate': 4.992e-05, 'epoch': 0.02} 0%| | 18/10000 [04:45<38:53:27, 14.03s/it] 0%| | 19/10000 [04:59<38:55:40, 14.04s/it] {'loss': 1.062, 'learning_rate': 4.9915e-05, 'epoch': 0.02} 0%| | 19/10000 [05:00<38:55:40, 14.04s/it] 0%| | 20/10000 [05:13<38:54:37, 14.04s/it] {'loss': 1.0916, 'learning_rate': 4.991e-05, 'epoch': 0.03} 0%| | 20/10000 [05:14<38:54:37, 14.04s/it] 0%| | 21/10000 [05:27<38:53:16, 14.03s/it] {'loss': 1.204, 'learning_rate': 4.9905000000000004e-05, 'epoch': 0.03} 0%| | 21/10000 [05:28<38:53:16, 14.03s/it] 0%| | 22/10000 [05:41<38:46:21, 13.99s/it] {'loss': 0.9764, 'learning_rate': 4.99e-05, 'epoch': 0.03} 0%| | 22/10000 [05:41<38:46:21, 13.99s/it] 0%| | 23/10000 [05:55<38:43:54, 13.98s/it] {'loss': 1.1547, 'learning_rate': 4.9895e-05, 'epoch': 0.03} 0%| | 23/10000 [05:55<38:43:54, 13.98s/it] 0%| | 24/10000 [06:09<38:51:12, 14.02s/it] {'loss': 1.332, 'learning_rate': 4.9890000000000005e-05, 'epoch': 0.03} 0%| | 24/10000 [06:10<38:51:12, 14.02s/it] 0%| | 25/10000 [06:23<38:49:08, 14.01s/it] {'loss': 1.3518, 'learning_rate': 4.9885e-05, 'epoch': 0.03} 0%| | 25/10000 [06:23<38:49:08, 14.01s/it] 0%| | 26/10000 [06:37<38:48:13, 14.01s/it] {'loss': 0.981, 'learning_rate': 4.9880000000000004e-05, 'epoch': 0.03} 0%| | 26/10000 [06:38<38:48:13, 14.01s/it] 0%| | 27/10000 [06:51<38:45:31, 13.99s/it] {'loss': 1.1544, 'learning_rate': 4.9875000000000006e-05, 'epoch': 0.04} 0%| | 27/10000 [06:51<38:45:31, 13.99s/it] 0%| | 28/10000 [07:05<38:46:54, 14.00s/it] {'loss': 1.0799, 'learning_rate': 4.987e-05, 'epoch': 0.04} 0%| | 28/10000 [07:05<38:46:54, 14.00s/it] 0%| | 29/10000 [07:19<38:49:49, 14.02s/it] {'loss': 1.133, 'learning_rate': 4.9865e-05, 'epoch': 0.04} 0%| | 29/10000 [07:20<38:49:49, 14.02s/it] 0%| | 30/10000 [07:34<38:52:52, 14.04s/it] {'loss': 1.2719, 'learning_rate': 4.986e-05, 'epoch': 0.04} 0%| | 30/10000 [07:34<38:52:52, 14.04s/it] 0%| | 31/10000 [07:48<38:53:24, 14.04s/it] {'loss': 1.1999, 'learning_rate': 4.9855e-05, 'epoch': 0.04} 0%| | 31/10000 [07:48<38:53:24, 14.04s/it] 0%| | 32/10000 [08:02<38:57:06, 14.07s/it] {'loss': 1.0432, 'learning_rate': 4.9850000000000006e-05, 'epoch': 0.04} 0%| | 32/10000 [08:02<38:57:06, 14.07s/it] 0%| | 33/10000 [08:16<39:02:25, 14.10s/it] {'loss': 1.1838, 'learning_rate': 4.9845e-05, 'epoch': 0.04} 0%| | 33/10000 [08:16<39:02:25, 14.10s/it] 0%| | 34/10000 [08:30<38:57:14, 14.07s/it] {'loss': 1.1303, 'learning_rate': 4.9840000000000004e-05, 'epoch': 0.04} 0%| | 34/10000 [08:30<38:57:14, 14.07s/it] 0%| | 35/10000 [08:44<38:50:52, 14.03s/it] {'loss': 1.0491, 'learning_rate': 4.9835000000000007e-05, 'epoch': 0.05} 0%| | 35/10000 [08:44<38:50:52, 14.03s/it] 0%| | 36/10000 [08:58<38:45:08, 14.00s/it] {'loss': 1.2789, 'learning_rate': 4.983e-05, 'epoch': 0.05} 0%| | 36/10000 [08:58<38:45:08, 14.00s/it] 0%| | 37/10000 [09:12<38:39:58, 13.97s/it] {'loss': 0.8952, 'learning_rate': 4.9825000000000005e-05, 'epoch': 0.05} 0%| | 37/10000 [09:12<38:39:58, 13.97s/it] 0%| | 38/10000 [09:26<38:39:29, 13.97s/it] {'loss': 1.0015, 'learning_rate': 4.982e-05, 'epoch': 0.05} 0%| | 38/10000 [09:26<38:39:29, 13.97s/it] 0%| | 39/10000 [09:40<38:37:43, 13.96s/it] {'loss': 1.1024, 'learning_rate': 4.9815e-05, 'epoch': 0.05} 0%| | 39/10000 [09:40<38:37:43, 13.96s/it] 0%| | 40/10000 [09:54<38:37:32, 13.96s/it] {'loss': 1.435, 'learning_rate': 4.981e-05, 'epoch': 0.05} 0%| | 40/10000 [09:54<38:37:32, 13.96s/it] 0%| | 41/10000 [10:08<38:39:37, 13.98s/it] {'loss': 1.1639, 'learning_rate': 4.9805e-05, 'epoch': 0.05} 0%| | 41/10000 [10:08<38:39:37, 13.98s/it] 0%| | 42/10000 [10:21<38:31:00, 13.92s/it] {'loss': 1.1886, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.05} 0%| | 42/10000 [10:21<38:31:00, 13.92s/it] 0%| | 43/10000 [10:35<38:33:19, 13.94s/it] {'loss': 1.0329, 'learning_rate': 4.9795e-05, 'epoch': 0.06} 0%| | 43/10000 [10:35<38:33:19, 13.94s/it] 0%| | 44/10000 [10:49<38:32:26, 13.94s/it] {'loss': 1.1902, 'learning_rate': 4.979e-05, 'epoch': 0.06} 0%| | 44/10000 [10:49<38:32:26, 13.94s/it] 0%| | 45/10000 [11:03<38:40:32, 13.99s/it] {'loss': 1.0498, 'learning_rate': 4.9785000000000005e-05, 'epoch': 0.06} 0%| | 45/10000 [11:03<38:40:32, 13.99s/it] 0%| | 46/10000 [11:17<38:34:28, 13.95s/it] {'loss': 1.4877, 'learning_rate': 4.978e-05, 'epoch': 0.06} 0%| | 46/10000 [11:17<38:34:28, 13.95s/it] 0%| | 47/10000 [11:31<38:45:17, 14.02s/it] {'loss': 1.0838, 'learning_rate': 4.9775000000000004e-05, 'epoch': 0.06} 0%| | 47/10000 [11:31<38:45:17, 14.02s/it] 0%| | 48/10000 [11:45<38:42:11, 14.00s/it] {'loss': 1.196, 'learning_rate': 4.977e-05, 'epoch': 0.06} 0%| | 48/10000 [11:45<38:42:11, 14.00s/it] 0%| | 49/10000 [11:59<38:36:57, 13.97s/it] {'loss': 0.9751, 'learning_rate': 4.9765e-05, 'epoch': 0.06} 0%| | 49/10000 [11:59<38:36:57, 13.97s/it] 0%| | 50/10000 [12:13<38:40:24, 13.99s/it] {'loss': 1.1618, 'learning_rate': 4.976e-05, 'epoch': 0.07} 0%| | 50/10000 [12:13<38:40:24, 13.99s/it] 1%| | 51/10000 [12:27<38:35:46, 13.97s/it] {'loss': 1.0687, 'learning_rate': 4.9755e-05, 'epoch': 0.07} 1%| | 51/10000 [12:27<38:35:46, 13.97s/it] 1%| | 52/10000 [12:41<38:30:52, 13.94s/it] {'loss': 1.0327, 'learning_rate': 4.975e-05, 'epoch': 0.07} 1%| | 52/10000 [12:41<38:30:52, 13.94s/it] 1%| | 53/10000 [12:55<38:33:00, 13.95s/it] {'loss': 1.3049, 'learning_rate': 4.9745000000000006e-05, 'epoch': 0.07} 1%| | 53/10000 [12:55<38:33:00, 13.95s/it] 1%| | 54/10000 [13:09<38:31:07, 13.94s/it] {'loss': 1.1296, 'learning_rate': 4.974e-05, 'epoch': 0.07} 1%| | 54/10000 [13:09<38:31:07, 13.94s/it] 1%| | 55/10000 [13:23<38:29:23, 13.93s/it] {'loss': 1.2892, 'learning_rate': 4.9735000000000004e-05, 'epoch': 0.07} 1%| | 55/10000 [13:23<38:29:23, 13.93s/it] 1%| | 56/10000 [13:37<38:29:18, 13.93s/it] {'loss': 0.9039, 'learning_rate': 4.973000000000001e-05, 'epoch': 0.07} 1%| | 56/10000 [13:37<38:29:18, 13.93s/it] 1%| | 57/10000 [13:51<38:27:02, 13.92s/it] {'loss': 1.0778, 'learning_rate': 4.9725e-05, 'epoch': 0.07} 1%| | 57/10000 [13:51<38:27:02, 13.92s/it] 1%| | 58/10000 [14:05<38:36:37, 13.98s/it] {'loss': 1.0511, 'learning_rate': 4.972e-05, 'epoch': 0.08} 1%| | 58/10000 [14:05<38:36:37, 13.98s/it] 1%| | 59/10000 [14:19<38:31:10, 13.95s/it] {'loss': 1.3157, 'learning_rate': 4.9715e-05, 'epoch': 0.08} 1%| | 59/10000 [14:19<38:31:10, 13.95s/it] 1%| | 60/10000 [14:33<38:33:05, 13.96s/it] {'loss': 0.934, 'learning_rate': 4.9710000000000003e-05, 'epoch': 0.08} 1%| | 60/10000 [14:33<38:33:05, 13.96s/it] 1%| | 61/10000 [14:47<38:27:08, 13.93s/it] {'loss': 0.9479, 'learning_rate': 4.9705e-05, 'epoch': 0.08} 1%| | 61/10000 [14:47<38:27:08, 13.93s/it] 1%| | 62/10000 [15:00<38:19:46, 13.88s/it] {'loss': 1.1344, 'learning_rate': 4.97e-05, 'epoch': 0.08} 1%| | 62/10000 [15:00<38:19:46, 13.88s/it] 1%| | 63/10000 [15:14<38:26:37, 13.93s/it] {'loss': 1.1477, 'learning_rate': 4.9695000000000004e-05, 'epoch': 0.08} 1%| | 63/10000 [15:14<38:26:37, 13.93s/it] 1%| | 64/10000 [15:28<38:30:01, 13.95s/it] {'loss': 1.2556, 'learning_rate': 4.969e-05, 'epoch': 0.08} 1%| | 64/10000 [15:28<38:30:01, 13.95s/it] 1%| | 65/10000 [15:42<38:29:40, 13.95s/it] {'loss': 1.0411, 'learning_rate': 4.9685e-05, 'epoch': 0.09} 1%| | 65/10000 [15:42<38:29:40, 13.95s/it] 1%| | 66/10000 [15:56<38:36:00, 13.99s/it] {'loss': 1.0563, 'learning_rate': 4.9680000000000005e-05, 'epoch': 0.09} 1%| | 66/10000 [15:56<38:36:00, 13.99s/it] 1%| | 67/10000 [16:10<38:33:46, 13.98s/it] {'loss': 1.1571, 'learning_rate': 4.967500000000001e-05, 'epoch': 0.09} 1%| | 67/10000 [16:10<38:33:46, 13.98s/it] 1%| | 68/10000 [16:24<38:22:19, 13.91s/it] {'loss': 1.0354, 'learning_rate': 4.967e-05, 'epoch': 0.09} 1%| | 68/10000 [16:24<38:22:19, 13.91s/it] 1%| | 69/10000 [16:38<38:21:34, 13.91s/it] {'loss': 1.0107, 'learning_rate': 4.9665e-05, 'epoch': 0.09} 1%| | 69/10000 [16:38<38:21:34, 13.91s/it] 1%| | 70/10000 [16:52<38:26:36, 13.94s/it] {'loss': 0.9611, 'learning_rate': 4.966e-05, 'epoch': 0.09} 1%| | 70/10000 [16:52<38:26:36, 13.94s/it] 1%| | 71/10000 [17:06<38:27:06, 13.94s/it] {'loss': 1.0534, 'learning_rate': 4.9655000000000005e-05, 'epoch': 0.09} 1%| | 71/10000 [17:06<38:27:06, 13.94s/it] 1%| | 72/10000 [17:20<38:26:01, 13.94s/it] {'loss': 1.133, 'learning_rate': 4.965e-05, 'epoch': 0.09} 1%| | 72/10000 [17:20<38:26:01, 13.94s/it] 1%| | 73/10000 [17:34<38:20:48, 13.91s/it] {'loss': 1.0493, 'learning_rate': 4.9645e-05, 'epoch': 0.1} 1%| | 73/10000 [17:34<38:20:48, 13.91s/it] 1%| | 74/10000 [17:48<38:27:34, 13.95s/it] {'loss': 1.2378, 'learning_rate': 4.9640000000000006e-05, 'epoch': 0.1} 1%| | 74/10000 [17:48<38:27:34, 13.95s/it] 1%| | 75/10000 [18:02<38:28:26, 13.96s/it] {'loss': 1.0439, 'learning_rate': 4.9635e-05, 'epoch': 0.1} 1%| | 75/10000 [18:02<38:28:26, 13.96s/it] 1%| | 76/10000 [18:16<38:23:27, 13.93s/it] {'loss': 0.8049, 'learning_rate': 4.9630000000000004e-05, 'epoch': 0.1} 1%| | 76/10000 [18:16<38:23:27, 13.93s/it] 1%| | 77/10000 [18:30<38:26:34, 13.95s/it] {'loss': 1.027, 'learning_rate': 4.962500000000001e-05, 'epoch': 0.1} 1%| | 77/10000 [18:30<38:26:34, 13.95s/it] 1%| | 78/10000 [18:44<38:27:16, 13.95s/it] {'loss': 1.0521, 'learning_rate': 4.962e-05, 'epoch': 0.1} 1%| | 78/10000 [18:44<38:27:16, 13.95s/it] 1%| | 79/10000 [18:58<38:28:00, 13.96s/it] {'loss': 1.1221, 'learning_rate': 4.9615e-05, 'epoch': 0.1} 1%| | 79/10000 [18:58<38:28:00, 13.96s/it] 1%| | 80/10000 [19:12<38:27:11, 13.95s/it] {'loss': 1.0288, 'learning_rate': 4.961e-05, 'epoch': 0.1} 1%| | 80/10000 [19:12<38:27:11, 13.95s/it] 1%| | 81/10000 [19:25<38:25:01, 13.94s/it] {'loss': 1.0487, 'learning_rate': 4.9605000000000004e-05, 'epoch': 0.11} 1%| | 81/10000 [19:25<38:25:01, 13.94s/it] 1%| | 82/10000 [19:39<38:20:03, 13.91s/it] {'loss': 1.0576, 'learning_rate': 4.96e-05, 'epoch': 0.11} 1%| | 82/10000 [19:39<38:20:03, 13.91s/it] 1%| | 83/10000 [19:53<38:20:09, 13.92s/it] {'loss': 1.1094, 'learning_rate': 4.9595e-05, 'epoch': 0.11} 1%| | 83/10000 [19:53<38:20:09, 13.92s/it] 1%| | 84/10000 [20:07<38:18:29, 13.91s/it] {'loss': 1.0504, 'learning_rate': 4.9590000000000005e-05, 'epoch': 0.11} 1%| | 84/10000 [20:07<38:18:29, 13.91s/it] 1%| | 85/10000 [20:21<38:18:51, 13.91s/it] {'loss': 1.0848, 'learning_rate': 4.9585e-05, 'epoch': 0.11} 1%| | 85/10000 [20:21<38:18:51, 13.91s/it] 1%| | 86/10000 [20:35<38:12:16, 13.87s/it] {'loss': 1.1061, 'learning_rate': 4.958e-05, 'epoch': 0.11} 1%| | 86/10000 [20:35<38:12:16, 13.87s/it] 1%| | 87/10000 [20:49<38:10:36, 13.86s/it] {'loss': 0.8237, 'learning_rate': 4.9575000000000006e-05, 'epoch': 0.11} 1%| | 87/10000 [20:49<38:10:36, 13.86s/it] 1%| | 88/10000 [21:03<38:17:02, 13.90s/it] {'loss': 0.9871, 'learning_rate': 4.957e-05, 'epoch': 0.12} 1%| | 88/10000 [21:03<38:17:02, 13.90s/it] 1%| | 89/10000 [21:17<38:26:16, 13.96s/it] {'loss': 1.1254, 'learning_rate': 4.9565e-05, 'epoch': 0.12} 1%| | 89/10000 [21:17<38:26:16, 13.96s/it] 1%| | 90/10000 [21:31<38:25:44, 13.96s/it] {'loss': 1.0392, 'learning_rate': 4.956e-05, 'epoch': 0.12} 1%| | 90/10000 [21:31<38:25:44, 13.96s/it] 1%| | 91/10000 [21:45<38:27:23, 13.97s/it] {'loss': 0.9964, 'learning_rate': 4.9555e-05, 'epoch': 0.12} 1%| | 91/10000 [21:45<38:27:23, 13.97s/it] 1%| | 92/10000 [21:59<38:23:25, 13.95s/it] {'loss': 0.8928, 'learning_rate': 4.9550000000000005e-05, 'epoch': 0.12} 1%| | 92/10000 [21:59<38:23:25, 13.95s/it] 1%| | 93/10000 [22:13<38:22:43, 13.95s/it] {'loss': 1.0601, 'learning_rate': 4.9545e-05, 'epoch': 0.12} 1%| | 93/10000 [22:13<38:22:43, 13.95s/it] 1%| | 94/10000 [22:27<38:25:57, 13.97s/it] {'loss': 0.9606, 'learning_rate': 4.9540000000000003e-05, 'epoch': 0.12} 1%| | 94/10000 [22:27<38:25:57, 13.97s/it] 1%| | 95/10000 [22:41<38:26:34, 13.97s/it] {'loss': 1.0289, 'learning_rate': 4.9535000000000006e-05, 'epoch': 0.12} 1%| | 95/10000 [22:41<38:26:34, 13.97s/it] 1%| | 96/10000 [22:55<38:28:18, 13.98s/it] {'loss': 0.9941, 'learning_rate': 4.953e-05, 'epoch': 0.13} 1%| | 96/10000 [22:55<38:28:18, 13.98s/it] 1%| | 97/10000 [23:08<38:18:48, 13.93s/it] {'loss': 1.1008, 'learning_rate': 4.9525000000000004e-05, 'epoch': 0.13} 1%| | 97/10000 [23:08<38:18:48, 13.93s/it] 1%| | 98/10000 [23:22<38:19:07, 13.93s/it] {'loss': 1.1122, 'learning_rate': 4.952e-05, 'epoch': 0.13} 1%| | 98/10000 [23:22<38:19:07, 13.93s/it] 1%| | 99/10000 [23:36<38:25:19, 13.97s/it] {'loss': 0.8777, 'learning_rate': 4.9515e-05, 'epoch': 0.13} 1%| | 99/10000 [23:36<38:25:19, 13.97s/it] 1%| | 100/10000 [23:50<38:23:44, 13.96s/it] {'loss': 1.3704, 'learning_rate': 4.951e-05, 'epoch': 0.13} 1%| | 100/10000 [23:50<38:23:44, 13.96s/it] 1%| | 101/10000 [24:04<38:23:25, 13.96s/it] {'loss': 0.9593, 'learning_rate': 4.9505e-05, 'epoch': 0.13} 1%| | 101/10000 [24:04<38:23:25, 13.96s/it] 1%| | 102/10000 [24:18<38:14:05, 13.91s/it] {'loss': 1.3386, 'learning_rate': 4.9500000000000004e-05, 'epoch': 0.13} 1%| | 102/10000 [24:18<38:14:05, 13.91s/it] 1%| | 103/10000 [24:32<38:20:24, 13.95s/it] {'loss': 1.0847, 'learning_rate': 4.9495e-05, 'epoch': 0.13} 1%| | 103/10000 [24:32<38:20:24, 13.95s/it] 1%| | 104/10000 [24:46<38:24:47, 13.97s/it] {'loss': 0.9998, 'learning_rate': 4.949e-05, 'epoch': 0.14} 1%| | 104/10000 [24:46<38:24:47, 13.97s/it] 1%| | 105/10000 [25:00<38:22:26, 13.96s/it] {'loss': 1.0587, 'learning_rate': 4.9485000000000005e-05, 'epoch': 0.14} 1%| | 105/10000 [25:00<38:22:26, 13.96s/it] 1%| | 106/10000 [25:14<38:24:39, 13.98s/it] {'loss': 0.8735, 'learning_rate': 4.948000000000001e-05, 'epoch': 0.14} 1%| | 106/10000 [25:14<38:24:39, 13.98s/it] 1%| | 107/10000 [25:28<38:18:34, 13.94s/it] {'loss': 0.9792, 'learning_rate': 4.9475e-05, 'epoch': 0.14} 1%| | 107/10000 [25:28<38:18:34, 13.94s/it] 1%| | 108/10000 [25:42<38:18:37, 13.94s/it] {'loss': 1.1384, 'learning_rate': 4.947e-05, 'epoch': 0.14} 1%| | 108/10000 [25:42<38:18:37, 13.94s/it] 1%| | 109/10000 [25:56<38:21:53, 13.96s/it] {'loss': 1.162, 'learning_rate': 4.9465e-05, 'epoch': 0.14} 1%| | 109/10000 [25:56<38:21:53, 13.96s/it] 1%| | 110/10000 [26:10<38:18:56, 13.95s/it] {'loss': 0.9491, 'learning_rate': 4.946e-05, 'epoch': 0.14} 1%| | 110/10000 [26:10<38:18:56, 13.95s/it] 1%| | 111/10000 [26:24<38:15:11, 13.93s/it] {'loss': 1.1418, 'learning_rate': 4.9455e-05, 'epoch': 0.15} 1%| | 111/10000 [26:24<38:15:11, 13.93s/it] 1%| | 112/10000 [26:38<38:14:52, 13.93s/it] {'loss': 1.0659, 'learning_rate': 4.945e-05, 'epoch': 0.15} 1%| | 112/10000 [26:38<38:14:52, 13.93s/it] 1%| | 113/10000 [26:51<38:15:58, 13.93s/it] {'loss': 0.9557, 'learning_rate': 4.9445000000000005e-05, 'epoch': 0.15} 1%| | 113/10000 [26:52<38:15:58, 13.93s/it] 1%| | 114/10000 [27:05<38:13:27, 13.92s/it] {'loss': 1.2053, 'learning_rate': 4.944e-05, 'epoch': 0.15} 1%| | 114/10000 [27:05<38:13:27, 13.92s/it] 1%| | 115/10000 [27:19<38:07:57, 13.89s/it] {'loss': 1.1764, 'learning_rate': 4.9435000000000004e-05, 'epoch': 0.15} 1%| | 115/10000 [27:19<38:07:57, 13.89s/it] 1%| | 116/10000 [27:33<38:08:56, 13.89s/it] {'loss': 0.9042, 'learning_rate': 4.9430000000000006e-05, 'epoch': 0.15} 1%| | 116/10000 [27:33<38:08:56, 13.89s/it] 1%| | 117/10000 [27:47<38:08:03, 13.89s/it] {'loss': 0.7959, 'learning_rate': 4.9425e-05, 'epoch': 0.15} 1%| | 117/10000 [27:47<38:08:03, 13.89s/it] 1%| | 118/10000 [28:01<38:12:13, 13.92s/it] {'loss': 1.17, 'learning_rate': 4.942e-05, 'epoch': 0.15} 1%| | 118/10000 [28:01<38:12:13, 13.92s/it] 1%| | 119/10000 [28:15<38:14:18, 13.93s/it] {'loss': 0.9732, 'learning_rate': 4.9415e-05, 'epoch': 0.16} 1%| | 119/10000 [28:15<38:14:18, 13.93s/it] 1%| | 120/10000 [28:29<38:14:43, 13.94s/it] {'loss': 0.9346, 'learning_rate': 4.941e-05, 'epoch': 0.16} 1%| | 120/10000 [28:29<38:14:43, 13.94s/it] 1%| | 121/10000 [28:43<38:06:04, 13.88s/it] {'loss': 1.1937, 'learning_rate': 4.9405e-05, 'epoch': 0.16} 1%| | 121/10000 [28:43<38:06:04, 13.88s/it] 1%| | 122/10000 [28:56<38:02:51, 13.87s/it] {'loss': 1.0677, 'learning_rate': 4.94e-05, 'epoch': 0.16} 1%| | 122/10000 [28:57<38:02:51, 13.87s/it] 1%| | 123/10000 [29:10<38:03:28, 13.87s/it] {'loss': 1.0693, 'learning_rate': 4.9395000000000004e-05, 'epoch': 0.16} 1%| | 123/10000 [29:10<38:03:28, 13.87s/it] 1%| | 124/10000 [29:24<38:00:55, 13.86s/it] {'loss': 1.254, 'learning_rate': 4.939e-05, 'epoch': 0.16} 1%| | 124/10000 [29:24<38:00:55, 13.86s/it] 1%|▏ | 125/10000 [29:38<38:08:36, 13.91s/it] {'loss': 1.0157, 'learning_rate': 4.9385e-05, 'epoch': 0.16} 1%|▏ | 125/10000 [29:38<38:08:36, 13.91s/it] 1%|▏ | 126/10000 [29:52<38:12:20, 13.93s/it] {'loss': 1.3604, 'learning_rate': 4.9380000000000005e-05, 'epoch': 0.16} 1%|▏ | 126/10000 [29:52<38:12:20, 13.93s/it] 1%|▏ | 127/10000 [30:06<38:08:33, 13.91s/it] {'loss': 0.9622, 'learning_rate': 4.937500000000001e-05, 'epoch': 0.17} 1%|▏ | 127/10000 [30:06<38:08:33, 13.91s/it] 1%|▏ | 128/10000 [30:20<38:09:19, 13.91s/it] {'loss': 0.9459, 'learning_rate': 4.937e-05, 'epoch': 0.17} 1%|▏ | 128/10000 [30:20<38:09:19, 13.91s/it] 1%|▏ | 129/10000 [30:34<38:11:23, 13.93s/it] {'loss': 0.9864, 'learning_rate': 4.9365e-05, 'epoch': 0.17} 1%|▏ | 129/10000 [30:34<38:11:23, 13.93s/it] 1%|▏ | 130/10000 [30:48<38:15:48, 13.96s/it] {'loss': 1.05, 'learning_rate': 4.936e-05, 'epoch': 0.17} 1%|▏ | 130/10000 [30:48<38:15:48, 13.96s/it] 1%|▏ | 131/10000 [31:02<38:16:27, 13.96s/it] {'loss': 1.1985, 'learning_rate': 4.9355000000000004e-05, 'epoch': 0.17} 1%|▏ | 131/10000 [31:02<38:16:27, 13.96s/it] 1%|▏ | 132/10000 [31:16<38:13:11, 13.94s/it] {'loss': 1.0503, 'learning_rate': 4.935e-05, 'epoch': 0.17} 1%|▏ | 132/10000 [31:16<38:13:11, 13.94s/it] 1%|▏ | 133/10000 [31:30<38:16:14, 13.96s/it] {'loss': 0.9725, 'learning_rate': 4.9345e-05, 'epoch': 0.17} 1%|▏ | 133/10000 [31:30<38:16:14, 13.96s/it] 1%|▏ | 134/10000 [31:44<38:14:16, 13.95s/it] {'loss': 1.1097, 'learning_rate': 4.9340000000000005e-05, 'epoch': 0.18} 1%|▏ | 134/10000 [31:44<38:14:16, 13.95s/it] 1%|▏ | 135/10000 [31:58<38:13:26, 13.95s/it] {'loss': 1.0361, 'learning_rate': 4.9335e-05, 'epoch': 0.18} 1%|▏ | 135/10000 [31:58<38:13:26, 13.95s/it] 1%|▏ | 136/10000 [32:12<38:12:43, 13.95s/it] {'loss': 0.8907, 'learning_rate': 4.9330000000000004e-05, 'epoch': 0.18} 1%|▏ | 136/10000 [32:12<38:12:43, 13.95s/it] 1%|▏ | 137/10000 [32:26<38:21:52, 14.00s/it] {'loss': 1.2129, 'learning_rate': 4.9325000000000006e-05, 'epoch': 0.18} 1%|▏ | 137/10000 [32:26<38:21:52, 14.00s/it] 1%|▏ | 138/10000 [32:40<38:14:47, 13.96s/it] {'loss': 1.176, 'learning_rate': 4.932e-05, 'epoch': 0.18} 1%|▏ | 138/10000 [32:40<38:14:47, 13.96s/it] 1%|▏ | 139/10000 [32:54<38:13:41, 13.96s/it] {'loss': 0.884, 'learning_rate': 4.9315e-05, 'epoch': 0.18} 1%|▏ | 139/10000 [32:54<38:13:41, 13.96s/it] 1%|▏ | 140/10000 [33:07<38:11:29, 13.94s/it] {'loss': 0.952, 'learning_rate': 4.931e-05, 'epoch': 0.18} 1%|▏ | 140/10000 [33:08<38:11:29, 13.94s/it] 1%|▏ | 141/10000 [33:21<38:04:08, 13.90s/it] {'loss': 0.9539, 'learning_rate': 4.9305e-05, 'epoch': 0.18} 1%|▏ | 141/10000 [33:21<38:04:08, 13.90s/it] 1%|▏ | 142/10000 [33:35<38:08:58, 13.93s/it] {'loss': 1.0538, 'learning_rate': 4.93e-05, 'epoch': 0.19} 1%|▏ | 142/10000 [33:35<38:08:58, 13.93s/it] 1%|▏ | 143/10000 [33:49<38:04:38, 13.91s/it] {'loss': 1.1323, 'learning_rate': 4.9295e-05, 'epoch': 0.19} 1%|▏ | 143/10000 [33:49<38:04:38, 13.91s/it] 1%|▏ | 144/10000 [34:03<38:08:36, 13.93s/it] {'loss': 0.897, 'learning_rate': 4.9290000000000004e-05, 'epoch': 0.19} 1%|▏ | 144/10000 [34:03<38:08:36, 13.93s/it] 1%|▏ | 145/10000 [34:17<38:06:59, 13.92s/it] {'loss': 0.9171, 'learning_rate': 4.928500000000001e-05, 'epoch': 0.19} 1%|▏ | 145/10000 [34:17<38:06:59, 13.92s/it] 1%|▏ | 146/10000 [34:31<38:08:00, 13.93s/it] {'loss': 0.8657, 'learning_rate': 4.928e-05, 'epoch': 0.19} 1%|▏ | 146/10000 [34:31<38:08:00, 13.93s/it] 1%|▏ | 147/10000 [34:45<38:04:47, 13.91s/it] {'loss': 1.1105, 'learning_rate': 4.9275000000000005e-05, 'epoch': 0.19} 1%|▏ | 147/10000 [34:45<38:04:47, 13.91s/it] 1%|▏ | 148/10000 [34:59<38:02:29, 13.90s/it] {'loss': 1.116, 'learning_rate': 4.927000000000001e-05, 'epoch': 0.19} 1%|▏ | 148/10000 [34:59<38:02:29, 13.90s/it] 1%|▏ | 149/10000 [35:13<37:58:45, 13.88s/it] {'loss': 1.194, 'learning_rate': 4.9265e-05, 'epoch': 0.2} 1%|▏ | 149/10000 [35:13<37:58:45, 13.88s/it] 2%|▏ | 150/10000 [35:26<37:59:54, 13.89s/it] {'loss': 1.0446, 'learning_rate': 4.926e-05, 'epoch': 0.2} 2%|▏ | 150/10000 [35:27<37:59:54, 13.89s/it] 2%|▏ | 151/10000 [35:40<38:02:59, 13.91s/it] {'loss': 0.9883, 'learning_rate': 4.9255e-05, 'epoch': 0.2} 2%|▏ | 151/10000 [35:40<38:02:59, 13.91s/it] 2%|▏ | 152/10000 [35:54<38:08:41, 13.94s/it] {'loss': 1.3495, 'learning_rate': 4.9250000000000004e-05, 'epoch': 0.2} 2%|▏ | 152/10000 [35:55<38:08:41, 13.94s/it] 2%|▏ | 153/10000 [36:08<38:03:44, 13.92s/it] {'loss': 1.1237, 'learning_rate': 4.9245e-05, 'epoch': 0.2} 2%|▏ | 153/10000 [36:08<38:03:44, 13.92s/it] 2%|▏ | 154/10000 [36:22<38:03:07, 13.91s/it] {'loss': 1.0164, 'learning_rate': 4.924e-05, 'epoch': 0.2} 2%|▏ | 154/10000 [36:22<38:03:07, 13.91s/it] 2%|▏ | 155/10000 [36:36<38:04:02, 13.92s/it] {'loss': 0.9186, 'learning_rate': 4.9235000000000005e-05, 'epoch': 0.2} 2%|▏ | 155/10000 [36:36<38:04:02, 13.92s/it] 2%|▏ | 156/10000 [36:50<38:08:43, 13.95s/it] {'loss': 0.924, 'learning_rate': 4.923e-05, 'epoch': 0.2} 2%|▏ | 156/10000 [36:50<38:08:43, 13.95s/it] 2%|▏ | 157/10000 [37:04<38:08:34, 13.95s/it] {'loss': 0.9716, 'learning_rate': 4.9225000000000004e-05, 'epoch': 0.21} 2%|▏ | 157/10000 [37:04<38:08:34, 13.95s/it] 2%|▏ | 158/10000 [37:18<38:04:08, 13.92s/it] {'loss': 1.0748, 'learning_rate': 4.9220000000000006e-05, 'epoch': 0.21} 2%|▏ | 158/10000 [37:18<38:04:08, 13.92s/it] 2%|▏ | 159/10000 [37:32<38:05:10, 13.93s/it] {'loss': 1.2057, 'learning_rate': 4.9215e-05, 'epoch': 0.21} 2%|▏ | 159/10000 [37:32<38:05:10, 13.93s/it] 2%|▏ | 160/10000 [37:46<38:05:55, 13.94s/it] {'loss': 1.0141, 'learning_rate': 4.921e-05, 'epoch': 0.21} 2%|▏ | 160/10000 [37:46<38:05:55, 13.94s/it] 2%|▏ | 161/10000 [38:00<38:00:46, 13.91s/it] {'loss': 1.3414, 'learning_rate': 4.9205e-05, 'epoch': 0.21} 2%|▏ | 161/10000 [38:00<38:00:46, 13.91s/it] 2%|▏ | 162/10000 [38:14<38:07:52, 13.95s/it] {'loss': 1.1044, 'learning_rate': 4.92e-05, 'epoch': 0.21} 2%|▏ | 162/10000 [38:14<38:07:52, 13.95s/it] 2%|▏ | 163/10000 [38:28<38:15:01, 14.00s/it] {'loss': 0.8931, 'learning_rate': 4.9195e-05, 'epoch': 0.21} 2%|▏ | 163/10000 [38:28<38:15:01, 14.00s/it] 2%|▏ | 164/10000 [38:42<38:07:26, 13.95s/it] {'loss': 1.0664, 'learning_rate': 4.919e-05, 'epoch': 0.21} 2%|▏ | 164/10000 [38:42<38:07:26, 13.95s/it] 2%|▏ | 165/10000 [38:56<38:04:43, 13.94s/it] {'loss': 1.0023, 'learning_rate': 4.9185000000000004e-05, 'epoch': 0.22} 2%|▏ | 165/10000 [38:56<38:04:43, 13.94s/it] 2%|▏ | 166/10000 [39:10<38:13:51, 14.00s/it] {'loss': 0.8324, 'learning_rate': 4.918000000000001e-05, 'epoch': 0.22} 2%|▏ | 166/10000 [39:10<38:13:51, 14.00s/it] 2%|▏ | 167/10000 [39:24<38:09:42, 13.97s/it] {'loss': 0.9214, 'learning_rate': 4.9175e-05, 'epoch': 0.22} 2%|▏ | 167/10000 [39:24<38:09:42, 13.97s/it] 2%|▏ | 168/10000 [39:38<38:03:56, 13.94s/it] {'loss': 1.0435, 'learning_rate': 4.9170000000000005e-05, 'epoch': 0.22} 2%|▏ | 168/10000 [39:38<38:03:56, 13.94s/it] 2%|▏ | 169/10000 [39:52<38:07:32, 13.96s/it] {'loss': 1.092, 'learning_rate': 4.9165e-05, 'epoch': 0.22} 2%|▏ | 169/10000 [39:52<38:07:32, 13.96s/it] 2%|▏ | 170/10000 [40:06<38:13:12, 14.00s/it] {'loss': 0.9049, 'learning_rate': 4.9160000000000004e-05, 'epoch': 0.22} 2%|▏ | 170/10000 [40:06<38:13:12, 14.00s/it] 2%|▏ | 171/10000 [40:20<38:09:13, 13.97s/it] {'loss': 0.9983, 'learning_rate': 4.9155e-05, 'epoch': 0.22} 2%|▏ | 171/10000 [40:20<38:09:13, 13.97s/it] 2%|▏ | 172/10000 [40:33<38:06:19, 13.96s/it] {'loss': 1.0073, 'learning_rate': 4.915e-05, 'epoch': 0.23} 2%|▏ | 172/10000 [40:34<38:06:19, 13.96s/it] 2%|▏ | 173/10000 [40:47<38:02:59, 13.94s/it] {'loss': 0.9658, 'learning_rate': 4.9145000000000005e-05, 'epoch': 0.23} 2%|▏ | 173/10000 [40:47<38:02:59, 13.94s/it] 2%|▏ | 174/10000 [41:01<37:57:31, 13.91s/it] {'loss': 0.8826, 'learning_rate': 4.914e-05, 'epoch': 0.23} 2%|▏ | 174/10000 [41:01<37:57:31, 13.91s/it] 2%|▏ | 175/10000 [41:15<38:03:11, 13.94s/it] {'loss': 0.8847, 'learning_rate': 4.9135e-05, 'epoch': 0.23} 2%|▏ | 175/10000 [41:15<38:03:11, 13.94s/it] 2%|▏ | 176/10000 [41:29<38:02:12, 13.94s/it] {'loss': 0.8738, 'learning_rate': 4.9130000000000006e-05, 'epoch': 0.23} 2%|▏ | 176/10000 [41:29<38:02:12, 13.94s/it] 2%|▏ | 177/10000 [41:43<38:02:55, 13.94s/it] {'loss': 0.9554, 'learning_rate': 4.9125e-05, 'epoch': 0.23} 2%|▏ | 177/10000 [41:43<38:02:55, 13.94s/it] 2%|▏ | 178/10000 [41:57<38:05:33, 13.96s/it] {'loss': 1.142, 'learning_rate': 4.9120000000000004e-05, 'epoch': 0.23} 2%|▏ | 178/10000 [41:57<38:05:33, 13.96s/it] 2%|▏ | 179/10000 [42:11<38:11:54, 14.00s/it] {'loss': 1.0584, 'learning_rate': 4.9115e-05, 'epoch': 0.23} 2%|▏ | 179/10000 [42:11<38:11:54, 14.00s/it] 2%|▏ | 180/10000 [42:25<38:08:39, 13.98s/it] {'loss': 0.9727, 'learning_rate': 4.911e-05, 'epoch': 0.24} 2%|▏ | 180/10000 [42:25<38:08:39, 13.98s/it] 2%|▏ | 181/10000 [42:39<38:07:31, 13.98s/it] {'loss': 0.859, 'learning_rate': 4.9105e-05, 'epoch': 0.24} 2%|▏ | 181/10000 [42:39<38:07:31, 13.98s/it] 2%|▏ | 182/10000 [42:53<38:07:52, 13.98s/it] {'loss': 1.0289, 'learning_rate': 4.91e-05, 'epoch': 0.24} 2%|▏ | 182/10000 [42:53<38:07:52, 13.98s/it] 2%|▏ | 183/10000 [43:07<38:03:50, 13.96s/it] {'loss': 1.002, 'learning_rate': 4.9095000000000003e-05, 'epoch': 0.24} 2%|▏ | 183/10000 [43:07<38:03:50, 13.96s/it] 2%|▏ | 184/10000 [43:21<38:01:33, 13.95s/it] {'loss': 0.7977, 'learning_rate': 4.9090000000000006e-05, 'epoch': 0.24} 2%|▏ | 184/10000 [43:21<38:01:33, 13.95s/it] 2%|▏ | 185/10000 [43:35<37:57:39, 13.92s/it] {'loss': 0.8837, 'learning_rate': 4.9085e-05, 'epoch': 0.24} 2%|▏ | 185/10000 [43:35<37:57:39, 13.92s/it] 2%|▏ | 186/10000 [43:49<37:55:16, 13.91s/it] {'loss': 0.8547, 'learning_rate': 4.9080000000000004e-05, 'epoch': 0.24} 2%|▏ | 186/10000 [43:49<37:55:16, 13.91s/it] 2%|▏ | 187/10000 [44:03<37:59:39, 13.94s/it] {'loss': 0.9382, 'learning_rate': 4.907500000000001e-05, 'epoch': 0.24} 2%|▏ | 187/10000 [44:03<37:59:39, 13.94s/it] 2%|▏ | 188/10000 [44:16<37:52:01, 13.89s/it] {'loss': 0.7546, 'learning_rate': 4.907e-05, 'epoch': 0.25} 2%|▏ | 188/10000 [44:17<37:52:01, 13.89s/it] 2%|▏ | 189/10000 [44:30<37:52:18, 13.90s/it] {'loss': 1.3638, 'learning_rate': 4.9065e-05, 'epoch': 0.25} 2%|▏ | 189/10000 [44:30<37:52:18, 13.90s/it] 2%|▏ | 190/10000 [44:44<37:55:19, 13.92s/it] {'loss': 1.0635, 'learning_rate': 4.906e-05, 'epoch': 0.25} 2%|▏ | 190/10000 [44:44<37:55:19, 13.92s/it] 2%|▏ | 191/10000 [44:58<37:56:01, 13.92s/it] {'loss': 0.8257, 'learning_rate': 4.9055000000000004e-05, 'epoch': 0.25} 2%|▏ | 191/10000 [44:58<37:56:01, 13.92s/it] 2%|▏ | 192/10000 [45:12<37:53:49, 13.91s/it] {'loss': 1.0229, 'learning_rate': 4.905e-05, 'epoch': 0.25} 2%|▏ | 192/10000 [45:12<37:53:49, 13.91s/it] 2%|▏ | 193/10000 [45:26<37:52:30, 13.90s/it] {'loss': 1.0845, 'learning_rate': 4.9045e-05, 'epoch': 0.25} 2%|▏ | 193/10000 [45:26<37:52:30, 13.90s/it] 2%|▏ | 194/10000 [45:40<37:52:13, 13.90s/it] {'loss': 1.3163, 'learning_rate': 4.9040000000000005e-05, 'epoch': 0.25} 2%|▏ | 194/10000 [45:40<37:52:13, 13.90s/it] 2%|▏ | 195/10000 [45:54<37:53:18, 13.91s/it] {'loss': 1.1227, 'learning_rate': 4.9035e-05, 'epoch': 0.26} 2%|▏ | 195/10000 [45:54<37:53:18, 13.91s/it] 2%|▏ | 196/10000 [46:08<38:02:18, 13.97s/it] {'loss': 1.0605, 'learning_rate': 4.903e-05, 'epoch': 0.26} 2%|▏ | 196/10000 [46:08<38:02:18, 13.97s/it] 2%|▏ | 197/10000 [46:22<37:57:24, 13.94s/it] {'loss': 1.1831, 'learning_rate': 4.9025000000000006e-05, 'epoch': 0.26} 2%|▏ | 197/10000 [46:22<37:57:24, 13.94s/it] 2%|▏ | 198/10000 [46:36<38:04:05, 13.98s/it] {'loss': 0.9807, 'learning_rate': 4.902e-05, 'epoch': 0.26} 2%|▏ | 198/10000 [46:36<38:04:05, 13.98s/it] 2%|▏ | 199/10000 [46:50<38:00:04, 13.96s/it] {'loss': 0.8634, 'learning_rate': 4.9015e-05, 'epoch': 0.26} 2%|▏ | 199/10000 [46:50<38:00:04, 13.96s/it] 2%|▏ | 200/10000 [47:04<37:59:13, 13.95s/it] {'loss': 1.0927, 'learning_rate': 4.901e-05, 'epoch': 0.26} 2%|▏ | 200/10000 [47:04<37:59:13, 13.95s/it] 2%|▏ | 201/10000 [47:18<38:03:36, 13.98s/it] {'loss': 1.1827, 'learning_rate': 4.9005e-05, 'epoch': 0.26} 2%|▏ | 201/10000 [47:18<38:03:36, 13.98s/it] 2%|▏ | 202/10000 [47:32<37:59:48, 13.96s/it] {'loss': 0.9196, 'learning_rate': 4.9e-05, 'epoch': 0.26} 2%|▏ | 202/10000 [47:32<37:59:48, 13.96s/it] 2%|▏ | 203/10000 [47:46<37:54:58, 13.93s/it] {'loss': 0.848, 'learning_rate': 4.8995e-05, 'epoch': 0.27} 2%|▏ | 203/10000 [47:46<37:54:58, 13.93s/it] 2%|▏ | 204/10000 [47:59<37:46:49, 13.88s/it] {'loss': 1.0508, 'learning_rate': 4.8990000000000004e-05, 'epoch': 0.27} 2%|▏ | 204/10000 [47:59<37:46:49, 13.88s/it] 2%|▏ | 205/10000 [48:13<37:54:53, 13.93s/it] {'loss': 0.8686, 'learning_rate': 4.8985000000000006e-05, 'epoch': 0.27} 2%|▏ | 205/10000 [48:13<37:54:53, 13.93s/it] 2%|▏ | 206/10000 [48:27<37:51:38, 13.92s/it] {'loss': 1.1881, 'learning_rate': 4.898e-05, 'epoch': 0.27} 2%|▏ | 206/10000 [48:27<37:51:38, 13.92s/it] 2%|▏ | 207/10000 [48:41<37:46:03, 13.88s/it] {'loss': 0.8658, 'learning_rate': 4.8975000000000005e-05, 'epoch': 0.27} 2%|▏ | 207/10000 [48:41<37:46:03, 13.88s/it] 2%|▏ | 208/10000 [48:55<37:51:52, 13.92s/it] {'loss': 0.9881, 'learning_rate': 4.897000000000001e-05, 'epoch': 0.27} 2%|▏ | 208/10000 [48:55<37:51:52, 13.92s/it] 2%|▏ | 209/10000 [49:09<37:49:51, 13.91s/it] {'loss': 1.0028, 'learning_rate': 4.8965e-05, 'epoch': 0.27} 2%|▏ | 209/10000 [49:09<37:49:51, 13.91s/it] 2%|▏ | 210/10000 [49:23<37:53:26, 13.93s/it] {'loss': 1.0389, 'learning_rate': 4.896e-05, 'epoch': 0.27} 2%|▏ | 210/10000 [49:23<37:53:26, 13.93s/it] 2%|▏ | 211/10000 [49:37<37:54:20, 13.94s/it] {'loss': 0.9998, 'learning_rate': 4.8955e-05, 'epoch': 0.28} 2%|▏ | 211/10000 [49:37<37:54:20, 13.94s/it] 2%|▏ | 212/10000 [49:51<37:53:17, 13.94s/it] {'loss': 0.8375, 'learning_rate': 4.8950000000000004e-05, 'epoch': 0.28} 2%|▏ | 212/10000 [49:51<37:53:17, 13.94s/it] 2%|▏ | 213/10000 [50:05<37:55:00, 13.95s/it] {'loss': 1.0385, 'learning_rate': 4.8945e-05, 'epoch': 0.28} 2%|▏ | 213/10000 [50:05<37:55:00, 13.95s/it] 2%|▏ | 214/10000 [50:19<37:53:25, 13.94s/it] {'loss': 0.8586, 'learning_rate': 4.894e-05, 'epoch': 0.28} 2%|▏ | 214/10000 [50:19<37:53:25, 13.94s/it] 2%|▏ | 215/10000 [50:33<37:51:45, 13.93s/it] {'loss': 1.0923, 'learning_rate': 4.8935000000000005e-05, 'epoch': 0.28} 2%|▏ | 215/10000 [50:33<37:51:45, 13.93s/it] 2%|▏ | 216/10000 [50:47<37:55:20, 13.95s/it] {'loss': 0.8516, 'learning_rate': 4.893e-05, 'epoch': 0.28} 2%|▏ | 216/10000 [50:47<37:55:20, 13.95s/it] 2%|▏ | 217/10000 [51:01<38:00:44, 13.99s/it] {'loss': 1.2719, 'learning_rate': 4.8925e-05, 'epoch': 0.28} 2%|▏ | 217/10000 [51:01<38:00:44, 13.99s/it] 2%|▏ | 218/10000 [51:15<38:02:43, 14.00s/it] {'loss': 0.8818, 'learning_rate': 4.8920000000000006e-05, 'epoch': 0.29} 2%|▏ | 218/10000 [51:15<38:02:43, 14.00s/it] 2%|▏ | 219/10000 [51:29<37:56:45, 13.97s/it] {'loss': 1.1437, 'learning_rate': 4.8915e-05, 'epoch': 0.29} 2%|▏ | 219/10000 [51:29<37:56:45, 13.97s/it] 2%|▏ | 220/10000 [51:43<37:55:14, 13.96s/it] {'loss': 0.8672, 'learning_rate': 4.891e-05, 'epoch': 0.29} 2%|▏ | 220/10000 [51:43<37:55:14, 13.96s/it] 2%|▏ | 221/10000 [51:56<37:50:04, 13.93s/it] {'loss': 1.1917, 'learning_rate': 4.8905e-05, 'epoch': 0.29} 2%|▏ | 221/10000 [51:57<37:50:04, 13.93s/it] 2%|▏ | 222/10000 [52:10<37:47:40, 13.91s/it] {'loss': 1.0064, 'learning_rate': 4.89e-05, 'epoch': 0.29} 2%|▏ | 222/10000 [52:10<37:47:40, 13.91s/it] 2%|▏ | 223/10000 [52:24<37:48:28, 13.92s/it] {'loss': 1.1262, 'learning_rate': 4.8895e-05, 'epoch': 0.29} 2%|▏ | 223/10000 [52:24<37:48:28, 13.92s/it] 2%|▏ | 224/10000 [52:38<37:45:04, 13.90s/it] {'loss': 0.9609, 'learning_rate': 4.889e-05, 'epoch': 0.29} 2%|▏ | 224/10000 [52:38<37:45:04, 13.90s/it] 2%|▏ | 225/10000 [52:52<37:50:47, 13.94s/it] {'loss': 0.9387, 'learning_rate': 4.8885000000000004e-05, 'epoch': 0.29} 2%|▏ | 225/10000 [52:52<37:50:47, 13.94s/it] 2%|▏ | 226/10000 [53:06<37:58:32, 13.99s/it] {'loss': 1.4603, 'learning_rate': 4.8880000000000006e-05, 'epoch': 0.3} 2%|▏ | 226/10000 [53:06<37:58:32, 13.99s/it] 2%|▏ | 227/10000 [53:20<37:56:20, 13.98s/it] {'loss': 1.1638, 'learning_rate': 4.8875e-05, 'epoch': 0.3} 2%|▏ | 227/10000 [53:20<37:56:20, 13.98s/it] 2%|▏ | 228/10000 [53:34<37:58:04, 13.99s/it] {'loss': 1.1136, 'learning_rate': 4.8870000000000005e-05, 'epoch': 0.3} 2%|▏ | 228/10000 [53:34<37:58:04, 13.99s/it] 2%|▏ | 229/10000 [53:48<37:51:05, 13.95s/it] {'loss': 1.0265, 'learning_rate': 4.8865e-05, 'epoch': 0.3} 2%|▏ | 229/10000 [53:48<37:51:05, 13.95s/it] 2%|▏ | 230/10000 [54:02<37:48:43, 13.93s/it] {'loss': 0.8488, 'learning_rate': 4.886e-05, 'epoch': 0.3} 2%|▏ | 230/10000 [54:02<37:48:43, 13.93s/it] 2%|▏ | 231/10000 [54:16<37:45:27, 13.91s/it] {'loss': 1.0038, 'learning_rate': 4.8855e-05, 'epoch': 0.3} 2%|▏ | 231/10000 [54:16<37:45:27, 13.91s/it] 2%|▏ | 232/10000 [54:30<37:44:32, 13.91s/it] {'loss': 1.0268, 'learning_rate': 4.885e-05, 'epoch': 0.3} 2%|▏ | 232/10000 [54:30<37:44:32, 13.91s/it] 2%|▏ | 233/10000 [54:44<37:50:08, 13.95s/it] {'loss': 1.226, 'learning_rate': 4.8845000000000004e-05, 'epoch': 0.3} 2%|▏ | 233/10000 [54:44<37:50:08, 13.95s/it] 2%|▏ | 234/10000 [54:58<37:51:53, 13.96s/it] {'loss': 0.9708, 'learning_rate': 4.884e-05, 'epoch': 0.31} 2%|▏ | 234/10000 [54:58<37:51:53, 13.96s/it] 2%|▏ | 235/10000 [55:12<37:44:37, 13.91s/it] {'loss': 0.9059, 'learning_rate': 4.8835e-05, 'epoch': 0.31} 2%|▏ | 235/10000 [55:12<37:44:37, 13.91s/it] 2%|▏ | 236/10000 [55:26<37:44:49, 13.92s/it] {'loss': 1.1241, 'learning_rate': 4.8830000000000005e-05, 'epoch': 0.31} 2%|▏ | 236/10000 [55:26<37:44:49, 13.92s/it] 2%|▏ | 237/10000 [55:40<37:53:25, 13.97s/it] {'loss': 0.9567, 'learning_rate': 4.8825e-05, 'epoch': 0.31} 2%|▏ | 237/10000 [55:40<37:53:25, 13.97s/it] 2%|▏ | 238/10000 [55:54<37:50:33, 13.96s/it] {'loss': 1.1141, 'learning_rate': 4.8820000000000004e-05, 'epoch': 0.31} 2%|▏ | 238/10000 [55:54<37:50:33, 13.96s/it] 2%|▏ | 239/10000 [56:08<37:55:34, 13.99s/it] {'loss': 1.3872, 'learning_rate': 4.8815e-05, 'epoch': 0.31} 2%|▏ | 239/10000 [56:08<37:55:34, 13.99s/it] 2%|▏ | 240/10000 [56:22<37:52:28, 13.97s/it] {'loss': 0.8183, 'learning_rate': 4.881e-05, 'epoch': 0.31} 2%|▏ | 240/10000 [56:22<37:52:28, 13.97s/it] 2%|▏ | 241/10000 [56:35<37:52:12, 13.97s/it] {'loss': 0.9641, 'learning_rate': 4.8805e-05, 'epoch': 0.32} 2%|▏ | 241/10000 [56:36<37:52:12, 13.97s/it] 2%|▏ | 242/10000 [56:49<37:51:59, 13.97s/it] {'loss': 1.0737, 'learning_rate': 4.88e-05, 'epoch': 0.32} 2%|▏ | 242/10000 [56:50<37:51:59, 13.97s/it] 2%|▏ | 243/10000 [57:03<37:54:35, 13.99s/it] {'loss': 1.071, 'learning_rate': 4.8795e-05, 'epoch': 0.32} 2%|▏ | 243/10000 [57:04<37:54:35, 13.99s/it] 2%|▏ | 244/10000 [57:18<37:57:51, 14.01s/it] {'loss': 0.9886, 'learning_rate': 4.8790000000000006e-05, 'epoch': 0.32} 2%|▏ | 244/10000 [57:18<37:57:51, 14.01s/it] 2%|▏ | 245/10000 [57:31<37:46:18, 13.94s/it] {'loss': 1.3119, 'learning_rate': 4.8785e-05, 'epoch': 0.32} 2%|▏ | 245/10000 [57:31<37:46:18, 13.94s/it] 2%|▏ | 246/10000 [57:45<37:53:30, 13.99s/it] {'loss': 0.9687, 'learning_rate': 4.8780000000000004e-05, 'epoch': 0.32} 2%|▏ | 246/10000 [57:45<37:53:30, 13.99s/it] 2%|▏ | 247/10000 [57:59<37:49:54, 13.96s/it] {'loss': 0.9879, 'learning_rate': 4.8775000000000007e-05, 'epoch': 0.32} 2%|▏ | 247/10000 [57:59<37:49:54, 13.96s/it] 2%|▏ | 248/10000 [58:13<37:52:47, 13.98s/it] {'loss': 0.8308, 'learning_rate': 4.877e-05, 'epoch': 0.32} 2%|▏ | 248/10000 [58:13<37:52:47, 13.98s/it] 2%|▏ | 249/10000 [58:27<37:48:15, 13.96s/it] {'loss': 1.1004, 'learning_rate': 4.8765e-05, 'epoch': 0.33} 2%|▏ | 249/10000 [58:27<37:48:15, 13.96s/it] 2%|▎ | 250/10000 [58:41<37:48:12, 13.96s/it] {'loss': 0.951, 'learning_rate': 4.876e-05, 'epoch': 0.33} 2%|▎ | 250/10000 [58:41<37:48:12, 13.96s/it] 3%|▎ | 251/10000 [58:55<37:50:22, 13.97s/it] {'loss': 0.9113, 'learning_rate': 4.8755e-05, 'epoch': 0.33} 3%|▎ | 251/10000 [58:55<37:50:22, 13.97s/it] 3%|▎ | 252/10000 [59:09<37:50:30, 13.98s/it] {'loss': 0.8915, 'learning_rate': 4.875e-05, 'epoch': 0.33} 3%|▎ | 252/10000 [59:09<37:50:30, 13.98s/it] 3%|▎ | 253/10000 [59:23<37:53:57, 14.00s/it] {'loss': 1.001, 'learning_rate': 4.8745e-05, 'epoch': 0.33} 3%|▎ | 253/10000 [59:23<37:53:57, 14.00s/it] 3%|▎ | 254/10000 [59:37<37:47:23, 13.96s/it] {'loss': 1.1171, 'learning_rate': 4.8740000000000004e-05, 'epoch': 0.33} 3%|▎ | 254/10000 [59:37<37:47:23, 13.96s/it] 3%|▎ | 255/10000 [59:51<37:44:16, 13.94s/it] {'loss': 0.9136, 'learning_rate': 4.8735e-05, 'epoch': 0.33} 3%|▎ | 255/10000 [59:51<37:44:16, 13.94s/it] 3%|▎ | 256/10000 [1:00:05<37:40:10, 13.92s/it] {'loss': 1.165, 'learning_rate': 4.873e-05, 'epoch': 0.34} 3%|▎ | 256/10000 [1:00:05<37:40:10, 13.92s/it] 3%|▎ | 257/10000 [1:00:19<37:39:34, 13.92s/it] {'loss': 1.0931, 'learning_rate': 4.8725000000000005e-05, 'epoch': 0.34} 3%|▎ | 257/10000 [1:00:19<37:39:34, 13.92s/it] 3%|▎ | 258/10000 [1:00:33<37:40:26, 13.92s/it] {'loss': 0.8351, 'learning_rate': 4.872000000000001e-05, 'epoch': 0.34} 3%|▎ | 258/10000 [1:00:33<37:40:26, 13.92s/it] 3%|▎ | 259/10000 [1:00:47<37:39:55, 13.92s/it] {'loss': 0.8134, 'learning_rate': 4.8715000000000004e-05, 'epoch': 0.34} 3%|▎ | 259/10000 [1:00:47<37:39:55, 13.92s/it] 3%|▎ | 260/10000 [1:01:01<37:47:36, 13.97s/it] {'loss': 1.2317, 'learning_rate': 4.871e-05, 'epoch': 0.34} 3%|▎ | 260/10000 [1:01:01<37:47:36, 13.97s/it] 3%|▎ | 261/10000 [1:01:15<37:46:02, 13.96s/it] {'loss': 0.9612, 'learning_rate': 4.8705e-05, 'epoch': 0.34} 3%|▎ | 261/10000 [1:01:15<37:46:02, 13.96s/it] 3%|▎ | 262/10000 [1:01:29<37:42:22, 13.94s/it] {'loss': 0.845, 'learning_rate': 4.87e-05, 'epoch': 0.34} 3%|▎ | 262/10000 [1:01:29<37:42:22, 13.94s/it] 3%|▎ | 263/10000 [1:01:42<37:39:41, 13.92s/it] {'loss': 0.9568, 'learning_rate': 4.8695e-05, 'epoch': 0.34} 3%|▎ | 263/10000 [1:01:43<37:39:41, 13.92s/it] 3%|▎ | 264/10000 [1:01:57<37:45:35, 13.96s/it] {'loss': 1.0104, 'learning_rate': 4.869e-05, 'epoch': 0.35} 3%|▎ | 264/10000 [1:01:57<37:45:35, 13.96s/it] 3%|▎ | 265/10000 [1:02:10<37:40:46, 13.93s/it] {'loss': 0.8468, 'learning_rate': 4.8685000000000006e-05, 'epoch': 0.35} 3%|▎ | 265/10000 [1:02:10<37:40:46, 13.93s/it] 3%|▎ | 266/10000 [1:02:24<37:39:54, 13.93s/it] {'loss': 0.9904, 'learning_rate': 4.868e-05, 'epoch': 0.35} 3%|▎ | 266/10000 [1:02:24<37:39:54, 13.93s/it] 3%|▎ | 267/10000 [1:02:38<37:36:05, 13.91s/it] {'loss': 0.9057, 'learning_rate': 4.8675000000000004e-05, 'epoch': 0.35} 3%|▎ | 267/10000 [1:02:38<37:36:05, 13.91s/it] 3%|▎ | 268/10000 [1:02:52<37:35:44, 13.91s/it] {'loss': 1.0353, 'learning_rate': 4.867000000000001e-05, 'epoch': 0.35} 3%|▎ | 268/10000 [1:02:52<37:35:44, 13.91s/it] 3%|▎ | 269/10000 [1:03:06<37:42:56, 13.95s/it] {'loss': 1.161, 'learning_rate': 4.8665e-05, 'epoch': 0.35} 3%|▎ | 269/10000 [1:03:06<37:42:56, 13.95s/it] 3%|▎ | 270/10000 [1:03:20<37:45:13, 13.97s/it] {'loss': 1.0005, 'learning_rate': 4.866e-05, 'epoch': 0.35} 3%|▎ | 270/10000 [1:03:20<37:45:13, 13.97s/it] 3%|▎ | 271/10000 [1:03:34<37:50:19, 14.00s/it] {'loss': 1.1971, 'learning_rate': 4.8655e-05, 'epoch': 0.35} 3%|▎ | 271/10000 [1:03:34<37:50:19, 14.00s/it] 3%|▎ | 272/10000 [1:03:48<37:44:01, 13.96s/it] {'loss': 0.9908, 'learning_rate': 4.8650000000000003e-05, 'epoch': 0.36} 3%|▎ | 272/10000 [1:03:48<37:44:01, 13.96s/it] 3%|▎ | 273/10000 [1:04:02<37:42:51, 13.96s/it] {'loss': 1.1434, 'learning_rate': 4.8645e-05, 'epoch': 0.36} 3%|▎ | 273/10000 [1:04:02<37:42:51, 13.96s/it] 3%|▎ | 274/10000 [1:04:16<37:42:30, 13.96s/it] {'loss': 0.9481, 'learning_rate': 4.864e-05, 'epoch': 0.36} 3%|▎ | 274/10000 [1:04:16<37:42:30, 13.96s/it] 3%|▎ | 275/10000 [1:04:30<37:47:03, 13.99s/it] {'loss': 0.9522, 'learning_rate': 4.8635000000000004e-05, 'epoch': 0.36} 3%|▎ | 275/10000 [1:04:30<37:47:03, 13.99s/it] 3%|▎ | 276/10000 [1:04:44<37:50:12, 14.01s/it] {'loss': 0.9144, 'learning_rate': 4.863e-05, 'epoch': 0.36} 3%|▎ | 276/10000 [1:04:44<37:50:12, 14.01s/it] 3%|▎ | 277/10000 [1:04:58<37:45:28, 13.98s/it] {'loss': 0.9243, 'learning_rate': 4.8625e-05, 'epoch': 0.36} 3%|▎ | 277/10000 [1:04:58<37:45:28, 13.98s/it] 3%|▎ | 278/10000 [1:05:12<37:50:31, 14.01s/it] {'loss': 0.8601, 'learning_rate': 4.8620000000000005e-05, 'epoch': 0.36} 3%|▎ | 278/10000 [1:05:12<37:50:31, 14.01s/it] 3%|▎ | 279/10000 [1:05:26<37:42:36, 13.97s/it] {'loss': 1.0374, 'learning_rate': 4.861500000000001e-05, 'epoch': 0.37} 3%|▎ | 279/10000 [1:05:26<37:42:36, 13.97s/it] 3%|▎ | 280/10000 [1:05:40<37:43:52, 13.97s/it] {'loss': 0.8276, 'learning_rate': 4.861e-05, 'epoch': 0.37} 3%|▎ | 280/10000 [1:05:40<37:43:52, 13.97s/it] 3%|▎ | 281/10000 [1:05:54<37:44:08, 13.98s/it] {'loss': 0.8931, 'learning_rate': 4.8605e-05, 'epoch': 0.37} 3%|▎ | 281/10000 [1:05:54<37:44:08, 13.98s/it] 3%|▎ | 282/10000 [1:06:08<37:35:16, 13.92s/it] {'loss': 1.0224, 'learning_rate': 4.86e-05, 'epoch': 0.37} 3%|▎ | 282/10000 [1:06:08<37:35:16, 13.92s/it] 3%|▎ | 283/10000 [1:06:22<37:37:04, 13.94s/it] {'loss': 0.7966, 'learning_rate': 4.8595000000000005e-05, 'epoch': 0.37} 3%|▎ | 283/10000 [1:06:22<37:37:04, 13.94s/it] 3%|▎ | 284/10000 [1:06:35<37:29:02, 13.89s/it] {'loss': 0.9948, 'learning_rate': 4.859e-05, 'epoch': 0.37} 3%|▎ | 284/10000 [1:06:36<37:29:02, 13.89s/it] 3%|▎ | 285/10000 [1:06:50<37:35:50, 13.93s/it] {'loss': 0.9496, 'learning_rate': 4.8585e-05, 'epoch': 0.37} 3%|▎ | 285/10000 [1:06:50<37:35:50, 13.93s/it] 3%|▎ | 286/10000 [1:07:03<37:29:33, 13.89s/it] {'loss': 0.9637, 'learning_rate': 4.8580000000000006e-05, 'epoch': 0.37} 3%|▎ | 286/10000 [1:07:03<37:29:33, 13.89s/it] 3%|▎ | 287/10000 [1:07:17<37:34:51, 13.93s/it] {'loss': 1.0753, 'learning_rate': 4.8575e-05, 'epoch': 0.38} 3%|▎ | 287/10000 [1:07:17<37:34:51, 13.93s/it] 3%|▎ | 288/10000 [1:07:31<37:36:42, 13.94s/it] {'loss': 1.136, 'learning_rate': 4.8570000000000004e-05, 'epoch': 0.38} 3%|▎ | 288/10000 [1:07:31<37:36:42, 13.94s/it] 3%|▎ | 289/10000 [1:07:45<37:37:10, 13.95s/it] {'loss': 0.9623, 'learning_rate': 4.856500000000001e-05, 'epoch': 0.38} 3%|▎ | 289/10000 [1:07:45<37:37:10, 13.95s/it] 3%|▎ | 290/10000 [1:07:59<37:37:48, 13.95s/it] {'loss': 0.9395, 'learning_rate': 4.856e-05, 'epoch': 0.38} 3%|▎ | 290/10000 [1:07:59<37:37:48, 13.95s/it] 3%|▎ | 291/10000 [1:08:13<37:39:56, 13.97s/it] {'loss': 1.0259, 'learning_rate': 4.8555e-05, 'epoch': 0.38} 3%|▎ | 291/10000 [1:08:13<37:39:56, 13.97s/it] 3%|▎ | 292/10000 [1:08:27<37:36:06, 13.94s/it] {'loss': 1.0602, 'learning_rate': 4.855e-05, 'epoch': 0.38} 3%|▎ | 292/10000 [1:08:27<37:36:06, 13.94s/it] 3%|▎ | 293/10000 [1:08:41<37:34:53, 13.94s/it] {'loss': 0.8122, 'learning_rate': 4.8545000000000004e-05, 'epoch': 0.38} 3%|▎ | 293/10000 [1:08:41<37:34:53, 13.94s/it] 3%|▎ | 294/10000 [1:08:55<37:28:45, 13.90s/it] {'loss': 0.9963, 'learning_rate': 4.854e-05, 'epoch': 0.38} 3%|▎ | 294/10000 [1:08:55<37:28:45, 13.90s/it] 3%|▎ | 295/10000 [1:09:09<37:28:52, 13.90s/it] {'loss': 1.0787, 'learning_rate': 4.8535e-05, 'epoch': 0.39} 3%|▎ | 295/10000 [1:09:09<37:28:52, 13.90s/it] 3%|▎ | 296/10000 [1:09:23<37:26:54, 13.89s/it] {'loss': 1.0264, 'learning_rate': 4.8530000000000005e-05, 'epoch': 0.39} 3%|▎ | 296/10000 [1:09:23<37:26:54, 13.89s/it] 3%|▎ | 297/10000 [1:09:36<37:25:57, 13.89s/it] {'loss': 0.8854, 'learning_rate': 4.8525e-05, 'epoch': 0.39} 3%|▎ | 297/10000 [1:09:37<37:25:57, 13.89s/it] 3%|▎ | 298/10000 [1:09:50<37:22:35, 13.87s/it] {'loss': 1.0472, 'learning_rate': 4.852e-05, 'epoch': 0.39} 3%|▎ | 298/10000 [1:09:50<37:22:35, 13.87s/it] 3%|▎ | 299/10000 [1:10:04<37:30:53, 13.92s/it] {'loss': 0.8604, 'learning_rate': 4.8515000000000006e-05, 'epoch': 0.39} 3%|▎ | 299/10000 [1:10:04<37:30:53, 13.92s/it] 3%|▎ | 300/10000 [1:10:18<37:32:10, 13.93s/it] {'loss': 1.2423, 'learning_rate': 4.851e-05, 'epoch': 0.39} 3%|▎ | 300/10000 [1:10:18<37:32:10, 13.93s/it] 3%|▎ | 301/10000 [1:10:32<37:32:26, 13.93s/it] {'loss': 0.8124, 'learning_rate': 4.8505e-05, 'epoch': 0.39} 3%|▎ | 301/10000 [1:10:32<37:32:26, 13.93s/it] 3%|▎ | 302/10000 [1:10:46<37:41:07, 13.99s/it] {'loss': 0.8976, 'learning_rate': 4.85e-05, 'epoch': 0.4} 3%|▎ | 302/10000 [1:10:46<37:41:07, 13.99s/it] 3%|▎ | 303/10000 [1:11:00<37:33:46, 13.95s/it] {'loss': 0.9854, 'learning_rate': 4.8495e-05, 'epoch': 0.4} 3%|▎ | 303/10000 [1:11:00<37:33:46, 13.95s/it] 3%|▎ | 304/10000 [1:11:14<37:37:26, 13.97s/it] {'loss': 1.1059, 'learning_rate': 4.8490000000000005e-05, 'epoch': 0.4} 3%|▎ | 304/10000 [1:11:14<37:37:26, 13.97s/it] 3%|▎ | 305/10000 [1:11:28<37:34:31, 13.95s/it] {'loss': 0.9352, 'learning_rate': 4.8485e-05, 'epoch': 0.4} 3%|▎ | 305/10000 [1:11:28<37:34:31, 13.95s/it] 3%|▎ | 306/10000 [1:11:42<37:33:12, 13.95s/it] {'loss': 0.8203, 'learning_rate': 4.8480000000000003e-05, 'epoch': 0.4} 3%|▎ | 306/10000 [1:11:42<37:33:12, 13.95s/it] 3%|▎ | 307/10000 [1:11:56<37:31:42, 13.94s/it] {'loss': 0.8752, 'learning_rate': 4.8475000000000006e-05, 'epoch': 0.4} 3%|▎ | 307/10000 [1:11:56<37:31:42, 13.94s/it] 3%|▎ | 308/10000 [1:12:10<37:33:14, 13.95s/it] {'loss': 0.8176, 'learning_rate': 4.847e-05, 'epoch': 0.4} 3%|▎ | 308/10000 [1:12:10<37:33:14, 13.95s/it] 3%|▎ | 309/10000 [1:12:24<37:34:40, 13.96s/it] {'loss': 1.2748, 'learning_rate': 4.8465000000000004e-05, 'epoch': 0.4} 3%|▎ | 309/10000 [1:12:24<37:34:40, 13.96s/it] 3%|▎ | 310/10000 [1:12:38<37:35:49, 13.97s/it] {'loss': 0.8861, 'learning_rate': 4.846e-05, 'epoch': 0.41} 3%|▎ | 310/10000 [1:12:38<37:35:49, 13.97s/it] 3%|▎ | 311/10000 [1:12:52<37:35:49, 13.97s/it] {'loss': 0.8223, 'learning_rate': 4.8455e-05, 'epoch': 0.41} 3%|▎ | 311/10000 [1:12:52<37:35:49, 13.97s/it] 3%|▎ | 312/10000 [1:13:06<37:31:12, 13.94s/it] {'loss': 1.0336, 'learning_rate': 4.845e-05, 'epoch': 0.41} 3%|▎ | 312/10000 [1:13:06<37:31:12, 13.94s/it] 3%|▎ | 313/10000 [1:13:20<37:31:42, 13.95s/it] {'loss': 1.379, 'learning_rate': 4.8445e-05, 'epoch': 0.41} 3%|▎ | 313/10000 [1:13:20<37:31:42, 13.95s/it] 3%|▎ | 314/10000 [1:13:34<37:29:07, 13.93s/it] {'loss': 1.0829, 'learning_rate': 4.8440000000000004e-05, 'epoch': 0.41} 3%|▎ | 314/10000 [1:13:34<37:29:07, 13.93s/it] 3%|▎ | 315/10000 [1:13:48<37:28:19, 13.93s/it] {'loss': 1.0375, 'learning_rate': 4.8435e-05, 'epoch': 0.41} 3%|▎ | 315/10000 [1:13:48<37:28:19, 13.93s/it] 3%|▎ | 316/10000 [1:14:02<37:27:37, 13.93s/it] {'loss': 1.2387, 'learning_rate': 4.843e-05, 'epoch': 0.41} 3%|▎ | 316/10000 [1:14:02<37:27:37, 13.93s/it] 3%|▎ | 317/10000 [1:14:15<37:30:27, 13.94s/it] {'loss': 1.1049, 'learning_rate': 4.8425000000000005e-05, 'epoch': 0.41} 3%|▎ | 317/10000 [1:14:16<37:30:27, 13.94s/it] 3%|▎ | 318/10000 [1:14:29<37:33:17, 13.96s/it] {'loss': 1.0294, 'learning_rate': 4.842000000000001e-05, 'epoch': 0.42} 3%|▎ | 318/10000 [1:14:30<37:33:17, 13.96s/it] 3%|▎ | 319/10000 [1:14:43<37:28:27, 13.94s/it] {'loss': 1.0098, 'learning_rate': 4.8415e-05, 'epoch': 0.42} 3%|▎ | 319/10000 [1:14:43<37:28:27, 13.94s/it] 3%|▎ | 320/10000 [1:14:58<37:40:02, 14.01s/it] {'loss': 1.3153, 'learning_rate': 4.841e-05, 'epoch': 0.42} 3%|▎ | 320/10000 [1:14:58<37:40:02, 14.01s/it] 3%|▎ | 321/10000 [1:15:11<37:28:56, 13.94s/it] {'loss': 1.1627, 'learning_rate': 4.8405e-05, 'epoch': 0.42} 3%|▎ | 321/10000 [1:15:11<37:28:56, 13.94s/it] 3%|▎ | 322/10000 [1:15:25<37:29:16, 13.94s/it] {'loss': 0.9937, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.42} 3%|▎ | 322/10000 [1:15:25<37:29:16, 13.94s/it] 3%|▎ | 323/10000 [1:15:39<37:28:24, 13.94s/it] {'loss': 0.9938, 'learning_rate': 4.8395e-05, 'epoch': 0.42} 3%|▎ | 323/10000 [1:15:39<37:28:24, 13.94s/it] 3%|▎ | 324/10000 [1:15:53<37:24:18, 13.92s/it] {'loss': 0.8659, 'learning_rate': 4.839e-05, 'epoch': 0.42} 3%|▎ | 324/10000 [1:15:53<37:24:18, 13.92s/it] 3%|▎ | 325/10000 [1:16:07<37:23:58, 13.92s/it] {'loss': 0.8741, 'learning_rate': 4.8385000000000005e-05, 'epoch': 0.43} 3%|▎ | 325/10000 [1:16:07<37:23:58, 13.92s/it] 3%|▎ | 326/10000 [1:16:21<37:21:16, 13.90s/it] {'loss': 0.9767, 'learning_rate': 4.838e-05, 'epoch': 0.43} 3%|▎ | 326/10000 [1:16:21<37:21:16, 13.90s/it] 3%|▎ | 327/10000 [1:16:35<37:24:26, 13.92s/it] {'loss': 0.9759, 'learning_rate': 4.8375000000000004e-05, 'epoch': 0.43} 3%|▎ | 327/10000 [1:16:35<37:24:26, 13.92s/it] 3%|▎ | 328/10000 [1:16:49<37:25:39, 13.93s/it] {'loss': 0.8588, 'learning_rate': 4.8370000000000006e-05, 'epoch': 0.43} 3%|▎ | 328/10000 [1:16:49<37:25:39, 13.93s/it] 3%|▎ | 329/10000 [1:17:03<37:29:31, 13.96s/it] {'loss': 0.9538, 'learning_rate': 4.8365e-05, 'epoch': 0.43} 3%|▎ | 329/10000 [1:17:03<37:29:31, 13.96s/it] 3%|▎ | 330/10000 [1:17:17<37:30:19, 13.96s/it] {'loss': 0.841, 'learning_rate': 4.836e-05, 'epoch': 0.43} 3%|▎ | 330/10000 [1:17:17<37:30:19, 13.96s/it] 3%|▎ | 331/10000 [1:17:31<37:28:45, 13.95s/it] {'loss': 1.1929, 'learning_rate': 4.8355e-05, 'epoch': 0.43} 3%|▎ | 331/10000 [1:17:31<37:28:45, 13.95s/it] 3%|▎ | 332/10000 [1:17:45<37:29:09, 13.96s/it] {'loss': 0.897, 'learning_rate': 4.835e-05, 'epoch': 0.43} 3%|▎ | 332/10000 [1:17:45<37:29:09, 13.96s/it] 3%|▎ | 333/10000 [1:17:58<37:21:39, 13.91s/it] {'loss': 0.9758, 'learning_rate': 4.8345e-05, 'epoch': 0.44} 3%|▎ | 333/10000 [1:17:59<37:21:39, 13.91s/it] 3%|▎ | 334/10000 [1:18:12<37:20:16, 13.91s/it] {'loss': 0.9275, 'learning_rate': 4.834e-05, 'epoch': 0.44} 3%|▎ | 334/10000 [1:18:12<37:20:16, 13.91s/it] 3%|▎ | 335/10000 [1:18:26<37:18:54, 13.90s/it] {'loss': 1.2234, 'learning_rate': 4.8335000000000004e-05, 'epoch': 0.44} 3%|▎ | 335/10000 [1:18:26<37:18:54, 13.90s/it] 3%|▎ | 336/10000 [1:18:40<37:25:40, 13.94s/it] {'loss': 0.8679, 'learning_rate': 4.833e-05, 'epoch': 0.44} 3%|▎ | 336/10000 [1:18:40<37:25:40, 13.94s/it] 3%|▎ | 337/10000 [1:18:54<37:30:55, 13.98s/it] {'loss': 1.0121, 'learning_rate': 4.8325e-05, 'epoch': 0.44} 3%|▎ | 337/10000 [1:18:54<37:30:55, 13.98s/it] 3%|▎ | 338/10000 [1:19:08<37:25:26, 13.94s/it] {'loss': 0.8055, 'learning_rate': 4.8320000000000005e-05, 'epoch': 0.44} 3%|▎ | 338/10000 [1:19:08<37:25:26, 13.94s/it] 3%|▎ | 339/10000 [1:19:22<37:31:28, 13.98s/it] {'loss': 0.9001, 'learning_rate': 4.831500000000001e-05, 'epoch': 0.44} 3%|▎ | 339/10000 [1:19:22<37:31:28, 13.98s/it] 3%|▎ | 340/10000 [1:19:36<37:25:57, 13.95s/it] {'loss': 1.0467, 'learning_rate': 4.8309999999999997e-05, 'epoch': 0.45} 3%|▎ | 340/10000 [1:19:36<37:25:57, 13.95s/it] 3%|▎ | 341/10000 [1:19:50<37:20:15, 13.92s/it] {'loss': 1.0771, 'learning_rate': 4.8305e-05, 'epoch': 0.45} 3%|▎ | 341/10000 [1:19:50<37:20:15, 13.92s/it] 3%|▎ | 342/10000 [1:20:04<37:20:02, 13.92s/it] {'loss': 1.0733, 'learning_rate': 4.83e-05, 'epoch': 0.45} 3%|▎ | 342/10000 [1:20:04<37:20:02, 13.92s/it] 3%|▎ | 343/10000 [1:20:18<37:21:49, 13.93s/it] {'loss': 0.8405, 'learning_rate': 4.8295000000000004e-05, 'epoch': 0.45} 3%|▎ | 343/10000 [1:20:18<37:21:49, 13.93s/it] 3%|▎ | 344/10000 [1:20:32<37:22:36, 13.94s/it] {'loss': 0.9773, 'learning_rate': 4.829e-05, 'epoch': 0.45} 3%|▎ | 344/10000 [1:20:32<37:22:36, 13.94s/it] 3%|▎ | 345/10000 [1:20:46<37:15:16, 13.89s/it] {'loss': 1.2372, 'learning_rate': 4.8285e-05, 'epoch': 0.45} 3%|▎ | 345/10000 [1:20:46<37:15:16, 13.89s/it] 3%|▎ | 346/10000 [1:20:59<37:13:44, 13.88s/it] {'loss': 0.9112, 'learning_rate': 4.8280000000000005e-05, 'epoch': 0.45} 3%|▎ | 346/10000 [1:21:00<37:13:44, 13.88s/it] 3%|▎ | 347/10000 [1:21:13<37:19:54, 13.92s/it] {'loss': 0.8111, 'learning_rate': 4.8275e-05, 'epoch': 0.45} 3%|▎ | 347/10000 [1:21:14<37:19:54, 13.92s/it] 3%|▎ | 348/10000 [1:21:27<37:17:50, 13.91s/it] {'loss': 1.0308, 'learning_rate': 4.8270000000000004e-05, 'epoch': 0.46} 3%|▎ | 348/10000 [1:21:27<37:17:50, 13.91s/it] 3%|▎ | 349/10000 [1:21:41<37:20:50, 13.93s/it] {'loss': 0.8704, 'learning_rate': 4.8265000000000006e-05, 'epoch': 0.46} 3%|▎ | 349/10000 [1:21:41<37:20:50, 13.93s/it] 4%|▎ | 350/10000 [1:21:55<37:15:31, 13.90s/it] {'loss': 1.2091, 'learning_rate': 4.826e-05, 'epoch': 0.46} 4%|▎ | 350/10000 [1:21:55<37:15:31, 13.90s/it] 4%|▎ | 351/10000 [1:22:09<37:19:29, 13.93s/it] {'loss': 1.1954, 'learning_rate': 4.8255e-05, 'epoch': 0.46} 4%|▎ | 351/10000 [1:22:09<37:19:29, 13.93s/it] 4%|▎ | 352/10000 [1:22:23<37:23:53, 13.95s/it] {'loss': 0.91, 'learning_rate': 4.825e-05, 'epoch': 0.46} 4%|▎ | 352/10000 [1:22:23<37:23:53, 13.95s/it] 4%|▎ | 353/10000 [1:22:37<37:25:00, 13.96s/it] {'loss': 0.9495, 'learning_rate': 4.8245e-05, 'epoch': 0.46} 4%|▎ | 353/10000 [1:22:37<37:25:00, 13.96s/it] 4%|▎ | 354/10000 [1:22:51<37:16:22, 13.91s/it] {'loss': 0.912, 'learning_rate': 4.824e-05, 'epoch': 0.46} 4%|▎ | 354/10000 [1:22:51<37:16:22, 13.91s/it] 4%|▎ | 355/10000 [1:23:05<37:19:07, 13.93s/it] {'loss': 1.0716, 'learning_rate': 4.8235e-05, 'epoch': 0.46} 4%|▎ | 355/10000 [1:23:05<37:19:07, 13.93s/it] 4%|▎ | 356/10000 [1:23:19<37:20:26, 13.94s/it] {'loss': 0.9116, 'learning_rate': 4.8230000000000004e-05, 'epoch': 0.47} 4%|▎ | 356/10000 [1:23:19<37:20:26, 13.94s/it] 4%|▎ | 357/10000 [1:23:33<37:25:46, 13.97s/it] {'loss': 0.9479, 'learning_rate': 4.822500000000001e-05, 'epoch': 0.47} 4%|▎ | 357/10000 [1:23:33<37:25:46, 13.97s/it] 4%|▎ | 358/10000 [1:23:47<37:24:11, 13.97s/it] {'loss': 1.0415, 'learning_rate': 4.822e-05, 'epoch': 0.47} 4%|▎ | 358/10000 [1:23:47<37:24:11, 13.97s/it] 4%|▎ | 359/10000 [1:24:01<37:30:00, 14.00s/it] {'loss': 0.9703, 'learning_rate': 4.8215000000000005e-05, 'epoch': 0.47} 4%|▎ | 359/10000 [1:24:01<37:30:00, 14.00s/it] 4%|▎ | 360/10000 [1:24:15<37:32:50, 14.02s/it] {'loss': 0.8931, 'learning_rate': 4.821e-05, 'epoch': 0.47} 4%|▎ | 360/10000 [1:24:15<37:32:50, 14.02s/it] 4%|▎ | 361/10000 [1:24:29<37:36:38, 14.05s/it] {'loss': 1.025, 'learning_rate': 4.8205000000000003e-05, 'epoch': 0.47} 4%|▎ | 361/10000 [1:24:29<37:36:38, 14.05s/it] 4%|▎ | 362/10000 [1:24:43<37:29:09, 14.00s/it] {'loss': 0.8893, 'learning_rate': 4.82e-05, 'epoch': 0.47} 4%|▎ | 362/10000 [1:24:43<37:29:09, 14.00s/it] 4%|▎ | 363/10000 [1:24:57<37:27:21, 13.99s/it] {'loss': 0.9929, 'learning_rate': 4.8195e-05, 'epoch': 0.48} 4%|▎ | 363/10000 [1:24:57<37:27:21, 13.99s/it] 4%|▎ | 364/10000 [1:25:11<37:22:34, 13.96s/it] {'loss': 1.2372, 'learning_rate': 4.8190000000000004e-05, 'epoch': 0.48} 4%|▎ | 364/10000 [1:25:11<37:22:34, 13.96s/it] 4%|▎ | 365/10000 [1:25:25<37:22:21, 13.96s/it] {'loss': 0.876, 'learning_rate': 4.8185e-05, 'epoch': 0.48} 4%|▎ | 365/10000 [1:25:25<37:22:21, 13.96s/it] 4%|▎ | 366/10000 [1:25:39<37:19:27, 13.95s/it] {'loss': 0.9533, 'learning_rate': 4.818e-05, 'epoch': 0.48} 4%|▎ | 366/10000 [1:25:39<37:19:27, 13.95s/it] 4%|▎ | 367/10000 [1:25:53<37:19:51, 13.95s/it] {'loss': 0.7874, 'learning_rate': 4.8175000000000005e-05, 'epoch': 0.48} 4%|▎ | 367/10000 [1:25:53<37:19:51, 13.95s/it] 4%|▎ | 368/10000 [1:26:07<37:19:04, 13.95s/it] {'loss': 0.9386, 'learning_rate': 4.817e-05, 'epoch': 0.48} 4%|▎ | 368/10000 [1:26:07<37:19:04, 13.95s/it] 4%|▎ | 369/10000 [1:26:21<37:18:28, 13.95s/it] {'loss': 0.975, 'learning_rate': 4.8165000000000004e-05, 'epoch': 0.48} 4%|▎ | 369/10000 [1:26:21<37:18:28, 13.95s/it] 4%|▎ | 370/10000 [1:26:35<37:18:01, 13.94s/it] {'loss': 1.004, 'learning_rate': 4.816e-05, 'epoch': 0.48} 4%|▎ | 370/10000 [1:26:35<37:18:01, 13.94s/it] 4%|▎ | 371/10000 [1:26:48<37:15:36, 13.93s/it] {'loss': 0.9017, 'learning_rate': 4.8155e-05, 'epoch': 0.49} 4%|▎ | 371/10000 [1:26:49<37:15:36, 13.93s/it] 4%|▎ | 372/10000 [1:27:02<37:14:44, 13.93s/it] {'loss': 0.9697, 'learning_rate': 4.815e-05, 'epoch': 0.49} 4%|▎ | 372/10000 [1:27:02<37:14:44, 13.93s/it] 4%|▎ | 373/10000 [1:27:16<37:14:37, 13.93s/it] {'loss': 1.0979, 'learning_rate': 4.8145e-05, 'epoch': 0.49} 4%|▎ | 373/10000 [1:27:16<37:14:37, 13.93s/it] 4%|▎ | 374/10000 [1:27:30<37:16:22, 13.94s/it] {'loss': 1.0298, 'learning_rate': 4.814e-05, 'epoch': 0.49} 4%|▎ | 374/10000 [1:27:30<37:16:22, 13.94s/it] 4%|▍ | 375/10000 [1:27:44<37:10:23, 13.90s/it] {'loss': 0.9384, 'learning_rate': 4.8135e-05, 'epoch': 0.49} 4%|▍ | 375/10000 [1:27:44<37:10:23, 13.90s/it] 4%|▍ | 376/10000 [1:27:58<37:19:18, 13.96s/it] {'loss': 0.9761, 'learning_rate': 4.813e-05, 'epoch': 0.49} 4%|▍ | 376/10000 [1:27:58<37:19:18, 13.96s/it] 4%|▍ | 377/10000 [1:28:12<37:20:09, 13.97s/it] {'loss': 1.0137, 'learning_rate': 4.8125000000000004e-05, 'epoch': 0.49} 4%|▍ | 377/10000 [1:28:12<37:20:09, 13.97s/it] 4%|▍ | 378/10000 [1:28:26<37:21:54, 13.98s/it] {'loss': 0.9922, 'learning_rate': 4.812000000000001e-05, 'epoch': 0.49} 4%|▍ | 378/10000 [1:28:26<37:21:54, 13.98s/it] 4%|▍ | 379/10000 [1:28:40<37:14:59, 13.94s/it] {'loss': 0.9694, 'learning_rate': 4.8115e-05, 'epoch': 0.5} 4%|▍ | 379/10000 [1:28:40<37:14:59, 13.94s/it] 4%|▍ | 380/10000 [1:28:54<37:12:52, 13.93s/it] {'loss': 1.3526, 'learning_rate': 4.8110000000000005e-05, 'epoch': 0.5} 4%|▍ | 380/10000 [1:28:54<37:12:52, 13.93s/it] 4%|▍ | 381/10000 [1:29:08<37:12:15, 13.92s/it] {'loss': 0.8072, 'learning_rate': 4.8105e-05, 'epoch': 0.5} 4%|▍ | 381/10000 [1:29:08<37:12:15, 13.92s/it] 4%|▍ | 382/10000 [1:29:22<37:10:37, 13.92s/it] {'loss': 1.2042, 'learning_rate': 4.8100000000000004e-05, 'epoch': 0.5} 4%|▍ | 382/10000 [1:29:22<37:10:37, 13.92s/it] 4%|▍ | 383/10000 [1:29:36<37:10:59, 13.92s/it] {'loss': 0.9266, 'learning_rate': 4.8095e-05, 'epoch': 0.5} 4%|▍ | 383/10000 [1:29:36<37:10:59, 13.92s/it] 4%|▍ | 384/10000 [1:29:50<37:10:44, 13.92s/it] {'loss': 0.8513, 'learning_rate': 4.809e-05, 'epoch': 0.5} 4%|▍ | 384/10000 [1:29:50<37:10:44, 13.92s/it] 4%|▍ | 385/10000 [1:30:03<37:05:11, 13.89s/it] {'loss': 1.0164, 'learning_rate': 4.8085000000000005e-05, 'epoch': 0.5} 4%|▍ | 385/10000 [1:30:03<37:05:11, 13.89s/it] 4%|▍ | 386/10000 [1:30:17<37:10:52, 13.92s/it] {'loss': 1.1594, 'learning_rate': 4.808e-05, 'epoch': 0.51} 4%|▍ | 386/10000 [1:30:17<37:10:52, 13.92s/it] 4%|▍ | 387/10000 [1:30:31<37:10:51, 13.92s/it] {'loss': 0.9869, 'learning_rate': 4.8075e-05, 'epoch': 0.51} 4%|▍ | 387/10000 [1:30:31<37:10:51, 13.92s/it] 4%|▍ | 388/10000 [1:30:45<37:11:46, 13.93s/it] {'loss': 0.9873, 'learning_rate': 4.8070000000000006e-05, 'epoch': 0.51} 4%|▍ | 388/10000 [1:30:45<37:11:46, 13.93s/it] 4%|▍ | 389/10000 [1:30:59<37:11:09, 13.93s/it] {'loss': 1.0241, 'learning_rate': 4.8065e-05, 'epoch': 0.51} 4%|▍ | 389/10000 [1:30:59<37:11:09, 13.93s/it] 4%|▍ | 390/10000 [1:31:13<37:08:06, 13.91s/it] {'loss': 0.861, 'learning_rate': 4.8060000000000004e-05, 'epoch': 0.51} 4%|▍ | 390/10000 [1:31:13<37:08:06, 13.91s/it] 4%|▍ | 391/10000 [1:31:27<37:13:11, 13.94s/it] {'loss': 0.8593, 'learning_rate': 4.8055e-05, 'epoch': 0.51} 4%|▍ | 391/10000 [1:31:27<37:13:11, 13.94s/it] 4%|▍ | 392/10000 [1:31:41<37:09:38, 13.92s/it] {'loss': 1.1614, 'learning_rate': 4.805e-05, 'epoch': 0.51} 4%|▍ | 392/10000 [1:31:41<37:09:38, 13.92s/it] 4%|▍ | 393/10000 [1:31:55<37:04:53, 13.90s/it] {'loss': 0.7522, 'learning_rate': 4.8045e-05, 'epoch': 0.51} 4%|▍ | 393/10000 [1:31:55<37:04:53, 13.90s/it] 4%|▍ | 394/10000 [1:32:09<37:08:27, 13.92s/it] {'loss': 0.9687, 'learning_rate': 4.804e-05, 'epoch': 0.52} 4%|▍ | 394/10000 [1:32:09<37:08:27, 13.92s/it] 4%|▍ | 395/10000 [1:32:23<37:06:27, 13.91s/it] {'loss': 1.0756, 'learning_rate': 4.8035000000000003e-05, 'epoch': 0.52} 4%|▍ | 395/10000 [1:32:23<37:06:27, 13.91s/it] 4%|▍ | 396/10000 [1:32:37<37:06:15, 13.91s/it] {'loss': 1.0197, 'learning_rate': 4.8030000000000006e-05, 'epoch': 0.52} 4%|▍ | 396/10000 [1:32:37<37:06:15, 13.91s/it] 4%|▍ | 397/10000 [1:32:50<37:04:01, 13.90s/it] {'loss': 0.9562, 'learning_rate': 4.8025e-05, 'epoch': 0.52} 4%|▍ | 397/10000 [1:32:50<37:04:01, 13.90s/it] 4%|▍ | 398/10000 [1:33:04<37:02:51, 13.89s/it] {'loss': 0.8951, 'learning_rate': 4.8020000000000004e-05, 'epoch': 0.52} 4%|▍ | 398/10000 [1:33:04<37:02:51, 13.89s/it] 4%|▍ | 399/10000 [1:33:18<37:06:32, 13.91s/it] {'loss': 0.9743, 'learning_rate': 4.801500000000001e-05, 'epoch': 0.52} 4%|▍ | 399/10000 [1:33:18<37:06:32, 13.91s/it] 4%|▍ | 400/10000 [1:33:32<37:11:27, 13.95s/it] {'loss': 0.866, 'learning_rate': 4.801e-05, 'epoch': 0.52} 4%|▍ | 400/10000 [1:33:32<37:11:27, 13.95s/it] 4%|▍ | 401/10000 [1:33:46<37:18:53, 13.99s/it] {'loss': 1.1208, 'learning_rate': 4.8005e-05, 'epoch': 0.52} 4%|▍ | 401/10000 [1:33:46<37:18:53, 13.99s/it] 4%|▍ | 402/10000 [1:34:00<37:14:42, 13.97s/it] {'loss': 0.9494, 'learning_rate': 4.8e-05, 'epoch': 0.53} 4%|▍ | 402/10000 [1:34:00<37:14:42, 13.97s/it] 4%|▍ | 403/10000 [1:34:14<37:13:44, 13.97s/it] {'loss': 0.9399, 'learning_rate': 4.7995000000000004e-05, 'epoch': 0.53} 4%|▍ | 403/10000 [1:34:14<37:13:44, 13.97s/it] 4%|▍ | 404/10000 [1:34:28<37:24:32, 14.03s/it] {'loss': 1.022, 'learning_rate': 4.799e-05, 'epoch': 0.53} 4%|▍ | 404/10000 [1:34:29<37:24:32, 14.03s/it] 4%|▍ | 405/10000 [1:34:43<37:25:18, 14.04s/it] {'loss': 1.1854, 'learning_rate': 4.7985e-05, 'epoch': 0.53} 4%|▍ | 405/10000 [1:34:43<37:25:18, 14.04s/it] 4%|▍ | 406/10000 [1:34:57<37:22:09, 14.02s/it] {'loss': 0.8429, 'learning_rate': 4.7980000000000005e-05, 'epoch': 0.53} 4%|▍ | 406/10000 [1:34:57<37:22:09, 14.02s/it] 4%|▍ | 407/10000 [1:35:11<37:20:23, 14.01s/it] {'loss': 0.9398, 'learning_rate': 4.7975e-05, 'epoch': 0.53} 4%|▍ | 407/10000 [1:35:11<37:20:23, 14.01s/it] 4%|▍ | 408/10000 [1:35:24<37:14:24, 13.98s/it] {'loss': 0.7363, 'learning_rate': 4.797e-05, 'epoch': 0.53} 4%|▍ | 408/10000 [1:35:24<37:14:24, 13.98s/it] 4%|▍ | 409/10000 [1:35:38<37:15:06, 13.98s/it] {'loss': 0.84, 'learning_rate': 4.7965000000000006e-05, 'epoch': 0.54} 4%|▍ | 409/10000 [1:35:38<37:15:06, 13.98s/it] 4%|▍ | 410/10000 [1:35:52<37:09:36, 13.95s/it] {'loss': 1.1485, 'learning_rate': 4.796e-05, 'epoch': 0.54} 4%|▍ | 410/10000 [1:35:52<37:09:36, 13.95s/it] 4%|▍ | 411/10000 [1:36:06<37:06:23, 13.93s/it] {'loss': 1.1136, 'learning_rate': 4.7955e-05, 'epoch': 0.54} 4%|▍ | 411/10000 [1:36:06<37:06:23, 13.93s/it] 4%|▍ | 412/10000 [1:36:20<37:05:39, 13.93s/it] {'loss': 0.9627, 'learning_rate': 4.795e-05, 'epoch': 0.54} 4%|▍ | 412/10000 [1:36:20<37:05:39, 13.93s/it] 4%|▍ | 413/10000 [1:36:34<37:10:58, 13.96s/it] {'loss': 1.0065, 'learning_rate': 4.7945e-05, 'epoch': 0.54} 4%|▍ | 413/10000 [1:36:34<37:10:58, 13.96s/it] 4%|▍ | 414/10000 [1:36:48<37:10:31, 13.96s/it] {'loss': 1.0542, 'learning_rate': 4.794e-05, 'epoch': 0.54} 4%|▍ | 414/10000 [1:36:48<37:10:31, 13.96s/it] 4%|▍ | 415/10000 [1:37:02<37:06:37, 13.94s/it] {'loss': 1.0974, 'learning_rate': 4.7935e-05, 'epoch': 0.54} 4%|▍ | 415/10000 [1:37:02<37:06:37, 13.94s/it] 4%|▍ | 416/10000 [1:37:16<37:03:25, 13.92s/it] {'loss': 0.9376, 'learning_rate': 4.7930000000000004e-05, 'epoch': 0.54} 4%|▍ | 416/10000 [1:37:16<37:03:25, 13.92s/it] 4%|▍ | 417/10000 [1:37:30<36:53:18, 13.86s/it] {'loss': 0.8115, 'learning_rate': 4.7925000000000006e-05, 'epoch': 0.55} 4%|▍ | 417/10000 [1:37:30<36:53:18, 13.86s/it] 4%|▍ | 418/10000 [1:37:43<36:55:12, 13.87s/it] {'loss': 0.9539, 'learning_rate': 4.792e-05, 'epoch': 0.55} 4%|▍ | 418/10000 [1:37:43<36:55:12, 13.87s/it] 4%|▍ | 419/10000 [1:37:57<36:51:37, 13.85s/it] {'loss': 0.727, 'learning_rate': 4.7915000000000005e-05, 'epoch': 0.55} 4%|▍ | 419/10000 [1:37:57<36:51:37, 13.85s/it] 4%|▍ | 420/10000 [1:38:11<36:50:55, 13.85s/it] {'loss': 0.877, 'learning_rate': 4.791000000000001e-05, 'epoch': 0.55} 4%|▍ | 420/10000 [1:38:11<36:50:55, 13.85s/it] 4%|▍ | 421/10000 [1:38:25<36:47:17, 13.83s/it] {'loss': 0.9069, 'learning_rate': 4.7905e-05, 'epoch': 0.55} 4%|▍ | 421/10000 [1:38:25<36:47:17, 13.83s/it][2024-11-03 21:56:46,332] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 4%|▍ | 422/10000 [1:38:38<36:17:32, 13.64s/it] {'loss': 0.911, 'learning_rate': 4.7905e-05, 'epoch': 0.55} 4%|▍ | 422/10000 [1:38:38<36:17:32, 13.64s/it] 4%|▍ | 423/10000 [1:38:52<36:31:34, 13.73s/it] {'loss': 0.9617, 'learning_rate': 4.79e-05, 'epoch': 0.55} 4%|▍ | 423/10000 [1:38:52<36:31:34, 13.73s/it] 4%|▍ | 424/10000 [1:39:06<36:42:03, 13.80s/it] {'loss': 0.8047, 'learning_rate': 4.7895e-05, 'epoch': 0.55} 4%|▍ | 424/10000 [1:39:06<36:42:03, 13.80s/it] 4%|▍ | 425/10000 [1:39:20<36:44:43, 13.82s/it] {'loss': 0.8611, 'learning_rate': 4.7890000000000004e-05, 'epoch': 0.56} 4%|▍ | 425/10000 [1:39:20<36:44:43, 13.82s/it] 4%|▍ | 426/10000 [1:39:34<36:48:46, 13.84s/it] {'loss': 0.7866, 'learning_rate': 4.7885e-05, 'epoch': 0.56} 4%|▍ | 426/10000 [1:39:34<36:48:46, 13.84s/it] 4%|▍ | 427/10000 [1:39:48<36:55:00, 13.88s/it] {'loss': 1.046, 'learning_rate': 4.788e-05, 'epoch': 0.56} 4%|▍ | 427/10000 [1:39:48<36:55:00, 13.88s/it] 4%|▍ | 428/10000 [1:40:02<37:04:29, 13.94s/it] {'loss': 0.9482, 'learning_rate': 4.7875000000000005e-05, 'epoch': 0.56} 4%|▍ | 428/10000 [1:40:02<37:04:29, 13.94s/it] 4%|▍ | 429/10000 [1:40:16<37:01:37, 13.93s/it] {'loss': 0.9005, 'learning_rate': 4.787e-05, 'epoch': 0.56} 4%|▍ | 429/10000 [1:40:16<37:01:37, 13.93s/it] 4%|▍ | 430/10000 [1:40:30<37:03:41, 13.94s/it] {'loss': 1.0614, 'learning_rate': 4.7865e-05, 'epoch': 0.56} 4%|▍ | 430/10000 [1:40:30<37:03:41, 13.94s/it] 4%|▍ | 431/10000 [1:40:44<37:11:18, 13.99s/it] {'loss': 1.1554, 'learning_rate': 4.7860000000000006e-05, 'epoch': 0.56} 4%|▍ | 431/10000 [1:40:44<37:11:18, 13.99s/it] 4%|▍ | 432/10000 [1:40:58<37:05:58, 13.96s/it] {'loss': 0.8815, 'learning_rate': 4.7855e-05, 'epoch': 0.57} 4%|▍ | 432/10000 [1:40:58<37:05:58, 13.96s/it] 4%|▍ | 433/10000 [1:41:12<37:02:12, 13.94s/it] {'loss': 1.0509, 'learning_rate': 4.785e-05, 'epoch': 0.57} 4%|▍ | 433/10000 [1:41:12<37:02:12, 13.94s/it] 4%|▍ | 434/10000 [1:41:25<36:56:33, 13.90s/it] {'loss': 0.8848, 'learning_rate': 4.7845e-05, 'epoch': 0.57} 4%|▍ | 434/10000 [1:41:25<36:56:33, 13.90s/it] 4%|▍ | 435/10000 [1:41:39<36:58:52, 13.92s/it] {'loss': 0.961, 'learning_rate': 4.784e-05, 'epoch': 0.57} 4%|▍ | 435/10000 [1:41:39<36:58:52, 13.92s/it] 4%|▍ | 436/10000 [1:41:53<36:54:26, 13.89s/it] {'loss': 0.8655, 'learning_rate': 4.7835000000000005e-05, 'epoch': 0.57} 4%|▍ | 436/10000 [1:41:53<36:54:26, 13.89s/it] 4%|▍ | 437/10000 [1:42:07<37:02:00, 13.94s/it] {'loss': 0.9245, 'learning_rate': 4.783e-05, 'epoch': 0.57} 4%|▍ | 437/10000 [1:42:07<37:02:00, 13.94s/it] 4%|▍ | 438/10000 [1:42:21<37:03:05, 13.95s/it] {'loss': 0.956, 'learning_rate': 4.7825000000000004e-05, 'epoch': 0.57} 4%|▍ | 438/10000 [1:42:21<37:03:05, 13.95s/it] 4%|▍ | 439/10000 [1:42:35<37:08:40, 13.99s/it] {'loss': 1.072, 'learning_rate': 4.7820000000000006e-05, 'epoch': 0.57} 4%|▍ | 439/10000 [1:42:35<37:08:40, 13.99s/it] 4%|▍ | 440/10000 [1:42:49<37:13:07, 14.02s/it] {'loss': 1.0318, 'learning_rate': 4.7815e-05, 'epoch': 0.58} 4%|▍ | 440/10000 [1:42:49<37:13:07, 14.02s/it] 4%|▍ | 441/10000 [1:43:03<37:06:16, 13.97s/it] {'loss': 0.9706, 'learning_rate': 4.7810000000000005e-05, 'epoch': 0.58} 4%|▍ | 441/10000 [1:43:03<37:06:16, 13.97s/it] 4%|▍ | 442/10000 [1:43:17<37:03:23, 13.96s/it] {'loss': 1.1428, 'learning_rate': 4.7805e-05, 'epoch': 0.58} 4%|▍ | 442/10000 [1:43:17<37:03:23, 13.96s/it] 4%|▍ | 443/10000 [1:43:31<37:08:04, 13.99s/it] {'loss': 0.9171, 'learning_rate': 4.78e-05, 'epoch': 0.58} 4%|▍ | 443/10000 [1:43:31<37:08:04, 13.99s/it] 4%|▍ | 444/10000 [1:43:45<37:01:04, 13.95s/it] {'loss': 1.0811, 'learning_rate': 4.7795e-05, 'epoch': 0.58} 4%|▍ | 444/10000 [1:43:45<37:01:04, 13.95s/it] 4%|▍ | 445/10000 [1:43:59<37:06:14, 13.98s/it] {'loss': 0.8371, 'learning_rate': 4.779e-05, 'epoch': 0.58} 4%|▍ | 445/10000 [1:43:59<37:06:14, 13.98s/it] 4%|▍ | 446/10000 [1:44:13<37:02:43, 13.96s/it] {'loss': 0.9252, 'learning_rate': 4.7785000000000004e-05, 'epoch': 0.58} 4%|▍ | 446/10000 [1:44:13<37:02:43, 13.96s/it] 4%|▍ | 447/10000 [1:44:27<37:00:17, 13.95s/it] {'loss': 1.1784, 'learning_rate': 4.778e-05, 'epoch': 0.59} 4%|▍ | 447/10000 [1:44:27<37:00:17, 13.95s/it] 4%|▍ | 448/10000 [1:44:41<37:06:55, 13.99s/it] {'loss': 0.9094, 'learning_rate': 4.7775e-05, 'epoch': 0.59} 4%|▍ | 448/10000 [1:44:41<37:06:55, 13.99s/it] 4%|▍ | 449/10000 [1:44:55<37:05:36, 13.98s/it] {'loss': 1.0783, 'learning_rate': 4.7770000000000005e-05, 'epoch': 0.59} 4%|▍ | 449/10000 [1:44:55<37:05:36, 13.98s/it] 4%|▍ | 450/10000 [1:45:09<37:00:22, 13.95s/it] {'loss': 1.0859, 'learning_rate': 4.7765e-05, 'epoch': 0.59} 4%|▍ | 450/10000 [1:45:09<37:00:22, 13.95s/it] 5%|▍ | 451/10000 [1:45:23<36:58:22, 13.94s/it] {'loss': 0.8835, 'learning_rate': 4.7760000000000004e-05, 'epoch': 0.59} 5%|▍ | 451/10000 [1:45:23<36:58:22, 13.94s/it] 5%|▍ | 452/10000 [1:45:37<36:56:21, 13.93s/it] {'loss': 1.1458, 'learning_rate': 4.7755e-05, 'epoch': 0.59} 5%|▍ | 452/10000 [1:45:37<36:56:21, 13.93s/it] 5%|▍ | 453/10000 [1:45:51<36:59:27, 13.95s/it] {'loss': 0.8774, 'learning_rate': 4.775e-05, 'epoch': 0.59} 5%|▍ | 453/10000 [1:45:51<36:59:27, 13.95s/it] 5%|▍ | 454/10000 [1:46:05<37:01:10, 13.96s/it] {'loss': 1.1058, 'learning_rate': 4.7745e-05, 'epoch': 0.59} 5%|▍ | 454/10000 [1:46:05<37:01:10, 13.96s/it] 5%|▍ | 455/10000 [1:46:19<36:58:20, 13.94s/it] {'loss': 0.8463, 'learning_rate': 4.774e-05, 'epoch': 0.6} 5%|▍ | 455/10000 [1:46:19<36:58:20, 13.94s/it] 5%|▍ | 456/10000 [1:46:32<36:55:25, 13.93s/it] {'loss': 1.0673, 'learning_rate': 4.7735e-05, 'epoch': 0.6} 5%|▍ | 456/10000 [1:46:32<36:55:25, 13.93s/it] 5%|▍ | 457/10000 [1:46:46<36:50:55, 13.90s/it] {'loss': 0.8472, 'learning_rate': 4.7730000000000005e-05, 'epoch': 0.6} 5%|▍ | 457/10000 [1:46:46<36:50:55, 13.90s/it] 5%|▍ | 458/10000 [1:47:00<36:51:55, 13.91s/it] {'loss': 0.9483, 'learning_rate': 4.7725e-05, 'epoch': 0.6} 5%|▍ | 458/10000 [1:47:00<36:51:55, 13.91s/it] 5%|▍ | 459/10000 [1:47:14<36:59:37, 13.96s/it] {'loss': 1.1638, 'learning_rate': 4.7720000000000004e-05, 'epoch': 0.6} 5%|▍ | 459/10000 [1:47:14<36:59:37, 13.96s/it] 5%|▍ | 460/10000 [1:47:28<36:57:57, 13.95s/it] {'loss': 0.7696, 'learning_rate': 4.7715000000000006e-05, 'epoch': 0.6} 5%|▍ | 460/10000 [1:47:28<36:57:57, 13.95s/it] 5%|▍ | 461/10000 [1:47:42<37:00:07, 13.96s/it] {'loss': 0.8679, 'learning_rate': 4.771e-05, 'epoch': 0.6} 5%|▍ | 461/10000 [1:47:42<37:00:07, 13.96s/it] 5%|▍ | 462/10000 [1:47:56<36:54:08, 13.93s/it] {'loss': 0.838, 'learning_rate': 4.7705e-05, 'epoch': 0.6} 5%|▍ | 462/10000 [1:47:56<36:54:08, 13.93s/it] 5%|▍ | 463/10000 [1:48:10<36:55:28, 13.94s/it] {'loss': 0.8732, 'learning_rate': 4.77e-05, 'epoch': 0.61} 5%|▍ | 463/10000 [1:48:10<36:55:28, 13.94s/it] 5%|▍ | 464/10000 [1:48:24<36:59:41, 13.97s/it] {'loss': 0.85, 'learning_rate': 4.7695e-05, 'epoch': 0.61} 5%|▍ | 464/10000 [1:48:24<36:59:41, 13.97s/it] 5%|▍ | 465/10000 [1:48:38<37:02:11, 13.98s/it] {'loss': 0.916, 'learning_rate': 4.769e-05, 'epoch': 0.61} 5%|▍ | 465/10000 [1:48:38<37:02:11, 13.98s/it] 5%|▍ | 466/10000 [1:48:52<37:01:43, 13.98s/it] {'loss': 1.1771, 'learning_rate': 4.7685e-05, 'epoch': 0.61} 5%|▍ | 466/10000 [1:48:52<37:01:43, 13.98s/it] 5%|▍ | 467/10000 [1:49:06<37:04:29, 14.00s/it] {'loss': 0.8578, 'learning_rate': 4.7680000000000004e-05, 'epoch': 0.61} 5%|▍ | 467/10000 [1:49:06<37:04:29, 14.00s/it] 5%|▍ | 468/10000 [1:49:20<36:57:00, 13.96s/it] {'loss': 0.7909, 'learning_rate': 4.7675e-05, 'epoch': 0.61} 5%|▍ | 468/10000 [1:49:20<36:57:00, 13.96s/it] 5%|▍ | 469/10000 [1:49:34<36:50:21, 13.91s/it] {'loss': 1.102, 'learning_rate': 4.767e-05, 'epoch': 0.61} 5%|▍ | 469/10000 [1:49:34<36:50:21, 13.91s/it] 5%|▍ | 470/10000 [1:49:48<36:50:21, 13.92s/it] {'loss': 0.9071, 'learning_rate': 4.7665000000000005e-05, 'epoch': 0.62} 5%|▍ | 470/10000 [1:49:48<36:50:21, 13.92s/it] 5%|▍ | 471/10000 [1:50:01<36:42:17, 13.87s/it] {'loss': 1.4196, 'learning_rate': 4.766000000000001e-05, 'epoch': 0.62} 5%|▍ | 471/10000 [1:50:01<36:42:17, 13.87s/it] 5%|▍ | 472/10000 [1:50:15<36:45:55, 13.89s/it] {'loss': 1.0134, 'learning_rate': 4.7655e-05, 'epoch': 0.62} 5%|▍ | 472/10000 [1:50:15<36:45:55, 13.89s/it] 5%|▍ | 473/10000 [1:50:29<36:44:29, 13.88s/it] {'loss': 1.0439, 'learning_rate': 4.765e-05, 'epoch': 0.62} 5%|▍ | 473/10000 [1:50:29<36:44:29, 13.88s/it] 5%|▍ | 474/10000 [1:50:43<36:51:14, 13.93s/it] {'loss': 1.1512, 'learning_rate': 4.7645e-05, 'epoch': 0.62} 5%|▍ | 474/10000 [1:50:43<36:51:14, 13.93s/it] 5%|▍ | 475/10000 [1:50:57<36:50:23, 13.92s/it] {'loss': 1.1807, 'learning_rate': 4.7640000000000005e-05, 'epoch': 0.62} 5%|▍ | 475/10000 [1:50:57<36:50:23, 13.92s/it] 5%|▍ | 476/10000 [1:51:11<36:46:36, 13.90s/it] {'loss': 0.84, 'learning_rate': 4.7635e-05, 'epoch': 0.62} 5%|▍ | 476/10000 [1:51:11<36:46:36, 13.90s/it] 5%|▍ | 477/10000 [1:51:25<36:48:26, 13.91s/it] {'loss': 0.803, 'learning_rate': 4.763e-05, 'epoch': 0.62} 5%|▍ | 477/10000 [1:51:25<36:48:26, 13.91s/it] 5%|▍ | 478/10000 [1:51:39<36:47:29, 13.91s/it] {'loss': 0.7105, 'learning_rate': 4.7625000000000006e-05, 'epoch': 0.63} 5%|▍ | 478/10000 [1:51:39<36:47:29, 13.91s/it] 5%|▍ | 479/10000 [1:51:53<36:51:10, 13.93s/it] {'loss': 1.1844, 'learning_rate': 4.762e-05, 'epoch': 0.63} 5%|▍ | 479/10000 [1:51:53<36:51:10, 13.93s/it] 5%|▍ | 480/10000 [1:52:07<36:47:20, 13.91s/it] {'loss': 0.8011, 'learning_rate': 4.7615000000000004e-05, 'epoch': 0.63} 5%|▍ | 480/10000 [1:52:07<36:47:20, 13.91s/it] 5%|▍ | 481/10000 [1:52:21<36:50:19, 13.93s/it] {'loss': 0.9138, 'learning_rate': 4.761000000000001e-05, 'epoch': 0.63} 5%|▍ | 481/10000 [1:52:21<36:50:19, 13.93s/it] 5%|▍ | 482/10000 [1:52:35<36:49:01, 13.93s/it] {'loss': 1.0131, 'learning_rate': 4.7605e-05, 'epoch': 0.63} 5%|▍ | 482/10000 [1:52:35<36:49:01, 13.93s/it] 5%|▍ | 483/10000 [1:52:49<36:52:13, 13.95s/it] {'loss': 0.8559, 'learning_rate': 4.76e-05, 'epoch': 0.63} 5%|▍ | 483/10000 [1:52:49<36:52:13, 13.95s/it] 5%|▍ | 484/10000 [1:53:03<36:56:38, 13.98s/it] {'loss': 1.0717, 'learning_rate': 4.7595e-05, 'epoch': 0.63} 5%|▍ | 484/10000 [1:53:03<36:56:38, 13.98s/it] 5%|▍ | 485/10000 [1:53:17<36:51:29, 13.95s/it] {'loss': 1.0077, 'learning_rate': 4.7590000000000003e-05, 'epoch': 0.63} 5%|▍ | 485/10000 [1:53:17<36:51:29, 13.95s/it] 5%|▍ | 486/10000 [1:53:30<36:50:07, 13.94s/it] {'loss': 1.2938, 'learning_rate': 4.7585e-05, 'epoch': 0.64} 5%|▍ | 486/10000 [1:53:31<36:50:07, 13.94s/it] 5%|▍ | 487/10000 [1:53:44<36:53:27, 13.96s/it] {'loss': 0.8785, 'learning_rate': 4.758e-05, 'epoch': 0.64} 5%|▍ | 487/10000 [1:53:45<36:53:27, 13.96s/it] 5%|▍ | 488/10000 [1:53:58<36:51:06, 13.95s/it] {'loss': 0.9354, 'learning_rate': 4.7575000000000004e-05, 'epoch': 0.64} 5%|▍ | 488/10000 [1:53:58<36:51:06, 13.95s/it] 5%|▍ | 489/10000 [1:54:12<36:53:18, 13.96s/it] {'loss': 1.1447, 'learning_rate': 4.757e-05, 'epoch': 0.64} 5%|▍ | 489/10000 [1:54:12<36:53:18, 13.96s/it] 5%|▍ | 490/10000 [1:54:26<36:46:35, 13.92s/it] {'loss': 0.9996, 'learning_rate': 4.7565e-05, 'epoch': 0.64} 5%|▍ | 490/10000 [1:54:26<36:46:35, 13.92s/it] 5%|▍ | 491/10000 [1:54:40<36:44:26, 13.91s/it] {'loss': 1.0106, 'learning_rate': 4.7560000000000005e-05, 'epoch': 0.64} 5%|▍ | 491/10000 [1:54:40<36:44:26, 13.91s/it] 5%|▍ | 492/10000 [1:54:54<36:45:58, 13.92s/it] {'loss': 0.9033, 'learning_rate': 4.7555e-05, 'epoch': 0.64} 5%|▍ | 492/10000 [1:54:54<36:45:58, 13.92s/it] 5%|▍ | 493/10000 [1:55:08<36:48:25, 13.94s/it] {'loss': 1.1715, 'learning_rate': 4.755e-05, 'epoch': 0.65} 5%|▍ | 493/10000 [1:55:08<36:48:25, 13.94s/it] 5%|▍ | 494/10000 [1:55:22<36:48:41, 13.94s/it] {'loss': 0.971, 'learning_rate': 4.7545e-05, 'epoch': 0.65} 5%|▍ | 494/10000 [1:55:22<36:48:41, 13.94s/it] 5%|▍ | 495/10000 [1:55:36<36:44:15, 13.91s/it] {'loss': 0.955, 'learning_rate': 4.754e-05, 'epoch': 0.65} 5%|▍ | 495/10000 [1:55:36<36:44:15, 13.91s/it] 5%|▍ | 496/10000 [1:55:50<36:43:13, 13.91s/it] {'loss': 0.8633, 'learning_rate': 4.7535000000000005e-05, 'epoch': 0.65} 5%|▍ | 496/10000 [1:55:50<36:43:13, 13.91s/it] 5%|▍ | 497/10000 [1:56:04<36:50:51, 13.96s/it] {'loss': 0.8306, 'learning_rate': 4.753e-05, 'epoch': 0.65} 5%|▍ | 497/10000 [1:56:04<36:50:51, 13.96s/it] 5%|▍ | 498/10000 [1:56:18<36:46:50, 13.94s/it] {'loss': 0.8759, 'learning_rate': 4.7525e-05, 'epoch': 0.65} 5%|▍ | 498/10000 [1:56:18<36:46:50, 13.94s/it] 5%|▍ | 499/10000 [1:56:31<36:41:03, 13.90s/it] {'loss': 0.955, 'learning_rate': 4.7520000000000006e-05, 'epoch': 0.65} 5%|▍ | 499/10000 [1:56:32<36:41:03, 13.90s/it] 5%|▌ | 500/10000 [1:56:45<36:40:32, 13.90s/it] {'loss': 0.7848, 'learning_rate': 4.7515e-05, 'epoch': 0.65} 5%|▌ | 500/10000 [1:56:45<36:40:32, 13.90s/it] 5%|▌ | 501/10000 [1:56:59<36:37:29, 13.88s/it] {'loss': 1.1597, 'learning_rate': 4.7510000000000004e-05, 'epoch': 0.66} 5%|▌ | 501/10000 [1:56:59<36:37:29, 13.88s/it] 5%|▌ | 502/10000 [1:57:13<36:38:49, 13.89s/it] {'loss': 0.7209, 'learning_rate': 4.7505e-05, 'epoch': 0.66} 5%|▌ | 502/10000 [1:57:13<36:38:49, 13.89s/it] 5%|▌ | 503/10000 [1:57:27<36:40:28, 13.90s/it] {'loss': 0.8751, 'learning_rate': 4.75e-05, 'epoch': 0.66} 5%|▌ | 503/10000 [1:57:27<36:40:28, 13.90s/it] 5%|▌ | 504/10000 [1:57:41<36:40:05, 13.90s/it] {'loss': 1.0763, 'learning_rate': 4.7495e-05, 'epoch': 0.66} 5%|▌ | 504/10000 [1:57:41<36:40:05, 13.90s/it] 5%|▌ | 505/10000 [1:57:55<36:47:32, 13.95s/it] {'loss': 1.0124, 'learning_rate': 4.749e-05, 'epoch': 0.66} 5%|▌ | 505/10000 [1:57:55<36:47:32, 13.95s/it] 5%|▌ | 506/10000 [1:58:09<36:45:19, 13.94s/it] {'loss': 0.8246, 'learning_rate': 4.7485000000000004e-05, 'epoch': 0.66} 5%|▌ | 506/10000 [1:58:09<36:45:19, 13.94s/it] 5%|▌ | 507/10000 [1:58:23<36:41:03, 13.91s/it] {'loss': 0.9802, 'learning_rate': 4.748e-05, 'epoch': 0.66} 5%|▌ | 507/10000 [1:58:23<36:41:03, 13.91s/it] 5%|▌ | 508/10000 [1:58:37<36:51:24, 13.98s/it] {'loss': 0.8757, 'learning_rate': 4.7475e-05, 'epoch': 0.66} 5%|▌ | 508/10000 [1:58:37<36:51:24, 13.98s/it] 5%|▌ | 509/10000 [1:58:51<36:46:56, 13.95s/it] {'loss': 1.1055, 'learning_rate': 4.7470000000000005e-05, 'epoch': 0.67} 5%|▌ | 509/10000 [1:58:51<36:46:56, 13.95s/it] 5%|▌ | 510/10000 [1:59:05<36:46:38, 13.95s/it] {'loss': 0.9797, 'learning_rate': 4.746500000000001e-05, 'epoch': 0.67} 5%|▌ | 510/10000 [1:59:05<36:46:38, 13.95s/it] 5%|▌ | 511/10000 [1:59:19<36:40:30, 13.91s/it] {'loss': 0.8071, 'learning_rate': 4.746e-05, 'epoch': 0.67} 5%|▌ | 511/10000 [1:59:19<36:40:30, 13.91s/it] 5%|▌ | 512/10000 [1:59:33<36:44:38, 13.94s/it] {'loss': 0.8686, 'learning_rate': 4.7455000000000006e-05, 'epoch': 0.67} 5%|▌ | 512/10000 [1:59:33<36:44:38, 13.94s/it] 5%|▌ | 513/10000 [1:59:46<36:39:43, 13.91s/it] {'loss': 1.0384, 'learning_rate': 4.745e-05, 'epoch': 0.67} 5%|▌ | 513/10000 [1:59:46<36:39:43, 13.91s/it] 5%|▌ | 514/10000 [2:00:00<36:44:38, 13.94s/it] {'loss': 0.7847, 'learning_rate': 4.7445e-05, 'epoch': 0.67} 5%|▌ | 514/10000 [2:00:01<36:44:38, 13.94s/it] 5%|▌ | 515/10000 [2:00:14<36:41:40, 13.93s/it] {'loss': 0.9966, 'learning_rate': 4.744e-05, 'epoch': 0.67} 5%|▌ | 515/10000 [2:00:14<36:41:40, 13.93s/it] 5%|▌ | 516/10000 [2:00:28<36:43:05, 13.94s/it] {'loss': 0.9612, 'learning_rate': 4.7435e-05, 'epoch': 0.68} 5%|▌ | 516/10000 [2:00:28<36:43:05, 13.94s/it] 5%|▌ | 517/10000 [2:00:42<36:35:46, 13.89s/it] {'loss': 0.9641, 'learning_rate': 4.7430000000000005e-05, 'epoch': 0.68} 5%|▌ | 517/10000 [2:00:42<36:35:46, 13.89s/it] 5%|▌ | 518/10000 [2:00:56<36:45:43, 13.96s/it] {'loss': 0.9827, 'learning_rate': 4.7425e-05, 'epoch': 0.68} 5%|▌ | 518/10000 [2:00:56<36:45:43, 13.96s/it] 5%|▌ | 519/10000 [2:01:10<36:45:54, 13.96s/it] {'loss': 0.879, 'learning_rate': 4.742e-05, 'epoch': 0.68} 5%|▌ | 519/10000 [2:01:10<36:45:54, 13.96s/it] 5%|▌ | 520/10000 [2:01:24<36:43:19, 13.95s/it] {'loss': 0.9515, 'learning_rate': 4.7415000000000006e-05, 'epoch': 0.68} 5%|▌ | 520/10000 [2:01:24<36:43:19, 13.95s/it] 5%|▌ | 521/10000 [2:01:38<36:37:43, 13.91s/it] {'loss': 0.8694, 'learning_rate': 4.741e-05, 'epoch': 0.68} 5%|▌ | 521/10000 [2:01:38<36:37:43, 13.91s/it] 5%|▌ | 522/10000 [2:01:52<36:39:01, 13.92s/it] {'loss': 0.8906, 'learning_rate': 4.7405000000000004e-05, 'epoch': 0.68} 5%|▌ | 522/10000 [2:01:52<36:39:01, 13.92s/it] 5%|▌ | 523/10000 [2:02:06<36:40:57, 13.93s/it] {'loss': 0.9185, 'learning_rate': 4.74e-05, 'epoch': 0.68} 5%|▌ | 523/10000 [2:02:06<36:40:57, 13.93s/it] 5%|▌ | 524/10000 [2:02:20<36:38:35, 13.92s/it] {'loss': 1.0126, 'learning_rate': 4.7395e-05, 'epoch': 0.69} 5%|▌ | 524/10000 [2:02:20<36:38:35, 13.92s/it] 5%|▌ | 525/10000 [2:02:34<36:37:04, 13.91s/it] {'loss': 0.9462, 'learning_rate': 4.739e-05, 'epoch': 0.69} 5%|▌ | 525/10000 [2:02:34<36:37:04, 13.91s/it] 5%|▌ | 526/10000 [2:02:47<36:30:53, 13.88s/it] {'loss': 1.103, 'learning_rate': 4.7385e-05, 'epoch': 0.69} 5%|▌ | 526/10000 [2:02:47<36:30:53, 13.88s/it] 5%|▌ | 527/10000 [2:03:01<36:33:06, 13.89s/it] {'loss': 0.9831, 'learning_rate': 4.7380000000000004e-05, 'epoch': 0.69} 5%|▌ | 527/10000 [2:03:01<36:33:06, 13.89s/it] 5%|▌ | 528/10000 [2:03:15<36:27:33, 13.86s/it] {'loss': 0.8049, 'learning_rate': 4.7375e-05, 'epoch': 0.69} 5%|▌ | 528/10000 [2:03:15<36:27:33, 13.86s/it] 5%|▌ | 529/10000 [2:03:29<36:27:32, 13.86s/it] {'loss': 0.8046, 'learning_rate': 4.737e-05, 'epoch': 0.69} 5%|▌ | 529/10000 [2:03:29<36:27:32, 13.86s/it] 5%|▌ | 530/10000 [2:03:43<36:33:57, 13.90s/it] {'loss': 0.9381, 'learning_rate': 4.7365000000000005e-05, 'epoch': 0.69} 5%|▌ | 530/10000 [2:03:43<36:33:57, 13.90s/it] 5%|▌ | 531/10000 [2:03:57<36:34:27, 13.91s/it] {'loss': 0.9118, 'learning_rate': 4.736000000000001e-05, 'epoch': 0.7} 5%|▌ | 531/10000 [2:03:57<36:34:27, 13.91s/it] 5%|▌ | 532/10000 [2:04:11<36:37:16, 13.92s/it] {'loss': 1.0331, 'learning_rate': 4.7355e-05, 'epoch': 0.7} 5%|▌ | 532/10000 [2:04:11<36:37:16, 13.92s/it] 5%|▌ | 533/10000 [2:04:25<36:43:44, 13.97s/it] {'loss': 1.1238, 'learning_rate': 4.735e-05, 'epoch': 0.7} 5%|▌ | 533/10000 [2:04:25<36:43:44, 13.97s/it] 5%|▌ | 534/10000 [2:04:39<36:46:03, 13.98s/it] {'loss': 0.7914, 'learning_rate': 4.7345e-05, 'epoch': 0.7} 5%|▌ | 534/10000 [2:04:39<36:46:03, 13.98s/it] 5%|▌ | 535/10000 [2:04:53<36:50:18, 14.01s/it] {'loss': 0.8408, 'learning_rate': 4.7340000000000004e-05, 'epoch': 0.7} 5%|▌ | 535/10000 [2:04:53<36:50:18, 14.01s/it] 5%|▌ | 536/10000 [2:05:07<36:41:10, 13.96s/it] {'loss': 1.1953, 'learning_rate': 4.7335e-05, 'epoch': 0.7} 5%|▌ | 536/10000 [2:05:07<36:41:10, 13.96s/it] 5%|▌ | 537/10000 [2:05:21<36:33:36, 13.91s/it] {'loss': 1.2695, 'learning_rate': 4.733e-05, 'epoch': 0.7} 5%|▌ | 537/10000 [2:05:21<36:33:36, 13.91s/it] 5%|▌ | 538/10000 [2:05:35<36:32:46, 13.90s/it] {'loss': 0.8392, 'learning_rate': 4.7325000000000005e-05, 'epoch': 0.7} 5%|▌ | 538/10000 [2:05:35<36:32:46, 13.90s/it] 5%|▌ | 539/10000 [2:05:48<36:33:42, 13.91s/it] {'loss': 0.9125, 'learning_rate': 4.732e-05, 'epoch': 0.71} 5%|▌ | 539/10000 [2:05:49<36:33:42, 13.91s/it] 5%|▌ | 540/10000 [2:06:02<36:28:29, 13.88s/it] {'loss': 1.0938, 'learning_rate': 4.7315000000000004e-05, 'epoch': 0.71} 5%|▌ | 540/10000 [2:06:02<36:28:29, 13.88s/it] 5%|▌ | 541/10000 [2:06:16<36:28:20, 13.88s/it] {'loss': 1.3212, 'learning_rate': 4.7310000000000006e-05, 'epoch': 0.71} 5%|▌ | 541/10000 [2:06:16<36:28:20, 13.88s/it] 5%|▌ | 542/10000 [2:06:30<36:26:39, 13.87s/it] {'loss': 0.8802, 'learning_rate': 4.7305e-05, 'epoch': 0.71} 5%|▌ | 542/10000 [2:06:30<36:26:39, 13.87s/it] 5%|▌ | 543/10000 [2:06:44<36:37:33, 13.94s/it] {'loss': 0.8341, 'learning_rate': 4.73e-05, 'epoch': 0.71} 5%|▌ | 543/10000 [2:06:44<36:37:33, 13.94s/it] 5%|▌ | 544/10000 [2:06:58<36:33:15, 13.92s/it] {'loss': 0.999, 'learning_rate': 4.7295e-05, 'epoch': 0.71} 5%|▌ | 544/10000 [2:06:58<36:33:15, 13.92s/it] 5%|▌ | 545/10000 [2:07:12<36:32:48, 13.92s/it] {'loss': 1.2435, 'learning_rate': 4.729e-05, 'epoch': 0.71} 5%|▌ | 545/10000 [2:07:12<36:32:48, 13.92s/it] 5%|▌ | 546/10000 [2:07:26<36:30:25, 13.90s/it] {'loss': 1.0853, 'learning_rate': 4.7285e-05, 'epoch': 0.71} 5%|▌ | 546/10000 [2:07:26<36:30:25, 13.90s/it] 5%|▌ | 547/10000 [2:07:40<36:33:03, 13.92s/it] {'loss': 0.8639, 'learning_rate': 4.728e-05, 'epoch': 0.72} 5%|▌ | 547/10000 [2:07:40<36:33:03, 13.92s/it] 5%|▌ | 548/10000 [2:07:54<36:35:00, 13.93s/it] {'loss': 0.9827, 'learning_rate': 4.7275000000000004e-05, 'epoch': 0.72} 5%|▌ | 548/10000 [2:07:54<36:35:00, 13.93s/it] 5%|▌ | 549/10000 [2:08:08<36:34:45, 13.93s/it] {'loss': 1.1498, 'learning_rate': 4.7270000000000007e-05, 'epoch': 0.72} 5%|▌ | 549/10000 [2:08:08<36:34:45, 13.93s/it] 6%|▌ | 550/10000 [2:08:21<36:31:38, 13.92s/it] {'loss': 0.7856, 'learning_rate': 4.7265e-05, 'epoch': 0.72} 6%|▌ | 550/10000 [2:08:22<36:31:38, 13.92s/it] 6%|▌ | 551/10000 [2:08:35<36:32:04, 13.92s/it] {'loss': 0.8733, 'learning_rate': 4.7260000000000005e-05, 'epoch': 0.72} 6%|▌ | 551/10000 [2:08:35<36:32:04, 13.92s/it] 6%|▌ | 552/10000 [2:08:49<36:36:06, 13.95s/it] {'loss': 1.0324, 'learning_rate': 4.725500000000001e-05, 'epoch': 0.72} 6%|▌ | 552/10000 [2:08:49<36:36:06, 13.95s/it] 6%|▌ | 553/10000 [2:09:03<36:30:21, 13.91s/it] {'loss': 1.1517, 'learning_rate': 4.7249999999999997e-05, 'epoch': 0.72} 6%|▌ | 553/10000 [2:09:03<36:30:21, 13.91s/it] 6%|▌ | 554/10000 [2:09:17<36:29:02, 13.90s/it] {'loss': 1.0397, 'learning_rate': 4.7245e-05, 'epoch': 0.73} 6%|▌ | 554/10000 [2:09:17<36:29:02, 13.90s/it] 6%|▌ | 555/10000 [2:09:31<36:35:25, 13.95s/it] {'loss': 0.9391, 'learning_rate': 4.724e-05, 'epoch': 0.73} 6%|▌ | 555/10000 [2:09:31<36:35:25, 13.95s/it] 6%|▌ | 556/10000 [2:09:45<36:31:47, 13.93s/it] {'loss': 1.0823, 'learning_rate': 4.7235000000000004e-05, 'epoch': 0.73} 6%|▌ | 556/10000 [2:09:45<36:31:47, 13.93s/it] 6%|▌ | 557/10000 [2:09:59<36:30:34, 13.92s/it] {'loss': 0.9634, 'learning_rate': 4.723e-05, 'epoch': 0.73} 6%|▌ | 557/10000 [2:09:59<36:30:34, 13.92s/it] 6%|▌ | 558/10000 [2:10:13<36:28:00, 13.90s/it] {'loss': 1.1104, 'learning_rate': 4.7225e-05, 'epoch': 0.73} 6%|▌ | 558/10000 [2:10:13<36:28:00, 13.90s/it] 6%|▌ | 559/10000 [2:10:27<36:32:05, 13.93s/it] {'loss': 0.9149, 'learning_rate': 4.7220000000000005e-05, 'epoch': 0.73} 6%|▌ | 559/10000 [2:10:27<36:32:05, 13.93s/it] 6%|▌ | 560/10000 [2:10:41<36:36:09, 13.96s/it] {'loss': 0.8582, 'learning_rate': 4.7215e-05, 'epoch': 0.73} 6%|▌ | 560/10000 [2:10:41<36:36:09, 13.96s/it] 6%|▌ | 561/10000 [2:10:55<36:28:13, 13.91s/it] {'loss': 0.7157, 'learning_rate': 4.7210000000000004e-05, 'epoch': 0.73} 6%|▌ | 561/10000 [2:10:55<36:28:13, 13.91s/it] 6%|▌ | 562/10000 [2:11:09<36:28:33, 13.91s/it] {'loss': 0.999, 'learning_rate': 4.7205000000000006e-05, 'epoch': 0.74} 6%|▌ | 562/10000 [2:11:09<36:28:33, 13.91s/it] 6%|▌ | 563/10000 [2:11:22<36:29:12, 13.92s/it] {'loss': 1.0583, 'learning_rate': 4.72e-05, 'epoch': 0.74} 6%|▌ | 563/10000 [2:11:23<36:29:12, 13.92s/it] 6%|▌ | 564/10000 [2:11:36<36:30:06, 13.93s/it] {'loss': 0.9158, 'learning_rate': 4.7195e-05, 'epoch': 0.74} 6%|▌ | 564/10000 [2:11:36<36:30:06, 13.93s/it] 6%|▌ | 565/10000 [2:11:50<36:29:15, 13.92s/it] {'loss': 1.0309, 'learning_rate': 4.719e-05, 'epoch': 0.74} 6%|▌ | 565/10000 [2:11:50<36:29:15, 13.92s/it] 6%|▌ | 566/10000 [2:12:04<36:28:55, 13.92s/it] {'loss': 0.7801, 'learning_rate': 4.7185e-05, 'epoch': 0.74} 6%|▌ | 566/10000 [2:12:04<36:28:55, 13.92s/it] 6%|▌ | 567/10000 [2:12:18<36:26:05, 13.90s/it] {'loss': 0.9798, 'learning_rate': 4.718e-05, 'epoch': 0.74} 6%|▌ | 567/10000 [2:12:18<36:26:05, 13.90s/it] 6%|▌ | 568/10000 [2:12:32<36:20:56, 13.87s/it] {'loss': 0.8442, 'learning_rate': 4.7175e-05, 'epoch': 0.74} 6%|▌ | 568/10000 [2:12:32<36:20:56, 13.87s/it] 6%|▌ | 569/10000 [2:12:46<36:20:18, 13.87s/it] {'loss': 0.9968, 'learning_rate': 4.7170000000000004e-05, 'epoch': 0.74} 6%|▌ | 569/10000 [2:12:46<36:20:18, 13.87s/it] 6%|▌ | 570/10000 [2:13:00<36:22:45, 13.89s/it] {'loss': 0.9027, 'learning_rate': 4.716500000000001e-05, 'epoch': 0.75} 6%|▌ | 570/10000 [2:13:00<36:22:45, 13.89s/it] 6%|▌ | 571/10000 [2:13:14<36:20:37, 13.88s/it] {'loss': 1.0192, 'learning_rate': 4.716e-05, 'epoch': 0.75} 6%|▌ | 571/10000 [2:13:14<36:20:37, 13.88s/it] 6%|▌ | 572/10000 [2:13:28<36:30:57, 13.94s/it] {'loss': 0.7408, 'learning_rate': 4.7155000000000005e-05, 'epoch': 0.75} 6%|▌ | 572/10000 [2:13:28<36:30:57, 13.94s/it] 6%|▌ | 573/10000 [2:13:42<36:28:40, 13.93s/it] {'loss': 0.9749, 'learning_rate': 4.715e-05, 'epoch': 0.75} 6%|▌ | 573/10000 [2:13:42<36:28:40, 13.93s/it] 6%|▌ | 574/10000 [2:13:55<36:22:52, 13.89s/it] {'loss': 0.9011, 'learning_rate': 4.7145000000000003e-05, 'epoch': 0.75} 6%|▌ | 574/10000 [2:13:55<36:22:52, 13.89s/it] 6%|▌ | 575/10000 [2:14:09<36:21:14, 13.89s/it] {'loss': 0.6923, 'learning_rate': 4.714e-05, 'epoch': 0.75} 6%|▌ | 575/10000 [2:14:09<36:21:14, 13.89s/it] 6%|▌ | 576/10000 [2:14:23<36:27:53, 13.93s/it] {'loss': 0.8668, 'learning_rate': 4.7135e-05, 'epoch': 0.75} 6%|▌ | 576/10000 [2:14:23<36:27:53, 13.93s/it] 6%|▌ | 577/10000 [2:14:37<36:22:49, 13.90s/it] {'loss': 0.9996, 'learning_rate': 4.7130000000000004e-05, 'epoch': 0.76} 6%|▌ | 577/10000 [2:14:37<36:22:49, 13.90s/it] 6%|▌ | 578/10000 [2:14:51<36:22:09, 13.90s/it] {'loss': 0.8956, 'learning_rate': 4.7125e-05, 'epoch': 0.76} 6%|▌ | 578/10000 [2:14:51<36:22:09, 13.90s/it] 6%|▌ | 579/10000 [2:15:05<36:16:22, 13.86s/it] {'loss': 0.8906, 'learning_rate': 4.712e-05, 'epoch': 0.76} 6%|▌ | 579/10000 [2:15:05<36:16:22, 13.86s/it] 6%|▌ | 580/10000 [2:15:19<36:27:21, 13.93s/it] {'loss': 0.951, 'learning_rate': 4.7115000000000005e-05, 'epoch': 0.76} 6%|▌ | 580/10000 [2:15:19<36:27:21, 13.93s/it] 6%|▌ | 581/10000 [2:15:33<36:30:03, 13.95s/it] {'loss': 0.9211, 'learning_rate': 4.711e-05, 'epoch': 0.76} 6%|▌ | 581/10000 [2:15:33<36:30:03, 13.95s/it] 6%|▌ | 582/10000 [2:15:47<36:38:16, 14.00s/it] {'loss': 0.8627, 'learning_rate': 4.7105000000000004e-05, 'epoch': 0.76} 6%|▌ | 582/10000 [2:15:47<36:38:16, 14.00s/it] 6%|▌ | 583/10000 [2:16:01<36:32:11, 13.97s/it] {'loss': 1.2368, 'learning_rate': 4.71e-05, 'epoch': 0.76} 6%|▌ | 583/10000 [2:16:01<36:32:11, 13.97s/it] 6%|▌ | 584/10000 [2:16:15<36:29:24, 13.95s/it] {'loss': 0.988, 'learning_rate': 4.7095e-05, 'epoch': 0.76} 6%|▌ | 584/10000 [2:16:15<36:29:24, 13.95s/it] 6%|▌ | 585/10000 [2:16:29<36:25:01, 13.92s/it] {'loss': 0.8478, 'learning_rate': 4.709e-05, 'epoch': 0.77} 6%|▌ | 585/10000 [2:16:29<36:25:01, 13.92s/it] 6%|▌ | 586/10000 [2:16:43<36:20:58, 13.90s/it] {'loss': 0.8419, 'learning_rate': 4.7085e-05, 'epoch': 0.77} 6%|▌ | 586/10000 [2:16:43<36:20:58, 13.90s/it] 6%|▌ | 587/10000 [2:16:56<36:17:36, 13.88s/it] {'loss': 0.9499, 'learning_rate': 4.708e-05, 'epoch': 0.77} 6%|▌ | 587/10000 [2:16:56<36:17:36, 13.88s/it] 6%|▌ | 588/10000 [2:17:11<36:31:28, 13.97s/it] {'loss': 1.0634, 'learning_rate': 4.7075e-05, 'epoch': 0.77} 6%|▌ | 588/10000 [2:17:11<36:31:28, 13.97s/it] 6%|▌ | 589/10000 [2:17:24<36:28:13, 13.95s/it] {'loss': 0.7289, 'learning_rate': 4.707e-05, 'epoch': 0.77} 6%|▌ | 589/10000 [2:17:24<36:28:13, 13.95s/it] 6%|▌ | 590/10000 [2:17:38<36:27:39, 13.95s/it] {'loss': 1.0728, 'learning_rate': 4.7065000000000004e-05, 'epoch': 0.77} 6%|▌ | 590/10000 [2:17:38<36:27:39, 13.95s/it] 6%|▌ | 591/10000 [2:17:52<36:25:18, 13.94s/it] {'loss': 0.9505, 'learning_rate': 4.706000000000001e-05, 'epoch': 0.77} 6%|▌ | 591/10000 [2:17:52<36:25:18, 13.94s/it] 6%|▌ | 592/10000 [2:18:07<36:41:18, 14.04s/it] {'loss': 1.1179, 'learning_rate': 4.7055e-05, 'epoch': 0.77} 6%|▌ | 592/10000 [2:18:07<36:41:18, 14.04s/it] 6%|▌ | 593/10000 [2:18:20<36:26:48, 13.95s/it] {'loss': 1.1087, 'learning_rate': 4.705e-05, 'epoch': 0.78} 6%|▌ | 593/10000 [2:18:20<36:26:48, 13.95s/it] 6%|▌ | 594/10000 [2:18:34<36:28:38, 13.96s/it] {'loss': 0.9962, 'learning_rate': 4.7045e-05, 'epoch': 0.78} 6%|▌ | 594/10000 [2:18:34<36:28:38, 13.96s/it] 6%|▌ | 595/10000 [2:18:48<36:24:55, 13.94s/it] {'loss': 0.8274, 'learning_rate': 4.7040000000000004e-05, 'epoch': 0.78} 6%|▌ | 595/10000 [2:18:48<36:24:55, 13.94s/it] 6%|▌ | 596/10000 [2:19:02<36:26:21, 13.95s/it] {'loss': 1.1123, 'learning_rate': 4.7035e-05, 'epoch': 0.78} 6%|▌ | 596/10000 [2:19:02<36:26:21, 13.95s/it] 6%|▌ | 597/10000 [2:19:16<36:26:55, 13.95s/it] {'loss': 0.9789, 'learning_rate': 4.703e-05, 'epoch': 0.78} 6%|▌ | 597/10000 [2:19:16<36:26:55, 13.95s/it] 6%|▌ | 598/10000 [2:19:30<36:27:08, 13.96s/it] {'loss': 0.8778, 'learning_rate': 4.7025000000000005e-05, 'epoch': 0.78} 6%|▌ | 598/10000 [2:19:30<36:27:08, 13.96s/it] 6%|▌ | 599/10000 [2:19:44<36:24:08, 13.94s/it] {'loss': 1.1416, 'learning_rate': 4.702e-05, 'epoch': 0.78} 6%|▌ | 599/10000 [2:19:44<36:24:08, 13.94s/it] 6%|▌ | 600/10000 [2:19:58<36:25:18, 13.95s/it] {'loss': 0.9456, 'learning_rate': 4.7015e-05, 'epoch': 0.79} 6%|▌ | 600/10000 [2:19:58<36:25:18, 13.95s/it] 6%|▌ | 601/10000 [2:20:12<36:20:30, 13.92s/it] {'loss': 0.9099, 'learning_rate': 4.7010000000000006e-05, 'epoch': 0.79} 6%|▌ | 601/10000 [2:20:12<36:20:30, 13.92s/it] 6%|▌ | 602/10000 [2:20:26<36:23:07, 13.94s/it] {'loss': 0.9017, 'learning_rate': 4.7005e-05, 'epoch': 0.79} 6%|▌ | 602/10000 [2:20:26<36:23:07, 13.94s/it] 6%|▌ | 603/10000 [2:20:40<36:20:11, 13.92s/it] {'loss': 0.9054, 'learning_rate': 4.7e-05, 'epoch': 0.79} 6%|▌ | 603/10000 [2:20:40<36:20:11, 13.92s/it] 6%|▌ | 604/10000 [2:20:53<36:14:33, 13.89s/it] {'loss': 0.9617, 'learning_rate': 4.6995e-05, 'epoch': 0.79} 6%|▌ | 604/10000 [2:20:54<36:14:33, 13.89s/it] 6%|▌ | 605/10000 [2:21:07<36:15:49, 13.90s/it] {'loss': 1.0609, 'learning_rate': 4.699e-05, 'epoch': 0.79} 6%|▌ | 605/10000 [2:21:07<36:15:49, 13.90s/it] 6%|▌ | 606/10000 [2:21:21<36:22:22, 13.94s/it] {'loss': 1.064, 'learning_rate': 4.6985e-05, 'epoch': 0.79} 6%|▌ | 606/10000 [2:21:21<36:22:22, 13.94s/it] 6%|▌ | 607/10000 [2:21:35<36:25:16, 13.96s/it] {'loss': 1.0648, 'learning_rate': 4.698e-05, 'epoch': 0.79} 6%|▌ | 607/10000 [2:21:35<36:25:16, 13.96s/it] 6%|▌ | 608/10000 [2:21:49<36:28:46, 13.98s/it] {'loss': 0.9504, 'learning_rate': 4.6975000000000003e-05, 'epoch': 0.8} 6%|▌ | 608/10000 [2:21:50<36:28:46, 13.98s/it] 6%|▌ | 609/10000 [2:22:03<36:26:54, 13.97s/it] {'loss': 0.9659, 'learning_rate': 4.6970000000000006e-05, 'epoch': 0.8} 6%|▌ | 609/10000 [2:22:03<36:26:54, 13.97s/it] 6%|▌ | 610/10000 [2:22:17<36:21:32, 13.94s/it] {'loss': 1.0206, 'learning_rate': 4.6965e-05, 'epoch': 0.8} 6%|▌ | 610/10000 [2:22:17<36:21:32, 13.94s/it] 6%|▌ | 611/10000 [2:22:31<36:25:37, 13.97s/it] {'loss': 0.8751, 'learning_rate': 4.6960000000000004e-05, 'epoch': 0.8} 6%|▌ | 611/10000 [2:22:31<36:25:37, 13.97s/it] 6%|▌ | 612/10000 [2:22:45<36:22:53, 13.95s/it] {'loss': 0.9523, 'learning_rate': 4.695500000000001e-05, 'epoch': 0.8} 6%|▌ | 612/10000 [2:22:45<36:22:53, 13.95s/it] 6%|▌ | 613/10000 [2:22:59<36:21:21, 13.94s/it] {'loss': 0.7328, 'learning_rate': 4.695e-05, 'epoch': 0.8} 6%|▌ | 613/10000 [2:22:59<36:21:21, 13.94s/it] 6%|▌ | 614/10000 [2:23:13<36:23:25, 13.96s/it] {'loss': 1.194, 'learning_rate': 4.6945e-05, 'epoch': 0.8} 6%|▌ | 614/10000 [2:23:13<36:23:25, 13.96s/it] 6%|▌ | 615/10000 [2:23:27<36:21:14, 13.95s/it] {'loss': 0.8258, 'learning_rate': 4.694e-05, 'epoch': 0.8} 6%|▌ | 615/10000 [2:23:27<36:21:14, 13.95s/it] 6%|▌ | 616/10000 [2:23:41<36:23:46, 13.96s/it] {'loss': 0.862, 'learning_rate': 4.6935000000000004e-05, 'epoch': 0.81} 6%|▌ | 616/10000 [2:23:41<36:23:46, 13.96s/it] 6%|▌ | 617/10000 [2:23:55<36:22:01, 13.95s/it] {'loss': 1.1786, 'learning_rate': 4.693e-05, 'epoch': 0.81} 6%|▌ | 617/10000 [2:23:55<36:22:01, 13.95s/it] 6%|▌ | 618/10000 [2:24:09<36:25:21, 13.98s/it] {'loss': 1.0133, 'learning_rate': 4.6925e-05, 'epoch': 0.81} 6%|▌ | 618/10000 [2:24:09<36:25:21, 13.98s/it] 6%|▌ | 619/10000 [2:24:23<36:20:28, 13.95s/it] {'loss': 1.0225, 'learning_rate': 4.6920000000000005e-05, 'epoch': 0.81} 6%|▌ | 619/10000 [2:24:23<36:20:28, 13.95s/it] 6%|▌ | 620/10000 [2:24:37<36:17:54, 13.93s/it] {'loss': 0.9821, 'learning_rate': 4.6915e-05, 'epoch': 0.81} 6%|▌ | 620/10000 [2:24:37<36:17:54, 13.93s/it] 6%|▌ | 621/10000 [2:24:51<36:11:55, 13.89s/it] {'loss': 0.9788, 'learning_rate': 4.691e-05, 'epoch': 0.81} 6%|▌ | 621/10000 [2:24:51<36:11:55, 13.89s/it] 6%|▌ | 622/10000 [2:25:04<36:10:58, 13.89s/it] {'loss': 0.821, 'learning_rate': 4.6905000000000006e-05, 'epoch': 0.81} 6%|▌ | 622/10000 [2:25:05<36:10:58, 13.89s/it] 6%|▌ | 623/10000 [2:25:18<36:12:46, 13.90s/it] {'loss': 0.988, 'learning_rate': 4.69e-05, 'epoch': 0.82} 6%|▌ | 623/10000 [2:25:18<36:12:46, 13.90s/it] 6%|▌ | 624/10000 [2:25:32<36:12:55, 13.91s/it] {'loss': 0.9283, 'learning_rate': 4.6895e-05, 'epoch': 0.82} 6%|▌ | 624/10000 [2:25:32<36:12:55, 13.91s/it] 6%|▋ | 625/10000 [2:25:46<36:13:20, 13.91s/it] {'loss': 1.0326, 'learning_rate': 4.689e-05, 'epoch': 0.82} 6%|▋ | 625/10000 [2:25:46<36:13:20, 13.91s/it] 6%|▋ | 626/10000 [2:26:00<36:14:52, 13.92s/it] {'loss': 0.7573, 'learning_rate': 4.6885e-05, 'epoch': 0.82} 6%|▋ | 626/10000 [2:26:00<36:14:52, 13.92s/it] 6%|▋ | 627/10000 [2:26:14<36:10:32, 13.89s/it] {'loss': 0.9193, 'learning_rate': 4.688e-05, 'epoch': 0.82} 6%|▋ | 627/10000 [2:26:14<36:10:32, 13.89s/it] 6%|▋ | 628/10000 [2:26:28<36:14:03, 13.92s/it] {'loss': 0.9022, 'learning_rate': 4.6875e-05, 'epoch': 0.82} 6%|▋ | 628/10000 [2:26:28<36:14:03, 13.92s/it] 6%|▋ | 629/10000 [2:26:42<36:17:44, 13.94s/it] {'loss': 1.0624, 'learning_rate': 4.6870000000000004e-05, 'epoch': 0.82} 6%|▋ | 629/10000 [2:26:42<36:17:44, 13.94s/it] 6%|▋ | 630/10000 [2:26:56<36:19:11, 13.95s/it] {'loss': 0.8442, 'learning_rate': 4.6865000000000006e-05, 'epoch': 0.82} 6%|▋ | 630/10000 [2:26:56<36:19:11, 13.95s/it] 6%|▋ | 631/10000 [2:27:10<36:23:02, 13.98s/it] {'loss': 0.9662, 'learning_rate': 4.686e-05, 'epoch': 0.83} 6%|▋ | 631/10000 [2:27:10<36:23:02, 13.98s/it] 6%|▋ | 632/10000 [2:27:24<36:26:35, 14.00s/it] {'loss': 1.0819, 'learning_rate': 4.6855000000000005e-05, 'epoch': 0.83} 6%|▋ | 632/10000 [2:27:24<36:26:35, 14.00s/it] 6%|▋ | 633/10000 [2:27:38<36:20:36, 13.97s/it] {'loss': 0.949, 'learning_rate': 4.685000000000001e-05, 'epoch': 0.83} 6%|▋ | 633/10000 [2:27:38<36:20:36, 13.97s/it] 6%|▋ | 634/10000 [2:27:52<36:33:07, 14.05s/it] {'loss': 0.7761, 'learning_rate': 4.6845e-05, 'epoch': 0.83} 6%|▋ | 634/10000 [2:27:52<36:33:07, 14.05s/it] 6%|▋ | 635/10000 [2:28:06<36:22:28, 13.98s/it] {'loss': 0.9946, 'learning_rate': 4.684e-05, 'epoch': 0.83} 6%|▋ | 635/10000 [2:28:06<36:22:28, 13.98s/it] 6%|▋ | 636/10000 [2:28:20<36:20:18, 13.97s/it] {'loss': 1.016, 'learning_rate': 4.6835e-05, 'epoch': 0.83} 6%|▋ | 636/10000 [2:28:20<36:20:18, 13.97s/it] 6%|▋ | 637/10000 [2:28:34<36:26:35, 14.01s/it] {'loss': 0.9721, 'learning_rate': 4.6830000000000004e-05, 'epoch': 0.83} 6%|▋ | 637/10000 [2:28:34<36:26:35, 14.01s/it] 6%|▋ | 638/10000 [2:28:48<36:18:45, 13.96s/it] {'loss': 0.8492, 'learning_rate': 4.6825e-05, 'epoch': 0.84} 6%|▋ | 638/10000 [2:28:48<36:18:45, 13.96s/it] 6%|▋ | 639/10000 [2:29:02<36:15:13, 13.94s/it] {'loss': 0.9068, 'learning_rate': 4.682e-05, 'epoch': 0.84} 6%|▋ | 639/10000 [2:29:02<36:15:13, 13.94s/it] 6%|▋ | 640/10000 [2:29:16<36:14:30, 13.94s/it] {'loss': 0.9359, 'learning_rate': 4.6815000000000005e-05, 'epoch': 0.84} 6%|▋ | 640/10000 [2:29:16<36:14:30, 13.94s/it] 6%|▋ | 641/10000 [2:29:30<36:07:31, 13.90s/it] {'loss': 0.784, 'learning_rate': 4.681e-05, 'epoch': 0.84} 6%|▋ | 641/10000 [2:29:30<36:07:31, 13.90s/it] 6%|▋ | 642/10000 [2:29:43<36:03:56, 13.87s/it] {'loss': 0.9368, 'learning_rate': 4.6805e-05, 'epoch': 0.84} 6%|▋ | 642/10000 [2:29:43<36:03:56, 13.87s/it] 6%|▋ | 643/10000 [2:29:57<36:02:28, 13.87s/it] {'loss': 0.8724, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.84} 6%|▋ | 643/10000 [2:29:57<36:02:28, 13.87s/it] 6%|▋ | 644/10000 [2:30:11<36:05:07, 13.88s/it] {'loss': 0.8871, 'learning_rate': 4.6795e-05, 'epoch': 0.84} 6%|▋ | 644/10000 [2:30:11<36:05:07, 13.88s/it] 6%|▋ | 645/10000 [2:30:25<36:09:21, 13.91s/it] {'loss': 0.8973, 'learning_rate': 4.679e-05, 'epoch': 0.84} 6%|▋ | 645/10000 [2:30:25<36:09:21, 13.91s/it] 6%|▋ | 646/10000 [2:30:39<36:08:58, 13.91s/it] {'loss': 0.9305, 'learning_rate': 4.6785e-05, 'epoch': 0.85} 6%|▋ | 646/10000 [2:30:39<36:08:58, 13.91s/it] 6%|▋ | 647/10000 [2:30:53<36:09:42, 13.92s/it] {'loss': 1.0288, 'learning_rate': 4.678e-05, 'epoch': 0.85} 6%|▋ | 647/10000 [2:30:53<36:09:42, 13.92s/it] 6%|▋ | 648/10000 [2:31:07<36:17:23, 13.97s/it] {'loss': 0.8545, 'learning_rate': 4.6775000000000005e-05, 'epoch': 0.85} 6%|▋ | 648/10000 [2:31:07<36:17:23, 13.97s/it] 6%|▋ | 649/10000 [2:31:21<36:13:54, 13.95s/it] {'loss': 0.8731, 'learning_rate': 4.677e-05, 'epoch': 0.85} 6%|▋ | 649/10000 [2:31:21<36:13:54, 13.95s/it] 6%|▋ | 650/10000 [2:31:35<36:10:38, 13.93s/it] {'loss': 0.7454, 'learning_rate': 4.6765000000000004e-05, 'epoch': 0.85} 6%|▋ | 650/10000 [2:31:35<36:10:38, 13.93s/it] 7%|▋ | 651/10000 [2:31:49<36:07:53, 13.91s/it] {'loss': 0.9136, 'learning_rate': 4.6760000000000006e-05, 'epoch': 0.85} 7%|▋ | 651/10000 [2:31:49<36:07:53, 13.91s/it] 7%|▋ | 652/10000 [2:32:03<36:12:15, 13.94s/it] {'loss': 0.9764, 'learning_rate': 4.6755e-05, 'epoch': 0.85} 7%|▋ | 652/10000 [2:32:03<36:12:15, 13.94s/it] 7%|▋ | 653/10000 [2:32:17<36:16:39, 13.97s/it] {'loss': 0.7126, 'learning_rate': 4.6750000000000005e-05, 'epoch': 0.85} 7%|▋ | 653/10000 [2:32:17<36:16:39, 13.97s/it] 7%|▋ | 654/10000 [2:32:31<36:09:50, 13.93s/it] {'loss': 0.9466, 'learning_rate': 4.6745e-05, 'epoch': 0.86} 7%|▋ | 654/10000 [2:32:31<36:09:50, 13.93s/it] 7%|▋ | 655/10000 [2:32:45<36:12:29, 13.95s/it] {'loss': 0.9468, 'learning_rate': 4.674e-05, 'epoch': 0.86} 7%|▋ | 655/10000 [2:32:45<36:12:29, 13.95s/it] 7%|▋ | 656/10000 [2:32:59<36:11:18, 13.94s/it] {'loss': 0.8549, 'learning_rate': 4.6735e-05, 'epoch': 0.86} 7%|▋ | 656/10000 [2:32:59<36:11:18, 13.94s/it] 7%|▋ | 657/10000 [2:33:13<36:16:33, 13.98s/it] {'loss': 0.8315, 'learning_rate': 4.673e-05, 'epoch': 0.86} 7%|▋ | 657/10000 [2:33:13<36:16:33, 13.98s/it] 7%|▋ | 658/10000 [2:33:26<36:13:31, 13.96s/it] {'loss': 0.9993, 'learning_rate': 4.6725000000000004e-05, 'epoch': 0.86} 7%|▋ | 658/10000 [2:33:27<36:13:31, 13.96s/it] 7%|▋ | 659/10000 [2:33:40<36:13:48, 13.96s/it] {'loss': 1.0018, 'learning_rate': 4.672e-05, 'epoch': 0.86} 7%|▋ | 659/10000 [2:33:41<36:13:48, 13.96s/it] 7%|▋ | 660/10000 [2:33:54<36:14:43, 13.97s/it] {'loss': 0.9999, 'learning_rate': 4.6715e-05, 'epoch': 0.86} 7%|▋ | 660/10000 [2:33:54<36:14:43, 13.97s/it] 7%|▋ | 661/10000 [2:34:08<36:02:37, 13.89s/it] {'loss': 0.7642, 'learning_rate': 4.6710000000000005e-05, 'epoch': 0.87} 7%|▋ | 661/10000 [2:34:08<36:02:37, 13.89s/it] 7%|▋ | 662/10000 [2:34:22<36:00:13, 13.88s/it] {'loss': 1.0583, 'learning_rate': 4.670500000000001e-05, 'epoch': 0.87} 7%|▋ | 662/10000 [2:34:22<36:00:13, 13.88s/it] 7%|▋ | 663/10000 [2:34:36<36:11:22, 13.95s/it] {'loss': 1.4718, 'learning_rate': 4.6700000000000003e-05, 'epoch': 0.87} 7%|▋ | 663/10000 [2:34:36<36:11:22, 13.95s/it] 7%|▋ | 664/10000 [2:34:50<36:14:40, 13.98s/it] {'loss': 1.0648, 'learning_rate': 4.6695e-05, 'epoch': 0.87} 7%|▋ | 664/10000 [2:34:50<36:14:40, 13.98s/it] 7%|▋ | 665/10000 [2:35:04<36:16:15, 13.99s/it] {'loss': 0.8954, 'learning_rate': 4.669e-05, 'epoch': 0.87} 7%|▋ | 665/10000 [2:35:04<36:16:15, 13.99s/it] 7%|▋ | 666/10000 [2:35:18<36:12:18, 13.96s/it] {'loss': 1.1181, 'learning_rate': 4.6685e-05, 'epoch': 0.87} 7%|▋ | 666/10000 [2:35:18<36:12:18, 13.96s/it] 7%|▋ | 667/10000 [2:35:32<36:09:17, 13.95s/it] {'loss': 0.9017, 'learning_rate': 4.668e-05, 'epoch': 0.87} 7%|▋ | 667/10000 [2:35:32<36:09:17, 13.95s/it] 7%|▋ | 668/10000 [2:35:46<36:09:49, 13.95s/it] {'loss': 0.9929, 'learning_rate': 4.6675e-05, 'epoch': 0.87} 7%|▋ | 668/10000 [2:35:46<36:09:49, 13.95s/it] 7%|▋ | 669/10000 [2:36:00<36:11:02, 13.96s/it] {'loss': 0.9011, 'learning_rate': 4.6670000000000005e-05, 'epoch': 0.88} 7%|▋ | 669/10000 [2:36:00<36:11:02, 13.96s/it] 7%|▋ | 670/10000 [2:36:14<36:07:12, 13.94s/it] {'loss': 0.8968, 'learning_rate': 4.6665e-05, 'epoch': 0.88} 7%|▋ | 670/10000 [2:36:14<36:07:12, 13.94s/it] 7%|▋ | 671/10000 [2:36:28<36:11:50, 13.97s/it] {'loss': 0.9369, 'learning_rate': 4.6660000000000004e-05, 'epoch': 0.88} 7%|▋ | 671/10000 [2:36:28<36:11:50, 13.97s/it] 7%|▋ | 672/10000 [2:36:42<36:14:39, 13.99s/it] {'loss': 0.7612, 'learning_rate': 4.6655000000000006e-05, 'epoch': 0.88} 7%|▋ | 672/10000 [2:36:42<36:14:39, 13.99s/it] 7%|▋ | 673/10000 [2:36:56<36:08:39, 13.95s/it] {'loss': 0.9257, 'learning_rate': 4.665e-05, 'epoch': 0.88} 7%|▋ | 673/10000 [2:36:56<36:08:39, 13.95s/it] 7%|▋ | 674/10000 [2:37:10<36:07:39, 13.95s/it] {'loss': 1.1022, 'learning_rate': 4.6645e-05, 'epoch': 0.88} 7%|▋ | 674/10000 [2:37:10<36:07:39, 13.95s/it] 7%|▋ | 675/10000 [2:37:24<36:06:15, 13.94s/it] {'loss': 1.0178, 'learning_rate': 4.664e-05, 'epoch': 0.88} 7%|▋ | 675/10000 [2:37:24<36:06:15, 13.94s/it] 7%|▋ | 676/10000 [2:37:38<36:06:54, 13.94s/it] {'loss': 0.9943, 'learning_rate': 4.6635e-05, 'epoch': 0.88} 7%|▋ | 676/10000 [2:37:38<36:06:54, 13.94s/it] 7%|▋ | 677/10000 [2:37:51<36:01:33, 13.91s/it] {'loss': 0.8352, 'learning_rate': 4.663e-05, 'epoch': 0.89} 7%|▋ | 677/10000 [2:37:51<36:01:33, 13.91s/it] 7%|▋ | 678/10000 [2:38:05<36:03:31, 13.93s/it] {'loss': 0.9079, 'learning_rate': 4.6625e-05, 'epoch': 0.89} 7%|▋ | 678/10000 [2:38:05<36:03:31, 13.93s/it] 7%|▋ | 679/10000 [2:38:19<36:04:52, 13.94s/it] {'loss': 0.9146, 'learning_rate': 4.6620000000000004e-05, 'epoch': 0.89} 7%|▋ | 679/10000 [2:38:19<36:04:52, 13.94s/it] 7%|▋ | 680/10000 [2:38:33<36:12:14, 13.98s/it] {'loss': 0.8866, 'learning_rate': 4.6615e-05, 'epoch': 0.89} 7%|▋ | 680/10000 [2:38:33<36:12:14, 13.98s/it] 7%|▋ | 681/10000 [2:38:47<36:13:23, 13.99s/it] {'loss': 0.9406, 'learning_rate': 4.661e-05, 'epoch': 0.89} 7%|▋ | 681/10000 [2:38:47<36:13:23, 13.99s/it] 7%|▋ | 682/10000 [2:39:02<36:24:49, 14.07s/it] {'loss': 1.2457, 'learning_rate': 4.6605000000000005e-05, 'epoch': 0.89} 7%|▋ | 682/10000 [2:39:02<36:24:49, 14.07s/it] 7%|▋ | 683/10000 [2:39:16<36:21:58, 14.05s/it] {'loss': 0.9355, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.89} 7%|▋ | 683/10000 [2:39:16<36:21:58, 14.05s/it] 7%|▋ | 684/10000 [2:39:30<36:16:47, 14.02s/it] {'loss': 1.1029, 'learning_rate': 4.6595e-05, 'epoch': 0.9} 7%|▋ | 684/10000 [2:39:30<36:16:47, 14.02s/it] 7%|▋ | 685/10000 [2:39:43<36:08:08, 13.97s/it] {'loss': 1.0058, 'learning_rate': 4.659e-05, 'epoch': 0.9} 7%|▋ | 685/10000 [2:39:44<36:08:08, 13.97s/it] 7%|▋ | 686/10000 [2:39:57<36:03:32, 13.94s/it] {'loss': 0.889, 'learning_rate': 4.6585e-05, 'epoch': 0.9} 7%|▋ | 686/10000 [2:39:57<36:03:32, 13.94s/it] 7%|▋ | 687/10000 [2:40:11<36:01:13, 13.92s/it] {'loss': 0.881, 'learning_rate': 4.6580000000000005e-05, 'epoch': 0.9} 7%|▋ | 687/10000 [2:40:11<36:01:13, 13.92s/it] 7%|▋ | 688/10000 [2:40:25<36:00:35, 13.92s/it] {'loss': 1.0272, 'learning_rate': 4.6575e-05, 'epoch': 0.9} 7%|▋ | 688/10000 [2:40:25<36:00:35, 13.92s/it] 7%|▋ | 689/10000 [2:40:39<35:56:56, 13.90s/it] {'loss': 1.0553, 'learning_rate': 4.657e-05, 'epoch': 0.9} 7%|▋ | 689/10000 [2:40:39<35:56:56, 13.90s/it] 7%|▋ | 690/10000 [2:40:53<35:52:43, 13.87s/it] {'loss': 0.8764, 'learning_rate': 4.6565000000000006e-05, 'epoch': 0.9} 7%|▋ | 690/10000 [2:40:53<35:52:43, 13.87s/it] 7%|▋ | 691/10000 [2:41:07<35:56:37, 13.90s/it] {'loss': 1.2845, 'learning_rate': 4.656e-05, 'epoch': 0.9} 7%|▋ | 691/10000 [2:41:07<35:56:37, 13.90s/it] 7%|▋ | 692/10000 [2:41:21<35:57:07, 13.91s/it] {'loss': 1.0675, 'learning_rate': 4.6555000000000004e-05, 'epoch': 0.91} 7%|▋ | 692/10000 [2:41:21<35:57:07, 13.91s/it] 7%|▋ | 693/10000 [2:41:35<35:58:01, 13.91s/it] {'loss': 0.8433, 'learning_rate': 4.655000000000001e-05, 'epoch': 0.91} 7%|▋ | 693/10000 [2:41:35<35:58:01, 13.91s/it] 7%|▋ | 694/10000 [2:41:48<35:53:05, 13.88s/it] {'loss': 0.8774, 'learning_rate': 4.6545e-05, 'epoch': 0.91} 7%|▋ | 694/10000 [2:41:49<35:53:05, 13.88s/it] 7%|▋ | 695/10000 [2:42:02<35:55:32, 13.90s/it] {'loss': 1.0627, 'learning_rate': 4.654e-05, 'epoch': 0.91} 7%|▋ | 695/10000 [2:42:02<35:55:32, 13.90s/it] 7%|▋ | 696/10000 [2:42:16<35:48:22, 13.85s/it] {'loss': 0.8818, 'learning_rate': 4.6535e-05, 'epoch': 0.91} 7%|▋ | 696/10000 [2:42:16<35:48:22, 13.85s/it] 7%|▋ | 697/10000 [2:42:30<35:44:34, 13.83s/it] {'loss': 0.877, 'learning_rate': 4.6530000000000003e-05, 'epoch': 0.91} 7%|▋ | 697/10000 [2:42:30<35:44:34, 13.83s/it] 7%|▋ | 698/10000 [2:42:44<35:50:50, 13.87s/it] {'loss': 0.956, 'learning_rate': 4.6525e-05, 'epoch': 0.91} 7%|▋ | 698/10000 [2:42:44<35:50:50, 13.87s/it] 7%|▋ | 699/10000 [2:42:58<35:55:58, 13.91s/it] {'loss': 1.1255, 'learning_rate': 4.652e-05, 'epoch': 0.91} 7%|▋ | 699/10000 [2:42:58<35:55:58, 13.91s/it] 7%|▋ | 700/10000 [2:43:12<36:00:35, 13.94s/it] {'loss': 0.8945, 'learning_rate': 4.6515000000000004e-05, 'epoch': 0.92} 7%|▋ | 700/10000 [2:43:12<36:00:35, 13.94s/it] 7%|▋ | 701/10000 [2:43:26<35:55:21, 13.91s/it] {'loss': 1.1845, 'learning_rate': 4.651e-05, 'epoch': 0.92} 7%|▋ | 701/10000 [2:43:26<35:55:21, 13.91s/it] 7%|▋ | 702/10000 [2:43:40<35:59:19, 13.93s/it] {'loss': 0.972, 'learning_rate': 4.6505e-05, 'epoch': 0.92} 7%|▋ | 702/10000 [2:43:40<35:59:19, 13.93s/it] 7%|▋ | 703/10000 [2:43:54<36:00:53, 13.95s/it] {'loss': 1.1082, 'learning_rate': 4.6500000000000005e-05, 'epoch': 0.92} 7%|▋ | 703/10000 [2:43:54<36:00:53, 13.95s/it] 7%|▋ | 704/10000 [2:44:08<35:58:11, 13.93s/it] {'loss': 0.8104, 'learning_rate': 4.6495e-05, 'epoch': 0.92} 7%|▋ | 704/10000 [2:44:08<35:58:11, 13.93s/it] 7%|▋ | 705/10000 [2:44:21<35:53:28, 13.90s/it] {'loss': 0.7955, 'learning_rate': 4.649e-05, 'epoch': 0.92} 7%|▋ | 705/10000 [2:44:21<35:53:28, 13.90s/it] 7%|▋ | 706/10000 [2:44:35<35:50:24, 13.88s/it] {'loss': 0.9673, 'learning_rate': 4.6485e-05, 'epoch': 0.92} 7%|▋ | 706/10000 [2:44:35<35:50:24, 13.88s/it] 7%|▋ | 707/10000 [2:44:49<35:48:48, 13.87s/it] {'loss': 1.0446, 'learning_rate': 4.648e-05, 'epoch': 0.93} 7%|▋ | 707/10000 [2:44:49<35:48:48, 13.87s/it] 7%|▋ | 708/10000 [2:45:03<35:56:37, 13.93s/it] {'loss': 0.9902, 'learning_rate': 4.6475000000000005e-05, 'epoch': 0.93} 7%|▋ | 708/10000 [2:45:03<35:56:37, 13.93s/it] 7%|▋ | 709/10000 [2:45:17<36:02:45, 13.97s/it] {'loss': 0.8445, 'learning_rate': 4.647e-05, 'epoch': 0.93} 7%|▋ | 709/10000 [2:45:17<36:02:45, 13.97s/it] 7%|▋ | 710/10000 [2:45:31<35:55:16, 13.92s/it] {'loss': 0.8785, 'learning_rate': 4.6465e-05, 'epoch': 0.93} 7%|▋ | 710/10000 [2:45:31<35:55:16, 13.92s/it] 7%|▋ | 711/10000 [2:45:45<36:05:35, 13.99s/it] {'loss': 0.9222, 'learning_rate': 4.6460000000000006e-05, 'epoch': 0.93} 7%|▋ | 711/10000 [2:45:45<36:05:35, 13.99s/it] 7%|▋ | 712/10000 [2:45:59<35:59:23, 13.95s/it] {'loss': 0.9375, 'learning_rate': 4.6455e-05, 'epoch': 0.93} 7%|▋ | 712/10000 [2:45:59<35:59:23, 13.95s/it] 7%|▋ | 713/10000 [2:46:13<35:59:28, 13.95s/it] {'loss': 0.9171, 'learning_rate': 4.6450000000000004e-05, 'epoch': 0.93} 7%|▋ | 713/10000 [2:46:13<35:59:28, 13.95s/it] 7%|▋ | 714/10000 [2:46:27<35:59:36, 13.95s/it] {'loss': 0.8032, 'learning_rate': 4.6445e-05, 'epoch': 0.93} 7%|▋ | 714/10000 [2:46:27<35:59:36, 13.95s/it] 7%|▋ | 715/10000 [2:46:41<35:49:04, 13.89s/it] {'loss': 0.9399, 'learning_rate': 4.644e-05, 'epoch': 0.94} 7%|▋ | 715/10000 [2:46:41<35:49:04, 13.89s/it] 7%|▋ | 716/10000 [2:46:54<35:40:34, 13.83s/it] {'loss': 0.9097, 'learning_rate': 4.6435e-05, 'epoch': 0.94} 7%|▋ | 716/10000 [2:46:54<35:40:34, 13.83s/it] 7%|▋ | 717/10000 [2:47:08<35:45:03, 13.86s/it] {'loss': 0.8224, 'learning_rate': 4.643e-05, 'epoch': 0.94} 7%|▋ | 717/10000 [2:47:08<35:45:03, 13.86s/it] 7%|▋ | 718/10000 [2:47:22<35:47:22, 13.88s/it] {'loss': 0.9174, 'learning_rate': 4.6425000000000004e-05, 'epoch': 0.94} 7%|▋ | 718/10000 [2:47:22<35:47:22, 13.88s/it] 7%|▋ | 719/10000 [2:47:36<35:51:10, 13.91s/it] {'loss': 0.9025, 'learning_rate': 4.642e-05, 'epoch': 0.94} 7%|▋ | 719/10000 [2:47:36<35:51:10, 13.91s/it] 7%|▋ | 720/10000 [2:47:50<35:59:51, 13.96s/it] {'loss': 0.8342, 'learning_rate': 4.6415e-05, 'epoch': 0.94} 7%|▋ | 720/10000 [2:47:50<35:59:51, 13.96s/it] 7%|▋ | 721/10000 [2:48:04<36:04:04, 13.99s/it] {'loss': 1.1155, 'learning_rate': 4.6410000000000005e-05, 'epoch': 0.94} 7%|▋ | 721/10000 [2:48:04<36:04:04, 13.99s/it] 7%|▋ | 722/10000 [2:48:18<35:52:28, 13.92s/it] {'loss': 1.1207, 'learning_rate': 4.640500000000001e-05, 'epoch': 0.95} 7%|▋ | 722/10000 [2:48:18<35:52:28, 13.92s/it] 7%|▋ | 723/10000 [2:48:32<35:55:22, 13.94s/it] {'loss': 0.7767, 'learning_rate': 4.64e-05, 'epoch': 0.95} 7%|▋ | 723/10000 [2:48:32<35:55:22, 13.94s/it] 7%|▋ | 724/10000 [2:48:46<36:00:44, 13.98s/it] {'loss': 1.1853, 'learning_rate': 4.6395e-05, 'epoch': 0.95} 7%|▋ | 724/10000 [2:48:46<36:00:44, 13.98s/it] 7%|▋ | 725/10000 [2:49:00<36:05:40, 14.01s/it] {'loss': 0.9474, 'learning_rate': 4.639e-05, 'epoch': 0.95} 7%|▋ | 725/10000 [2:49:00<36:05:40, 14.01s/it] 7%|▋ | 726/10000 [2:49:14<36:02:21, 13.99s/it] {'loss': 0.9361, 'learning_rate': 4.6385000000000004e-05, 'epoch': 0.95} 7%|▋ | 726/10000 [2:49:14<36:02:21, 13.99s/it] 7%|▋ | 727/10000 [2:49:28<35:59:22, 13.97s/it] {'loss': 1.2294, 'learning_rate': 4.638e-05, 'epoch': 0.95} 7%|▋ | 727/10000 [2:49:28<35:59:22, 13.97s/it] 7%|▋ | 728/10000 [2:49:42<35:57:57, 13.96s/it] {'loss': 0.7878, 'learning_rate': 4.6375e-05, 'epoch': 0.95} 7%|▋ | 728/10000 [2:49:42<35:57:57, 13.96s/it] 7%|▋ | 729/10000 [2:49:56<35:54:53, 13.95s/it] {'loss': 0.8989, 'learning_rate': 4.6370000000000005e-05, 'epoch': 0.95} 7%|▋ | 729/10000 [2:49:56<35:54:53, 13.95s/it] 7%|▋ | 730/10000 [2:50:10<35:59:35, 13.98s/it] {'loss': 0.9554, 'learning_rate': 4.6365e-05, 'epoch': 0.96} 7%|▋ | 730/10000 [2:50:10<35:59:35, 13.98s/it] 7%|▋ | 731/10000 [2:50:24<36:00:00, 13.98s/it] {'loss': 0.8646, 'learning_rate': 4.636e-05, 'epoch': 0.96} 7%|▋ | 731/10000 [2:50:24<36:00:00, 13.98s/it] 7%|▋ | 732/10000 [2:50:38<36:03:56, 14.01s/it] {'loss': 1.0831, 'learning_rate': 4.6355000000000006e-05, 'epoch': 0.96} 7%|▋ | 732/10000 [2:50:38<36:03:56, 14.01s/it] 7%|▋ | 733/10000 [2:50:52<36:04:00, 14.01s/it] {'loss': 0.9, 'learning_rate': 4.635e-05, 'epoch': 0.96} 7%|▋ | 733/10000 [2:50:52<36:04:00, 14.01s/it] 7%|▋ | 734/10000 [2:51:06<36:08:28, 14.04s/it] {'loss': 1.0377, 'learning_rate': 4.6345e-05, 'epoch': 0.96} 7%|▋ | 734/10000 [2:51:06<36:08:28, 14.04s/it] 7%|▋ | 735/10000 [2:51:20<36:08:10, 14.04s/it] {'loss': 0.9044, 'learning_rate': 4.634e-05, 'epoch': 0.96} 7%|▋ | 735/10000 [2:51:20<36:08:10, 14.04s/it] 7%|▋ | 736/10000 [2:51:34<36:09:36, 14.05s/it] {'loss': 1.0234, 'learning_rate': 4.6335e-05, 'epoch': 0.96} 7%|▋ | 736/10000 [2:51:34<36:09:36, 14.05s/it] 7%|▋ | 737/10000 [2:51:48<36:08:35, 14.05s/it] {'loss': 0.915, 'learning_rate': 4.633e-05, 'epoch': 0.96} 7%|▋ | 737/10000 [2:51:48<36:08:35, 14.05s/it] 7%|▋ | 738/10000 [2:52:02<36:05:30, 14.03s/it] {'loss': 0.8516, 'learning_rate': 4.6325e-05, 'epoch': 0.97} 7%|▋ | 738/10000 [2:52:02<36:05:30, 14.03s/it] 7%|▋ | 739/10000 [2:52:16<36:01:30, 14.00s/it] {'loss': 0.9338, 'learning_rate': 4.6320000000000004e-05, 'epoch': 0.97} 7%|▋ | 739/10000 [2:52:16<36:01:30, 14.00s/it] 7%|▋ | 740/10000 [2:52:30<35:55:40, 13.97s/it] {'loss': 0.9885, 'learning_rate': 4.6315e-05, 'epoch': 0.97} 7%|▋ | 740/10000 [2:52:30<35:55:40, 13.97s/it] 7%|▋ | 741/10000 [2:52:44<35:52:09, 13.95s/it] {'loss': 1.0036, 'learning_rate': 4.631e-05, 'epoch': 0.97} 7%|▋ | 741/10000 [2:52:44<35:52:09, 13.95s/it] 7%|▋ | 742/10000 [2:52:58<35:58:06, 13.99s/it] {'loss': 0.8386, 'learning_rate': 4.6305000000000005e-05, 'epoch': 0.97} 7%|▋ | 742/10000 [2:52:58<35:58:06, 13.99s/it] 7%|▋ | 743/10000 [2:53:12<35:56:09, 13.98s/it] {'loss': 0.7953, 'learning_rate': 4.630000000000001e-05, 'epoch': 0.97} 7%|▋ | 743/10000 [2:53:12<35:56:09, 13.98s/it] 7%|▋ | 744/10000 [2:53:26<35:57:16, 13.98s/it] {'loss': 0.989, 'learning_rate': 4.6294999999999996e-05, 'epoch': 0.97} 7%|▋ | 744/10000 [2:53:26<35:57:16, 13.98s/it] 7%|▋ | 745/10000 [2:53:40<35:58:02, 13.99s/it] {'loss': 1.0504, 'learning_rate': 4.629e-05, 'epoch': 0.98} 7%|▋ | 745/10000 [2:53:40<35:58:02, 13.99s/it] 7%|▋ | 746/10000 [2:53:54<35:56:53, 13.98s/it] {'loss': 0.8721, 'learning_rate': 4.6285e-05, 'epoch': 0.98} 7%|▋ | 746/10000 [2:53:54<35:56:53, 13.98s/it] 7%|▋ | 747/10000 [2:54:08<35:57:04, 13.99s/it] {'loss': 0.8353, 'learning_rate': 4.6280000000000004e-05, 'epoch': 0.98} 7%|▋ | 747/10000 [2:54:08<35:57:04, 13.99s/it] 7%|▋ | 748/10000 [2:54:22<35:50:35, 13.95s/it] {'loss': 0.7908, 'learning_rate': 4.6275e-05, 'epoch': 0.98} 7%|▋ | 748/10000 [2:54:22<35:50:35, 13.95s/it] 7%|▋ | 749/10000 [2:54:36<35:51:31, 13.95s/it] {'loss': 0.9932, 'learning_rate': 4.627e-05, 'epoch': 0.98} 7%|▋ | 749/10000 [2:54:36<35:51:31, 13.95s/it] 8%|▊ | 750/10000 [2:54:50<35:50:00, 13.95s/it] {'loss': 0.8553, 'learning_rate': 4.6265000000000005e-05, 'epoch': 0.98} 8%|▊ | 750/10000 [2:54:50<35:50:00, 13.95s/it] 8%|▊ | 751/10000 [2:55:04<35:51:01, 13.95s/it] {'loss': 0.7297, 'learning_rate': 4.626e-05, 'epoch': 0.98} 8%|▊ | 751/10000 [2:55:04<35:51:01, 13.95s/it] 8%|▊ | 752/10000 [2:55:18<35:47:04, 13.93s/it] {'loss': 0.8229, 'learning_rate': 4.6255000000000004e-05, 'epoch': 0.98} 8%|▊ | 752/10000 [2:55:18<35:47:04, 13.93s/it] 8%|▊ | 753/10000 [2:55:32<35:47:06, 13.93s/it] {'loss': 0.9847, 'learning_rate': 4.6250000000000006e-05, 'epoch': 0.99} 8%|▊ | 753/10000 [2:55:32<35:47:06, 13.93s/it] 8%|▊ | 754/10000 [2:55:46<35:44:47, 13.92s/it] {'loss': 1.0427, 'learning_rate': 4.6245e-05, 'epoch': 0.99} 8%|▊ | 754/10000 [2:55:46<35:44:47, 13.92s/it] 8%|▊ | 755/10000 [2:55:59<35:40:46, 13.89s/it] {'loss': 0.9766, 'learning_rate': 4.624e-05, 'epoch': 0.99} 8%|▊ | 755/10000 [2:55:59<35:40:46, 13.89s/it] 8%|▊ | 756/10000 [2:56:13<35:40:58, 13.90s/it] {'loss': 0.8629, 'learning_rate': 4.6235e-05, 'epoch': 0.99} 8%|▊ | 756/10000 [2:56:13<35:40:58, 13.90s/it] 8%|▊ | 757/10000 [2:56:27<35:37:59, 13.88s/it] {'loss': 1.141, 'learning_rate': 4.623e-05, 'epoch': 0.99} 8%|▊ | 757/10000 [2:56:27<35:37:59, 13.88s/it] 8%|▊ | 758/10000 [2:56:41<35:35:59, 13.87s/it] {'loss': 1.0004, 'learning_rate': 4.6225e-05, 'epoch': 0.99} 8%|▊ | 758/10000 [2:56:41<35:35:59, 13.87s/it] 8%|▊ | 759/10000 [2:56:55<35:40:17, 13.90s/it] {'loss': 1.0183, 'learning_rate': 4.622e-05, 'epoch': 0.99} 8%|▊ | 759/10000 [2:56:55<35:40:17, 13.90s/it] 8%|▊ | 760/10000 [2:57:09<35:45:31, 13.93s/it] {'loss': 1.2385, 'learning_rate': 4.6215000000000004e-05, 'epoch': 0.99} 8%|▊ | 760/10000 [2:57:09<35:45:31, 13.93s/it] 8%|▊ | 761/10000 [2:57:23<35:45:35, 13.93s/it] {'loss': 0.8413, 'learning_rate': 4.6210000000000006e-05, 'epoch': 1.0} 8%|▊ | 761/10000 [2:57:23<35:45:35, 13.93s/it] 8%|▊ | 762/10000 [2:57:37<35:47:55, 13.95s/it] {'loss': 0.8755, 'learning_rate': 4.6205e-05, 'epoch': 1.0} 8%|▊ | 762/10000 [2:57:37<35:47:55, 13.95s/it] 8%|▊ | 763/10000 [2:57:51<35:52:47, 13.98s/it] {'loss': 0.7853, 'learning_rate': 4.6200000000000005e-05, 'epoch': 1.0} 8%|▊ | 763/10000 [2:57:51<35:52:47, 13.98s/it] 8%|▊ | 764/10000 [2:58:03<34:46:58, 13.56s/it] {'loss': 0.7659, 'learning_rate': 4.619500000000001e-05, 'epoch': 1.0} 8%|▊ | 764/10000 [2:58:03<34:46:58, 13.56s/it] 8%|▊ | 765/10000 [2:58:17<35:06:46, 13.69s/it] {'loss': 0.6121, 'learning_rate': 4.619e-05, 'epoch': 1.0} 8%|▊ | 765/10000 [2:58:17<35:06:46, 13.69s/it] 8%|▊ | 766/10000 [2:58:31<35:18:57, 13.77s/it] {'loss': 0.8022, 'learning_rate': 4.6185e-05, 'epoch': 1.0} 8%|▊ | 766/10000 [2:58:31<35:18:57, 13.77s/it] 8%|▊ | 767/10000 [2:58:46<35:36:04, 13.88s/it] {'loss': 0.6172, 'learning_rate': 4.618e-05, 'epoch': 1.0} 8%|▊ | 767/10000 [2:58:46<35:36:04, 13.88s/it] 8%|▊ | 768/10000 [2:59:00<35:44:48, 13.94s/it] {'loss': 0.6093, 'learning_rate': 4.6175000000000004e-05, 'epoch': 1.01} 8%|▊ | 768/10000 [2:59:00<35:44:48, 13.94s/it] 8%|▊ | 769/10000 [2:59:14<35:43:52, 13.93s/it] {'loss': 0.5324, 'learning_rate': 4.617e-05, 'epoch': 1.01} 8%|▊ | 769/10000 [2:59:14<35:43:52, 13.93s/it] 8%|▊ | 770/10000 [2:59:27<35:39:38, 13.91s/it] {'loss': 0.6579, 'learning_rate': 4.6165e-05, 'epoch': 1.01} 8%|▊ | 770/10000 [2:59:27<35:39:38, 13.91s/it] 8%|▊ | 771/10000 [2:59:41<35:39:43, 13.91s/it] {'loss': 0.5984, 'learning_rate': 4.6160000000000005e-05, 'epoch': 1.01} 8%|▊ | 771/10000 [2:59:41<35:39:43, 13.91s/it] 8%|▊ | 772/10000 [2:59:55<35:42:07, 13.93s/it] {'loss': 0.5234, 'learning_rate': 4.6155e-05, 'epoch': 1.01} 8%|▊ | 772/10000 [2:59:55<35:42:07, 13.93s/it] 8%|▊ | 773/10000 [3:00:09<35:37:26, 13.90s/it] {'loss': 0.7531, 'learning_rate': 4.6150000000000004e-05, 'epoch': 1.01} 8%|▊ | 773/10000 [3:00:09<35:37:26, 13.90s/it] 8%|▊ | 774/10000 [3:00:23<35:42:04, 13.93s/it] {'loss': 0.5738, 'learning_rate': 4.6145000000000006e-05, 'epoch': 1.01} 8%|▊ | 774/10000 [3:00:23<35:42:04, 13.93s/it] 8%|▊ | 775/10000 [3:00:37<35:45:08, 13.95s/it] {'loss': 0.6364, 'learning_rate': 4.614e-05, 'epoch': 1.01} 8%|▊ | 775/10000 [3:00:37<35:45:08, 13.95s/it] 8%|▊ | 776/10000 [3:00:51<35:41:51, 13.93s/it] {'loss': 0.7311, 'learning_rate': 4.6135e-05, 'epoch': 1.02} 8%|▊ | 776/10000 [3:00:51<35:41:51, 13.93s/it] 8%|▊ | 777/10000 [3:01:05<35:44:35, 13.95s/it] {'loss': 0.5477, 'learning_rate': 4.613e-05, 'epoch': 1.02} 8%|▊ | 777/10000 [3:01:05<35:44:35, 13.95s/it] 8%|▊ | 778/10000 [3:01:19<35:44:20, 13.95s/it] {'loss': 0.5899, 'learning_rate': 4.6125e-05, 'epoch': 1.02} 8%|▊ | 778/10000 [3:01:19<35:44:20, 13.95s/it] 8%|▊ | 779/10000 [3:01:33<35:34:04, 13.89s/it] {'loss': 0.5699, 'learning_rate': 4.612e-05, 'epoch': 1.02} 8%|▊ | 779/10000 [3:01:33<35:34:04, 13.89s/it] 8%|▊ | 780/10000 [3:01:47<35:36:14, 13.90s/it] {'loss': 0.593, 'learning_rate': 4.6115e-05, 'epoch': 1.02} 8%|▊ | 780/10000 [3:01:47<35:36:14, 13.90s/it] 8%|▊ | 781/10000 [3:02:01<35:43:06, 13.95s/it] {'loss': 0.5065, 'learning_rate': 4.6110000000000004e-05, 'epoch': 1.02} 8%|▊ | 781/10000 [3:02:01<35:43:06, 13.95s/it] 8%|▊ | 782/10000 [3:02:15<35:46:35, 13.97s/it] {'loss': 0.5892, 'learning_rate': 4.610500000000001e-05, 'epoch': 1.02} 8%|▊ | 782/10000 [3:02:15<35:46:35, 13.97s/it] 8%|▊ | 783/10000 [3:02:29<35:49:17, 13.99s/it] {'loss': 0.5957, 'learning_rate': 4.61e-05, 'epoch': 1.02} 8%|▊ | 783/10000 [3:02:29<35:49:17, 13.99s/it] 8%|▊ | 784/10000 [3:02:43<35:46:06, 13.97s/it] {'loss': 0.6689, 'learning_rate': 4.6095000000000005e-05, 'epoch': 1.03} 8%|▊ | 784/10000 [3:02:43<35:46:06, 13.97s/it] 8%|▊ | 785/10000 [3:02:57<35:40:45, 13.94s/it] {'loss': 0.5482, 'learning_rate': 4.609e-05, 'epoch': 1.03} 8%|▊ | 785/10000 [3:02:57<35:40:45, 13.94s/it] 8%|▊ | 786/10000 [3:03:10<35:38:21, 13.92s/it] {'loss': 0.6407, 'learning_rate': 4.6085000000000003e-05, 'epoch': 1.03} 8%|▊ | 786/10000 [3:03:10<35:38:21, 13.92s/it] 8%|▊ | 787/10000 [3:03:24<35:38:38, 13.93s/it] {'loss': 0.5532, 'learning_rate': 4.608e-05, 'epoch': 1.03} 8%|▊ | 787/10000 [3:03:24<35:38:38, 13.93s/it] 8%|▊ | 788/10000 [3:03:38<35:39:53, 13.94s/it] {'loss': 0.5516, 'learning_rate': 4.6075e-05, 'epoch': 1.03} 8%|▊ | 788/10000 [3:03:38<35:39:53, 13.94s/it] 8%|▊ | 789/10000 [3:03:52<35:38:25, 13.93s/it] {'loss': 0.5754, 'learning_rate': 4.6070000000000004e-05, 'epoch': 1.03} 8%|▊ | 789/10000 [3:03:52<35:38:25, 13.93s/it] 8%|▊ | 790/10000 [3:04:06<35:35:02, 13.91s/it] {'loss': 0.6982, 'learning_rate': 4.6065e-05, 'epoch': 1.03} 8%|▊ | 790/10000 [3:04:06<35:35:02, 13.91s/it] 8%|▊ | 791/10000 [3:04:20<35:34:46, 13.91s/it] {'loss': 0.5117, 'learning_rate': 4.606e-05, 'epoch': 1.04} 8%|▊ | 791/10000 [3:04:20<35:34:46, 13.91s/it] 8%|▊ | 792/10000 [3:04:34<35:40:30, 13.95s/it] {'loss': 0.5348, 'learning_rate': 4.6055000000000005e-05, 'epoch': 1.04} 8%|▊ | 792/10000 [3:04:34<35:40:30, 13.95s/it] 8%|▊ | 793/10000 [3:04:48<35:36:40, 13.92s/it] {'loss': 0.5244, 'learning_rate': 4.605e-05, 'epoch': 1.04} 8%|▊ | 793/10000 [3:04:48<35:36:40, 13.92s/it] 8%|▊ | 794/10000 [3:05:02<35:44:09, 13.97s/it] {'loss': 0.5124, 'learning_rate': 4.6045000000000004e-05, 'epoch': 1.04} 8%|▊ | 794/10000 [3:05:02<35:44:09, 13.97s/it] 8%|▊ | 795/10000 [3:05:16<35:42:52, 13.97s/it] {'loss': 0.4848, 'learning_rate': 4.604e-05, 'epoch': 1.04} 8%|▊ | 795/10000 [3:05:16<35:42:52, 13.97s/it] 8%|▊ | 796/10000 [3:05:30<35:41:55, 13.96s/it] {'loss': 0.5509, 'learning_rate': 4.6035e-05, 'epoch': 1.04} 8%|▊ | 796/10000 [3:05:30<35:41:55, 13.96s/it] 8%|▊ | 797/10000 [3:05:44<35:42:28, 13.97s/it] {'loss': 0.5407, 'learning_rate': 4.603e-05, 'epoch': 1.04} 8%|▊ | 797/10000 [3:05:44<35:42:28, 13.97s/it] 8%|▊ | 798/10000 [3:05:58<35:44:39, 13.98s/it] {'loss': 0.6558, 'learning_rate': 4.6025e-05, 'epoch': 1.04} 8%|▊ | 798/10000 [3:05:58<35:44:39, 13.98s/it] 8%|▊ | 799/10000 [3:06:12<35:49:25, 14.02s/it] {'loss': 0.6191, 'learning_rate': 4.602e-05, 'epoch': 1.05} 8%|▊ | 799/10000 [3:06:12<35:49:25, 14.02s/it] 8%|▊ | 800/10000 [3:06:26<35:46:10, 14.00s/it] {'loss': 0.4214, 'learning_rate': 4.6015000000000006e-05, 'epoch': 1.05} 8%|▊ | 800/10000 [3:06:26<35:46:10, 14.00s/it] 8%|▊ | 801/10000 [3:06:40<35:46:05, 14.00s/it] {'loss': 0.6567, 'learning_rate': 4.601e-05, 'epoch': 1.05} 8%|▊ | 801/10000 [3:06:40<35:46:05, 14.00s/it] 8%|▊ | 802/10000 [3:06:54<35:52:33, 14.04s/it] {'loss': 0.7937, 'learning_rate': 4.6005000000000004e-05, 'epoch': 1.05} 8%|▊ | 802/10000 [3:06:54<35:52:33, 14.04s/it] 8%|▊ | 803/10000 [3:07:08<35:45:51, 14.00s/it] {'loss': 0.5406, 'learning_rate': 4.600000000000001e-05, 'epoch': 1.05} 8%|▊ | 803/10000 [3:07:08<35:45:51, 14.00s/it] 8%|▊ | 804/10000 [3:07:22<35:43:07, 13.98s/it] {'loss': 0.6405, 'learning_rate': 4.5995e-05, 'epoch': 1.05} 8%|▊ | 804/10000 [3:07:22<35:43:07, 13.98s/it] 8%|▊ | 805/10000 [3:07:36<35:43:58, 13.99s/it] {'loss': 0.6404, 'learning_rate': 4.599e-05, 'epoch': 1.05} 8%|▊ | 805/10000 [3:07:36<35:43:58, 13.99s/it] 8%|▊ | 806/10000 [3:07:50<35:37:24, 13.95s/it] {'loss': 0.522, 'learning_rate': 4.5985e-05, 'epoch': 1.05} 8%|▊ | 806/10000 [3:07:50<35:37:24, 13.95s/it] 8%|▊ | 807/10000 [3:08:04<35:39:09, 13.96s/it] {'loss': 0.4715, 'learning_rate': 4.5980000000000004e-05, 'epoch': 1.06} 8%|▊ | 807/10000 [3:08:04<35:39:09, 13.96s/it] 8%|▊ | 808/10000 [3:08:18<35:32:46, 13.92s/it] {'loss': 0.6197, 'learning_rate': 4.5975e-05, 'epoch': 1.06} 8%|▊ | 808/10000 [3:08:18<35:32:46, 13.92s/it] 8%|▊ | 809/10000 [3:08:32<35:32:51, 13.92s/it] {'loss': 0.4675, 'learning_rate': 4.597e-05, 'epoch': 1.06} 8%|▊ | 809/10000 [3:08:32<35:32:51, 13.92s/it] 8%|▊ | 810/10000 [3:08:46<35:40:21, 13.97s/it] {'loss': 0.6953, 'learning_rate': 4.5965000000000005e-05, 'epoch': 1.06} 8%|▊ | 810/10000 [3:08:46<35:40:21, 13.97s/it] 8%|▊ | 811/10000 [3:09:00<35:53:11, 14.06s/it] {'loss': 0.4702, 'learning_rate': 4.596e-05, 'epoch': 1.06} 8%|▊ | 811/10000 [3:09:00<35:53:11, 14.06s/it] 8%|▊ | 812/10000 [3:09:14<35:44:01, 14.00s/it] {'loss': 0.64, 'learning_rate': 4.5955e-05, 'epoch': 1.06} 8%|▊ | 812/10000 [3:09:14<35:44:01, 14.00s/it] 8%|▊ | 813/10000 [3:09:28<35:48:33, 14.03s/it] {'loss': 0.7632, 'learning_rate': 4.5950000000000006e-05, 'epoch': 1.06} 8%|▊ | 813/10000 [3:09:28<35:48:33, 14.03s/it] 8%|▊ | 814/10000 [3:09:42<35:44:32, 14.01s/it] {'loss': 0.6144, 'learning_rate': 4.5945e-05, 'epoch': 1.07} 8%|▊ | 814/10000 [3:09:42<35:44:32, 14.01s/it] 8%|▊ | 815/10000 [3:09:56<35:42:35, 14.00s/it] {'loss': 0.5461, 'learning_rate': 4.594e-05, 'epoch': 1.07} 8%|▊ | 815/10000 [3:09:56<35:42:35, 14.00s/it] 8%|▊ | 816/10000 [3:10:10<35:37:52, 13.97s/it] {'loss': 0.7641, 'learning_rate': 4.5935e-05, 'epoch': 1.07} 8%|▊ | 816/10000 [3:10:10<35:37:52, 13.97s/it] 8%|▊ | 817/10000 [3:10:24<35:38:52, 13.98s/it] {'loss': 0.6639, 'learning_rate': 4.593e-05, 'epoch': 1.07} 8%|▊ | 817/10000 [3:10:24<35:38:52, 13.98s/it] 8%|▊ | 818/10000 [3:10:38<35:36:38, 13.96s/it] {'loss': 0.9299, 'learning_rate': 4.5925e-05, 'epoch': 1.07} 8%|▊ | 818/10000 [3:10:38<35:36:38, 13.96s/it] 8%|▊ | 819/10000 [3:10:52<35:44:16, 14.01s/it] {'loss': 0.5727, 'learning_rate': 4.592e-05, 'epoch': 1.07} 8%|▊ | 819/10000 [3:10:52<35:44:16, 14.01s/it] 8%|▊ | 820/10000 [3:11:06<35:38:36, 13.98s/it] {'loss': 0.6164, 'learning_rate': 4.5915000000000003e-05, 'epoch': 1.07} 8%|▊ | 820/10000 [3:11:06<35:38:36, 13.98s/it] 8%|▊ | 821/10000 [3:11:20<35:33:52, 13.95s/it] {'loss': 0.6412, 'learning_rate': 4.5910000000000006e-05, 'epoch': 1.07} 8%|▊ | 821/10000 [3:11:20<35:33:52, 13.95s/it] 8%|▊ | 822/10000 [3:11:34<35:37:17, 13.97s/it] {'loss': 0.6242, 'learning_rate': 4.5905e-05, 'epoch': 1.08} 8%|▊ | 822/10000 [3:11:34<35:37:17, 13.97s/it] 8%|▊ | 823/10000 [3:11:48<35:40:45, 14.00s/it] {'loss': 0.7899, 'learning_rate': 4.5900000000000004e-05, 'epoch': 1.08} 8%|▊ | 823/10000 [3:11:48<35:40:45, 14.00s/it] 8%|▊ | 824/10000 [3:12:02<35:37:31, 13.98s/it] {'loss': 0.5628, 'learning_rate': 4.589500000000001e-05, 'epoch': 1.08} 8%|▊ | 824/10000 [3:12:02<35:37:31, 13.98s/it] 8%|▊ | 825/10000 [3:12:15<35:35:37, 13.97s/it] {'loss': 0.6942, 'learning_rate': 4.589e-05, 'epoch': 1.08} 8%|▊ | 825/10000 [3:12:16<35:35:37, 13.97s/it] 8%|▊ | 826/10000 [3:12:29<35:38:02, 13.98s/it] {'loss': 0.6749, 'learning_rate': 4.5885e-05, 'epoch': 1.08} 8%|▊ | 826/10000 [3:12:30<35:38:02, 13.98s/it] 8%|▊ | 827/10000 [3:12:43<35:35:27, 13.97s/it] {'loss': 0.5475, 'learning_rate': 4.588e-05, 'epoch': 1.08} 8%|▊ | 827/10000 [3:12:43<35:35:27, 13.97s/it] 8%|▊ | 828/10000 [3:12:57<35:33:48, 13.96s/it] {'loss': 0.5528, 'learning_rate': 4.5875000000000004e-05, 'epoch': 1.08} 8%|▊ | 828/10000 [3:12:57<35:33:48, 13.96s/it] 8%|▊ | 829/10000 [3:13:11<35:37:17, 13.98s/it] {'loss': 0.7676, 'learning_rate': 4.587e-05, 'epoch': 1.09} 8%|▊ | 829/10000 [3:13:11<35:37:17, 13.98s/it] 8%|▊ | 830/10000 [3:13:25<35:31:34, 13.95s/it] {'loss': 0.6449, 'learning_rate': 4.5865e-05, 'epoch': 1.09} 8%|▊ | 830/10000 [3:13:25<35:31:34, 13.95s/it] 8%|▊ | 831/10000 [3:13:39<35:28:59, 13.93s/it] {'loss': 0.5352, 'learning_rate': 4.5860000000000005e-05, 'epoch': 1.09} 8%|▊ | 831/10000 [3:13:39<35:28:59, 13.93s/it] 8%|▊ | 832/10000 [3:13:53<35:33:13, 13.96s/it] {'loss': 0.6775, 'learning_rate': 4.5855e-05, 'epoch': 1.09} 8%|▊ | 832/10000 [3:13:53<35:33:13, 13.96s/it] 8%|▊ | 833/10000 [3:14:07<35:28:30, 13.93s/it] {'loss': 0.5045, 'learning_rate': 4.585e-05, 'epoch': 1.09} 8%|▊ | 833/10000 [3:14:07<35:28:30, 13.93s/it] 8%|▊ | 834/10000 [3:14:21<35:33:45, 13.97s/it] {'loss': 0.6284, 'learning_rate': 4.5845000000000006e-05, 'epoch': 1.09} 8%|▊ | 834/10000 [3:14:21<35:33:45, 13.97s/it] 8%|▊ | 835/10000 [3:14:35<35:33:11, 13.97s/it] {'loss': 0.5957, 'learning_rate': 4.584e-05, 'epoch': 1.09} 8%|▊ | 835/10000 [3:14:35<35:33:11, 13.97s/it] 8%|▊ | 836/10000 [3:14:49<35:28:57, 13.94s/it] {'loss': 0.5215, 'learning_rate': 4.5835e-05, 'epoch': 1.09} 8%|▊ | 836/10000 [3:14:49<35:28:57, 13.94s/it] 8%|▊ | 837/10000 [3:15:03<35:30:45, 13.95s/it] {'loss': 0.6045, 'learning_rate': 4.583e-05, 'epoch': 1.1} 8%|▊ | 837/10000 [3:15:03<35:30:45, 13.95s/it] 8%|▊ | 838/10000 [3:15:17<35:30:06, 13.95s/it] {'loss': 0.5606, 'learning_rate': 4.5825e-05, 'epoch': 1.1} 8%|▊ | 838/10000 [3:15:17<35:30:06, 13.95s/it] 8%|▊ | 839/10000 [3:15:31<35:28:49, 13.94s/it] {'loss': 0.647, 'learning_rate': 4.5820000000000005e-05, 'epoch': 1.1} 8%|▊ | 839/10000 [3:15:31<35:28:49, 13.94s/it] 8%|▊ | 840/10000 [3:15:45<35:31:35, 13.96s/it] {'loss': 0.6046, 'learning_rate': 4.5815e-05, 'epoch': 1.1} 8%|▊ | 840/10000 [3:15:45<35:31:35, 13.96s/it] 8%|▊ | 841/10000 [3:15:59<35:30:44, 13.96s/it] {'loss': 0.73, 'learning_rate': 4.5810000000000004e-05, 'epoch': 1.1} 8%|▊ | 841/10000 [3:15:59<35:30:44, 13.96s/it] 8%|▊ | 842/10000 [3:16:13<35:29:08, 13.95s/it] {'loss': 0.7749, 'learning_rate': 4.5805000000000006e-05, 'epoch': 1.1} 8%|▊ | 842/10000 [3:16:13<35:29:08, 13.95s/it] 8%|▊ | 843/10000 [3:16:27<35:25:10, 13.92s/it] {'loss': 0.6999, 'learning_rate': 4.58e-05, 'epoch': 1.1} 8%|▊ | 843/10000 [3:16:27<35:25:10, 13.92s/it] 8%|▊ | 844/10000 [3:16:40<35:23:36, 13.92s/it] {'loss': 0.6506, 'learning_rate': 4.5795000000000005e-05, 'epoch': 1.1} 8%|▊ | 844/10000 [3:16:40<35:23:36, 13.92s/it] 8%|▊ | 845/10000 [3:16:54<35:23:38, 13.92s/it] {'loss': 0.6221, 'learning_rate': 4.579e-05, 'epoch': 1.11} 8%|▊ | 845/10000 [3:16:54<35:23:38, 13.92s/it] 8%|▊ | 846/10000 [3:17:08<35:22:08, 13.91s/it] {'loss': 0.5783, 'learning_rate': 4.5785e-05, 'epoch': 1.11} 8%|▊ | 846/10000 [3:17:08<35:22:08, 13.91s/it] 8%|▊ | 847/10000 [3:17:22<35:23:50, 13.92s/it] {'loss': 0.5429, 'learning_rate': 4.578e-05, 'epoch': 1.11} 8%|▊ | 847/10000 [3:17:22<35:23:50, 13.92s/it] 8%|▊ | 848/10000 [3:17:36<35:24:17, 13.93s/it] {'loss': 0.5391, 'learning_rate': 4.5775e-05, 'epoch': 1.11} 8%|▊ | 848/10000 [3:17:36<35:24:17, 13.93s/it] 8%|▊ | 849/10000 [3:17:50<35:19:57, 13.90s/it] {'loss': 0.5789, 'learning_rate': 4.5770000000000004e-05, 'epoch': 1.11} 8%|▊ | 849/10000 [3:17:50<35:19:57, 13.90s/it] 8%|▊ | 850/10000 [3:18:04<35:21:47, 13.91s/it] {'loss': 0.6742, 'learning_rate': 4.5765e-05, 'epoch': 1.11} 8%|▊ | 850/10000 [3:18:04<35:21:47, 13.91s/it] 9%|▊ | 851/10000 [3:18:18<35:26:28, 13.95s/it] {'loss': 0.5768, 'learning_rate': 4.576e-05, 'epoch': 1.11} 9%|▊ | 851/10000 [3:18:18<35:26:28, 13.95s/it] 9%|▊ | 852/10000 [3:18:32<35:24:36, 13.93s/it] {'loss': 0.6935, 'learning_rate': 4.5755000000000005e-05, 'epoch': 1.12} 9%|▊ | 852/10000 [3:18:32<35:24:36, 13.93s/it] 9%|▊ | 853/10000 [3:18:46<35:21:55, 13.92s/it] {'loss': 0.7472, 'learning_rate': 4.575e-05, 'epoch': 1.12} 9%|▊ | 853/10000 [3:18:46<35:21:55, 13.92s/it] 9%|▊ | 854/10000 [3:19:00<35:26:02, 13.95s/it] {'loss': 0.5289, 'learning_rate': 4.5745e-05, 'epoch': 1.12} 9%|▊ | 854/10000 [3:19:00<35:26:02, 13.95s/it] 9%|▊ | 855/10000 [3:19:14<35:29:48, 13.97s/it] {'loss': 0.636, 'learning_rate': 4.574e-05, 'epoch': 1.12} 9%|▊ | 855/10000 [3:19:14<35:29:48, 13.97s/it] 9%|▊ | 856/10000 [3:19:28<35:24:30, 13.94s/it] {'loss': 0.6183, 'learning_rate': 4.5735e-05, 'epoch': 1.12} 9%|▊ | 856/10000 [3:19:28<35:24:30, 13.94s/it] 9%|▊ | 857/10000 [3:19:42<35:25:15, 13.95s/it] {'loss': 0.6353, 'learning_rate': 4.573e-05, 'epoch': 1.12} 9%|▊ | 857/10000 [3:19:42<35:25:15, 13.95s/it] 9%|▊ | 858/10000 [3:19:56<35:25:25, 13.95s/it] {'loss': 0.5963, 'learning_rate': 4.5725e-05, 'epoch': 1.12} 9%|▊ | 858/10000 [3:19:56<35:25:25, 13.95s/it] 9%|▊ | 859/10000 [3:20:10<35:25:40, 13.95s/it] {'loss': 0.5858, 'learning_rate': 4.572e-05, 'epoch': 1.12} 9%|▊ | 859/10000 [3:20:10<35:25:40, 13.95s/it] 9%|▊ | 860/10000 [3:20:23<35:25:14, 13.95s/it] {'loss': 0.6064, 'learning_rate': 4.5715000000000005e-05, 'epoch': 1.13} 9%|▊ | 860/10000 [3:20:24<35:25:14, 13.95s/it] 9%|▊ | 861/10000 [3:20:37<35:21:25, 13.93s/it] {'loss': 0.6058, 'learning_rate': 4.571e-05, 'epoch': 1.13} 9%|▊ | 861/10000 [3:20:37<35:21:25, 13.93s/it] 9%|▊ | 862/10000 [3:20:51<35:22:37, 13.94s/it] {'loss': 0.5529, 'learning_rate': 4.5705000000000004e-05, 'epoch': 1.13} 9%|▊ | 862/10000 [3:20:51<35:22:37, 13.94s/it] 9%|▊ | 863/10000 [3:21:05<35:20:03, 13.92s/it] {'loss': 0.5465, 'learning_rate': 4.5700000000000006e-05, 'epoch': 1.13} 9%|▊ | 863/10000 [3:21:05<35:20:03, 13.92s/it] 9%|▊ | 864/10000 [3:21:19<35:19:05, 13.92s/it] {'loss': 0.5659, 'learning_rate': 4.5695e-05, 'epoch': 1.13} 9%|▊ | 864/10000 [3:21:19<35:19:05, 13.92s/it] 9%|▊ | 865/10000 [3:21:33<35:15:51, 13.90s/it] {'loss': 0.5788, 'learning_rate': 4.569e-05, 'epoch': 1.13} 9%|▊ | 865/10000 [3:21:33<35:15:51, 13.90s/it] 9%|▊ | 866/10000 [3:21:47<35:21:29, 13.94s/it] {'loss': 0.7236, 'learning_rate': 4.5685e-05, 'epoch': 1.13} 9%|▊ | 866/10000 [3:21:47<35:21:29, 13.94s/it] 9%|▊ | 867/10000 [3:22:01<35:22:43, 13.95s/it] {'loss': 0.4999, 'learning_rate': 4.568e-05, 'epoch': 1.13} 9%|▊ | 867/10000 [3:22:01<35:22:43, 13.95s/it] 9%|▊ | 868/10000 [3:22:15<35:22:01, 13.94s/it] {'loss': 0.5608, 'learning_rate': 4.5675e-05, 'epoch': 1.14} 9%|▊ | 868/10000 [3:22:15<35:22:01, 13.94s/it] 9%|▊ | 869/10000 [3:22:29<35:22:39, 13.95s/it] {'loss': 0.5639, 'learning_rate': 4.567e-05, 'epoch': 1.14} 9%|▊ | 869/10000 [3:22:29<35:22:39, 13.95s/it] 9%|▊ | 870/10000 [3:22:43<35:26:12, 13.97s/it] {'loss': 0.7227, 'learning_rate': 4.5665000000000004e-05, 'epoch': 1.14} 9%|▊ | 870/10000 [3:22:43<35:26:12, 13.97s/it] 9%|▊ | 871/10000 [3:22:57<35:22:50, 13.95s/it] {'loss': 0.6908, 'learning_rate': 4.566e-05, 'epoch': 1.14} 9%|▊ | 871/10000 [3:22:57<35:22:50, 13.95s/it] 9%|▊ | 872/10000 [3:23:11<35:19:29, 13.93s/it] {'loss': 0.6038, 'learning_rate': 4.5655e-05, 'epoch': 1.14} 9%|▊ | 872/10000 [3:23:11<35:19:29, 13.93s/it] 9%|▊ | 873/10000 [3:23:25<35:24:54, 13.97s/it] {'loss': 0.6251, 'learning_rate': 4.5650000000000005e-05, 'epoch': 1.14} 9%|▊ | 873/10000 [3:23:25<35:24:54, 13.97s/it] 9%|▊ | 874/10000 [3:23:39<35:18:26, 13.93s/it] {'loss': 0.7227, 'learning_rate': 4.564500000000001e-05, 'epoch': 1.14} 9%|▊ | 874/10000 [3:23:39<35:18:26, 13.93s/it] 9%|▉ | 875/10000 [3:23:53<35:20:34, 13.94s/it] {'loss': 0.6878, 'learning_rate': 4.564e-05, 'epoch': 1.15} 9%|▉ | 875/10000 [3:23:53<35:20:34, 13.94s/it] 9%|▉ | 876/10000 [3:24:06<35:18:52, 13.93s/it] {'loss': 0.6429, 'learning_rate': 4.5635e-05, 'epoch': 1.15} 9%|▉ | 876/10000 [3:24:06<35:18:52, 13.93s/it] 9%|▉ | 877/10000 [3:24:21<35:28:44, 14.00s/it] {'loss': 0.5482, 'learning_rate': 4.563e-05, 'epoch': 1.15} 9%|▉ | 877/10000 [3:24:21<35:28:44, 14.00s/it] 9%|▉ | 878/10000 [3:24:35<35:25:49, 13.98s/it] {'loss': 0.5715, 'learning_rate': 4.5625e-05, 'epoch': 1.15} 9%|▉ | 878/10000 [3:24:35<35:25:49, 13.98s/it] 9%|▉ | 879/10000 [3:24:49<35:26:22, 13.99s/it] {'loss': 0.6107, 'learning_rate': 4.562e-05, 'epoch': 1.15} 9%|▉ | 879/10000 [3:24:49<35:26:22, 13.99s/it] 9%|▉ | 880/10000 [3:25:02<35:18:01, 13.93s/it] {'loss': 0.6337, 'learning_rate': 4.5615e-05, 'epoch': 1.15} 9%|▉ | 880/10000 [3:25:02<35:18:01, 13.93s/it] 9%|▉ | 881/10000 [3:25:16<35:21:26, 13.96s/it] {'loss': 0.5822, 'learning_rate': 4.5610000000000005e-05, 'epoch': 1.15} 9%|▉ | 881/10000 [3:25:16<35:21:26, 13.96s/it] 9%|▉ | 882/10000 [3:25:30<35:23:07, 13.97s/it] {'loss': 0.6877, 'learning_rate': 4.5605e-05, 'epoch': 1.15} 9%|▉ | 882/10000 [3:25:30<35:23:07, 13.97s/it] 9%|▉ | 883/10000 [3:25:44<35:23:47, 13.98s/it] {'loss': 0.5343, 'learning_rate': 4.5600000000000004e-05, 'epoch': 1.16} 9%|▉ | 883/10000 [3:25:44<35:23:47, 13.98s/it] 9%|▉ | 884/10000 [3:25:58<35:19:32, 13.95s/it] {'loss': 0.6461, 'learning_rate': 4.5595000000000006e-05, 'epoch': 1.16} 9%|▉ | 884/10000 [3:25:58<35:19:32, 13.95s/it] 9%|▉ | 885/10000 [3:26:12<35:20:07, 13.96s/it] {'loss': 0.5533, 'learning_rate': 4.559e-05, 'epoch': 1.16} 9%|▉ | 885/10000 [3:26:12<35:20:07, 13.96s/it] 9%|▉ | 886/10000 [3:26:26<35:23:19, 13.98s/it] {'loss': 0.623, 'learning_rate': 4.5585e-05, 'epoch': 1.16} 9%|▉ | 886/10000 [3:26:26<35:23:19, 13.98s/it] 9%|▉ | 887/10000 [3:26:40<35:21:21, 13.97s/it] {'loss': 0.6016, 'learning_rate': 4.558e-05, 'epoch': 1.16} 9%|▉ | 887/10000 [3:26:40<35:21:21, 13.97s/it] 9%|▉ | 888/10000 [3:26:54<35:17:31, 13.94s/it] {'loss': 0.4784, 'learning_rate': 4.5575e-05, 'epoch': 1.16} 9%|▉ | 888/10000 [3:26:54<35:17:31, 13.94s/it] 9%|▉ | 889/10000 [3:27:08<35:16:03, 13.94s/it] {'loss': 0.5117, 'learning_rate': 4.557e-05, 'epoch': 1.16} 9%|▉ | 889/10000 [3:27:08<35:16:03, 13.94s/it] 9%|▉ | 890/10000 [3:27:22<35:20:24, 13.97s/it] {'loss': 0.5142, 'learning_rate': 4.5565e-05, 'epoch': 1.16} 9%|▉ | 890/10000 [3:27:22<35:20:24, 13.97s/it] 9%|▉ | 891/10000 [3:27:36<35:17:35, 13.95s/it] {'loss': 0.5544, 'learning_rate': 4.5560000000000004e-05, 'epoch': 1.17} 9%|▉ | 891/10000 [3:27:36<35:17:35, 13.95s/it] 9%|▉ | 892/10000 [3:27:50<35:18:02, 13.95s/it] {'loss': 0.5979, 'learning_rate': 4.5555e-05, 'epoch': 1.17} 9%|▉ | 892/10000 [3:27:50<35:18:02, 13.95s/it] 9%|▉ | 893/10000 [3:28:04<35:14:26, 13.93s/it] {'loss': 0.6488, 'learning_rate': 4.555e-05, 'epoch': 1.17} 9%|▉ | 893/10000 [3:28:04<35:14:26, 13.93s/it] 9%|▉ | 894/10000 [3:28:18<35:25:19, 14.00s/it] {'loss': 0.589, 'learning_rate': 4.5545000000000005e-05, 'epoch': 1.17} 9%|▉ | 894/10000 [3:28:18<35:25:19, 14.00s/it] 9%|▉ | 895/10000 [3:28:32<35:16:41, 13.95s/it] {'loss': 0.5574, 'learning_rate': 4.554000000000001e-05, 'epoch': 1.17} 9%|▉ | 895/10000 [3:28:32<35:16:41, 13.95s/it] 9%|▉ | 896/10000 [3:28:46<35:18:45, 13.96s/it] {'loss': 0.5755, 'learning_rate': 4.5535e-05, 'epoch': 1.17} 9%|▉ | 896/10000 [3:28:46<35:18:45, 13.96s/it] 9%|▉ | 897/10000 [3:29:00<35:18:32, 13.96s/it] {'loss': 0.578, 'learning_rate': 4.553e-05, 'epoch': 1.17} 9%|▉ | 897/10000 [3:29:00<35:18:32, 13.96s/it] 9%|▉ | 898/10000 [3:29:14<35:27:57, 14.03s/it] {'loss': 0.6892, 'learning_rate': 4.5525e-05, 'epoch': 1.18} 9%|▉ | 898/10000 [3:29:14<35:27:57, 14.03s/it] 9%|▉ | 899/10000 [3:29:28<35:22:07, 13.99s/it] {'loss': 0.6012, 'learning_rate': 4.5520000000000005e-05, 'epoch': 1.18} 9%|▉ | 899/10000 [3:29:28<35:22:07, 13.99s/it] 9%|▉ | 900/10000 [3:29:42<35:19:56, 13.98s/it] {'loss': 0.5878, 'learning_rate': 4.5515e-05, 'epoch': 1.18} 9%|▉ | 900/10000 [3:29:42<35:19:56, 13.98s/it] 9%|▉ | 901/10000 [3:29:56<35:25:17, 14.01s/it] {'loss': 0.6805, 'learning_rate': 4.551e-05, 'epoch': 1.18} 9%|▉ | 901/10000 [3:29:56<35:25:17, 14.01s/it] 9%|▉ | 902/10000 [3:30:10<35:31:52, 14.06s/it] {'loss': 0.5334, 'learning_rate': 4.5505000000000006e-05, 'epoch': 1.18} 9%|▉ | 902/10000 [3:30:10<35:31:52, 14.06s/it] 9%|▉ | 903/10000 [3:30:24<35:26:06, 14.02s/it] {'loss': 0.5013, 'learning_rate': 4.55e-05, 'epoch': 1.18} 9%|▉ | 903/10000 [3:30:24<35:26:06, 14.02s/it] 9%|▉ | 904/10000 [3:30:38<35:22:08, 14.00s/it] {'loss': 0.8237, 'learning_rate': 4.5495000000000004e-05, 'epoch': 1.18} 9%|▉ | 904/10000 [3:30:38<35:22:08, 14.00s/it] 9%|▉ | 905/10000 [3:30:52<35:15:09, 13.95s/it] {'loss': 0.6894, 'learning_rate': 4.549000000000001e-05, 'epoch': 1.18} 9%|▉ | 905/10000 [3:30:52<35:15:09, 13.95s/it] 9%|▉ | 906/10000 [3:31:06<35:21:44, 14.00s/it] {'loss': 0.5776, 'learning_rate': 4.5485e-05, 'epoch': 1.19} 9%|▉ | 906/10000 [3:31:06<35:21:44, 14.00s/it] 9%|▉ | 907/10000 [3:31:20<35:17:42, 13.97s/it] {'loss': 0.7534, 'learning_rate': 4.548e-05, 'epoch': 1.19} 9%|▉ | 907/10000 [3:31:20<35:17:42, 13.97s/it] 9%|▉ | 908/10000 [3:31:34<35:18:48, 13.98s/it] {'loss': 0.8477, 'learning_rate': 4.5475e-05, 'epoch': 1.19} 9%|▉ | 908/10000 [3:31:34<35:18:48, 13.98s/it] 9%|▉ | 909/10000 [3:31:48<35:19:37, 13.99s/it] {'loss': 0.5592, 'learning_rate': 4.5470000000000003e-05, 'epoch': 1.19} 9%|▉ | 909/10000 [3:31:48<35:19:37, 13.99s/it] 9%|▉ | 910/10000 [3:32:02<35:13:51, 13.95s/it] {'loss': 0.5704, 'learning_rate': 4.5465e-05, 'epoch': 1.19} 9%|▉ | 910/10000 [3:32:02<35:13:51, 13.95s/it] 9%|▉ | 911/10000 [3:32:16<35:18:18, 13.98s/it] {'loss': 0.5608, 'learning_rate': 4.546e-05, 'epoch': 1.19} 9%|▉ | 911/10000 [3:32:16<35:18:18, 13.98s/it] 9%|▉ | 912/10000 [3:32:30<35:18:51, 13.99s/it] {'loss': 0.7286, 'learning_rate': 4.5455000000000004e-05, 'epoch': 1.19} 9%|▉ | 912/10000 [3:32:30<35:18:51, 13.99s/it] 9%|▉ | 913/10000 [3:32:44<35:26:43, 14.04s/it] {'loss': 0.5547, 'learning_rate': 4.545000000000001e-05, 'epoch': 1.2} 9%|▉ | 913/10000 [3:32:44<35:26:43, 14.04s/it] 9%|▉ | 914/10000 [3:32:58<35:20:33, 14.00s/it] {'loss': 0.6314, 'learning_rate': 4.5445e-05, 'epoch': 1.2} 9%|▉ | 914/10000 [3:32:58<35:20:33, 14.00s/it] 9%|▉ | 915/10000 [3:33:12<35:19:56, 14.00s/it] {'loss': 0.5182, 'learning_rate': 4.5440000000000005e-05, 'epoch': 1.2} 9%|▉ | 915/10000 [3:33:12<35:19:56, 14.00s/it] 9%|▉ | 916/10000 [3:33:26<35:19:57, 14.00s/it] {'loss': 0.6636, 'learning_rate': 4.5435e-05, 'epoch': 1.2} 9%|▉ | 916/10000 [3:33:26<35:19:57, 14.00s/it] 9%|▉ | 917/10000 [3:33:40<35:19:45, 14.00s/it] {'loss': 0.564, 'learning_rate': 4.543e-05, 'epoch': 1.2} 9%|▉ | 917/10000 [3:33:40<35:19:45, 14.00s/it] 9%|▉ | 918/10000 [3:33:54<35:17:03, 13.99s/it] {'loss': 0.5926, 'learning_rate': 4.5425e-05, 'epoch': 1.2} 9%|▉ | 918/10000 [3:33:54<35:17:03, 13.99s/it] 9%|▉ | 919/10000 [3:34:08<35:19:08, 14.00s/it] {'loss': 0.5602, 'learning_rate': 4.542e-05, 'epoch': 1.2} 9%|▉ | 919/10000 [3:34:08<35:19:08, 14.00s/it] 9%|▉ | 920/10000 [3:34:22<35:13:07, 13.96s/it] {'loss': 0.7353, 'learning_rate': 4.5415000000000005e-05, 'epoch': 1.2} 9%|▉ | 920/10000 [3:34:22<35:13:07, 13.96s/it] 9%|▉ | 921/10000 [3:34:35<35:05:50, 13.92s/it] {'loss': 0.6411, 'learning_rate': 4.541e-05, 'epoch': 1.21} 9%|▉ | 921/10000 [3:34:36<35:05:50, 13.92s/it] 9%|▉ | 922/10000 [3:34:49<35:05:23, 13.92s/it] {'loss': 0.6486, 'learning_rate': 4.5405e-05, 'epoch': 1.21} 9%|▉ | 922/10000 [3:34:49<35:05:23, 13.92s/it] 9%|▉ | 923/10000 [3:35:03<35:08:49, 13.94s/it] {'loss': 0.7203, 'learning_rate': 4.5400000000000006e-05, 'epoch': 1.21} 9%|▉ | 923/10000 [3:35:03<35:08:49, 13.94s/it] 9%|▉ | 924/10000 [3:35:18<35:17:46, 14.00s/it] {'loss': 0.6559, 'learning_rate': 4.5395e-05, 'epoch': 1.21} 9%|▉ | 924/10000 [3:35:18<35:17:46, 14.00s/it] 9%|▉ | 925/10000 [3:35:32<35:19:01, 14.01s/it] {'loss': 0.6701, 'learning_rate': 4.5390000000000004e-05, 'epoch': 1.21} 9%|▉ | 925/10000 [3:35:32<35:19:01, 14.01s/it] 9%|▉ | 926/10000 [3:35:45<35:14:15, 13.98s/it] {'loss': 0.5261, 'learning_rate': 4.5385e-05, 'epoch': 1.21} 9%|▉ | 926/10000 [3:35:46<35:14:15, 13.98s/it] 9%|▉ | 927/10000 [3:36:00<35:19:25, 14.02s/it] {'loss': 0.6567, 'learning_rate': 4.538e-05, 'epoch': 1.21} 9%|▉ | 927/10000 [3:36:00<35:19:25, 14.02s/it] 9%|▉ | 928/10000 [3:36:14<35:17:23, 14.00s/it] {'loss': 0.7458, 'learning_rate': 4.5375e-05, 'epoch': 1.21} 9%|▉ | 928/10000 [3:36:14<35:17:23, 14.00s/it] 9%|▉ | 929/10000 [3:36:27<35:13:22, 13.98s/it] {'loss': 0.4503, 'learning_rate': 4.537e-05, 'epoch': 1.22} 9%|▉ | 929/10000 [3:36:27<35:13:22, 13.98s/it] 9%|▉ | 930/10000 [3:36:41<35:15:42, 14.00s/it] {'loss': 0.6208, 'learning_rate': 4.5365000000000004e-05, 'epoch': 1.22} 9%|▉ | 930/10000 [3:36:42<35:15:42, 14.00s/it] 9%|▉ | 931/10000 [3:36:55<35:09:47, 13.96s/it] {'loss': 0.6111, 'learning_rate': 4.536e-05, 'epoch': 1.22} 9%|▉ | 931/10000 [3:36:55<35:09:47, 13.96s/it] 9%|▉ | 932/10000 [3:37:09<35:13:52, 13.99s/it] {'loss': 0.8205, 'learning_rate': 4.5355e-05, 'epoch': 1.22} 9%|▉ | 932/10000 [3:37:09<35:13:52, 13.99s/it] 9%|▉ | 933/10000 [3:37:23<35:11:25, 13.97s/it] {'loss': 0.466, 'learning_rate': 4.5350000000000005e-05, 'epoch': 1.22} 9%|▉ | 933/10000 [3:37:23<35:11:25, 13.97s/it] 9%|▉ | 934/10000 [3:37:38<35:20:19, 14.03s/it] {'loss': 0.5165, 'learning_rate': 4.534500000000001e-05, 'epoch': 1.22} 9%|▉ | 934/10000 [3:37:38<35:20:19, 14.03s/it] 9%|▉ | 935/10000 [3:37:52<35:18:06, 14.02s/it] {'loss': 0.6131, 'learning_rate': 4.534e-05, 'epoch': 1.22} 9%|▉ | 935/10000 [3:37:52<35:18:06, 14.02s/it] 9%|▉ | 936/10000 [3:38:05<35:16:10, 14.01s/it] {'loss': 0.5616, 'learning_rate': 4.5335e-05, 'epoch': 1.23} 9%|▉ | 936/10000 [3:38:06<35:16:10, 14.01s/it] 9%|▉ | 937/10000 [3:38:19<35:09:17, 13.96s/it] {'loss': 0.687, 'learning_rate': 4.533e-05, 'epoch': 1.23} 9%|▉ | 937/10000 [3:38:19<35:09:17, 13.96s/it] 9%|▉ | 938/10000 [3:38:33<35:05:01, 13.94s/it] {'loss': 0.5673, 'learning_rate': 4.5325000000000004e-05, 'epoch': 1.23} 9%|▉ | 938/10000 [3:38:33<35:05:01, 13.94s/it] 9%|▉ | 939/10000 [3:38:47<35:02:34, 13.92s/it] {'loss': 0.6733, 'learning_rate': 4.532e-05, 'epoch': 1.23} 9%|▉ | 939/10000 [3:38:47<35:02:34, 13.92s/it] 9%|▉ | 940/10000 [3:39:01<35:15:48, 14.01s/it] {'loss': 0.6051, 'learning_rate': 4.5315e-05, 'epoch': 1.23} 9%|▉ | 940/10000 [3:39:01<35:15:48, 14.01s/it] 9%|▉ | 941/10000 [3:39:15<35:07:43, 13.96s/it] {'loss': 0.6232, 'learning_rate': 4.5310000000000005e-05, 'epoch': 1.23} 9%|▉ | 941/10000 [3:39:15<35:07:43, 13.96s/it] 9%|▉ | 942/10000 [3:39:29<35:07:45, 13.96s/it] {'loss': 0.6182, 'learning_rate': 4.5305e-05, 'epoch': 1.23} 9%|▉ | 942/10000 [3:39:29<35:07:45, 13.96s/it] 9%|▉ | 943/10000 [3:39:43<35:05:06, 13.95s/it] {'loss': 0.6927, 'learning_rate': 4.53e-05, 'epoch': 1.23} 9%|▉ | 943/10000 [3:39:43<35:05:06, 13.95s/it] 9%|▉ | 944/10000 [3:39:57<35:06:18, 13.96s/it] {'loss': 0.4954, 'learning_rate': 4.5295000000000006e-05, 'epoch': 1.24} 9%|▉ | 944/10000 [3:39:57<35:06:18, 13.96s/it] 9%|▉ | 945/10000 [3:40:11<35:06:01, 13.95s/it] {'loss': 0.5703, 'learning_rate': 4.529e-05, 'epoch': 1.24} 9%|▉ | 945/10000 [3:40:11<35:06:01, 13.95s/it] 9%|▉ | 946/10000 [3:40:25<35:00:35, 13.92s/it] {'loss': 0.6132, 'learning_rate': 4.5285e-05, 'epoch': 1.24} 9%|▉ | 946/10000 [3:40:25<35:00:35, 13.92s/it] 9%|▉ | 947/10000 [3:40:39<35:11:45, 14.00s/it] {'loss': 0.4792, 'learning_rate': 4.528e-05, 'epoch': 1.24} 9%|▉ | 947/10000 [3:40:39<35:11:45, 14.00s/it] 9%|▉ | 948/10000 [3:40:53<35:08:31, 13.98s/it] {'loss': 0.6449, 'learning_rate': 4.5275e-05, 'epoch': 1.24} 9%|▉ | 948/10000 [3:40:53<35:08:31, 13.98s/it] 9%|▉ | 949/10000 [3:41:07<35:07:06, 13.97s/it] {'loss': 0.817, 'learning_rate': 4.527e-05, 'epoch': 1.24} 9%|▉ | 949/10000 [3:41:07<35:07:06, 13.97s/it] 10%|▉ | 950/10000 [3:41:21<35:12:18, 14.00s/it] {'loss': 0.4237, 'learning_rate': 4.5265e-05, 'epoch': 1.24} 10%|▉ | 950/10000 [3:41:21<35:12:18, 14.00s/it] 10%|▉ | 951/10000 [3:41:35<35:06:59, 13.97s/it] {'loss': 0.4814, 'learning_rate': 4.5260000000000004e-05, 'epoch': 1.24} 10%|▉ | 951/10000 [3:41:35<35:06:59, 13.97s/it] 10%|▉ | 952/10000 [3:41:49<35:06:07, 13.97s/it] {'loss': 0.6678, 'learning_rate': 4.5255000000000006e-05, 'epoch': 1.25} 10%|▉ | 952/10000 [3:41:49<35:06:07, 13.97s/it] 10%|▉ | 953/10000 [3:42:03<35:00:13, 13.93s/it] {'loss': 0.5536, 'learning_rate': 4.525e-05, 'epoch': 1.25} 10%|▉ | 953/10000 [3:42:03<35:00:13, 13.93s/it] 10%|▉ | 954/10000 [3:42:17<35:00:24, 13.93s/it] {'loss': 0.5041, 'learning_rate': 4.5245000000000005e-05, 'epoch': 1.25} 10%|▉ | 954/10000 [3:42:17<35:00:24, 13.93s/it] 10%|▉ | 955/10000 [3:42:31<35:05:27, 13.97s/it] {'loss': 0.6548, 'learning_rate': 4.524000000000001e-05, 'epoch': 1.25} 10%|▉ | 955/10000 [3:42:31<35:05:27, 13.97s/it] 10%|▉ | 956/10000 [3:42:45<35:01:30, 13.94s/it] {'loss': 0.8377, 'learning_rate': 4.5234999999999996e-05, 'epoch': 1.25} 10%|▉ | 956/10000 [3:42:45<35:01:30, 13.94s/it] 10%|▉ | 957/10000 [3:42:58<34:59:13, 13.93s/it] {'loss': 0.5742, 'learning_rate': 4.523e-05, 'epoch': 1.25} 10%|▉ | 957/10000 [3:42:58<34:59:13, 13.93s/it] 10%|▉ | 958/10000 [3:43:12<35:00:23, 13.94s/it] {'loss': 0.6514, 'learning_rate': 4.5225e-05, 'epoch': 1.25} 10%|▉ | 958/10000 [3:43:12<35:00:23, 13.94s/it] 10%|▉ | 959/10000 [3:43:26<34:59:11, 13.93s/it] {'loss': 0.5995, 'learning_rate': 4.5220000000000004e-05, 'epoch': 1.26} 10%|▉ | 959/10000 [3:43:26<34:59:11, 13.93s/it] 10%|▉ | 960/10000 [3:43:40<34:54:11, 13.90s/it] {'loss': 0.5399, 'learning_rate': 4.5215e-05, 'epoch': 1.26} 10%|▉ | 960/10000 [3:43:40<34:54:11, 13.90s/it] 10%|▉ | 961/10000 [3:43:54<34:55:10, 13.91s/it] {'loss': 0.5508, 'learning_rate': 4.521e-05, 'epoch': 1.26} 10%|▉ | 961/10000 [3:43:54<34:55:10, 13.91s/it] 10%|▉ | 962/10000 [3:44:08<34:51:06, 13.88s/it] {'loss': 0.7065, 'learning_rate': 4.5205000000000005e-05, 'epoch': 1.26} 10%|▉ | 962/10000 [3:44:08<34:51:06, 13.88s/it] 10%|▉ | 963/10000 [3:44:22<34:51:12, 13.88s/it] {'loss': 0.5331, 'learning_rate': 4.52e-05, 'epoch': 1.26} 10%|▉ | 963/10000 [3:44:22<34:51:12, 13.88s/it] 10%|▉ | 964/10000 [3:44:36<34:47:36, 13.86s/it] {'loss': 0.5958, 'learning_rate': 4.5195000000000004e-05, 'epoch': 1.26} 10%|▉ | 964/10000 [3:44:36<34:47:36, 13.86s/it] 10%|▉ | 965/10000 [3:44:49<34:47:21, 13.86s/it] {'loss': 0.7665, 'learning_rate': 4.5190000000000006e-05, 'epoch': 1.26} 10%|▉ | 965/10000 [3:44:49<34:47:21, 13.86s/it] 10%|▉ | 966/10000 [3:45:03<34:48:09, 13.87s/it] {'loss': 0.6617, 'learning_rate': 4.5185e-05, 'epoch': 1.26} 10%|▉ | 966/10000 [3:45:03<34:48:09, 13.87s/it] 10%|▉ | 967/10000 [3:45:17<34:47:21, 13.86s/it] {'loss': 0.6051, 'learning_rate': 4.518e-05, 'epoch': 1.27} 10%|▉ | 967/10000 [3:45:17<34:47:21, 13.86s/it] 10%|▉ | 968/10000 [3:45:31<34:56:10, 13.92s/it] {'loss': 0.637, 'learning_rate': 4.5175e-05, 'epoch': 1.27} 10%|▉ | 968/10000 [3:45:31<34:56:10, 13.92s/it] 10%|▉ | 969/10000 [3:45:45<34:58:29, 13.94s/it] {'loss': 0.7131, 'learning_rate': 4.517e-05, 'epoch': 1.27} 10%|▉ | 969/10000 [3:45:45<34:58:29, 13.94s/it] 10%|▉ | 970/10000 [3:45:59<35:01:03, 13.96s/it] {'loss': 0.5297, 'learning_rate': 4.5165e-05, 'epoch': 1.27} 10%|▉ | 970/10000 [3:45:59<35:01:03, 13.96s/it] 10%|▉ | 971/10000 [3:46:13<34:58:19, 13.94s/it] {'loss': 0.6778, 'learning_rate': 4.516e-05, 'epoch': 1.27} 10%|▉ | 971/10000 [3:46:13<34:58:19, 13.94s/it] 10%|▉ | 972/10000 [3:46:27<34:50:41, 13.89s/it] {'loss': 0.6273, 'learning_rate': 4.5155000000000004e-05, 'epoch': 1.27} 10%|▉ | 972/10000 [3:46:27<34:50:41, 13.89s/it] 10%|▉ | 973/10000 [3:46:41<34:46:54, 13.87s/it] {'loss': 0.6467, 'learning_rate': 4.5150000000000006e-05, 'epoch': 1.27} 10%|▉ | 973/10000 [3:46:41<34:46:54, 13.87s/it] 10%|▉ | 974/10000 [3:46:55<34:51:46, 13.91s/it] {'loss': 0.5666, 'learning_rate': 4.5145e-05, 'epoch': 1.27} 10%|▉ | 974/10000 [3:46:55<34:51:46, 13.91s/it] 10%|▉ | 975/10000 [3:47:09<34:56:00, 13.93s/it] {'loss': 0.9065, 'learning_rate': 4.5140000000000005e-05, 'epoch': 1.28} 10%|▉ | 975/10000 [3:47:09<34:56:00, 13.93s/it] 10%|▉ | 976/10000 [3:47:22<34:49:00, 13.89s/it] {'loss': 0.5052, 'learning_rate': 4.5135e-05, 'epoch': 1.28} 10%|▉ | 976/10000 [3:47:23<34:49:00, 13.89s/it] 10%|▉ | 977/10000 [3:47:36<34:53:33, 13.92s/it] {'loss': 0.6579, 'learning_rate': 4.513e-05, 'epoch': 1.28} 10%|▉ | 977/10000 [3:47:37<34:53:33, 13.92s/it] 10%|▉ | 978/10000 [3:47:50<34:56:51, 13.94s/it] {'loss': 0.5921, 'learning_rate': 4.5125e-05, 'epoch': 1.28} 10%|▉ | 978/10000 [3:47:51<34:56:51, 13.94s/it] 10%|▉ | 979/10000 [3:48:04<34:52:25, 13.92s/it] {'loss': 0.6317, 'learning_rate': 4.512e-05, 'epoch': 1.28} 10%|▉ | 979/10000 [3:48:04<34:52:25, 13.92s/it] 10%|▉ | 980/10000 [3:48:18<34:49:36, 13.90s/it] {'loss': 0.6058, 'learning_rate': 4.5115000000000004e-05, 'epoch': 1.28} 10%|▉ | 980/10000 [3:48:18<34:49:36, 13.90s/it] 10%|▉ | 981/10000 [3:48:32<34:54:05, 13.93s/it] {'loss': 0.5437, 'learning_rate': 4.511e-05, 'epoch': 1.28} 10%|▉ | 981/10000 [3:48:32<34:54:05, 13.93s/it] 10%|▉ | 982/10000 [3:48:46<34:56:41, 13.95s/it] {'loss': 0.5201, 'learning_rate': 4.5105e-05, 'epoch': 1.29} 10%|▉ | 982/10000 [3:48:46<34:56:41, 13.95s/it] 10%|▉ | 983/10000 [3:49:00<34:54:57, 13.94s/it] {'loss': 0.4862, 'learning_rate': 4.5100000000000005e-05, 'epoch': 1.29} 10%|▉ | 983/10000 [3:49:00<34:54:57, 13.94s/it] 10%|▉ | 984/10000 [3:49:14<34:57:07, 13.96s/it] {'loss': 0.5613, 'learning_rate': 4.5095e-05, 'epoch': 1.29} 10%|▉ | 984/10000 [3:49:14<34:57:07, 13.96s/it] 10%|▉ | 985/10000 [3:49:28<34:58:32, 13.97s/it] {'loss': 0.637, 'learning_rate': 4.5090000000000004e-05, 'epoch': 1.29} 10%|▉ | 985/10000 [3:49:28<34:58:32, 13.97s/it] 10%|▉ | 986/10000 [3:49:42<34:56:53, 13.96s/it] {'loss': 0.6602, 'learning_rate': 4.5085e-05, 'epoch': 1.29} 10%|▉ | 986/10000 [3:49:42<34:56:53, 13.96s/it] 10%|▉ | 987/10000 [3:49:56<34:55:09, 13.95s/it] {'loss': 0.5879, 'learning_rate': 4.508e-05, 'epoch': 1.29} 10%|▉ | 987/10000 [3:49:56<34:55:09, 13.95s/it] 10%|▉ | 988/10000 [3:50:10<34:57:16, 13.96s/it] {'loss': 0.5699, 'learning_rate': 4.5075e-05, 'epoch': 1.29} 10%|▉ | 988/10000 [3:50:10<34:57:16, 13.96s/it] 10%|▉ | 989/10000 [3:50:24<35:01:34, 13.99s/it] {'loss': 0.6436, 'learning_rate': 4.507e-05, 'epoch': 1.29} 10%|▉ | 989/10000 [3:50:24<35:01:34, 13.99s/it] 10%|▉ | 990/10000 [3:50:38<35:00:45, 13.99s/it] {'loss': 0.516, 'learning_rate': 4.5065e-05, 'epoch': 1.3} 10%|▉ | 990/10000 [3:50:38<35:00:45, 13.99s/it] 10%|▉ | 991/10000 [3:50:52<34:56:43, 13.96s/it] {'loss': 0.6631, 'learning_rate': 4.506e-05, 'epoch': 1.3} 10%|▉ | 991/10000 [3:50:52<34:56:43, 13.96s/it] 10%|▉ | 992/10000 [3:51:06<34:55:52, 13.96s/it] {'loss': 0.5491, 'learning_rate': 4.5055e-05, 'epoch': 1.3} 10%|▉ | 992/10000 [3:51:06<34:55:52, 13.96s/it] 10%|▉ | 993/10000 [3:51:20<34:54:56, 13.96s/it] {'loss': 0.6844, 'learning_rate': 4.5050000000000004e-05, 'epoch': 1.3} 10%|▉ | 993/10000 [3:51:20<34:54:56, 13.96s/it] 10%|▉ | 994/10000 [3:51:34<35:05:32, 14.03s/it] {'loss': 0.6052, 'learning_rate': 4.504500000000001e-05, 'epoch': 1.3} 10%|▉ | 994/10000 [3:51:34<35:05:32, 14.03s/it] 10%|▉ | 995/10000 [3:51:48<34:59:43, 13.99s/it] {'loss': 0.6889, 'learning_rate': 4.504e-05, 'epoch': 1.3} 10%|▉ | 995/10000 [3:51:48<34:59:43, 13.99s/it] 10%|▉ | 996/10000 [3:52:02<34:59:15, 13.99s/it] {'loss': 0.5212, 'learning_rate': 4.5035e-05, 'epoch': 1.3} 10%|▉ | 996/10000 [3:52:02<34:59:15, 13.99s/it] 10%|▉ | 997/10000 [3:52:16<34:58:29, 13.99s/it] {'loss': 0.5889, 'learning_rate': 4.503e-05, 'epoch': 1.3} 10%|▉ | 997/10000 [3:52:16<34:58:29, 13.99s/it] 10%|▉ | 998/10000 [3:52:30<34:59:53, 14.00s/it] {'loss': 0.5359, 'learning_rate': 4.5025000000000003e-05, 'epoch': 1.31} 10%|▉ | 998/10000 [3:52:30<34:59:53, 14.00s/it] 10%|▉ | 999/10000 [3:52:44<34:55:51, 13.97s/it] {'loss': 0.7551, 'learning_rate': 4.502e-05, 'epoch': 1.31} 10%|▉ | 999/10000 [3:52:44<34:55:51, 13.97s/it] 10%|█ | 1000/10000 [3:52:58<34:53:18, 13.96s/it] {'loss': 0.5294, 'learning_rate': 4.5015e-05, 'epoch': 1.31} 10%|█ | 1000/10000 [3:52:58<34:53:18, 13.96s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 00:11:06,028 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-1000/config.json [INFO|configuration_utils.py:364] 2024-11-04 00:11:06,030 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-1000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 00:11:51,868 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 00:11:51,871 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 00:11:51,873 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-1000/special_tokens_map.json [2024-11-04 00:11:51,883] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved! /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( [2024-11-04 00:11:51,937] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( [2024-11-04 00:11:51,944] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... /public1/home/amzhou/anaconda3/envs/echo/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. warnings.warn( [2024-11-04 00:12:46,155] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2024-11-04 00:12:46,245] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 00:14:30,103] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 00:14:30,541] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 00:14:30,542] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now! 10%|█ | 1001/10000 [3:56:37<188:54:56, 75.57s/it] {'loss': 0.572, 'learning_rate': 4.5010000000000004e-05, 'epoch': 1.31} 10%|█ | 1001/10000 [3:56:37<188:54:56, 75.57s/it] 10%|█ | 1002/10000 [3:56:51<142:34:48, 57.04s/it] {'loss': 0.6195, 'learning_rate': 4.5005e-05, 'epoch': 1.31} 10%|█ | 1002/10000 [3:56:51<142:34:48, 57.04s/it] 10%|█ | 1003/10000 [3:57:05<110:06:01, 44.05s/it] {'loss': 0.5311, 'learning_rate': 4.5e-05, 'epoch': 1.31} 10%|█ | 1003/10000 [3:57:05<110:06:01, 44.05s/it] 10%|█ | 1004/10000 [3:57:18<87:25:16, 34.98s/it] {'loss': 0.613, 'learning_rate': 4.4995000000000005e-05, 'epoch': 1.31} 10%|█ | 1004/10000 [3:57:18<87:25:16, 34.98s/it] 10%|█ | 1005/10000 [3:57:32<71:42:23, 28.70s/it] {'loss': 0.6714, 'learning_rate': 4.499e-05, 'epoch': 1.32} 10%|█ | 1005/10000 [3:57:33<71:42:23, 28.70s/it] 10%|█ | 1006/10000 [3:57:46<60:36:05, 24.26s/it] {'loss': 0.7039, 'learning_rate': 4.4985000000000004e-05, 'epoch': 1.32} 10%|█ | 1006/10000 [3:57:46<60:36:05, 24.26s/it] 10%|█ | 1007/10000 [3:58:00<52:50:09, 21.15s/it] {'loss': 0.5115, 'learning_rate': 4.498e-05, 'epoch': 1.32} 10%|█ | 1007/10000 [3:58:00<52:50:09, 21.15s/it] 10%|█ | 1008/10000 [3:58:14<47:24:29, 18.98s/it] {'loss': 0.6861, 'learning_rate': 4.4975e-05, 'epoch': 1.32} 10%|█ | 1008/10000 [3:58:14<47:24:29, 18.98s/it] 10%|█ | 1009/10000 [3:58:28<43:41:08, 17.49s/it] {'loss': 0.5338, 'learning_rate': 4.497e-05, 'epoch': 1.32} 10%|█ | 1009/10000 [3:58:28<43:41:08, 17.49s/it] 10%|█ | 1010/10000 [3:58:42<41:01:36, 16.43s/it] {'loss': 0.6671, 'learning_rate': 4.4965e-05, 'epoch': 1.32} 10%|█ | 1010/10000 [3:58:42<41:01:36, 16.43s/it] 10%|█ | 1011/10000 [3:58:56<39:07:40, 15.67s/it] {'loss': 0.7405, 'learning_rate': 4.496e-05, 'epoch': 1.32} 10%|█ | 1011/10000 [3:58:56<39:07:40, 15.67s/it] 10%|█ | 1012/10000 [3:59:10<37:46:57, 15.13s/it] {'loss': 0.8011, 'learning_rate': 4.4955000000000006e-05, 'epoch': 1.32} 10%|█ | 1012/10000 [3:59:10<37:46:57, 15.13s/it] 10%|█ | 1013/10000 [3:59:24<36:56:12, 14.80s/it] {'loss': 0.5703, 'learning_rate': 4.495e-05, 'epoch': 1.33} 10%|█ | 1013/10000 [3:59:24<36:56:12, 14.80s/it] 10%|█ | 1014/10000 [3:59:38<36:19:13, 14.55s/it] {'loss': 0.4499, 'learning_rate': 4.4945000000000004e-05, 'epoch': 1.33} 10%|█ | 1014/10000 [3:59:38<36:19:13, 14.55s/it] 10%|█ | 1015/10000 [3:59:52<35:56:45, 14.40s/it] {'loss': 0.6003, 'learning_rate': 4.494000000000001e-05, 'epoch': 1.33} 10%|█ | 1015/10000 [3:59:52<35:56:45, 14.40s/it] 10%|█ | 1016/10000 [4:00:06<35:38:49, 14.28s/it] {'loss': 0.656, 'learning_rate': 4.4935e-05, 'epoch': 1.33} 10%|█ | 1016/10000 [4:00:06<35:38:49, 14.28s/it] 10%|█ | 1017/10000 [4:00:20<35:22:44, 14.18s/it] {'loss': 0.7386, 'learning_rate': 4.493e-05, 'epoch': 1.33} 10%|█ | 1017/10000 [4:00:20<35:22:44, 14.18s/it] 10%|█ | 1018/10000 [4:00:34<35:08:23, 14.08s/it] {'loss': 0.6519, 'learning_rate': 4.4925e-05, 'epoch': 1.33} 10%|█ | 1018/10000 [4:00:34<35:08:23, 14.08s/it] 10%|█ | 1019/10000 [4:00:48<34:52:20, 13.98s/it] {'loss': 0.603, 'learning_rate': 4.4920000000000004e-05, 'epoch': 1.33} 10%|█ | 1019/10000 [4:00:48<34:52:20, 13.98s/it] 10%|█ | 1020/10000 [4:01:02<34:52:53, 13.98s/it] {'loss': 0.5307, 'learning_rate': 4.4915e-05, 'epoch': 1.34} 10%|█ | 1020/10000 [4:01:02<34:52:53, 13.98s/it] 10%|█ | 1021/10000 [4:01:15<34:50:42, 13.97s/it] {'loss': 0.6138, 'learning_rate': 4.491e-05, 'epoch': 1.34} 10%|█ | 1021/10000 [4:01:15<34:50:42, 13.97s/it] 10%|█ | 1022/10000 [4:01:29<34:44:45, 13.93s/it] {'loss': 0.6346, 'learning_rate': 4.4905000000000005e-05, 'epoch': 1.34} 10%|█ | 1022/10000 [4:01:29<34:44:45, 13.93s/it] 10%|█ | 1023/10000 [4:01:43<34:46:08, 13.94s/it] {'loss': 0.6764, 'learning_rate': 4.49e-05, 'epoch': 1.34} 10%|█ | 1023/10000 [4:01:43<34:46:08, 13.94s/it] 10%|█ | 1024/10000 [4:01:57<34:53:21, 13.99s/it] {'loss': 0.6808, 'learning_rate': 4.4895e-05, 'epoch': 1.34} 10%|█ | 1024/10000 [4:01:57<34:53:21, 13.99s/it] 10%|█ | 1025/10000 [4:02:11<34:58:08, 14.03s/it] {'loss': 0.5487, 'learning_rate': 4.4890000000000006e-05, 'epoch': 1.34} 10%|█ | 1025/10000 [4:02:12<34:58:08, 14.03s/it] 10%|█ | 1026/10000 [4:02:25<34:56:22, 14.02s/it] {'loss': 0.6566, 'learning_rate': 4.488500000000001e-05, 'epoch': 1.34} 10%|█ | 1026/10000 [4:02:26<34:56:22, 14.02s/it] 10%|█ | 1027/10000 [4:02:39<34:54:03, 14.00s/it] {'loss': 0.5869, 'learning_rate': 4.488e-05, 'epoch': 1.34} 10%|█ | 1027/10000 [4:02:39<34:54:03, 14.00s/it] 10%|█ | 1028/10000 [4:02:53<34:49:04, 13.97s/it] {'loss': 0.5351, 'learning_rate': 4.4875e-05, 'epoch': 1.35} 10%|█ | 1028/10000 [4:02:53<34:49:04, 13.97s/it] 10%|█ | 1029/10000 [4:03:07<34:47:37, 13.96s/it] {'loss': 0.6305, 'learning_rate': 4.487e-05, 'epoch': 1.35} 10%|█ | 1029/10000 [4:03:07<34:47:37, 13.96s/it] 10%|█ | 1030/10000 [4:03:21<34:48:09, 13.97s/it] {'loss': 0.6481, 'learning_rate': 4.4865e-05, 'epoch': 1.35} 10%|█ | 1030/10000 [4:03:21<34:48:09, 13.97s/it] 10%|█ | 1031/10000 [4:03:35<34:54:30, 14.01s/it] {'loss': 0.5292, 'learning_rate': 4.486e-05, 'epoch': 1.35} 10%|█ | 1031/10000 [4:03:35<34:54:30, 14.01s/it] 10%|█ | 1032/10000 [4:03:49<34:48:06, 13.97s/it] {'loss': 0.5262, 'learning_rate': 4.4855e-05, 'epoch': 1.35} 10%|█ | 1032/10000 [4:03:49<34:48:06, 13.97s/it] 10%|█ | 1033/10000 [4:04:03<34:57:27, 14.03s/it] {'loss': 0.5387, 'learning_rate': 4.4850000000000006e-05, 'epoch': 1.35} 10%|█ | 1033/10000 [4:04:03<34:57:27, 14.03s/it] 10%|█ | 1034/10000 [4:04:17<34:57:32, 14.04s/it] {'loss': 0.5864, 'learning_rate': 4.4845e-05, 'epoch': 1.35} 10%|█ | 1034/10000 [4:04:18<34:57:32, 14.04s/it] 10%|█ | 1035/10000 [4:04:31<34:52:29, 14.00s/it] {'loss': 0.6016, 'learning_rate': 4.4840000000000004e-05, 'epoch': 1.35} 10%|█ | 1035/10000 [4:04:31<34:52:29, 14.00s/it] 10%|█ | 1036/10000 [4:04:45<34:46:21, 13.96s/it] {'loss': 0.5851, 'learning_rate': 4.483500000000001e-05, 'epoch': 1.36} 10%|█ | 1036/10000 [4:04:45<34:46:21, 13.96s/it] 10%|█ | 1037/10000 [4:04:59<34:46:05, 13.96s/it] {'loss': 0.5576, 'learning_rate': 4.483e-05, 'epoch': 1.36} 10%|█ | 1037/10000 [4:04:59<34:46:05, 13.96s/it] 10%|█ | 1038/10000 [4:05:13<34:48:49, 13.98s/it] {'loss': 0.7418, 'learning_rate': 4.4825e-05, 'epoch': 1.36} 10%|█ | 1038/10000 [4:05:13<34:48:49, 13.98s/it] 10%|█ | 1039/10000 [4:05:27<34:44:01, 13.95s/it] {'loss': 0.617, 'learning_rate': 4.482e-05, 'epoch': 1.36} 10%|█ | 1039/10000 [4:05:27<34:44:01, 13.95s/it] 10%|█ | 1040/10000 [4:05:41<34:37:52, 13.91s/it] {'loss': 0.6897, 'learning_rate': 4.4815000000000004e-05, 'epoch': 1.36} 10%|█ | 1040/10000 [4:05:41<34:37:52, 13.91s/it] 10%|█ | 1041/10000 [4:05:55<34:37:19, 13.91s/it] {'loss': 0.6209, 'learning_rate': 4.481e-05, 'epoch': 1.36} 10%|█ | 1041/10000 [4:05:55<34:37:19, 13.91s/it] 10%|█ | 1042/10000 [4:06:09<34:37:20, 13.91s/it] {'loss': 0.5322, 'learning_rate': 4.4805e-05, 'epoch': 1.36} 10%|█ | 1042/10000 [4:06:09<34:37:20, 13.91s/it] 10%|█ | 1043/10000 [4:06:23<34:37:43, 13.92s/it] {'loss': 0.8139, 'learning_rate': 4.4800000000000005e-05, 'epoch': 1.37} 10%|█ | 1043/10000 [4:06:23<34:37:43, 13.92s/it] 10%|█ | 1044/10000 [4:06:37<34:36:52, 13.91s/it] {'loss': 0.5749, 'learning_rate': 4.4795e-05, 'epoch': 1.37} 10%|█ | 1044/10000 [4:06:37<34:36:52, 13.91s/it] 10%|█ | 1045/10000 [4:06:51<34:38:42, 13.93s/it] {'loss': 0.6793, 'learning_rate': 4.479e-05, 'epoch': 1.37} 10%|█ | 1045/10000 [4:06:51<34:38:42, 13.93s/it] 10%|█ | 1046/10000 [4:07:05<34:39:22, 13.93s/it] {'loss': 0.6486, 'learning_rate': 4.4785000000000006e-05, 'epoch': 1.37} 10%|█ | 1046/10000 [4:07:05<34:39:22, 13.93s/it] 10%|█ | 1047/10000 [4:07:18<34:38:53, 13.93s/it] {'loss': 0.6467, 'learning_rate': 4.478e-05, 'epoch': 1.37} 10%|█ | 1047/10000 [4:07:18<34:38:53, 13.93s/it] 10%|█ | 1048/10000 [4:07:32<34:36:41, 13.92s/it] {'loss': 0.5946, 'learning_rate': 4.4775e-05, 'epoch': 1.37} 10%|█ | 1048/10000 [4:07:32<34:36:41, 13.92s/it] 10%|█ | 1049/10000 [4:07:46<34:45:05, 13.98s/it] {'loss': 0.5921, 'learning_rate': 4.477e-05, 'epoch': 1.37} 10%|█ | 1049/10000 [4:07:46<34:45:05, 13.98s/it] 10%|█ | 1050/10000 [4:08:00<34:44:28, 13.97s/it] {'loss': 0.653, 'learning_rate': 4.4765e-05, 'epoch': 1.37} 10%|█ | 1050/10000 [4:08:00<34:44:28, 13.97s/it] 11%|█ | 1051/10000 [4:08:15<34:50:47, 14.02s/it] {'loss': 0.6626, 'learning_rate': 4.4760000000000005e-05, 'epoch': 1.38} 11%|█ | 1051/10000 [4:08:15<34:50:47, 14.02s/it] 11%|█ | 1052/10000 [4:08:29<34:47:35, 14.00s/it] {'loss': 0.4525, 'learning_rate': 4.4755e-05, 'epoch': 1.38} 11%|█ | 1052/10000 [4:08:29<34:47:35, 14.00s/it] 11%|█ | 1053/10000 [4:08:43<34:47:47, 14.00s/it] {'loss': 0.619, 'learning_rate': 4.4750000000000004e-05, 'epoch': 1.38} 11%|█ | 1053/10000 [4:08:43<34:47:47, 14.00s/it] 11%|█ | 1054/10000 [4:08:56<34:41:15, 13.96s/it] {'loss': 0.7734, 'learning_rate': 4.4745000000000006e-05, 'epoch': 1.38} 11%|█ | 1054/10000 [4:08:56<34:41:15, 13.96s/it] 11%|█ | 1055/10000 [4:09:10<34:42:08, 13.97s/it] {'loss': 0.6881, 'learning_rate': 4.474e-05, 'epoch': 1.38} 11%|█ | 1055/10000 [4:09:10<34:42:08, 13.97s/it] 11%|█ | 1056/10000 [4:09:24<34:41:39, 13.96s/it] {'loss': 0.7872, 'learning_rate': 4.4735000000000005e-05, 'epoch': 1.38} 11%|█ | 1056/10000 [4:09:24<34:41:39, 13.96s/it] 11%|█ | 1057/10000 [4:09:38<34:46:43, 14.00s/it] {'loss': 0.6717, 'learning_rate': 4.473e-05, 'epoch': 1.38} 11%|█ | 1057/10000 [4:09:38<34:46:43, 14.00s/it] 11%|█ | 1058/10000 [4:09:52<34:46:39, 14.00s/it] {'loss': 0.4917, 'learning_rate': 4.4725e-05, 'epoch': 1.38} 11%|█ | 1058/10000 [4:09:52<34:46:39, 14.00s/it] 11%|█ | 1059/10000 [4:10:06<34:40:16, 13.96s/it] {'loss': 0.827, 'learning_rate': 4.472e-05, 'epoch': 1.39} 11%|█ | 1059/10000 [4:10:06<34:40:16, 13.96s/it] 11%|█ | 1060/10000 [4:10:20<34:46:30, 14.00s/it] {'loss': 0.6613, 'learning_rate': 4.4715e-05, 'epoch': 1.39} 11%|█ | 1060/10000 [4:10:20<34:46:30, 14.00s/it] 11%|█ | 1061/10000 [4:10:34<34:41:30, 13.97s/it] {'loss': 0.5802, 'learning_rate': 4.4710000000000004e-05, 'epoch': 1.39} 11%|█ | 1061/10000 [4:10:34<34:41:30, 13.97s/it] 11%|█ | 1062/10000 [4:10:48<34:43:16, 13.98s/it] {'loss': 0.6985, 'learning_rate': 4.4705e-05, 'epoch': 1.39} 11%|█ | 1062/10000 [4:10:48<34:43:16, 13.98s/it] 11%|█ | 1063/10000 [4:11:02<34:36:45, 13.94s/it] {'loss': 0.54, 'learning_rate': 4.47e-05, 'epoch': 1.39} 11%|█ | 1063/10000 [4:11:02<34:36:45, 13.94s/it] 11%|█ | 1064/10000 [4:11:16<34:36:28, 13.94s/it] {'loss': 0.753, 'learning_rate': 4.4695000000000005e-05, 'epoch': 1.39} 11%|█ | 1064/10000 [4:11:16<34:36:28, 13.94s/it] 11%|█ | 1065/10000 [4:11:30<34:44:02, 13.99s/it] {'loss': 0.511, 'learning_rate': 4.469e-05, 'epoch': 1.39} 11%|█ | 1065/10000 [4:11:30<34:44:02, 13.99s/it] 11%|█ | 1066/10000 [4:11:44<34:44:57, 14.00s/it] {'loss': 0.7011, 'learning_rate': 4.4685e-05, 'epoch': 1.4} 11%|█ | 1066/10000 [4:11:44<34:44:57, 14.00s/it] 11%|█ | 1067/10000 [4:11:58<34:43:11, 13.99s/it] {'loss': 0.5172, 'learning_rate': 4.468e-05, 'epoch': 1.4} 11%|█ | 1067/10000 [4:11:58<34:43:11, 13.99s/it] 11%|█ | 1068/10000 [4:12:12<34:41:30, 13.98s/it] {'loss': 0.7054, 'learning_rate': 4.4675e-05, 'epoch': 1.4} 11%|█ | 1068/10000 [4:12:12<34:41:30, 13.98s/it] 11%|█ | 1069/10000 [4:12:26<34:36:57, 13.95s/it] {'loss': 0.6131, 'learning_rate': 4.467e-05, 'epoch': 1.4} 11%|█ | 1069/10000 [4:12:26<34:36:57, 13.95s/it] 11%|█ | 1070/10000 [4:12:40<34:40:41, 13.98s/it] {'loss': 0.7326, 'learning_rate': 4.4665e-05, 'epoch': 1.4} 11%|█ | 1070/10000 [4:12:40<34:40:41, 13.98s/it] 11%|█ | 1071/10000 [4:12:54<34:40:28, 13.98s/it] {'loss': 0.8679, 'learning_rate': 4.466e-05, 'epoch': 1.4} 11%|█ | 1071/10000 [4:12:54<34:40:28, 13.98s/it] 11%|█ | 1072/10000 [4:13:08<34:34:09, 13.94s/it] {'loss': 0.5562, 'learning_rate': 4.4655000000000005e-05, 'epoch': 1.4} 11%|█ | 1072/10000 [4:13:08<34:34:09, 13.94s/it] 11%|█ | 1073/10000 [4:13:22<34:28:44, 13.90s/it] {'loss': 0.6029, 'learning_rate': 4.465e-05, 'epoch': 1.4} 11%|█ | 1073/10000 [4:13:22<34:28:44, 13.90s/it] 11%|█ | 1074/10000 [4:13:36<34:27:59, 13.90s/it] {'loss': 0.6284, 'learning_rate': 4.4645000000000004e-05, 'epoch': 1.41} 11%|█ | 1074/10000 [4:13:36<34:27:59, 13.90s/it] 11%|█ | 1075/10000 [4:13:50<34:29:16, 13.91s/it] {'loss': 0.4807, 'learning_rate': 4.4640000000000006e-05, 'epoch': 1.41} 11%|█ | 1075/10000 [4:13:50<34:29:16, 13.91s/it] 11%|█ | 1076/10000 [4:14:04<34:37:30, 13.97s/it] {'loss': 0.7166, 'learning_rate': 4.4635e-05, 'epoch': 1.41} 11%|█ | 1076/10000 [4:14:04<34:37:30, 13.97s/it] 11%|█ | 1077/10000 [4:14:18<34:35:37, 13.96s/it] {'loss': 0.5568, 'learning_rate': 4.463e-05, 'epoch': 1.41} 11%|█ | 1077/10000 [4:14:18<34:35:37, 13.96s/it] 11%|█ | 1078/10000 [4:14:32<34:36:22, 13.96s/it] {'loss': 0.5538, 'learning_rate': 4.4625e-05, 'epoch': 1.41} 11%|█ | 1078/10000 [4:14:32<34:36:22, 13.96s/it] 11%|█ | 1079/10000 [4:14:46<34:43:13, 14.01s/it] {'loss': 0.6075, 'learning_rate': 4.462e-05, 'epoch': 1.41} 11%|█ | 1079/10000 [4:14:46<34:43:13, 14.01s/it] 11%|█ | 1080/10000 [4:15:00<34:42:19, 14.01s/it] {'loss': 0.5625, 'learning_rate': 4.4615e-05, 'epoch': 1.41} 11%|█ | 1080/10000 [4:15:00<34:42:19, 14.01s/it] 11%|█ | 1081/10000 [4:15:14<34:44:47, 14.02s/it] {'loss': 0.596, 'learning_rate': 4.461e-05, 'epoch': 1.41} 11%|█ | 1081/10000 [4:15:14<34:44:47, 14.02s/it] 11%|█ | 1082/10000 [4:15:28<34:49:47, 14.06s/it] {'loss': 0.6121, 'learning_rate': 4.4605000000000004e-05, 'epoch': 1.42} 11%|█ | 1082/10000 [4:15:28<34:49:47, 14.06s/it] 11%|█ | 1083/10000 [4:15:42<34:43:15, 14.02s/it] {'loss': 0.5124, 'learning_rate': 4.46e-05, 'epoch': 1.42} 11%|█ | 1083/10000 [4:15:42<34:43:15, 14.02s/it] 11%|█ | 1084/10000 [4:15:56<34:43:59, 14.02s/it] {'loss': 0.5353, 'learning_rate': 4.4595e-05, 'epoch': 1.42} 11%|█ | 1084/10000 [4:15:56<34:43:59, 14.02s/it] 11%|█ | 1085/10000 [4:16:10<34:39:42, 14.00s/it] {'loss': 0.7318, 'learning_rate': 4.4590000000000005e-05, 'epoch': 1.42} 11%|█ | 1085/10000 [4:16:10<34:39:42, 14.00s/it] 11%|█ | 1086/10000 [4:16:24<34:36:57, 13.98s/it] {'loss': 0.5736, 'learning_rate': 4.458500000000001e-05, 'epoch': 1.42} 11%|█ | 1086/10000 [4:16:24<34:36:57, 13.98s/it] 11%|█ | 1087/10000 [4:16:38<34:29:52, 13.93s/it] {'loss': 0.7325, 'learning_rate': 4.458e-05, 'epoch': 1.42} 11%|█ | 1087/10000 [4:16:38<34:29:52, 13.93s/it] 11%|█ | 1088/10000 [4:16:51<34:30:27, 13.94s/it] {'loss': 0.6344, 'learning_rate': 4.4575e-05, 'epoch': 1.42} 11%|█ | 1088/10000 [4:16:52<34:30:27, 13.94s/it] 11%|█ | 1089/10000 [4:17:06<34:45:04, 14.04s/it] {'loss': 0.5642, 'learning_rate': 4.457e-05, 'epoch': 1.43} 11%|█ | 1089/10000 [4:17:06<34:45:04, 14.04s/it] 11%|█ | 1090/10000 [4:17:20<34:31:50, 13.95s/it] {'loss': 0.4136, 'learning_rate': 4.4565000000000004e-05, 'epoch': 1.43} 11%|█ | 1090/10000 [4:17:20<34:31:50, 13.95s/it] 11%|█ | 1091/10000 [4:17:33<34:30:25, 13.94s/it] {'loss': 0.9219, 'learning_rate': 4.456e-05, 'epoch': 1.43} 11%|█ | 1091/10000 [4:17:33<34:30:25, 13.94s/it] 11%|█ | 1092/10000 [4:17:47<34:27:06, 13.92s/it] {'loss': 0.5998, 'learning_rate': 4.4555e-05, 'epoch': 1.43} 11%|█ | 1092/10000 [4:17:47<34:27:06, 13.92s/it] 11%|█ | 1093/10000 [4:18:01<34:28:09, 13.93s/it] {'loss': 0.5999, 'learning_rate': 4.4550000000000005e-05, 'epoch': 1.43} 11%|█ | 1093/10000 [4:18:01<34:28:09, 13.93s/it] 11%|█ | 1094/10000 [4:18:15<34:24:37, 13.91s/it] {'loss': 0.7621, 'learning_rate': 4.4545e-05, 'epoch': 1.43} 11%|█ | 1094/10000 [4:18:15<34:24:37, 13.91s/it] 11%|█ | 1095/10000 [4:18:29<34:26:26, 13.92s/it] {'loss': 0.6213, 'learning_rate': 4.4540000000000004e-05, 'epoch': 1.43} 11%|█ | 1095/10000 [4:18:29<34:26:26, 13.92s/it] 11%|█ | 1096/10000 [4:18:43<34:31:54, 13.96s/it] {'loss': 0.6006, 'learning_rate': 4.4535000000000006e-05, 'epoch': 1.43} 11%|█ | 1096/10000 [4:18:43<34:31:54, 13.96s/it] 11%|█ | 1097/10000 [4:18:57<34:39:47, 14.02s/it] {'loss': 0.4961, 'learning_rate': 4.453e-05, 'epoch': 1.44} 11%|█ | 1097/10000 [4:18:57<34:39:47, 14.02s/it] 11%|█ | 1098/10000 [4:19:11<34:40:39, 14.02s/it] {'loss': 0.5485, 'learning_rate': 4.4525e-05, 'epoch': 1.44} 11%|█ | 1098/10000 [4:19:11<34:40:39, 14.02s/it] 11%|█ | 1099/10000 [4:19:25<34:36:03, 13.99s/it] {'loss': 0.5146, 'learning_rate': 4.452e-05, 'epoch': 1.44} 11%|█ | 1099/10000 [4:19:25<34:36:03, 13.99s/it] 11%|█ | 1100/10000 [4:19:39<34:32:42, 13.97s/it] {'loss': 0.6867, 'learning_rate': 4.4515e-05, 'epoch': 1.44} 11%|█ | 1100/10000 [4:19:39<34:32:42, 13.97s/it] 11%|█ | 1101/10000 [4:19:53<34:26:13, 13.93s/it] {'loss': 0.5842, 'learning_rate': 4.451e-05, 'epoch': 1.44} 11%|█ | 1101/10000 [4:19:53<34:26:13, 13.93s/it] 11%|█ | 1102/10000 [4:20:07<34:25:13, 13.93s/it] {'loss': 0.5015, 'learning_rate': 4.4505e-05, 'epoch': 1.44} 11%|█ | 1102/10000 [4:20:07<34:25:13, 13.93s/it] 11%|█ | 1103/10000 [4:20:21<34:22:08, 13.91s/it] {'loss': 0.7086, 'learning_rate': 4.4500000000000004e-05, 'epoch': 1.44} 11%|█ | 1103/10000 [4:20:21<34:22:08, 13.91s/it] 11%|█ | 1104/10000 [4:20:35<34:20:26, 13.90s/it] {'loss': 0.6456, 'learning_rate': 4.4495e-05, 'epoch': 1.45} 11%|█ | 1104/10000 [4:20:35<34:20:26, 13.90s/it] 11%|█ | 1105/10000 [4:20:49<34:22:31, 13.91s/it] {'loss': 0.7506, 'learning_rate': 4.449e-05, 'epoch': 1.45} 11%|█ | 1105/10000 [4:20:49<34:22:31, 13.91s/it] 11%|█ | 1106/10000 [4:21:03<34:22:25, 13.91s/it] {'loss': 0.6325, 'learning_rate': 4.4485000000000005e-05, 'epoch': 1.45} 11%|█ | 1106/10000 [4:21:03<34:22:25, 13.91s/it] 11%|█ | 1107/10000 [4:21:17<34:26:42, 13.94s/it] {'loss': 0.5829, 'learning_rate': 4.448e-05, 'epoch': 1.45} 11%|█ | 1107/10000 [4:21:17<34:26:42, 13.94s/it] 11%|█ | 1108/10000 [4:21:30<34:25:12, 13.94s/it] {'loss': 0.9496, 'learning_rate': 4.4475e-05, 'epoch': 1.45} 11%|█ | 1108/10000 [4:21:30<34:25:12, 13.94s/it] 11%|█ | 1109/10000 [4:21:44<34:23:13, 13.92s/it] {'loss': 0.6981, 'learning_rate': 4.447e-05, 'epoch': 1.45} 11%|█ | 1109/10000 [4:21:44<34:23:13, 13.92s/it] 11%|█ | 1110/10000 [4:21:58<34:28:12, 13.96s/it] {'loss': 0.6571, 'learning_rate': 4.4465e-05, 'epoch': 1.45} 11%|█ | 1110/10000 [4:21:58<34:28:12, 13.96s/it] 11%|█ | 1111/10000 [4:22:12<34:25:21, 13.94s/it] {'loss': 0.7308, 'learning_rate': 4.4460000000000005e-05, 'epoch': 1.45} 11%|█ | 1111/10000 [4:22:12<34:25:21, 13.94s/it] 11%|█ | 1112/10000 [4:22:26<34:22:46, 13.93s/it] {'loss': 0.6724, 'learning_rate': 4.4455e-05, 'epoch': 1.46} 11%|█ | 1112/10000 [4:22:26<34:22:46, 13.93s/it] 11%|█ | 1113/10000 [4:22:40<34:22:48, 13.93s/it] {'loss': 0.6601, 'learning_rate': 4.445e-05, 'epoch': 1.46} 11%|█ | 1113/10000 [4:22:40<34:22:48, 13.93s/it] 11%|█ | 1114/10000 [4:22:54<34:18:50, 13.90s/it] {'loss': 0.5423, 'learning_rate': 4.4445000000000006e-05, 'epoch': 1.46} 11%|█ | 1114/10000 [4:22:54<34:18:50, 13.90s/it] 11%|█ | 1115/10000 [4:23:08<34:23:55, 13.94s/it] {'loss': 0.6329, 'learning_rate': 4.444e-05, 'epoch': 1.46} 11%|█ | 1115/10000 [4:23:08<34:23:55, 13.94s/it] 11%|█ | 1116/10000 [4:23:22<34:27:49, 13.97s/it] {'loss': 0.6298, 'learning_rate': 4.4435000000000004e-05, 'epoch': 1.46} 11%|█ | 1116/10000 [4:23:22<34:27:49, 13.97s/it] 11%|█ | 1117/10000 [4:23:36<34:28:58, 13.97s/it] {'loss': 0.6714, 'learning_rate': 4.443e-05, 'epoch': 1.46} 11%|█ | 1117/10000 [4:23:36<34:28:58, 13.97s/it] 11%|█ | 1118/10000 [4:23:50<34:32:33, 14.00s/it] {'loss': 0.5593, 'learning_rate': 4.4425e-05, 'epoch': 1.46} 11%|█ | 1118/10000 [4:23:50<34:32:33, 14.00s/it] 11%|█ | 1119/10000 [4:24:04<34:30:23, 13.99s/it] {'loss': 0.5731, 'learning_rate': 4.442e-05, 'epoch': 1.46} 11%|█ | 1119/10000 [4:24:04<34:30:23, 13.99s/it] 11%|█ | 1120/10000 [4:24:18<34:26:12, 13.96s/it] {'loss': 0.6683, 'learning_rate': 4.4415e-05, 'epoch': 1.47} 11%|█ | 1120/10000 [4:24:18<34:26:12, 13.96s/it] 11%|█ | 1121/10000 [4:24:32<34:21:34, 13.93s/it] {'loss': 0.6129, 'learning_rate': 4.4410000000000003e-05, 'epoch': 1.47} 11%|█ | 1121/10000 [4:24:32<34:21:34, 13.93s/it] 11%|█ | 1122/10000 [4:24:46<34:18:18, 13.91s/it] {'loss': 0.574, 'learning_rate': 4.4405e-05, 'epoch': 1.47} 11%|█ | 1122/10000 [4:24:46<34:18:18, 13.91s/it] 11%|█ | 1123/10000 [4:25:00<34:19:52, 13.92s/it] {'loss': 0.7732, 'learning_rate': 4.44e-05, 'epoch': 1.47} 11%|█ | 1123/10000 [4:25:00<34:19:52, 13.92s/it] 11%|█ | 1124/10000 [4:25:13<34:17:48, 13.91s/it] {'loss': 0.5939, 'learning_rate': 4.4395000000000004e-05, 'epoch': 1.47} 11%|█ | 1124/10000 [4:25:13<34:17:48, 13.91s/it] 11%|█▏ | 1125/10000 [4:25:27<34:16:30, 13.90s/it] {'loss': 0.5883, 'learning_rate': 4.439000000000001e-05, 'epoch': 1.47} 11%|█▏ | 1125/10000 [4:25:27<34:16:30, 13.90s/it] 11%|█▏ | 1126/10000 [4:25:41<34:17:46, 13.91s/it] {'loss': 0.6137, 'learning_rate': 4.4385e-05, 'epoch': 1.47} 11%|█▏ | 1126/10000 [4:25:41<34:17:46, 13.91s/it] 11%|█▏ | 1127/10000 [4:25:55<34:18:03, 13.92s/it] {'loss': 0.6114, 'learning_rate': 4.438e-05, 'epoch': 1.48} 11%|█▏ | 1127/10000 [4:25:55<34:18:03, 13.92s/it] 11%|█▏ | 1128/10000 [4:26:09<34:20:59, 13.94s/it] {'loss': 0.7731, 'learning_rate': 4.4375e-05, 'epoch': 1.48} 11%|█▏ | 1128/10000 [4:26:09<34:20:59, 13.94s/it] 11%|█▏ | 1129/10000 [4:26:23<34:17:35, 13.92s/it] {'loss': 0.7335, 'learning_rate': 4.4370000000000004e-05, 'epoch': 1.48} 11%|█▏ | 1129/10000 [4:26:23<34:17:35, 13.92s/it] 11%|█▏ | 1130/10000 [4:26:37<34:23:44, 13.96s/it] {'loss': 0.594, 'learning_rate': 4.4365e-05, 'epoch': 1.48} 11%|█▏ | 1130/10000 [4:26:37<34:23:44, 13.96s/it] 11%|█▏ | 1131/10000 [4:26:51<34:19:13, 13.93s/it] {'loss': 0.606, 'learning_rate': 4.436e-05, 'epoch': 1.48} 11%|█▏ | 1131/10000 [4:26:51<34:19:13, 13.93s/it] 11%|█▏ | 1132/10000 [4:27:05<34:24:10, 13.97s/it] {'loss': 0.5592, 'learning_rate': 4.4355000000000005e-05, 'epoch': 1.48} 11%|█▏ | 1132/10000 [4:27:05<34:24:10, 13.97s/it] 11%|█▏ | 1133/10000 [4:27:19<34:28:17, 14.00s/it] {'loss': 0.5432, 'learning_rate': 4.435e-05, 'epoch': 1.48} 11%|█▏ | 1133/10000 [4:27:19<34:28:17, 14.00s/it] 11%|█▏ | 1134/10000 [4:27:33<34:35:22, 14.04s/it] {'loss': 0.6055, 'learning_rate': 4.4345e-05, 'epoch': 1.48} 11%|█▏ | 1134/10000 [4:27:33<34:35:22, 14.04s/it] 11%|█▏ | 1135/10000 [4:27:47<34:31:55, 14.02s/it] {'loss': 0.5841, 'learning_rate': 4.4340000000000006e-05, 'epoch': 1.49} 11%|█▏ | 1135/10000 [4:27:47<34:31:55, 14.02s/it] 11%|█▏ | 1136/10000 [4:28:01<34:28:30, 14.00s/it] {'loss': 0.61, 'learning_rate': 4.4335e-05, 'epoch': 1.49} 11%|█▏ | 1136/10000 [4:28:01<34:28:30, 14.00s/it] 11%|█▏ | 1137/10000 [4:28:15<34:24:20, 13.97s/it] {'loss': 0.5798, 'learning_rate': 4.4330000000000004e-05, 'epoch': 1.49} 11%|█▏ | 1137/10000 [4:28:15<34:24:20, 13.97s/it] 11%|█▏ | 1138/10000 [4:28:29<34:20:06, 13.95s/it] {'loss': 0.6405, 'learning_rate': 4.4325e-05, 'epoch': 1.49} 11%|█▏ | 1138/10000 [4:28:29<34:20:06, 13.95s/it] 11%|█▏ | 1139/10000 [4:28:43<34:21:58, 13.96s/it] {'loss': 0.5578, 'learning_rate': 4.432e-05, 'epoch': 1.49} 11%|█▏ | 1139/10000 [4:28:43<34:21:58, 13.96s/it] 11%|█▏ | 1140/10000 [4:28:57<34:23:06, 13.97s/it] {'loss': 0.61, 'learning_rate': 4.4315e-05, 'epoch': 1.49} 11%|█▏ | 1140/10000 [4:28:57<34:23:06, 13.97s/it] 11%|█▏ | 1141/10000 [4:29:11<34:24:46, 13.98s/it] {'loss': 0.6324, 'learning_rate': 4.431e-05, 'epoch': 1.49} 11%|█▏ | 1141/10000 [4:29:11<34:24:46, 13.98s/it] 11%|█▏ | 1142/10000 [4:29:25<34:18:16, 13.94s/it] {'loss': 0.8244, 'learning_rate': 4.4305000000000004e-05, 'epoch': 1.49} 11%|█▏ | 1142/10000 [4:29:25<34:18:16, 13.94s/it] 11%|█▏ | 1143/10000 [4:29:39<34:22:59, 13.98s/it] {'loss': 0.5885, 'learning_rate': 4.43e-05, 'epoch': 1.5} 11%|█▏ | 1143/10000 [4:29:39<34:22:59, 13.98s/it] 11%|█▏ | 1144/10000 [4:29:53<34:17:42, 13.94s/it] {'loss': 0.8334, 'learning_rate': 4.4295e-05, 'epoch': 1.5} 11%|█▏ | 1144/10000 [4:29:53<34:17:42, 13.94s/it] 11%|█▏ | 1145/10000 [4:30:07<34:19:08, 13.95s/it] {'loss': 0.6452, 'learning_rate': 4.4290000000000005e-05, 'epoch': 1.5} 11%|█▏ | 1145/10000 [4:30:07<34:19:08, 13.95s/it] 11%|█▏ | 1146/10000 [4:30:21<34:23:25, 13.98s/it] {'loss': 0.653, 'learning_rate': 4.428500000000001e-05, 'epoch': 1.5} 11%|█▏ | 1146/10000 [4:30:21<34:23:25, 13.98s/it] 11%|█▏ | 1147/10000 [4:30:35<34:13:04, 13.91s/it] {'loss': 0.6787, 'learning_rate': 4.428e-05, 'epoch': 1.5} 11%|█▏ | 1147/10000 [4:30:35<34:13:04, 13.91s/it] 11%|█▏ | 1148/10000 [4:30:49<34:16:52, 13.94s/it] {'loss': 0.6612, 'learning_rate': 4.4275e-05, 'epoch': 1.5} 11%|█▏ | 1148/10000 [4:30:49<34:16:52, 13.94s/it] 11%|█▏ | 1149/10000 [4:31:03<34:19:16, 13.96s/it] {'loss': 0.6126, 'learning_rate': 4.427e-05, 'epoch': 1.5} 11%|█▏ | 1149/10000 [4:31:03<34:19:16, 13.96s/it] 12%|█▏ | 1150/10000 [4:31:17<34:26:32, 14.01s/it] {'loss': 0.5891, 'learning_rate': 4.4265000000000004e-05, 'epoch': 1.51} 12%|█▏ | 1150/10000 [4:31:17<34:26:32, 14.01s/it] 12%|█▏ | 1151/10000 [4:31:31<34:24:26, 14.00s/it] {'loss': 0.6445, 'learning_rate': 4.426e-05, 'epoch': 1.51} 12%|█▏ | 1151/10000 [4:31:31<34:24:26, 14.00s/it] 12%|█▏ | 1152/10000 [4:31:45<34:31:11, 14.05s/it] {'loss': 0.7164, 'learning_rate': 4.4255e-05, 'epoch': 1.51} 12%|█▏ | 1152/10000 [4:31:45<34:31:11, 14.05s/it] 12%|█▏ | 1153/10000 [4:31:59<34:31:23, 14.05s/it] {'loss': 0.7443, 'learning_rate': 4.4250000000000005e-05, 'epoch': 1.51} 12%|█▏ | 1153/10000 [4:31:59<34:31:23, 14.05s/it] 12%|█▏ | 1154/10000 [4:32:13<34:26:41, 14.02s/it] {'loss': 0.5169, 'learning_rate': 4.4245e-05, 'epoch': 1.51} 12%|█▏ | 1154/10000 [4:32:13<34:26:41, 14.02s/it] 12%|█▏ | 1155/10000 [4:32:27<34:22:53, 13.99s/it] {'loss': 0.6542, 'learning_rate': 4.424e-05, 'epoch': 1.51} 12%|█▏ | 1155/10000 [4:32:27<34:22:53, 13.99s/it] 12%|█▏ | 1156/10000 [4:32:41<34:19:11, 13.97s/it] {'loss': 0.437, 'learning_rate': 4.4235000000000006e-05, 'epoch': 1.51} 12%|█▏ | 1156/10000 [4:32:41<34:19:11, 13.97s/it] 12%|█▏ | 1157/10000 [4:32:55<34:23:22, 14.00s/it] {'loss': 0.5236, 'learning_rate': 4.423e-05, 'epoch': 1.51} 12%|█▏ | 1157/10000 [4:32:55<34:23:22, 14.00s/it] 12%|█▏ | 1158/10000 [4:33:09<34:19:22, 13.97s/it] {'loss': 0.5756, 'learning_rate': 4.4225e-05, 'epoch': 1.52} 12%|█▏ | 1158/10000 [4:33:09<34:19:22, 13.97s/it] 12%|█▏ | 1159/10000 [4:33:23<34:16:00, 13.95s/it] {'loss': 0.6014, 'learning_rate': 4.422e-05, 'epoch': 1.52} 12%|█▏ | 1159/10000 [4:33:23<34:16:00, 13.95s/it] 12%|█▏ | 1160/10000 [4:33:37<34:23:09, 14.00s/it] {'loss': 0.6563, 'learning_rate': 4.4215e-05, 'epoch': 1.52} 12%|█▏ | 1160/10000 [4:33:37<34:23:09, 14.00s/it] 12%|█▏ | 1161/10000 [4:33:50<34:15:53, 13.96s/it] {'loss': 0.7307, 'learning_rate': 4.421e-05, 'epoch': 1.52} 12%|█▏ | 1161/10000 [4:33:51<34:15:53, 13.96s/it] 12%|█▏ | 1162/10000 [4:34:04<34:16:11, 13.96s/it] {'loss': 0.5545, 'learning_rate': 4.4205e-05, 'epoch': 1.52} 12%|█▏ | 1162/10000 [4:34:05<34:16:11, 13.96s/it] 12%|█▏ | 1163/10000 [4:34:18<34:15:27, 13.96s/it] {'loss': 0.4994, 'learning_rate': 4.4200000000000004e-05, 'epoch': 1.52} 12%|█▏ | 1163/10000 [4:34:18<34:15:27, 13.96s/it] 12%|█▏ | 1164/10000 [4:34:32<34:21:14, 14.00s/it] {'loss': 0.787, 'learning_rate': 4.4195000000000006e-05, 'epoch': 1.52} 12%|█▏ | 1164/10000 [4:34:33<34:21:14, 14.00s/it] 12%|█▏ | 1165/10000 [4:34:46<34:17:26, 13.97s/it] {'loss': 0.7274, 'learning_rate': 4.419e-05, 'epoch': 1.52} 12%|█▏ | 1165/10000 [4:34:46<34:17:26, 13.97s/it] 12%|█▏ | 1166/10000 [4:35:00<34:22:08, 14.01s/it] {'loss': 0.6336, 'learning_rate': 4.4185000000000005e-05, 'epoch': 1.53} 12%|█▏ | 1166/10000 [4:35:01<34:22:08, 14.01s/it] 12%|█▏ | 1167/10000 [4:35:14<34:13:16, 13.95s/it] {'loss': 0.5211, 'learning_rate': 4.418000000000001e-05, 'epoch': 1.53} 12%|█▏ | 1167/10000 [4:35:14<34:13:16, 13.95s/it] 12%|█▏ | 1168/10000 [4:35:28<34:12:25, 13.94s/it] {'loss': 0.7553, 'learning_rate': 4.4174999999999996e-05, 'epoch': 1.53} 12%|█▏ | 1168/10000 [4:35:28<34:12:25, 13.94s/it] 12%|█▏ | 1169/10000 [4:35:42<34:14:10, 13.96s/it] {'loss': 0.5703, 'learning_rate': 4.417e-05, 'epoch': 1.53} 12%|█▏ | 1169/10000 [4:35:42<34:14:10, 13.96s/it] 12%|█▏ | 1170/10000 [4:35:56<34:13:23, 13.95s/it] {'loss': 0.5732, 'learning_rate': 4.4165e-05, 'epoch': 1.53} 12%|█▏ | 1170/10000 [4:35:56<34:13:23, 13.95s/it] 12%|█▏ | 1171/10000 [4:36:10<34:09:13, 13.93s/it] {'loss': 0.5925, 'learning_rate': 4.4160000000000004e-05, 'epoch': 1.53} 12%|█▏ | 1171/10000 [4:36:10<34:09:13, 13.93s/it] 12%|█▏ | 1172/10000 [4:36:24<34:07:35, 13.92s/it] {'loss': 0.6006, 'learning_rate': 4.4155e-05, 'epoch': 1.53} 12%|█▏ | 1172/10000 [4:36:24<34:07:35, 13.92s/it] 12%|█▏ | 1173/10000 [4:36:38<34:05:29, 13.90s/it] {'loss': 0.8236, 'learning_rate': 4.415e-05, 'epoch': 1.54} 12%|█▏ | 1173/10000 [4:36:38<34:05:29, 13.90s/it] 12%|█▏ | 1174/10000 [4:36:52<34:03:29, 13.89s/it] {'loss': 0.5282, 'learning_rate': 4.4145000000000005e-05, 'epoch': 1.54} 12%|█▏ | 1174/10000 [4:36:52<34:03:29, 13.89s/it] 12%|█▏ | 1175/10000 [4:37:06<34:00:32, 13.87s/it] {'loss': 0.7734, 'learning_rate': 4.414e-05, 'epoch': 1.54} 12%|█▏ | 1175/10000 [4:37:06<34:00:32, 13.87s/it] 12%|█▏ | 1176/10000 [4:37:19<34:01:07, 13.88s/it] {'loss': 0.6717, 'learning_rate': 4.4135000000000003e-05, 'epoch': 1.54} 12%|█▏ | 1176/10000 [4:37:19<34:01:07, 13.88s/it] 12%|█▏ | 1177/10000 [4:37:33<34:05:12, 13.91s/it] {'loss': 0.7127, 'learning_rate': 4.4130000000000006e-05, 'epoch': 1.54} 12%|█▏ | 1177/10000 [4:37:33<34:05:12, 13.91s/it] 12%|█▏ | 1178/10000 [4:37:47<34:03:19, 13.90s/it] {'loss': 0.6315, 'learning_rate': 4.4125e-05, 'epoch': 1.54} 12%|█▏ | 1178/10000 [4:37:47<34:03:19, 13.90s/it] 12%|█▏ | 1179/10000 [4:38:01<34:04:23, 13.91s/it] {'loss': 0.6975, 'learning_rate': 4.412e-05, 'epoch': 1.54} 12%|█▏ | 1179/10000 [4:38:01<34:04:23, 13.91s/it] 12%|█▏ | 1180/10000 [4:38:15<34:17:10, 13.99s/it] {'loss': 0.6541, 'learning_rate': 4.4115e-05, 'epoch': 1.54} 12%|█▏ | 1180/10000 [4:38:15<34:17:10, 13.99s/it] 12%|█▏ | 1181/10000 [4:38:29<34:21:43, 14.03s/it] {'loss': 0.5614, 'learning_rate': 4.411e-05, 'epoch': 1.55} 12%|█▏ | 1181/10000 [4:38:30<34:21:43, 14.03s/it] 12%|█▏ | 1182/10000 [4:38:43<34:20:03, 14.02s/it] {'loss': 0.6569, 'learning_rate': 4.4105e-05, 'epoch': 1.55} 12%|█▏ | 1182/10000 [4:38:44<34:20:03, 14.02s/it] 12%|█▏ | 1183/10000 [4:38:57<34:11:05, 13.96s/it] {'loss': 0.7225, 'learning_rate': 4.41e-05, 'epoch': 1.55} 12%|█▏ | 1183/10000 [4:38:57<34:11:05, 13.96s/it] 12%|█▏ | 1184/10000 [4:39:11<34:17:02, 14.00s/it] {'loss': 0.5399, 'learning_rate': 4.4095000000000004e-05, 'epoch': 1.55} 12%|█▏ | 1184/10000 [4:39:11<34:17:02, 14.00s/it] 12%|█▏ | 1185/10000 [4:39:25<34:13:29, 13.98s/it] {'loss': 0.559, 'learning_rate': 4.4090000000000006e-05, 'epoch': 1.55} 12%|█▏ | 1185/10000 [4:39:25<34:13:29, 13.98s/it] 12%|█▏ | 1186/10000 [4:39:39<34:11:27, 13.97s/it] {'loss': 0.5803, 'learning_rate': 4.4085e-05, 'epoch': 1.55} 12%|█▏ | 1186/10000 [4:39:39<34:11:27, 13.97s/it] 12%|█▏ | 1187/10000 [4:39:53<34:07:40, 13.94s/it] {'loss': 0.6493, 'learning_rate': 4.4080000000000005e-05, 'epoch': 1.55} 12%|█▏ | 1187/10000 [4:39:53<34:07:40, 13.94s/it] 12%|█▏ | 1188/10000 [4:40:07<34:12:18, 13.97s/it] {'loss': 0.5793, 'learning_rate': 4.4075e-05, 'epoch': 1.55} 12%|█▏ | 1188/10000 [4:40:07<34:12:18, 13.97s/it] 12%|█▏ | 1189/10000 [4:40:21<34:09:19, 13.96s/it] {'loss': 0.63, 'learning_rate': 4.407e-05, 'epoch': 1.56} 12%|█▏ | 1189/10000 [4:40:21<34:09:19, 13.96s/it] 12%|█▏ | 1190/10000 [4:40:35<34:06:08, 13.94s/it] {'loss': 0.5882, 'learning_rate': 4.4065e-05, 'epoch': 1.56} 12%|█▏ | 1190/10000 [4:40:35<34:06:08, 13.94s/it] 12%|█▏ | 1191/10000 [4:40:49<34:05:46, 13.93s/it] {'loss': 0.6876, 'learning_rate': 4.406e-05, 'epoch': 1.56} 12%|█▏ | 1191/10000 [4:40:49<34:05:46, 13.93s/it] 12%|█▏ | 1192/10000 [4:41:03<33:59:59, 13.90s/it] {'loss': 0.5922, 'learning_rate': 4.4055000000000004e-05, 'epoch': 1.56} 12%|█▏ | 1192/10000 [4:41:03<33:59:59, 13.90s/it] 12%|█▏ | 1193/10000 [4:41:17<33:59:51, 13.90s/it] {'loss': 0.5438, 'learning_rate': 4.405e-05, 'epoch': 1.56} 12%|█▏ | 1193/10000 [4:41:17<33:59:51, 13.90s/it] 12%|█▏ | 1194/10000 [4:41:30<33:56:28, 13.88s/it] {'loss': 0.7443, 'learning_rate': 4.4045e-05, 'epoch': 1.56} 12%|█▏ | 1194/10000 [4:41:30<33:56:28, 13.88s/it] 12%|█▏ | 1195/10000 [4:41:44<33:58:22, 13.89s/it] {'loss': 0.6469, 'learning_rate': 4.4040000000000005e-05, 'epoch': 1.56} 12%|█▏ | 1195/10000 [4:41:44<33:58:22, 13.89s/it] 12%|█▏ | 1196/10000 [4:41:58<34:05:30, 13.94s/it] {'loss': 0.666, 'learning_rate': 4.4035e-05, 'epoch': 1.57} 12%|█▏ | 1196/10000 [4:41:58<34:05:30, 13.94s/it] 12%|█▏ | 1197/10000 [4:42:12<34:03:27, 13.93s/it] {'loss': 0.6194, 'learning_rate': 4.4030000000000004e-05, 'epoch': 1.57} 12%|█▏ | 1197/10000 [4:42:12<34:03:27, 13.93s/it] 12%|█▏ | 1198/10000 [4:42:26<34:02:46, 13.92s/it] {'loss': 0.6526, 'learning_rate': 4.4025e-05, 'epoch': 1.57} 12%|█▏ | 1198/10000 [4:42:26<34:02:46, 13.92s/it] 12%|█▏ | 1199/10000 [4:42:40<34:00:40, 13.91s/it] {'loss': 0.7105, 'learning_rate': 4.402e-05, 'epoch': 1.57} 12%|█▏ | 1199/10000 [4:42:40<34:00:40, 13.91s/it] 12%|█▏ | 1200/10000 [4:42:54<34:14:23, 14.01s/it] {'loss': 0.5476, 'learning_rate': 4.4015e-05, 'epoch': 1.57} 12%|█▏ | 1200/10000 [4:42:54<34:14:23, 14.01s/it] 12%|█▏ | 1201/10000 [4:43:08<34:11:27, 13.99s/it] {'loss': 0.5121, 'learning_rate': 4.401e-05, 'epoch': 1.57} 12%|█▏ | 1201/10000 [4:43:08<34:11:27, 13.99s/it] 12%|█▏ | 1202/10000 [4:43:22<34:06:41, 13.96s/it] {'loss': 0.6477, 'learning_rate': 4.4005e-05, 'epoch': 1.57} 12%|█▏ | 1202/10000 [4:43:22<34:06:41, 13.96s/it] 12%|█▏ | 1203/10000 [4:43:36<34:05:47, 13.95s/it] {'loss': 0.7619, 'learning_rate': 4.4000000000000006e-05, 'epoch': 1.57} 12%|█▏ | 1203/10000 [4:43:36<34:05:47, 13.95s/it] 12%|█▏ | 1204/10000 [4:43:50<34:07:15, 13.96s/it] {'loss': 0.7018, 'learning_rate': 4.3995e-05, 'epoch': 1.58} 12%|█▏ | 1204/10000 [4:43:50<34:07:15, 13.96s/it] 12%|█▏ | 1205/10000 [4:44:04<34:06:21, 13.96s/it] {'loss': 0.6238, 'learning_rate': 4.3990000000000004e-05, 'epoch': 1.58} 12%|█▏ | 1205/10000 [4:44:04<34:06:21, 13.96s/it] 12%|█▏ | 1206/10000 [4:44:18<34:09:02, 13.98s/it] {'loss': 0.5166, 'learning_rate': 4.398500000000001e-05, 'epoch': 1.58} 12%|█▏ | 1206/10000 [4:44:18<34:09:02, 13.98s/it] 12%|█▏ | 1207/10000 [4:44:32<34:07:58, 13.97s/it] {'loss': 0.6903, 'learning_rate': 4.398e-05, 'epoch': 1.58} 12%|█▏ | 1207/10000 [4:44:32<34:07:58, 13.97s/it] 12%|█▏ | 1208/10000 [4:44:46<34:04:33, 13.95s/it] {'loss': 0.7041, 'learning_rate': 4.3975e-05, 'epoch': 1.58} 12%|█▏ | 1208/10000 [4:44:46<34:04:33, 13.95s/it] 12%|█▏ | 1209/10000 [4:45:00<34:02:47, 13.94s/it] {'loss': 0.5905, 'learning_rate': 4.397e-05, 'epoch': 1.58} 12%|█▏ | 1209/10000 [4:45:00<34:02:47, 13.94s/it] 12%|█▏ | 1210/10000 [4:45:14<34:00:57, 13.93s/it] {'loss': 0.6458, 'learning_rate': 4.3965000000000003e-05, 'epoch': 1.58} 12%|█▏ | 1210/10000 [4:45:14<34:00:57, 13.93s/it] 12%|█▏ | 1211/10000 [4:45:28<34:09:22, 13.99s/it] {'loss': 0.6422, 'learning_rate': 4.396e-05, 'epoch': 1.59} 12%|█▏ | 1211/10000 [4:45:28<34:09:22, 13.99s/it] 12%|█▏ | 1212/10000 [4:45:42<34:07:52, 13.98s/it] {'loss': 0.5981, 'learning_rate': 4.3955e-05, 'epoch': 1.59} 12%|█▏ | 1212/10000 [4:45:42<34:07:52, 13.98s/it] 12%|█▏ | 1213/10000 [4:45:56<34:05:53, 13.97s/it] {'loss': 0.7856, 'learning_rate': 4.3950000000000004e-05, 'epoch': 1.59} 12%|█▏ | 1213/10000 [4:45:56<34:05:53, 13.97s/it] 12%|█▏ | 1214/10000 [4:46:10<34:07:33, 13.98s/it] {'loss': 0.6765, 'learning_rate': 4.3945e-05, 'epoch': 1.59} 12%|█▏ | 1214/10000 [4:46:10<34:07:33, 13.98s/it] 12%|█▏ | 1215/10000 [4:46:24<34:06:14, 13.98s/it] {'loss': 0.6675, 'learning_rate': 4.394e-05, 'epoch': 1.59} 12%|█▏ | 1215/10000 [4:46:24<34:06:14, 13.98s/it] 12%|█▏ | 1216/10000 [4:46:38<34:06:06, 13.98s/it] {'loss': 0.531, 'learning_rate': 4.3935000000000005e-05, 'epoch': 1.59} 12%|█▏ | 1216/10000 [4:46:38<34:06:06, 13.98s/it] 12%|█▏ | 1217/10000 [4:46:52<34:01:37, 13.95s/it] {'loss': 0.6272, 'learning_rate': 4.393e-05, 'epoch': 1.59} 12%|█▏ | 1217/10000 [4:46:52<34:01:37, 13.95s/it] 12%|█▏ | 1218/10000 [4:47:06<34:03:31, 13.96s/it] {'loss': 0.7354, 'learning_rate': 4.3925e-05, 'epoch': 1.59} 12%|█▏ | 1218/10000 [4:47:06<34:03:31, 13.96s/it] 12%|█▏ | 1219/10000 [4:47:20<34:04:26, 13.97s/it] {'loss': 0.7163, 'learning_rate': 4.392e-05, 'epoch': 1.6} 12%|█▏ | 1219/10000 [4:47:20<34:04:26, 13.97s/it] 12%|█▏ | 1220/10000 [4:47:34<34:10:44, 14.01s/it] {'loss': 0.5091, 'learning_rate': 4.3915e-05, 'epoch': 1.6} 12%|█▏ | 1220/10000 [4:47:34<34:10:44, 14.01s/it] 12%|█▏ | 1221/10000 [4:47:48<34:07:30, 13.99s/it] {'loss': 0.5118, 'learning_rate': 4.391e-05, 'epoch': 1.6} 12%|█▏ | 1221/10000 [4:47:48<34:07:30, 13.99s/it] 12%|█▏ | 1222/10000 [4:48:02<34:02:59, 13.96s/it] {'loss': 0.6357, 'learning_rate': 4.3905e-05, 'epoch': 1.6} 12%|█▏ | 1222/10000 [4:48:02<34:02:59, 13.96s/it] 12%|█▏ | 1223/10000 [4:48:15<33:58:00, 13.93s/it] {'loss': 0.6122, 'learning_rate': 4.39e-05, 'epoch': 1.6} 12%|█▏ | 1223/10000 [4:48:15<33:58:00, 13.93s/it] 12%|█▏ | 1224/10000 [4:48:29<33:56:11, 13.92s/it] {'loss': 0.5995, 'learning_rate': 4.3895000000000006e-05, 'epoch': 1.6} 12%|█▏ | 1224/10000 [4:48:29<33:56:11, 13.92s/it] 12%|█▏ | 1225/10000 [4:48:43<33:50:38, 13.88s/it] {'loss': 0.7743, 'learning_rate': 4.389e-05, 'epoch': 1.6} 12%|█▏ | 1225/10000 [4:48:43<33:50:38, 13.88s/it] 12%|█▏ | 1226/10000 [4:48:57<33:53:10, 13.90s/it] {'loss': 0.6441, 'learning_rate': 4.3885000000000004e-05, 'epoch': 1.6} 12%|█▏ | 1226/10000 [4:48:57<33:53:10, 13.90s/it] 12%|█▏ | 1227/10000 [4:49:11<34:02:16, 13.97s/it] {'loss': 0.6969, 'learning_rate': 4.388000000000001e-05, 'epoch': 1.61} 12%|█▏ | 1227/10000 [4:49:11<34:02:16, 13.97s/it] 12%|█▏ | 1228/10000 [4:49:25<34:05:31, 13.99s/it] {'loss': 0.8982, 'learning_rate': 4.3875e-05, 'epoch': 1.61} 12%|█▏ | 1228/10000 [4:49:25<34:05:31, 13.99s/it] 12%|█▏ | 1229/10000 [4:49:39<34:07:59, 14.01s/it] {'loss': 0.7527, 'learning_rate': 4.387e-05, 'epoch': 1.61} 12%|█▏ | 1229/10000 [4:49:39<34:07:59, 14.01s/it] 12%|█▏ | 1230/10000 [4:49:53<33:59:26, 13.95s/it] {'loss': 0.6091, 'learning_rate': 4.3865e-05, 'epoch': 1.61} 12%|█▏ | 1230/10000 [4:49:53<33:59:26, 13.95s/it] 12%|█▏ | 1231/10000 [4:50:07<33:58:49, 13.95s/it] {'loss': 0.6958, 'learning_rate': 4.3860000000000004e-05, 'epoch': 1.61} 12%|█▏ | 1231/10000 [4:50:07<33:58:49, 13.95s/it] 12%|█▏ | 1232/10000 [4:50:21<33:58:29, 13.95s/it] {'loss': 0.6484, 'learning_rate': 4.3855e-05, 'epoch': 1.61} 12%|█▏ | 1232/10000 [4:50:21<33:58:29, 13.95s/it] 12%|█▏ | 1233/10000 [4:50:35<34:02:56, 13.98s/it] {'loss': 0.6215, 'learning_rate': 4.385e-05, 'epoch': 1.61} 12%|█▏ | 1233/10000 [4:50:35<34:02:56, 13.98s/it] 12%|█▏ | 1234/10000 [4:50:49<33:59:10, 13.96s/it] {'loss': 0.6599, 'learning_rate': 4.3845000000000005e-05, 'epoch': 1.62} 12%|█▏ | 1234/10000 [4:50:49<33:59:10, 13.96s/it] 12%|█▏ | 1235/10000 [4:51:03<33:59:02, 13.96s/it] {'loss': 0.605, 'learning_rate': 4.384e-05, 'epoch': 1.62} 12%|█▏ | 1235/10000 [4:51:03<33:59:02, 13.96s/it] 12%|█▏ | 1236/10000 [4:51:17<33:54:39, 13.93s/it] {'loss': 0.4562, 'learning_rate': 4.3835e-05, 'epoch': 1.62} 12%|█▏ | 1236/10000 [4:51:17<33:54:39, 13.93s/it] 12%|█▏ | 1237/10000 [4:51:31<33:55:23, 13.94s/it] {'loss': 0.7233, 'learning_rate': 4.3830000000000006e-05, 'epoch': 1.62} 12%|█▏ | 1237/10000 [4:51:31<33:55:23, 13.94s/it] 12%|█▏ | 1238/10000 [4:51:45<33:53:08, 13.92s/it] {'loss': 0.6468, 'learning_rate': 4.3825e-05, 'epoch': 1.62} 12%|█▏ | 1238/10000 [4:51:45<33:53:08, 13.92s/it] 12%|█▏ | 1239/10000 [4:51:59<33:52:27, 13.92s/it] {'loss': 0.7029, 'learning_rate': 4.382e-05, 'epoch': 1.62} 12%|█▏ | 1239/10000 [4:51:59<33:52:27, 13.92s/it] 12%|█▏ | 1240/10000 [4:52:13<33:53:54, 13.93s/it] {'loss': 0.7883, 'learning_rate': 4.3815e-05, 'epoch': 1.62} 12%|█▏ | 1240/10000 [4:52:13<33:53:54, 13.93s/it] 12%|█▏ | 1241/10000 [4:52:26<33:49:46, 13.90s/it] {'loss': 0.5886, 'learning_rate': 4.381e-05, 'epoch': 1.62} 12%|█▏ | 1241/10000 [4:52:26<33:49:46, 13.90s/it] 12%|█▏ | 1242/10000 [4:52:40<33:51:54, 13.92s/it] {'loss': 0.6915, 'learning_rate': 4.3805000000000005e-05, 'epoch': 1.63} 12%|█▏ | 1242/10000 [4:52:40<33:51:54, 13.92s/it] 12%|█▏ | 1243/10000 [4:52:54<33:55:16, 13.95s/it] {'loss': 0.515, 'learning_rate': 4.38e-05, 'epoch': 1.63} 12%|█▏ | 1243/10000 [4:52:54<33:55:16, 13.95s/it] 12%|█▏ | 1244/10000 [4:53:08<34:02:15, 13.99s/it] {'loss': 0.5367, 'learning_rate': 4.3795e-05, 'epoch': 1.63} 12%|█▏ | 1244/10000 [4:53:08<34:02:15, 13.99s/it] 12%|█▏ | 1245/10000 [4:53:22<33:58:02, 13.97s/it] {'loss': 0.5799, 'learning_rate': 4.3790000000000006e-05, 'epoch': 1.63} 12%|█▏ | 1245/10000 [4:53:22<33:58:02, 13.97s/it] 12%|█▏ | 1246/10000 [4:53:36<33:55:04, 13.95s/it] {'loss': 0.7674, 'learning_rate': 4.3785e-05, 'epoch': 1.63} 12%|█▏ | 1246/10000 [4:53:36<33:55:04, 13.95s/it] 12%|█▏ | 1247/10000 [4:53:50<33:54:58, 13.95s/it] {'loss': 0.7472, 'learning_rate': 4.3780000000000004e-05, 'epoch': 1.63} 12%|█▏ | 1247/10000 [4:53:50<33:54:58, 13.95s/it] 12%|█▏ | 1248/10000 [4:54:04<34:02:43, 14.00s/it] {'loss': 0.624, 'learning_rate': 4.3775e-05, 'epoch': 1.63} 12%|█▏ | 1248/10000 [4:54:04<34:02:43, 14.00s/it] 12%|█▏ | 1249/10000 [4:54:18<34:03:43, 14.01s/it] {'loss': 0.5452, 'learning_rate': 4.377e-05, 'epoch': 1.63} 12%|█▏ | 1249/10000 [4:54:18<34:03:43, 14.01s/it] 12%|█▎ | 1250/10000 [4:54:32<34:03:50, 14.01s/it] {'loss': 0.5385, 'learning_rate': 4.3765e-05, 'epoch': 1.64} 12%|█▎ | 1250/10000 [4:54:32<34:03:50, 14.01s/it] 13%|█▎ | 1251/10000 [4:54:46<33:56:10, 13.96s/it] {'loss': 0.5707, 'learning_rate': 4.376e-05, 'epoch': 1.64} 13%|█▎ | 1251/10000 [4:54:46<33:56:10, 13.96s/it] 13%|█▎ | 1252/10000 [4:55:00<33:58:34, 13.98s/it] {'loss': 0.7019, 'learning_rate': 4.3755000000000004e-05, 'epoch': 1.64} 13%|█▎ | 1252/10000 [4:55:00<33:58:34, 13.98s/it] 13%|█▎ | 1253/10000 [4:55:14<33:45:31, 13.89s/it] {'loss': 0.6106, 'learning_rate': 4.375e-05, 'epoch': 1.64} 13%|█▎ | 1253/10000 [4:55:14<33:45:31, 13.89s/it] 13%|█▎ | 1254/10000 [4:55:28<33:55:55, 13.97s/it] {'loss': 0.7644, 'learning_rate': 4.3745e-05, 'epoch': 1.64} 13%|█▎ | 1254/10000 [4:55:28<33:55:55, 13.97s/it] 13%|█▎ | 1255/10000 [4:55:42<33:55:20, 13.96s/it] {'loss': 0.7192, 'learning_rate': 4.3740000000000005e-05, 'epoch': 1.64} 13%|█▎ | 1255/10000 [4:55:42<33:55:20, 13.96s/it] 13%|█▎ | 1256/10000 [4:55:56<33:55:27, 13.97s/it] {'loss': 0.7011, 'learning_rate': 4.3735e-05, 'epoch': 1.64} 13%|█▎ | 1256/10000 [4:55:56<33:55:27, 13.97s/it] 13%|█▎ | 1257/10000 [4:56:10<33:50:01, 13.93s/it] {'loss': 0.4966, 'learning_rate': 4.373e-05, 'epoch': 1.65} 13%|█▎ | 1257/10000 [4:56:10<33:50:01, 13.93s/it] 13%|█▎ | 1258/10000 [4:56:24<33:59:53, 14.00s/it] {'loss': 0.7213, 'learning_rate': 4.3725000000000006e-05, 'epoch': 1.65} 13%|█▎ | 1258/10000 [4:56:24<33:59:53, 14.00s/it] 13%|█▎ | 1259/10000 [4:56:38<33:52:43, 13.95s/it] {'loss': 0.6475, 'learning_rate': 4.372e-05, 'epoch': 1.65} 13%|█▎ | 1259/10000 [4:56:38<33:52:43, 13.95s/it] 13%|█▎ | 1260/10000 [4:56:52<33:48:58, 13.93s/it] {'loss': 0.6182, 'learning_rate': 4.3715e-05, 'epoch': 1.65} 13%|█▎ | 1260/10000 [4:56:52<33:48:58, 13.93s/it] 13%|█▎ | 1261/10000 [4:57:06<33:51:59, 13.95s/it] {'loss': 0.7578, 'learning_rate': 4.371e-05, 'epoch': 1.65} 13%|█▎ | 1261/10000 [4:57:06<33:51:59, 13.95s/it] 13%|█▎ | 1262/10000 [4:57:20<33:53:25, 13.96s/it] {'loss': 0.5094, 'learning_rate': 4.3705e-05, 'epoch': 1.65} 13%|█▎ | 1262/10000 [4:57:20<33:53:25, 13.96s/it] 13%|█▎ | 1263/10000 [4:57:34<33:54:32, 13.97s/it] {'loss': 0.4847, 'learning_rate': 4.3700000000000005e-05, 'epoch': 1.65} 13%|█▎ | 1263/10000 [4:57:34<33:54:32, 13.97s/it] 13%|█▎ | 1264/10000 [4:57:48<33:51:03, 13.95s/it] {'loss': 0.6145, 'learning_rate': 4.3695e-05, 'epoch': 1.65} 13%|█▎ | 1264/10000 [4:57:48<33:51:03, 13.95s/it] 13%|█▎ | 1265/10000 [4:58:02<33:53:41, 13.97s/it] {'loss': 0.6436, 'learning_rate': 4.3690000000000004e-05, 'epoch': 1.66} 13%|█▎ | 1265/10000 [4:58:02<33:53:41, 13.97s/it] 13%|█▎ | 1266/10000 [4:58:16<33:55:31, 13.98s/it] {'loss': 0.6649, 'learning_rate': 4.3685000000000006e-05, 'epoch': 1.66} 13%|█▎ | 1266/10000 [4:58:16<33:55:31, 13.98s/it] 13%|█▎ | 1267/10000 [4:58:30<33:51:09, 13.96s/it] {'loss': 0.6523, 'learning_rate': 4.368e-05, 'epoch': 1.66} 13%|█▎ | 1267/10000 [4:58:30<33:51:09, 13.96s/it] 13%|█▎ | 1268/10000 [4:58:44<33:52:42, 13.97s/it] {'loss': 0.5292, 'learning_rate': 4.3675000000000005e-05, 'epoch': 1.66} 13%|█▎ | 1268/10000 [4:58:44<33:52:42, 13.97s/it] 13%|█▎ | 1269/10000 [4:58:57<33:48:59, 13.94s/it] {'loss': 0.5953, 'learning_rate': 4.367e-05, 'epoch': 1.66} 13%|█▎ | 1269/10000 [4:58:57<33:48:59, 13.94s/it] 13%|█▎ | 1270/10000 [4:59:11<33:48:37, 13.94s/it] {'loss': 0.6079, 'learning_rate': 4.3665e-05, 'epoch': 1.66} 13%|█▎ | 1270/10000 [4:59:11<33:48:37, 13.94s/it] 13%|█▎ | 1271/10000 [4:59:25<33:43:30, 13.91s/it] {'loss': 0.7093, 'learning_rate': 4.366e-05, 'epoch': 1.66} 13%|█▎ | 1271/10000 [4:59:25<33:43:30, 13.91s/it] 13%|█▎ | 1272/10000 [4:59:39<33:45:07, 13.92s/it] {'loss': 0.6169, 'learning_rate': 4.3655e-05, 'epoch': 1.66} 13%|█▎ | 1272/10000 [4:59:39<33:45:07, 13.92s/it] 13%|█▎ | 1273/10000 [4:59:53<33:44:12, 13.92s/it] {'loss': 0.6944, 'learning_rate': 4.3650000000000004e-05, 'epoch': 1.67} 13%|█▎ | 1273/10000 [4:59:53<33:44:12, 13.92s/it] 13%|█▎ | 1274/10000 [5:00:07<33:41:55, 13.90s/it] {'loss': 0.5184, 'learning_rate': 4.3645e-05, 'epoch': 1.67} 13%|█▎ | 1274/10000 [5:00:07<33:41:55, 13.90s/it] 13%|█▎ | 1275/10000 [5:00:21<33:44:16, 13.92s/it] {'loss': 0.7321, 'learning_rate': 4.364e-05, 'epoch': 1.67} 13%|█▎ | 1275/10000 [5:00:21<33:44:16, 13.92s/it] 13%|█▎ | 1276/10000 [5:00:35<33:52:23, 13.98s/it] {'loss': 0.7252, 'learning_rate': 4.3635000000000005e-05, 'epoch': 1.67} 13%|█▎ | 1276/10000 [5:00:35<33:52:23, 13.98s/it] 13%|█▎ | 1277/10000 [5:00:49<33:49:38, 13.96s/it] {'loss': 0.6179, 'learning_rate': 4.363000000000001e-05, 'epoch': 1.67} 13%|█▎ | 1277/10000 [5:00:49<33:49:38, 13.96s/it] 13%|█▎ | 1278/10000 [5:01:03<33:46:45, 13.94s/it] {'loss': 0.8616, 'learning_rate': 4.3625e-05, 'epoch': 1.67} 13%|█▎ | 1278/10000 [5:01:03<33:46:45, 13.94s/it] 13%|█▎ | 1279/10000 [5:01:17<33:41:23, 13.91s/it] {'loss': 0.5379, 'learning_rate': 4.362e-05, 'epoch': 1.67} 13%|█▎ | 1279/10000 [5:01:17<33:41:23, 13.91s/it] 13%|█▎ | 1280/10000 [5:01:31<33:43:05, 13.92s/it] {'loss': 0.6418, 'learning_rate': 4.3615e-05, 'epoch': 1.68} 13%|█▎ | 1280/10000 [5:01:31<33:43:05, 13.92s/it] 13%|█▎ | 1281/10000 [5:01:45<33:48:04, 13.96s/it] {'loss': 0.5853, 'learning_rate': 4.361e-05, 'epoch': 1.68} 13%|█▎ | 1281/10000 [5:01:45<33:48:04, 13.96s/it] 13%|█▎ | 1282/10000 [5:01:58<33:44:42, 13.93s/it] {'loss': 0.5811, 'learning_rate': 4.3605e-05, 'epoch': 1.68} 13%|█▎ | 1282/10000 [5:01:59<33:44:42, 13.93s/it] 13%|█▎ | 1283/10000 [5:02:12<33:42:05, 13.92s/it] {'loss': 0.7042, 'learning_rate': 4.36e-05, 'epoch': 1.68} 13%|█▎ | 1283/10000 [5:02:12<33:42:05, 13.92s/it] 13%|█▎ | 1284/10000 [5:02:26<33:44:04, 13.93s/it] {'loss': 0.5282, 'learning_rate': 4.3595000000000005e-05, 'epoch': 1.68} 13%|█▎ | 1284/10000 [5:02:26<33:44:04, 13.93s/it] 13%|█▎ | 1285/10000 [5:02:40<33:45:07, 13.94s/it] {'loss': 0.5164, 'learning_rate': 4.359e-05, 'epoch': 1.68} 13%|█▎ | 1285/10000 [5:02:40<33:45:07, 13.94s/it] 13%|█▎ | 1286/10000 [5:02:54<33:45:48, 13.95s/it] {'loss': 0.5158, 'learning_rate': 4.3585000000000004e-05, 'epoch': 1.68} 13%|█▎ | 1286/10000 [5:02:54<33:45:48, 13.95s/it] 13%|█▎ | 1287/10000 [5:03:08<33:44:52, 13.94s/it] {'loss': 0.7134, 'learning_rate': 4.3580000000000006e-05, 'epoch': 1.68} 13%|█▎ | 1287/10000 [5:03:08<33:44:52, 13.94s/it] 13%|█▎ | 1288/10000 [5:03:22<33:42:17, 13.93s/it] {'loss': 0.8047, 'learning_rate': 4.3575e-05, 'epoch': 1.69} 13%|█▎ | 1288/10000 [5:03:22<33:42:17, 13.93s/it] 13%|█▎ | 1289/10000 [5:03:36<33:42:20, 13.93s/it] {'loss': 0.6878, 'learning_rate': 4.357e-05, 'epoch': 1.69} 13%|█▎ | 1289/10000 [5:03:36<33:42:20, 13.93s/it] 13%|█▎ | 1290/10000 [5:03:50<33:44:53, 13.95s/it] {'loss': 0.5307, 'learning_rate': 4.3565e-05, 'epoch': 1.69} 13%|█▎ | 1290/10000 [5:03:50<33:44:53, 13.95s/it] 13%|█▎ | 1291/10000 [5:04:04<33:43:31, 13.94s/it] {'loss': 0.6064, 'learning_rate': 4.356e-05, 'epoch': 1.69} 13%|█▎ | 1291/10000 [5:04:04<33:43:31, 13.94s/it] 13%|█▎ | 1292/10000 [5:04:18<33:46:42, 13.96s/it] {'loss': 0.7875, 'learning_rate': 4.3555e-05, 'epoch': 1.69} 13%|█▎ | 1292/10000 [5:04:18<33:46:42, 13.96s/it] 13%|█▎ | 1293/10000 [5:04:32<33:43:31, 13.94s/it] {'loss': 0.7766, 'learning_rate': 4.355e-05, 'epoch': 1.69} 13%|█▎ | 1293/10000 [5:04:32<33:43:31, 13.94s/it] 13%|█▎ | 1294/10000 [5:04:46<33:43:04, 13.94s/it] {'loss': 0.6484, 'learning_rate': 4.3545000000000004e-05, 'epoch': 1.69} 13%|█▎ | 1294/10000 [5:04:46<33:43:04, 13.94s/it][2024-11-04 01:23:07,320] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 13%|█▎ | 1295/10000 [5:04:59<33:13:32, 13.74s/it] {'loss': 0.6046, 'learning_rate': 4.3545000000000004e-05, 'epoch': 1.7} 13%|█▎ | 1295/10000 [5:04:59<33:13:32, 13.74s/it] 13%|█▎ | 1296/10000 [5:05:13<33:19:19, 13.78s/it] {'loss': 0.6554, 'learning_rate': 4.354e-05, 'epoch': 1.7} 13%|█▎ | 1296/10000 [5:05:13<33:19:19, 13.78s/it] 13%|█▎ | 1297/10000 [5:05:27<33:28:59, 13.85s/it] {'loss': 0.6629, 'learning_rate': 4.3535e-05, 'epoch': 1.7} 13%|█▎ | 1297/10000 [5:05:27<33:28:59, 13.85s/it] 13%|█▎ | 1298/10000 [5:05:41<33:29:59, 13.86s/it] {'loss': 0.6729, 'learning_rate': 4.3530000000000005e-05, 'epoch': 1.7} 13%|█▎ | 1298/10000 [5:05:41<33:29:59, 13.86s/it] 13%|█▎ | 1299/10000 [5:05:55<33:38:10, 13.92s/it] {'loss': 0.6907, 'learning_rate': 4.352500000000001e-05, 'epoch': 1.7} 13%|█▎ | 1299/10000 [5:05:55<33:38:10, 13.92s/it] 13%|█▎ | 1300/10000 [5:06:09<33:41:18, 13.94s/it] {'loss': 0.605, 'learning_rate': 4.352e-05, 'epoch': 1.7} 13%|█▎ | 1300/10000 [5:06:09<33:41:18, 13.94s/it] 13%|█▎ | 1301/10000 [5:06:23<33:42:04, 13.95s/it] {'loss': 0.8584, 'learning_rate': 4.3515e-05, 'epoch': 1.7} 13%|█▎ | 1301/10000 [5:06:23<33:42:04, 13.95s/it] 13%|█▎ | 1302/10000 [5:06:37<33:34:01, 13.89s/it] {'loss': 0.6486, 'learning_rate': 4.351e-05, 'epoch': 1.7} 13%|█▎ | 1302/10000 [5:06:37<33:34:01, 13.89s/it] 13%|█▎ | 1303/10000 [5:06:51<33:37:02, 13.92s/it] {'loss': 0.5695, 'learning_rate': 4.3505000000000004e-05, 'epoch': 1.71} 13%|█▎ | 1303/10000 [5:06:51<33:37:02, 13.92s/it] 13%|█▎ | 1304/10000 [5:07:05<33:39:26, 13.93s/it] {'loss': 0.6643, 'learning_rate': 4.35e-05, 'epoch': 1.71} 13%|█▎ | 1304/10000 [5:07:05<33:39:26, 13.93s/it] 13%|█▎ | 1305/10000 [5:07:19<33:42:16, 13.95s/it] {'loss': 0.6483, 'learning_rate': 4.3495e-05, 'epoch': 1.71} 13%|█▎ | 1305/10000 [5:07:19<33:42:16, 13.95s/it] 13%|█▎ | 1306/10000 [5:07:33<33:43:25, 13.96s/it] {'loss': 0.5908, 'learning_rate': 4.3490000000000005e-05, 'epoch': 1.71} 13%|█▎ | 1306/10000 [5:07:33<33:43:25, 13.96s/it] 13%|█▎ | 1307/10000 [5:07:46<33:36:42, 13.92s/it] {'loss': 0.4829, 'learning_rate': 4.3485e-05, 'epoch': 1.71} 13%|█▎ | 1307/10000 [5:07:46<33:36:42, 13.92s/it] 13%|█▎ | 1308/10000 [5:08:00<33:38:51, 13.94s/it] {'loss': 0.6889, 'learning_rate': 4.3480000000000004e-05, 'epoch': 1.71} 13%|█▎ | 1308/10000 [5:08:00<33:38:51, 13.94s/it] 13%|█▎ | 1309/10000 [5:08:14<33:38:48, 13.94s/it] {'loss': 0.5604, 'learning_rate': 4.3475000000000006e-05, 'epoch': 1.71} 13%|█▎ | 1309/10000 [5:08:14<33:38:48, 13.94s/it] 13%|█▎ | 1310/10000 [5:08:28<33:38:09, 13.93s/it] {'loss': 0.5515, 'learning_rate': 4.347e-05, 'epoch': 1.71} 13%|█▎ | 1310/10000 [5:08:28<33:38:09, 13.93s/it] 13%|█▎ | 1311/10000 [5:08:42<33:38:11, 13.94s/it] {'loss': 0.6175, 'learning_rate': 4.3465e-05, 'epoch': 1.72} 13%|█▎ | 1311/10000 [5:08:42<33:38:11, 13.94s/it] 13%|█▎ | 1312/10000 [5:08:56<33:37:10, 13.93s/it] {'loss': 0.5928, 'learning_rate': 4.346e-05, 'epoch': 1.72} 13%|█▎ | 1312/10000 [5:08:56<33:37:10, 13.93s/it] 13%|█▎ | 1313/10000 [5:09:10<33:43:08, 13.97s/it] {'loss': 0.6545, 'learning_rate': 4.3455e-05, 'epoch': 1.72} 13%|█▎ | 1313/10000 [5:09:10<33:43:08, 13.97s/it] 13%|█▎ | 1314/10000 [5:09:24<33:48:42, 14.01s/it] {'loss': 0.72, 'learning_rate': 4.345e-05, 'epoch': 1.72} 13%|█▎ | 1314/10000 [5:09:24<33:48:42, 14.01s/it] 13%|█▎ | 1315/10000 [5:09:38<33:44:18, 13.98s/it] {'loss': 0.5217, 'learning_rate': 4.3445e-05, 'epoch': 1.72} 13%|█▎ | 1315/10000 [5:09:38<33:44:18, 13.98s/it] 13%|█▎ | 1316/10000 [5:09:52<33:45:02, 13.99s/it] {'loss': 0.6959, 'learning_rate': 4.3440000000000004e-05, 'epoch': 1.72} 13%|█▎ | 1316/10000 [5:09:52<33:45:02, 13.99s/it] 13%|█▎ | 1317/10000 [5:10:06<33:37:54, 13.94s/it] {'loss': 0.6716, 'learning_rate': 4.343500000000001e-05, 'epoch': 1.72} 13%|█▎ | 1317/10000 [5:10:06<33:37:54, 13.94s/it] 13%|█▎ | 1318/10000 [5:10:20<33:41:49, 13.97s/it] {'loss': 0.5625, 'learning_rate': 4.343e-05, 'epoch': 1.73} 13%|█▎ | 1318/10000 [5:10:20<33:41:49, 13.97s/it] 13%|█▎ | 1319/10000 [5:10:34<33:36:11, 13.94s/it] {'loss': 0.7862, 'learning_rate': 4.3425000000000005e-05, 'epoch': 1.73} 13%|█▎ | 1319/10000 [5:10:34<33:36:11, 13.94s/it] 13%|█▎ | 1320/10000 [5:10:48<33:43:45, 13.99s/it] {'loss': 0.7901, 'learning_rate': 4.342e-05, 'epoch': 1.73} 13%|█▎ | 1320/10000 [5:10:48<33:43:45, 13.99s/it] 13%|█▎ | 1321/10000 [5:11:02<33:36:44, 13.94s/it] {'loss': 0.806, 'learning_rate': 4.3415e-05, 'epoch': 1.73} 13%|█▎ | 1321/10000 [5:11:02<33:36:44, 13.94s/it] 13%|█▎ | 1322/10000 [5:11:16<33:40:39, 13.97s/it] {'loss': 0.4893, 'learning_rate': 4.341e-05, 'epoch': 1.73} 13%|█▎ | 1322/10000 [5:11:16<33:40:39, 13.97s/it] 13%|█▎ | 1323/10000 [5:11:30<33:35:12, 13.93s/it] {'loss': 0.7184, 'learning_rate': 4.3405e-05, 'epoch': 1.73} 13%|█▎ | 1323/10000 [5:11:30<33:35:12, 13.93s/it] 13%|█▎ | 1324/10000 [5:11:44<33:34:31, 13.93s/it] {'loss': 0.5296, 'learning_rate': 4.3400000000000005e-05, 'epoch': 1.73} 13%|█▎ | 1324/10000 [5:11:44<33:34:31, 13.93s/it] 13%|█▎ | 1325/10000 [5:11:58<33:33:08, 13.92s/it] {'loss': 0.5851, 'learning_rate': 4.3395e-05, 'epoch': 1.73} 13%|█▎ | 1325/10000 [5:11:58<33:33:08, 13.92s/it] 13%|█▎ | 1326/10000 [5:12:12<33:37:13, 13.95s/it] {'loss': 0.5971, 'learning_rate': 4.339e-05, 'epoch': 1.74} 13%|█▎ | 1326/10000 [5:12:12<33:37:13, 13.95s/it] 13%|█▎ | 1327/10000 [5:12:25<33:30:27, 13.91s/it] {'loss': 0.6414, 'learning_rate': 4.3385000000000006e-05, 'epoch': 1.74} 13%|█▎ | 1327/10000 [5:12:25<33:30:27, 13.91s/it] 13%|█▎ | 1328/10000 [5:12:39<33:31:53, 13.92s/it] {'loss': 0.7059, 'learning_rate': 4.338e-05, 'epoch': 1.74} 13%|█▎ | 1328/10000 [5:12:39<33:31:53, 13.92s/it] 13%|█▎ | 1329/10000 [5:12:53<33:30:56, 13.91s/it] {'loss': 0.617, 'learning_rate': 4.3375000000000004e-05, 'epoch': 1.74} 13%|█▎ | 1329/10000 [5:12:53<33:30:56, 13.91s/it] 13%|█▎ | 1330/10000 [5:13:07<33:31:43, 13.92s/it] {'loss': 0.5895, 'learning_rate': 4.337e-05, 'epoch': 1.74} 13%|█▎ | 1330/10000 [5:13:07<33:31:43, 13.92s/it] 13%|█▎ | 1331/10000 [5:13:21<33:30:41, 13.92s/it] {'loss': 0.7116, 'learning_rate': 4.3365e-05, 'epoch': 1.74} 13%|█▎ | 1331/10000 [5:13:21<33:30:41, 13.92s/it] 13%|█▎ | 1332/10000 [5:13:35<33:33:02, 13.93s/it] {'loss': 0.6014, 'learning_rate': 4.336e-05, 'epoch': 1.74} 13%|█▎ | 1332/10000 [5:13:35<33:33:02, 13.93s/it] 13%|█▎ | 1333/10000 [5:13:49<33:32:45, 13.93s/it] {'loss': 0.599, 'learning_rate': 4.3355e-05, 'epoch': 1.74} 13%|█▎ | 1333/10000 [5:13:49<33:32:45, 13.93s/it] 13%|█▎ | 1334/10000 [5:14:03<33:30:29, 13.92s/it] {'loss': 0.6602, 'learning_rate': 4.335e-05, 'epoch': 1.75} 13%|█▎ | 1334/10000 [5:14:03<33:30:29, 13.92s/it] 13%|█▎ | 1335/10000 [5:14:17<33:33:35, 13.94s/it] {'loss': 0.5847, 'learning_rate': 4.3345e-05, 'epoch': 1.75} 13%|█▎ | 1335/10000 [5:14:17<33:33:35, 13.94s/it] 13%|█▎ | 1336/10000 [5:14:31<33:37:12, 13.97s/it] {'loss': 0.5747, 'learning_rate': 4.334e-05, 'epoch': 1.75} 13%|█▎ | 1336/10000 [5:14:31<33:37:12, 13.97s/it] 13%|█▎ | 1337/10000 [5:14:45<33:39:29, 13.99s/it] {'loss': 0.5997, 'learning_rate': 4.3335000000000004e-05, 'epoch': 1.75} 13%|█▎ | 1337/10000 [5:14:45<33:39:29, 13.99s/it] 13%|█▎ | 1338/10000 [5:14:59<33:36:54, 13.97s/it] {'loss': 0.6738, 'learning_rate': 4.333000000000001e-05, 'epoch': 1.75} 13%|█▎ | 1338/10000 [5:14:59<33:36:54, 13.97s/it] 13%|█▎ | 1339/10000 [5:15:13<33:37:41, 13.98s/it] {'loss': 0.8249, 'learning_rate': 4.3325e-05, 'epoch': 1.75} 13%|█▎ | 1339/10000 [5:15:13<33:37:41, 13.98s/it] 13%|█▎ | 1340/10000 [5:15:27<33:36:21, 13.97s/it] {'loss': 0.5425, 'learning_rate': 4.332e-05, 'epoch': 1.75} 13%|█▎ | 1340/10000 [5:15:27<33:36:21, 13.97s/it] 13%|█▎ | 1341/10000 [5:15:41<33:36:36, 13.97s/it] {'loss': 0.6793, 'learning_rate': 4.3315e-05, 'epoch': 1.76} 13%|█▎ | 1341/10000 [5:15:41<33:36:36, 13.97s/it] 13%|█▎ | 1342/10000 [5:15:55<33:30:43, 13.93s/it] {'loss': 0.5728, 'learning_rate': 4.3310000000000004e-05, 'epoch': 1.76} 13%|█▎ | 1342/10000 [5:15:55<33:30:43, 13.93s/it] 13%|█▎ | 1343/10000 [5:16:08<33:24:55, 13.90s/it] {'loss': 0.639, 'learning_rate': 4.3305e-05, 'epoch': 1.76} 13%|█▎ | 1343/10000 [5:16:08<33:24:55, 13.90s/it] 13%|█▎ | 1344/10000 [5:16:22<33:31:06, 13.94s/it] {'loss': 0.5369, 'learning_rate': 4.33e-05, 'epoch': 1.76} 13%|█▎ | 1344/10000 [5:16:23<33:31:06, 13.94s/it] 13%|█▎ | 1345/10000 [5:16:36<33:29:44, 13.93s/it] {'loss': 0.4427, 'learning_rate': 4.3295000000000005e-05, 'epoch': 1.76} 13%|█▎ | 1345/10000 [5:16:36<33:29:44, 13.93s/it] 13%|█▎ | 1346/10000 [5:16:50<33:30:43, 13.94s/it] {'loss': 0.5626, 'learning_rate': 4.329e-05, 'epoch': 1.76} 13%|█▎ | 1346/10000 [5:16:50<33:30:43, 13.94s/it] 13%|█▎ | 1347/10000 [5:17:04<33:38:15, 13.99s/it] {'loss': 0.6742, 'learning_rate': 4.3285e-05, 'epoch': 1.76} 13%|█▎ | 1347/10000 [5:17:05<33:38:15, 13.99s/it] 13%|█▎ | 1348/10000 [5:17:18<33:38:46, 14.00s/it] {'loss': 0.6324, 'learning_rate': 4.3280000000000006e-05, 'epoch': 1.76} 13%|█▎ | 1348/10000 [5:17:19<33:38:46, 14.00s/it] 13%|█▎ | 1349/10000 [5:17:33<33:41:48, 14.02s/it] {'loss': 0.5496, 'learning_rate': 4.3275e-05, 'epoch': 1.77} 13%|█▎ | 1349/10000 [5:17:33<33:41:48, 14.02s/it] 14%|█▎ | 1350/10000 [5:17:47<33:38:12, 14.00s/it] {'loss': 0.591, 'learning_rate': 4.327e-05, 'epoch': 1.77} 14%|█▎ | 1350/10000 [5:17:47<33:38:12, 14.00s/it] 14%|█▎ | 1351/10000 [5:18:00<33:36:42, 13.99s/it] {'loss': 0.7729, 'learning_rate': 4.3265e-05, 'epoch': 1.77} 14%|█▎ | 1351/10000 [5:18:01<33:36:42, 13.99s/it] 14%|█▎ | 1352/10000 [5:18:14<33:33:00, 13.97s/it] {'loss': 0.6834, 'learning_rate': 4.326e-05, 'epoch': 1.77} 14%|█▎ | 1352/10000 [5:18:14<33:33:00, 13.97s/it] 14%|█▎ | 1353/10000 [5:18:28<33:36:18, 13.99s/it] {'loss': 0.5762, 'learning_rate': 4.3255e-05, 'epoch': 1.77} 14%|█▎ | 1353/10000 [5:18:28<33:36:18, 13.99s/it] 14%|█▎ | 1354/10000 [5:18:42<33:36:17, 13.99s/it] {'loss': 0.5916, 'learning_rate': 4.325e-05, 'epoch': 1.77} 14%|█▎ | 1354/10000 [5:18:42<33:36:17, 13.99s/it] 14%|█▎ | 1355/10000 [5:18:56<33:35:41, 13.99s/it] {'loss': 0.5055, 'learning_rate': 4.3245000000000004e-05, 'epoch': 1.77} 14%|█▎ | 1355/10000 [5:18:56<33:35:41, 13.99s/it] 14%|█▎ | 1356/10000 [5:19:10<33:33:49, 13.98s/it] {'loss': 0.7113, 'learning_rate': 4.324e-05, 'epoch': 1.77} 14%|█▎ | 1356/10000 [5:19:10<33:33:49, 13.98s/it] 14%|█▎ | 1357/10000 [5:19:24<33:34:47, 13.99s/it] {'loss': 0.6245, 'learning_rate': 4.3235e-05, 'epoch': 1.78} 14%|█▎ | 1357/10000 [5:19:24<33:34:47, 13.99s/it] 14%|█▎ | 1358/10000 [5:19:38<33:36:18, 14.00s/it] {'loss': 0.5859, 'learning_rate': 4.3230000000000005e-05, 'epoch': 1.78} 14%|█▎ | 1358/10000 [5:19:38<33:36:18, 14.00s/it] 14%|█▎ | 1359/10000 [5:19:52<33:32:57, 13.98s/it] {'loss': 0.4327, 'learning_rate': 4.322500000000001e-05, 'epoch': 1.78} 14%|█▎ | 1359/10000 [5:19:52<33:32:57, 13.98s/it] 14%|█▎ | 1360/10000 [5:20:06<33:25:59, 13.93s/it] {'loss': 0.5536, 'learning_rate': 4.3219999999999996e-05, 'epoch': 1.78} 14%|█▎ | 1360/10000 [5:20:06<33:25:59, 13.93s/it] 14%|█▎ | 1361/10000 [5:20:20<33:36:24, 14.00s/it] {'loss': 0.5287, 'learning_rate': 4.3215e-05, 'epoch': 1.78} 14%|█▎ | 1361/10000 [5:20:20<33:36:24, 14.00s/it] 14%|█▎ | 1362/10000 [5:20:34<33:34:40, 13.99s/it] {'loss': 0.6322, 'learning_rate': 4.321e-05, 'epoch': 1.78} 14%|█▎ | 1362/10000 [5:20:34<33:34:40, 13.99s/it] 14%|█▎ | 1363/10000 [5:20:48<33:31:35, 13.97s/it] {'loss': 0.6076, 'learning_rate': 4.3205000000000004e-05, 'epoch': 1.78} 14%|█▎ | 1363/10000 [5:20:48<33:31:35, 13.97s/it] 14%|█▎ | 1364/10000 [5:21:02<33:27:56, 13.95s/it] {'loss': 0.5162, 'learning_rate': 4.32e-05, 'epoch': 1.79} 14%|█▎ | 1364/10000 [5:21:02<33:27:56, 13.95s/it] 14%|█▎ | 1365/10000 [5:21:16<33:27:25, 13.95s/it] {'loss': 0.4868, 'learning_rate': 4.3195e-05, 'epoch': 1.79} 14%|█▎ | 1365/10000 [5:21:16<33:27:25, 13.95s/it] 14%|█▎ | 1366/10000 [5:21:30<33:26:11, 13.94s/it] {'loss': 0.6183, 'learning_rate': 4.3190000000000005e-05, 'epoch': 1.79} 14%|█▎ | 1366/10000 [5:21:30<33:26:11, 13.94s/it] 14%|█▎ | 1367/10000 [5:21:44<33:26:55, 13.95s/it] {'loss': 0.606, 'learning_rate': 4.3185e-05, 'epoch': 1.79} 14%|█▎ | 1367/10000 [5:21:44<33:26:55, 13.95s/it] 14%|█▎ | 1368/10000 [5:21:58<33:27:02, 13.95s/it] {'loss': 0.6122, 'learning_rate': 4.318e-05, 'epoch': 1.79} 14%|█▎ | 1368/10000 [5:21:58<33:27:02, 13.95s/it] 14%|█▎ | 1369/10000 [5:22:12<33:22:00, 13.92s/it] {'loss': 0.5956, 'learning_rate': 4.3175000000000006e-05, 'epoch': 1.79} 14%|█▎ | 1369/10000 [5:22:12<33:22:00, 13.92s/it] 14%|█▎ | 1370/10000 [5:22:26<33:23:11, 13.93s/it] {'loss': 0.7241, 'learning_rate': 4.317e-05, 'epoch': 1.79} 14%|█▎ | 1370/10000 [5:22:26<33:23:11, 13.93s/it] 14%|█▎ | 1371/10000 [5:22:40<33:25:08, 13.94s/it] {'loss': 0.6037, 'learning_rate': 4.3165e-05, 'epoch': 1.79} 14%|█▎ | 1371/10000 [5:22:40<33:25:08, 13.94s/it] 14%|█▎ | 1372/10000 [5:22:54<33:23:14, 13.93s/it] {'loss': 0.5904, 'learning_rate': 4.316e-05, 'epoch': 1.8} 14%|█▎ | 1372/10000 [5:22:54<33:23:14, 13.93s/it] 14%|█▎ | 1373/10000 [5:23:07<33:19:44, 13.91s/it] {'loss': 0.6897, 'learning_rate': 4.3155e-05, 'epoch': 1.8} 14%|█▎ | 1373/10000 [5:23:07<33:19:44, 13.91s/it] 14%|█▎ | 1374/10000 [5:23:21<33:14:39, 13.87s/it] {'loss': 0.5639, 'learning_rate': 4.315e-05, 'epoch': 1.8} 14%|█▎ | 1374/10000 [5:23:21<33:14:39, 13.87s/it] 14%|█▍ | 1375/10000 [5:23:35<33:21:25, 13.92s/it] {'loss': 0.5964, 'learning_rate': 4.3145e-05, 'epoch': 1.8} 14%|█▍ | 1375/10000 [5:23:35<33:21:25, 13.92s/it] 14%|█▍ | 1376/10000 [5:23:49<33:22:19, 13.93s/it] {'loss': 0.6373, 'learning_rate': 4.3140000000000004e-05, 'epoch': 1.8} 14%|█▍ | 1376/10000 [5:23:49<33:22:19, 13.93s/it] 14%|█▍ | 1377/10000 [5:24:03<33:22:27, 13.93s/it] {'loss': 0.6552, 'learning_rate': 4.3135000000000006e-05, 'epoch': 1.8} 14%|█▍ | 1377/10000 [5:24:03<33:22:27, 13.93s/it] 14%|█▍ | 1378/10000 [5:24:17<33:19:34, 13.91s/it] {'loss': 0.6531, 'learning_rate': 4.313e-05, 'epoch': 1.8} 14%|█▍ | 1378/10000 [5:24:17<33:19:34, 13.91s/it] 14%|█▍ | 1379/10000 [5:24:31<33:29:57, 13.99s/it] {'loss': 0.6091, 'learning_rate': 4.3125000000000005e-05, 'epoch': 1.8} 14%|█▍ | 1379/10000 [5:24:31<33:29:57, 13.99s/it] 14%|█▍ | 1380/10000 [5:24:45<33:31:46, 14.00s/it] {'loss': 0.6423, 'learning_rate': 4.312000000000001e-05, 'epoch': 1.81} 14%|█▍ | 1380/10000 [5:24:45<33:31:46, 14.00s/it] 14%|█▍ | 1381/10000 [5:24:59<33:29:59, 13.99s/it] {'loss': 0.6043, 'learning_rate': 4.3115e-05, 'epoch': 1.81} 14%|█▍ | 1381/10000 [5:24:59<33:29:59, 13.99s/it] 14%|█▍ | 1382/10000 [5:25:13<33:27:30, 13.98s/it] {'loss': 0.5102, 'learning_rate': 4.311e-05, 'epoch': 1.81} 14%|█▍ | 1382/10000 [5:25:13<33:27:30, 13.98s/it] 14%|█▍ | 1383/10000 [5:25:27<33:21:54, 13.94s/it] {'loss': 0.6609, 'learning_rate': 4.3105e-05, 'epoch': 1.81} 14%|█▍ | 1383/10000 [5:25:27<33:21:54, 13.94s/it] 14%|█▍ | 1384/10000 [5:25:41<33:23:14, 13.95s/it] {'loss': 0.581, 'learning_rate': 4.3100000000000004e-05, 'epoch': 1.81} 14%|█▍ | 1384/10000 [5:25:41<33:23:14, 13.95s/it] 14%|█▍ | 1385/10000 [5:25:55<33:20:41, 13.93s/it] {'loss': 0.6147, 'learning_rate': 4.3095e-05, 'epoch': 1.81} 14%|█▍ | 1385/10000 [5:25:55<33:20:41, 13.93s/it] 14%|█▍ | 1386/10000 [5:26:09<33:25:22, 13.97s/it] {'loss': 0.7118, 'learning_rate': 4.309e-05, 'epoch': 1.81} 14%|█▍ | 1386/10000 [5:26:09<33:25:22, 13.97s/it] 14%|█▍ | 1387/10000 [5:26:23<33:28:27, 13.99s/it] {'loss': 0.7869, 'learning_rate': 4.3085000000000005e-05, 'epoch': 1.82} 14%|█▍ | 1387/10000 [5:26:23<33:28:27, 13.99s/it] 14%|█▍ | 1388/10000 [5:26:37<33:24:45, 13.97s/it] {'loss': 0.6782, 'learning_rate': 4.308e-05, 'epoch': 1.82} 14%|█▍ | 1388/10000 [5:26:37<33:24:45, 13.97s/it] 14%|█▍ | 1389/10000 [5:26:51<33:27:21, 13.99s/it] {'loss': 0.6579, 'learning_rate': 4.3075000000000003e-05, 'epoch': 1.82} 14%|█▍ | 1389/10000 [5:26:51<33:27:21, 13.99s/it] 14%|█▍ | 1390/10000 [5:27:05<33:22:28, 13.95s/it] {'loss': 0.6512, 'learning_rate': 4.3070000000000006e-05, 'epoch': 1.82} 14%|█▍ | 1390/10000 [5:27:05<33:22:28, 13.95s/it] 14%|█▍ | 1391/10000 [5:27:19<33:20:24, 13.94s/it] {'loss': 0.8332, 'learning_rate': 4.3065e-05, 'epoch': 1.82} 14%|█▍ | 1391/10000 [5:27:19<33:20:24, 13.94s/it] 14%|█▍ | 1392/10000 [5:27:33<33:23:24, 13.96s/it] {'loss': 0.589, 'learning_rate': 4.306e-05, 'epoch': 1.82} 14%|█▍ | 1392/10000 [5:27:33<33:23:24, 13.96s/it] 14%|█▍ | 1393/10000 [5:27:47<33:23:07, 13.96s/it] {'loss': 0.6721, 'learning_rate': 4.3055e-05, 'epoch': 1.82} 14%|█▍ | 1393/10000 [5:27:47<33:23:07, 13.96s/it] 14%|█▍ | 1394/10000 [5:28:01<33:19:48, 13.94s/it] {'loss': 0.6522, 'learning_rate': 4.305e-05, 'epoch': 1.82} 14%|█▍ | 1394/10000 [5:28:01<33:19:48, 13.94s/it] 14%|█▍ | 1395/10000 [5:28:14<33:17:46, 13.93s/it] {'loss': 0.7928, 'learning_rate': 4.3045e-05, 'epoch': 1.83} 14%|█▍ | 1395/10000 [5:28:15<33:17:46, 13.93s/it] 14%|█▍ | 1396/10000 [5:28:28<33:20:42, 13.95s/it] {'loss': 0.6789, 'learning_rate': 4.304e-05, 'epoch': 1.83} 14%|█▍ | 1396/10000 [5:28:29<33:20:42, 13.95s/it] 14%|█▍ | 1397/10000 [5:28:43<33:28:56, 14.01s/it] {'loss': 0.4857, 'learning_rate': 4.3035000000000004e-05, 'epoch': 1.83} 14%|█▍ | 1397/10000 [5:28:43<33:28:56, 14.01s/it] 14%|█▍ | 1398/10000 [5:28:57<33:28:03, 14.01s/it] {'loss': 0.6034, 'learning_rate': 4.3030000000000006e-05, 'epoch': 1.83} 14%|█▍ | 1398/10000 [5:28:57<33:28:03, 14.01s/it] 14%|█▍ | 1399/10000 [5:29:10<33:21:46, 13.96s/it] {'loss': 0.3824, 'learning_rate': 4.3025e-05, 'epoch': 1.83} 14%|█▍ | 1399/10000 [5:29:11<33:21:46, 13.96s/it] 14%|█▍ | 1400/10000 [5:29:24<33:18:59, 13.95s/it] {'loss': 0.483, 'learning_rate': 4.3020000000000005e-05, 'epoch': 1.83} 14%|█▍ | 1400/10000 [5:29:24<33:18:59, 13.95s/it] 14%|█▍ | 1401/10000 [5:29:38<33:11:27, 13.90s/it] {'loss': 0.657, 'learning_rate': 4.3015e-05, 'epoch': 1.83} 14%|█▍ | 1401/10000 [5:29:38<33:11:27, 13.90s/it] 14%|█▍ | 1402/10000 [5:29:52<33:17:07, 13.94s/it] {'loss': 0.6302, 'learning_rate': 4.301e-05, 'epoch': 1.84} 14%|█▍ | 1402/10000 [5:29:52<33:17:07, 13.94s/it] 14%|█▍ | 1403/10000 [5:30:06<33:21:18, 13.97s/it] {'loss': 0.6237, 'learning_rate': 4.3005e-05, 'epoch': 1.84} 14%|█▍ | 1403/10000 [5:30:06<33:21:18, 13.97s/it] 14%|█▍ | 1404/10000 [5:30:20<33:19:05, 13.95s/it] {'loss': 0.6418, 'learning_rate': 4.3e-05, 'epoch': 1.84} 14%|█▍ | 1404/10000 [5:30:20<33:19:05, 13.95s/it] 14%|█▍ | 1405/10000 [5:30:34<33:14:41, 13.92s/it] {'loss': 0.5741, 'learning_rate': 4.2995000000000004e-05, 'epoch': 1.84} 14%|█▍ | 1405/10000 [5:30:34<33:14:41, 13.92s/it] 14%|█▍ | 1406/10000 [5:30:48<33:15:02, 13.93s/it] {'loss': 0.5755, 'learning_rate': 4.299e-05, 'epoch': 1.84} 14%|█▍ | 1406/10000 [5:30:48<33:15:02, 13.93s/it] 14%|█▍ | 1407/10000 [5:31:02<33:09:50, 13.89s/it] {'loss': 0.5613, 'learning_rate': 4.2985e-05, 'epoch': 1.84} 14%|█▍ | 1407/10000 [5:31:02<33:09:50, 13.89s/it] 14%|█▍ | 1408/10000 [5:31:16<33:13:26, 13.92s/it] {'loss': 0.5921, 'learning_rate': 4.2980000000000005e-05, 'epoch': 1.84} 14%|█▍ | 1408/10000 [5:31:16<33:13:26, 13.92s/it] 14%|█▍ | 1409/10000 [5:31:30<33:09:06, 13.89s/it] {'loss': 0.5783, 'learning_rate': 4.2975e-05, 'epoch': 1.84} 14%|█▍ | 1409/10000 [5:31:30<33:09:06, 13.89s/it] 14%|█▍ | 1410/10000 [5:31:43<33:05:44, 13.87s/it] {'loss': 0.5934, 'learning_rate': 4.2970000000000004e-05, 'epoch': 1.85} 14%|█▍ | 1410/10000 [5:31:43<33:05:44, 13.87s/it] 14%|█▍ | 1411/10000 [5:31:57<33:13:08, 13.92s/it] {'loss': 0.635, 'learning_rate': 4.2965e-05, 'epoch': 1.85} 14%|█▍ | 1411/10000 [5:31:57<33:13:08, 13.92s/it] 14%|█▍ | 1412/10000 [5:32:12<33:20:57, 13.98s/it] {'loss': 0.6309, 'learning_rate': 4.296e-05, 'epoch': 1.85} 14%|█▍ | 1412/10000 [5:32:12<33:20:57, 13.98s/it] 14%|█▍ | 1413/10000 [5:32:25<33:18:12, 13.96s/it] {'loss': 0.568, 'learning_rate': 4.2955e-05, 'epoch': 1.85} 14%|█▍ | 1413/10000 [5:32:26<33:18:12, 13.96s/it] 14%|█▍ | 1414/10000 [5:32:39<33:17:23, 13.96s/it] {'loss': 0.5965, 'learning_rate': 4.295e-05, 'epoch': 1.85} 14%|█▍ | 1414/10000 [5:32:39<33:17:23, 13.96s/it] 14%|█▍ | 1415/10000 [5:32:53<33:18:56, 13.97s/it] {'loss': 0.8903, 'learning_rate': 4.2945e-05, 'epoch': 1.85} 14%|█▍ | 1415/10000 [5:32:53<33:18:56, 13.97s/it] 14%|█▍ | 1416/10000 [5:33:08<33:25:39, 14.02s/it] {'loss': 0.7283, 'learning_rate': 4.2940000000000006e-05, 'epoch': 1.85} 14%|█▍ | 1416/10000 [5:33:08<33:25:39, 14.02s/it] 14%|█▍ | 1417/10000 [5:33:21<33:16:51, 13.96s/it] {'loss': 0.5194, 'learning_rate': 4.2935e-05, 'epoch': 1.85} 14%|█▍ | 1417/10000 [5:33:21<33:16:51, 13.96s/it] 14%|█▍ | 1418/10000 [5:33:35<33:23:55, 14.01s/it] {'loss': 0.5518, 'learning_rate': 4.2930000000000004e-05, 'epoch': 1.86} 14%|█▍ | 1418/10000 [5:33:36<33:23:55, 14.01s/it] 14%|█▍ | 1419/10000 [5:33:49<33:18:47, 13.98s/it] {'loss': 0.5839, 'learning_rate': 4.2925000000000007e-05, 'epoch': 1.86} 14%|█▍ | 1419/10000 [5:33:49<33:18:47, 13.98s/it] 14%|█▍ | 1420/10000 [5:34:03<33:14:34, 13.95s/it] {'loss': 0.728, 'learning_rate': 4.292e-05, 'epoch': 1.86} 14%|█▍ | 1420/10000 [5:34:03<33:14:34, 13.95s/it] 14%|█▍ | 1421/10000 [5:34:17<33:15:02, 13.95s/it] {'loss': 0.8006, 'learning_rate': 4.2915e-05, 'epoch': 1.86} 14%|█▍ | 1421/10000 [5:34:17<33:15:02, 13.95s/it] 14%|█▍ | 1422/10000 [5:34:31<33:12:14, 13.94s/it] {'loss': 0.6852, 'learning_rate': 4.291e-05, 'epoch': 1.86} 14%|█▍ | 1422/10000 [5:34:31<33:12:14, 13.94s/it] 14%|█▍ | 1423/10000 [5:34:45<33:09:03, 13.91s/it] {'loss': 0.6508, 'learning_rate': 4.2905000000000003e-05, 'epoch': 1.86} 14%|█▍ | 1423/10000 [5:34:45<33:09:03, 13.91s/it] 14%|█▍ | 1424/10000 [5:34:59<33:10:56, 13.93s/it] {'loss': 0.6889, 'learning_rate': 4.29e-05, 'epoch': 1.86} 14%|█▍ | 1424/10000 [5:34:59<33:10:56, 13.93s/it] 14%|█▍ | 1425/10000 [5:35:13<33:18:23, 13.98s/it] {'loss': 0.5331, 'learning_rate': 4.2895e-05, 'epoch': 1.87} 14%|█▍ | 1425/10000 [5:35:13<33:18:23, 13.98s/it] 14%|█▍ | 1426/10000 [5:35:27<33:19:04, 13.99s/it] {'loss': 0.6851, 'learning_rate': 4.2890000000000004e-05, 'epoch': 1.87} 14%|█▍ | 1426/10000 [5:35:27<33:19:04, 13.99s/it] 14%|█▍ | 1427/10000 [5:35:41<33:15:51, 13.97s/it] {'loss': 0.5964, 'learning_rate': 4.2885e-05, 'epoch': 1.87} 14%|█▍ | 1427/10000 [5:35:41<33:15:51, 13.97s/it] 14%|█▍ | 1428/10000 [5:35:55<33:13:57, 13.96s/it] {'loss': 0.6599, 'learning_rate': 4.288e-05, 'epoch': 1.87} 14%|█▍ | 1428/10000 [5:35:55<33:13:57, 13.96s/it] 14%|█▍ | 1429/10000 [5:36:09<33:15:18, 13.97s/it] {'loss': 0.6395, 'learning_rate': 4.2875000000000005e-05, 'epoch': 1.87} 14%|█▍ | 1429/10000 [5:36:09<33:15:18, 13.97s/it] 14%|█▍ | 1430/10000 [5:36:23<33:09:55, 13.93s/it] {'loss': 0.7758, 'learning_rate': 4.287000000000001e-05, 'epoch': 1.87} 14%|█▍ | 1430/10000 [5:36:23<33:09:55, 13.93s/it] 14%|█▍ | 1431/10000 [5:36:37<33:08:03, 13.92s/it] {'loss': 0.6046, 'learning_rate': 4.2865e-05, 'epoch': 1.87} 14%|█▍ | 1431/10000 [5:36:37<33:08:03, 13.92s/it] 14%|█▍ | 1432/10000 [5:36:51<33:05:43, 13.91s/it] {'loss': 0.6941, 'learning_rate': 4.286e-05, 'epoch': 1.87} 14%|█▍ | 1432/10000 [5:36:51<33:05:43, 13.91s/it] 14%|█▍ | 1433/10000 [5:37:05<33:13:26, 13.96s/it] {'loss': 0.5298, 'learning_rate': 4.2855e-05, 'epoch': 1.88} 14%|█▍ | 1433/10000 [5:37:05<33:13:26, 13.96s/it] 14%|█▍ | 1434/10000 [5:37:19<33:14:02, 13.97s/it] {'loss': 0.6617, 'learning_rate': 4.285e-05, 'epoch': 1.88} 14%|█▍ | 1434/10000 [5:37:19<33:14:02, 13.97s/it] 14%|█▍ | 1435/10000 [5:37:33<33:11:27, 13.95s/it] {'loss': 0.6122, 'learning_rate': 4.2845e-05, 'epoch': 1.88} 14%|█▍ | 1435/10000 [5:37:33<33:11:27, 13.95s/it] 14%|█▍ | 1436/10000 [5:37:47<33:16:18, 13.99s/it] {'loss': 0.5886, 'learning_rate': 4.284e-05, 'epoch': 1.88} 14%|█▍ | 1436/10000 [5:37:47<33:16:18, 13.99s/it] 14%|█▍ | 1437/10000 [5:38:00<33:11:49, 13.96s/it] {'loss': 0.551, 'learning_rate': 4.2835000000000006e-05, 'epoch': 1.88} 14%|█▍ | 1437/10000 [5:38:00<33:11:49, 13.96s/it] 14%|█▍ | 1438/10000 [5:38:14<33:02:35, 13.89s/it] {'loss': 0.7972, 'learning_rate': 4.283e-05, 'epoch': 1.88} 14%|█▍ | 1438/10000 [5:38:14<33:02:35, 13.89s/it] 14%|█▍ | 1439/10000 [5:38:28<33:01:50, 13.89s/it] {'loss': 0.6932, 'learning_rate': 4.2825000000000004e-05, 'epoch': 1.88} 14%|█▍ | 1439/10000 [5:38:28<33:01:50, 13.89s/it] 14%|█▍ | 1440/10000 [5:38:42<33:02:09, 13.89s/it] {'loss': 0.6305, 'learning_rate': 4.282000000000001e-05, 'epoch': 1.88} 14%|█▍ | 1440/10000 [5:38:42<33:02:09, 13.89s/it] 14%|█▍ | 1441/10000 [5:38:56<32:57:46, 13.86s/it] {'loss': 0.6887, 'learning_rate': 4.2815e-05, 'epoch': 1.89} 14%|█▍ | 1441/10000 [5:38:56<32:57:46, 13.86s/it] 14%|█▍ | 1442/10000 [5:39:09<32:50:52, 13.82s/it] {'loss': 0.9205, 'learning_rate': 4.281e-05, 'epoch': 1.89} 14%|█▍ | 1442/10000 [5:39:10<32:50:52, 13.82s/it] 14%|█▍ | 1443/10000 [5:39:23<32:52:47, 13.83s/it] {'loss': 0.7403, 'learning_rate': 4.2805e-05, 'epoch': 1.89} 14%|█▍ | 1443/10000 [5:39:23<32:52:47, 13.83s/it] 14%|█▍ | 1444/10000 [5:39:37<32:54:34, 13.85s/it] {'loss': 0.7084, 'learning_rate': 4.2800000000000004e-05, 'epoch': 1.89} 14%|█▍ | 1444/10000 [5:39:37<32:54:34, 13.85s/it] 14%|█▍ | 1445/10000 [5:39:51<32:54:54, 13.85s/it] {'loss': 0.5992, 'learning_rate': 4.2795e-05, 'epoch': 1.89} 14%|█▍ | 1445/10000 [5:39:51<32:54:54, 13.85s/it] 14%|█▍ | 1446/10000 [5:40:05<32:56:06, 13.86s/it] {'loss': 0.5849, 'learning_rate': 4.279e-05, 'epoch': 1.89} 14%|█▍ | 1446/10000 [5:40:05<32:56:06, 13.86s/it] 14%|█▍ | 1447/10000 [5:40:19<33:04:02, 13.92s/it] {'loss': 0.6968, 'learning_rate': 4.2785000000000005e-05, 'epoch': 1.89} 14%|█▍ | 1447/10000 [5:40:19<33:04:02, 13.92s/it] 14%|█▍ | 1448/10000 [5:40:33<33:05:55, 13.93s/it] {'loss': 0.5597, 'learning_rate': 4.278e-05, 'epoch': 1.9} 14%|█▍ | 1448/10000 [5:40:33<33:05:55, 13.93s/it] 14%|█▍ | 1449/10000 [5:40:47<33:02:49, 13.91s/it] {'loss': 0.6324, 'learning_rate': 4.2775e-05, 'epoch': 1.9} 14%|█▍ | 1449/10000 [5:40:47<33:02:49, 13.91s/it] 14%|█▍ | 1450/10000 [5:41:01<33:02:57, 13.92s/it] {'loss': 0.7743, 'learning_rate': 4.2770000000000006e-05, 'epoch': 1.9} 14%|█▍ | 1450/10000 [5:41:01<33:02:57, 13.92s/it] 15%|█▍ | 1451/10000 [5:41:15<32:58:28, 13.89s/it] {'loss': 0.6024, 'learning_rate': 4.2765e-05, 'epoch': 1.9} 15%|█▍ | 1451/10000 [5:41:15<32:58:28, 13.89s/it] 15%|█▍ | 1452/10000 [5:41:28<32:56:39, 13.87s/it] {'loss': 0.5417, 'learning_rate': 4.276e-05, 'epoch': 1.9} 15%|█▍ | 1452/10000 [5:41:28<32:56:39, 13.87s/it] 15%|█▍ | 1453/10000 [5:41:42<32:56:59, 13.88s/it] {'loss': 0.6219, 'learning_rate': 4.2755e-05, 'epoch': 1.9} 15%|█▍ | 1453/10000 [5:41:42<32:56:59, 13.88s/it] 15%|█▍ | 1454/10000 [5:41:56<32:54:07, 13.86s/it] {'loss': 0.6311, 'learning_rate': 4.275e-05, 'epoch': 1.9} 15%|█▍ | 1454/10000 [5:41:56<32:54:07, 13.86s/it] 15%|█▍ | 1455/10000 [5:42:10<32:51:39, 13.84s/it] {'loss': 0.7234, 'learning_rate': 4.2745000000000005e-05, 'epoch': 1.9} 15%|█▍ | 1455/10000 [5:42:10<32:51:39, 13.84s/it] 15%|█▍ | 1456/10000 [5:42:24<32:50:32, 13.84s/it] {'loss': 0.6164, 'learning_rate': 4.274e-05, 'epoch': 1.91} 15%|█▍ | 1456/10000 [5:42:24<32:50:32, 13.84s/it] 15%|█▍ | 1457/10000 [5:42:38<32:45:57, 13.81s/it] {'loss': 0.7138, 'learning_rate': 4.2735e-05, 'epoch': 1.91} 15%|█▍ | 1457/10000 [5:42:38<32:45:57, 13.81s/it] 15%|█▍ | 1458/10000 [5:42:51<32:50:10, 13.84s/it] {'loss': 0.5615, 'learning_rate': 4.2730000000000006e-05, 'epoch': 1.91} 15%|█▍ | 1458/10000 [5:42:51<32:50:10, 13.84s/it] 15%|█▍ | 1459/10000 [5:43:05<32:48:59, 13.83s/it] {'loss': 0.4963, 'learning_rate': 4.2725e-05, 'epoch': 1.91} 15%|█▍ | 1459/10000 [5:43:05<32:48:59, 13.83s/it] 15%|█▍ | 1460/10000 [5:43:19<32:50:14, 13.84s/it] {'loss': 0.5612, 'learning_rate': 4.2720000000000004e-05, 'epoch': 1.91} 15%|█▍ | 1460/10000 [5:43:19<32:50:14, 13.84s/it] 15%|█▍ | 1461/10000 [5:43:33<32:52:41, 13.86s/it] {'loss': 0.5955, 'learning_rate': 4.2715e-05, 'epoch': 1.91} 15%|█▍ | 1461/10000 [5:43:33<32:52:41, 13.86s/it] 15%|█▍ | 1462/10000 [5:43:47<32:50:21, 13.85s/it] {'loss': 0.6282, 'learning_rate': 4.271e-05, 'epoch': 1.91} 15%|█▍ | 1462/10000 [5:43:47<32:50:21, 13.85s/it] 15%|█▍ | 1463/10000 [5:44:01<32:55:49, 13.89s/it] {'loss': 0.5416, 'learning_rate': 4.2705e-05, 'epoch': 1.91} 15%|█▍ | 1463/10000 [5:44:01<32:55:49, 13.89s/it] 15%|█▍ | 1464/10000 [5:44:15<32:52:04, 13.86s/it] {'loss': 0.588, 'learning_rate': 4.27e-05, 'epoch': 1.92} 15%|█▍ | 1464/10000 [5:44:15<32:52:04, 13.86s/it] 15%|█▍ | 1465/10000 [5:44:28<32:49:12, 13.84s/it] {'loss': 0.6563, 'learning_rate': 4.2695000000000004e-05, 'epoch': 1.92} 15%|█▍ | 1465/10000 [5:44:28<32:49:12, 13.84s/it] 15%|█▍ | 1466/10000 [5:44:42<32:46:26, 13.83s/it] {'loss': 0.6123, 'learning_rate': 4.269e-05, 'epoch': 1.92} 15%|█▍ | 1466/10000 [5:44:42<32:46:26, 13.83s/it] 15%|█▍ | 1467/10000 [5:44:56<32:48:01, 13.84s/it] {'loss': 0.5904, 'learning_rate': 4.2685e-05, 'epoch': 1.92} 15%|█▍ | 1467/10000 [5:44:56<32:48:01, 13.84s/it] 15%|█▍ | 1468/10000 [5:45:10<32:46:32, 13.83s/it] {'loss': 0.647, 'learning_rate': 4.2680000000000005e-05, 'epoch': 1.92} 15%|█▍ | 1468/10000 [5:45:10<32:46:32, 13.83s/it] 15%|█▍ | 1469/10000 [5:45:24<32:41:37, 13.80s/it] {'loss': 0.7227, 'learning_rate': 4.2675e-05, 'epoch': 1.92} 15%|█▍ | 1469/10000 [5:45:24<32:41:37, 13.80s/it] 15%|█▍ | 1470/10000 [5:45:37<32:42:43, 13.81s/it] {'loss': 0.6995, 'learning_rate': 4.267e-05, 'epoch': 1.92} 15%|█▍ | 1470/10000 [5:45:37<32:42:43, 13.81s/it] 15%|█▍ | 1471/10000 [5:45:51<32:45:09, 13.82s/it] {'loss': 0.5114, 'learning_rate': 4.2665e-05, 'epoch': 1.93} 15%|█▍ | 1471/10000 [5:45:51<32:45:09, 13.82s/it] 15%|█▍ | 1472/10000 [5:46:05<32:50:33, 13.86s/it] {'loss': 0.5814, 'learning_rate': 4.266e-05, 'epoch': 1.93} 15%|█▍ | 1472/10000 [5:46:05<32:50:33, 13.86s/it] 15%|█▍ | 1473/10000 [5:46:19<32:53:43, 13.89s/it] {'loss': 0.693, 'learning_rate': 4.2655e-05, 'epoch': 1.93} 15%|█▍ | 1473/10000 [5:46:19<32:53:43, 13.89s/it] 15%|█▍ | 1474/10000 [5:46:33<32:51:19, 13.87s/it] {'loss': 0.777, 'learning_rate': 4.265e-05, 'epoch': 1.93} 15%|█▍ | 1474/10000 [5:46:33<32:51:19, 13.87s/it] 15%|█▍ | 1475/10000 [5:46:47<32:54:41, 13.90s/it] {'loss': 0.6855, 'learning_rate': 4.2645e-05, 'epoch': 1.93} 15%|█▍ | 1475/10000 [5:46:47<32:54:41, 13.90s/it] 15%|█▍ | 1476/10000 [5:47:01<32:51:53, 13.88s/it] {'loss': 0.7619, 'learning_rate': 4.2640000000000005e-05, 'epoch': 1.93} 15%|█▍ | 1476/10000 [5:47:01<32:51:53, 13.88s/it] 15%|█▍ | 1477/10000 [5:47:15<32:45:19, 13.84s/it] {'loss': 0.9015, 'learning_rate': 4.2635e-05, 'epoch': 1.93} 15%|█▍ | 1477/10000 [5:47:15<32:45:19, 13.84s/it] 15%|█▍ | 1478/10000 [5:47:28<32:42:19, 13.82s/it] {'loss': 0.6491, 'learning_rate': 4.2630000000000004e-05, 'epoch': 1.93} 15%|█▍ | 1478/10000 [5:47:28<32:42:19, 13.82s/it] 15%|█▍ | 1479/10000 [5:47:42<32:39:25, 13.80s/it] {'loss': 0.5643, 'learning_rate': 4.2625000000000006e-05, 'epoch': 1.94} 15%|█▍ | 1479/10000 [5:47:42<32:39:25, 13.80s/it] 15%|█▍ | 1480/10000 [5:47:56<32:41:38, 13.81s/it] {'loss': 0.6608, 'learning_rate': 4.262e-05, 'epoch': 1.94} 15%|█▍ | 1480/10000 [5:47:56<32:41:38, 13.81s/it] 15%|█▍ | 1481/10000 [5:48:10<32:38:39, 13.80s/it] {'loss': 0.506, 'learning_rate': 4.2615e-05, 'epoch': 1.94} 15%|█▍ | 1481/10000 [5:48:10<32:38:39, 13.80s/it] 15%|█▍ | 1482/10000 [5:48:24<32:44:04, 13.83s/it] {'loss': 0.5918, 'learning_rate': 4.261e-05, 'epoch': 1.94} 15%|█▍ | 1482/10000 [5:48:24<32:44:04, 13.83s/it] 15%|█▍ | 1483/10000 [5:48:37<32:41:25, 13.82s/it] {'loss': 0.7319, 'learning_rate': 4.2605e-05, 'epoch': 1.94} 15%|█▍ | 1483/10000 [5:48:37<32:41:25, 13.82s/it] 15%|█▍ | 1484/10000 [5:48:51<32:37:37, 13.79s/it] {'loss': 0.5844, 'learning_rate': 4.26e-05, 'epoch': 1.94} 15%|█▍ | 1484/10000 [5:48:51<32:37:37, 13.79s/it] 15%|█▍ | 1485/10000 [5:49:05<32:39:15, 13.81s/it] {'loss': 0.7239, 'learning_rate': 4.2595e-05, 'epoch': 1.94} 15%|█▍ | 1485/10000 [5:49:05<32:39:15, 13.81s/it] 15%|█▍ | 1486/10000 [5:49:19<32:43:43, 13.84s/it] {'loss': 0.7651, 'learning_rate': 4.2590000000000004e-05, 'epoch': 1.95} 15%|█▍ | 1486/10000 [5:49:19<32:43:43, 13.84s/it] 15%|█▍ | 1487/10000 [5:49:33<32:44:41, 13.85s/it] {'loss': 0.5031, 'learning_rate': 4.2585e-05, 'epoch': 1.95} 15%|█▍ | 1487/10000 [5:49:33<32:44:41, 13.85s/it] 15%|█▍ | 1488/10000 [5:49:47<32:42:59, 13.84s/it] {'loss': 0.6131, 'learning_rate': 4.258e-05, 'epoch': 1.95} 15%|█▍ | 1488/10000 [5:49:47<32:42:59, 13.84s/it] 15%|█▍ | 1489/10000 [5:50:01<32:48:20, 13.88s/it] {'loss': 0.6242, 'learning_rate': 4.2575000000000005e-05, 'epoch': 1.95} 15%|█▍ | 1489/10000 [5:50:01<32:48:20, 13.88s/it] 15%|█▍ | 1490/10000 [5:50:14<32:47:14, 13.87s/it] {'loss': 0.5771, 'learning_rate': 4.257000000000001e-05, 'epoch': 1.95} 15%|█▍ | 1490/10000 [5:50:14<32:47:14, 13.87s/it] 15%|█▍ | 1491/10000 [5:50:28<32:48:37, 13.88s/it] {'loss': 0.5546, 'learning_rate': 4.2564999999999997e-05, 'epoch': 1.95} 15%|█▍ | 1491/10000 [5:50:28<32:48:37, 13.88s/it] 15%|█▍ | 1492/10000 [5:50:42<32:49:11, 13.89s/it] {'loss': 0.6122, 'learning_rate': 4.256e-05, 'epoch': 1.95} 15%|█▍ | 1492/10000 [5:50:42<32:49:11, 13.89s/it] 15%|█▍ | 1493/10000 [5:50:56<32:43:12, 13.85s/it] {'loss': 0.5872, 'learning_rate': 4.2555e-05, 'epoch': 1.95} 15%|█▍ | 1493/10000 [5:50:56<32:43:12, 13.85s/it] 15%|█▍ | 1494/10000 [5:51:10<32:43:12, 13.85s/it] {'loss': 0.5594, 'learning_rate': 4.2550000000000004e-05, 'epoch': 1.96} 15%|█▍ | 1494/10000 [5:51:10<32:43:12, 13.85s/it] 15%|█▍ | 1495/10000 [5:51:24<32:42:34, 13.85s/it] {'loss': 0.5677, 'learning_rate': 4.2545e-05, 'epoch': 1.96} 15%|█▍ | 1495/10000 [5:51:24<32:42:34, 13.85s/it] 15%|█▍ | 1496/10000 [5:51:38<32:45:43, 13.87s/it] {'loss': 0.898, 'learning_rate': 4.254e-05, 'epoch': 1.96} 15%|█▍ | 1496/10000 [5:51:38<32:45:43, 13.87s/it] 15%|█▍ | 1497/10000 [5:51:52<32:51:14, 13.91s/it] {'loss': 0.652, 'learning_rate': 4.2535000000000005e-05, 'epoch': 1.96} 15%|█▍ | 1497/10000 [5:51:52<32:51:14, 13.91s/it] 15%|█▍ | 1498/10000 [5:52:05<32:46:11, 13.88s/it] {'loss': 0.5977, 'learning_rate': 4.253e-05, 'epoch': 1.96} 15%|█▍ | 1498/10000 [5:52:05<32:46:11, 13.88s/it] 15%|█▍ | 1499/10000 [5:52:19<32:44:45, 13.87s/it] {'loss': 0.819, 'learning_rate': 4.2525000000000004e-05, 'epoch': 1.96} 15%|█▍ | 1499/10000 [5:52:19<32:44:45, 13.87s/it] 15%|█▌ | 1500/10000 [5:52:33<32:43:22, 13.86s/it] {'loss': 0.6304, 'learning_rate': 4.2520000000000006e-05, 'epoch': 1.96} 15%|█▌ | 1500/10000 [5:52:33<32:43:22, 13.86s/it] 15%|█▌ | 1501/10000 [5:52:47<32:50:48, 13.91s/it] {'loss': 0.7215, 'learning_rate': 4.2515e-05, 'epoch': 1.96} 15%|█▌ | 1501/10000 [5:52:47<32:50:48, 13.91s/it] 15%|█▌ | 1502/10000 [5:53:01<32:49:02, 13.90s/it] {'loss': 0.6221, 'learning_rate': 4.251e-05, 'epoch': 1.97} 15%|█▌ | 1502/10000 [5:53:01<32:49:02, 13.90s/it] 15%|█▌ | 1503/10000 [5:53:15<32:47:19, 13.89s/it] {'loss': 0.4996, 'learning_rate': 4.2505e-05, 'epoch': 1.97} 15%|█▌ | 1503/10000 [5:53:15<32:47:19, 13.89s/it] 15%|█▌ | 1504/10000 [5:53:29<32:46:48, 13.89s/it] {'loss': 0.5926, 'learning_rate': 4.25e-05, 'epoch': 1.97} 15%|█▌ | 1504/10000 [5:53:29<32:46:48, 13.89s/it] 15%|█▌ | 1505/10000 [5:53:43<32:44:04, 13.87s/it] {'loss': 0.5357, 'learning_rate': 4.2495e-05, 'epoch': 1.97} 15%|█▌ | 1505/10000 [5:53:43<32:44:04, 13.87s/it] 15%|█▌ | 1506/10000 [5:53:56<32:41:44, 13.86s/it] {'loss': 0.7315, 'learning_rate': 4.249e-05, 'epoch': 1.97} 15%|█▌ | 1506/10000 [5:53:56<32:41:44, 13.86s/it] 15%|█▌ | 1507/10000 [5:54:10<32:37:15, 13.83s/it] {'loss': 0.5113, 'learning_rate': 4.2485000000000004e-05, 'epoch': 1.97} 15%|█▌ | 1507/10000 [5:54:10<32:37:15, 13.83s/it] 15%|█▌ | 1508/10000 [5:54:24<32:46:33, 13.89s/it] {'loss': 0.6161, 'learning_rate': 4.248e-05, 'epoch': 1.97} 15%|█▌ | 1508/10000 [5:54:24<32:46:33, 13.89s/it] 15%|█▌ | 1509/10000 [5:54:38<32:46:07, 13.89s/it] {'loss': 0.5406, 'learning_rate': 4.2475e-05, 'epoch': 1.98} 15%|█▌ | 1509/10000 [5:54:38<32:46:07, 13.89s/it] 15%|█▌ | 1510/10000 [5:54:52<32:42:08, 13.87s/it] {'loss': 0.8068, 'learning_rate': 4.2470000000000005e-05, 'epoch': 1.98} 15%|█▌ | 1510/10000 [5:54:52<32:42:08, 13.87s/it] 15%|█▌ | 1511/10000 [5:55:06<32:37:36, 13.84s/it] {'loss': 0.5028, 'learning_rate': 4.246500000000001e-05, 'epoch': 1.98} 15%|█▌ | 1511/10000 [5:55:06<32:37:36, 13.84s/it] 15%|█▌ | 1512/10000 [5:55:19<32:38:02, 13.84s/it] {'loss': 0.6376, 'learning_rate': 4.246e-05, 'epoch': 1.98} 15%|█▌ | 1512/10000 [5:55:20<32:38:02, 13.84s/it] 15%|█▌ | 1513/10000 [5:55:33<32:38:16, 13.84s/it] {'loss': 0.6484, 'learning_rate': 4.2455e-05, 'epoch': 1.98} 15%|█▌ | 1513/10000 [5:55:33<32:38:16, 13.84s/it] 15%|█▌ | 1514/10000 [5:55:47<32:32:47, 13.81s/it] {'loss': 0.6228, 'learning_rate': 4.245e-05, 'epoch': 1.98} 15%|█▌ | 1514/10000 [5:55:47<32:32:47, 13.81s/it] 15%|█▌ | 1515/10000 [5:56:01<32:31:03, 13.80s/it] {'loss': 0.6398, 'learning_rate': 4.2445000000000004e-05, 'epoch': 1.98} 15%|█▌ | 1515/10000 [5:56:01<32:31:03, 13.80s/it] 15%|█▌ | 1516/10000 [5:56:15<32:35:36, 13.83s/it] {'loss': 0.5078, 'learning_rate': 4.244e-05, 'epoch': 1.98} 15%|█▌ | 1516/10000 [5:56:15<32:35:36, 13.83s/it] 15%|█▌ | 1517/10000 [5:56:29<32:37:38, 13.85s/it] {'loss': 0.622, 'learning_rate': 4.2435e-05, 'epoch': 1.99} 15%|█▌ | 1517/10000 [5:56:29<32:37:38, 13.85s/it] 15%|█▌ | 1518/10000 [5:56:42<32:35:33, 13.83s/it] {'loss': 0.5719, 'learning_rate': 4.2430000000000005e-05, 'epoch': 1.99} 15%|█▌ | 1518/10000 [5:56:42<32:35:33, 13.83s/it] 15%|█▌ | 1519/10000 [5:56:56<32:42:51, 13.89s/it] {'loss': 0.6554, 'learning_rate': 4.2425e-05, 'epoch': 1.99} 15%|█▌ | 1519/10000 [5:56:56<32:42:51, 13.89s/it] 15%|█▌ | 1520/10000 [5:57:10<32:41:03, 13.88s/it] {'loss': 0.6234, 'learning_rate': 4.2420000000000004e-05, 'epoch': 1.99} 15%|█▌ | 1520/10000 [5:57:10<32:41:03, 13.88s/it] 15%|█▌ | 1521/10000 [5:57:24<32:38:04, 13.86s/it] {'loss': 0.5471, 'learning_rate': 4.2415000000000006e-05, 'epoch': 1.99} 15%|█▌ | 1521/10000 [5:57:24<32:38:04, 13.86s/it] 15%|█▌ | 1522/10000 [5:57:38<32:34:50, 13.83s/it] {'loss': 0.7987, 'learning_rate': 4.241e-05, 'epoch': 1.99} 15%|█▌ | 1522/10000 [5:57:38<32:34:50, 13.83s/it] 15%|█▌ | 1523/10000 [5:57:52<32:36:20, 13.85s/it] {'loss': 0.7057, 'learning_rate': 4.2405e-05, 'epoch': 1.99} 15%|█▌ | 1523/10000 [5:57:52<32:36:20, 13.85s/it] 15%|█▌ | 1524/10000 [5:58:06<32:37:12, 13.85s/it] {'loss': 0.6362, 'learning_rate': 4.24e-05, 'epoch': 1.99} 15%|█▌ | 1524/10000 [5:58:06<32:37:12, 13.85s/it] 15%|█▌ | 1525/10000 [5:58:19<32:34:39, 13.84s/it] {'loss': 0.7612, 'learning_rate': 4.2395e-05, 'epoch': 2.0} 15%|█▌ | 1525/10000 [5:58:19<32:34:39, 13.84s/it] 15%|█▌ | 1526/10000 [5:58:33<32:31:47, 13.82s/it] {'loss': 0.8176, 'learning_rate': 4.239e-05, 'epoch': 2.0} 15%|█▌ | 1526/10000 [5:58:33<32:31:47, 13.82s/it] 15%|█▌ | 1527/10000 [5:58:47<32:29:53, 13.81s/it] {'loss': 0.6831, 'learning_rate': 4.2385e-05, 'epoch': 2.0} 15%|█▌ | 1527/10000 [5:58:47<32:29:53, 13.81s/it] 15%|█▌ | 1528/10000 [5:59:00<31:41:09, 13.46s/it] {'loss': 0.6111, 'learning_rate': 4.2380000000000004e-05, 'epoch': 2.0} 15%|█▌ | 1528/10000 [5:59:00<31:41:09, 13.46s/it] 15%|█▌ | 1529/10000 [5:59:13<31:52:18, 13.54s/it] {'loss': 0.4211, 'learning_rate': 4.237500000000001e-05, 'epoch': 2.0} 15%|█▌ | 1529/10000 [5:59:13<31:52:18, 13.54s/it] 15%|█▌ | 1530/10000 [5:59:27<32:10:16, 13.67s/it] {'loss': 0.2717, 'learning_rate': 4.237e-05, 'epoch': 2.0} 15%|█▌ | 1530/10000 [5:59:27<32:10:16, 13.67s/it] 15%|█▌ | 1531/10000 [5:59:41<32:21:23, 13.75s/it] {'loss': 0.3564, 'learning_rate': 4.2365000000000005e-05, 'epoch': 2.0} 15%|█▌ | 1531/10000 [5:59:41<32:21:23, 13.75s/it] 15%|█▌ | 1532/10000 [5:59:55<32:27:47, 13.80s/it] {'loss': 0.3298, 'learning_rate': 4.236e-05, 'epoch': 2.01} 15%|█▌ | 1532/10000 [5:59:55<32:27:47, 13.80s/it] 15%|█▌ | 1533/10000 [6:00:09<32:23:31, 13.77s/it] {'loss': 0.3159, 'learning_rate': 4.2355000000000004e-05, 'epoch': 2.01} 15%|█▌ | 1533/10000 [6:00:09<32:23:31, 13.77s/it] 15%|█▌ | 1534/10000 [6:00:23<32:26:48, 13.80s/it] {'loss': 0.3569, 'learning_rate': 4.235e-05, 'epoch': 2.01} 15%|█▌ | 1534/10000 [6:00:23<32:26:48, 13.80s/it] 15%|█▌ | 1535/10000 [6:00:37<32:31:46, 13.83s/it] {'loss': 0.3231, 'learning_rate': 4.2345e-05, 'epoch': 2.01} 15%|█▌ | 1535/10000 [6:00:37<32:31:46, 13.83s/it] 15%|█▌ | 1536/10000 [6:00:51<32:32:25, 13.84s/it] {'loss': 0.3051, 'learning_rate': 4.2340000000000005e-05, 'epoch': 2.01} 15%|█▌ | 1536/10000 [6:00:51<32:32:25, 13.84s/it] 15%|█▌ | 1537/10000 [6:01:04<32:29:58, 13.82s/it] {'loss': 0.3269, 'learning_rate': 4.2335e-05, 'epoch': 2.01} 15%|█▌ | 1537/10000 [6:01:04<32:29:58, 13.82s/it] 15%|█▌ | 1538/10000 [6:01:18<32:33:50, 13.85s/it] {'loss': 0.3232, 'learning_rate': 4.233e-05, 'epoch': 2.01} 15%|█▌ | 1538/10000 [6:01:18<32:33:50, 13.85s/it] 15%|█▌ | 1539/10000 [6:01:32<32:39:34, 13.90s/it] {'loss': 0.3587, 'learning_rate': 4.2325000000000006e-05, 'epoch': 2.01} 15%|█▌ | 1539/10000 [6:01:32<32:39:34, 13.90s/it] 15%|█▌ | 1540/10000 [6:01:46<32:36:34, 13.88s/it] {'loss': 0.2753, 'learning_rate': 4.232e-05, 'epoch': 2.02} 15%|█▌ | 1540/10000 [6:01:46<32:36:34, 13.88s/it] 15%|█▌ | 1541/10000 [6:02:00<32:34:02, 13.86s/it] {'loss': 0.3948, 'learning_rate': 4.2315000000000004e-05, 'epoch': 2.02} 15%|█▌ | 1541/10000 [6:02:00<32:34:02, 13.86s/it] 15%|█▌ | 1542/10000 [6:02:14<32:38:12, 13.89s/it] {'loss': 0.3288, 'learning_rate': 4.231e-05, 'epoch': 2.02} 15%|█▌ | 1542/10000 [6:02:14<32:38:12, 13.89s/it] 15%|█▌ | 1543/10000 [6:02:28<32:31:24, 13.84s/it] {'loss': 0.2817, 'learning_rate': 4.2305e-05, 'epoch': 2.02} 15%|█▌ | 1543/10000 [6:02:28<32:31:24, 13.84s/it] 15%|█▌ | 1544/10000 [6:02:42<32:43:09, 13.93s/it] {'loss': 0.2588, 'learning_rate': 4.23e-05, 'epoch': 2.02} 15%|█▌ | 1544/10000 [6:02:42<32:43:09, 13.93s/it] 15%|█▌ | 1545/10000 [6:02:56<32:38:56, 13.90s/it] {'loss': 0.3694, 'learning_rate': 4.2295e-05, 'epoch': 2.02} 15%|█▌ | 1545/10000 [6:02:56<32:38:56, 13.90s/it] 15%|█▌ | 1546/10000 [6:03:10<32:39:41, 13.91s/it] {'loss': 0.312, 'learning_rate': 4.229e-05, 'epoch': 2.02} 15%|█▌ | 1546/10000 [6:03:10<32:39:41, 13.91s/it] 15%|█▌ | 1547/10000 [6:03:23<32:39:26, 13.91s/it] {'loss': 0.3899, 'learning_rate': 4.2285e-05, 'epoch': 2.02} 15%|█▌ | 1547/10000 [6:03:23<32:39:26, 13.91s/it] 15%|█▌ | 1548/10000 [6:03:37<32:33:35, 13.87s/it] {'loss': 0.3143, 'learning_rate': 4.228e-05, 'epoch': 2.03} 15%|█▌ | 1548/10000 [6:03:37<32:33:35, 13.87s/it] 15%|█▌ | 1549/10000 [6:03:51<32:37:33, 13.90s/it] {'loss': 0.2959, 'learning_rate': 4.2275000000000004e-05, 'epoch': 2.03} 15%|█▌ | 1549/10000 [6:03:51<32:37:33, 13.90s/it] 16%|█▌ | 1550/10000 [6:04:05<32:31:11, 13.85s/it] {'loss': 0.2729, 'learning_rate': 4.227000000000001e-05, 'epoch': 2.03} 16%|█▌ | 1550/10000 [6:04:05<32:31:11, 13.85s/it] 16%|█▌ | 1551/10000 [6:04:19<32:32:57, 13.87s/it] {'loss': 0.2988, 'learning_rate': 4.2265e-05, 'epoch': 2.03} 16%|█▌ | 1551/10000 [6:04:19<32:32:57, 13.87s/it] 16%|█▌ | 1552/10000 [6:04:33<32:30:26, 13.85s/it] {'loss': 0.382, 'learning_rate': 4.226e-05, 'epoch': 2.03} 16%|█▌ | 1552/10000 [6:04:33<32:30:26, 13.85s/it] 16%|█▌ | 1553/10000 [6:04:46<32:29:39, 13.85s/it] {'loss': 0.3827, 'learning_rate': 4.2255e-05, 'epoch': 2.03} 16%|█▌ | 1553/10000 [6:04:46<32:29:39, 13.85s/it] 16%|█▌ | 1554/10000 [6:05:00<32:30:25, 13.86s/it] {'loss': 0.3008, 'learning_rate': 4.2250000000000004e-05, 'epoch': 2.03} 16%|█▌ | 1554/10000 [6:05:00<32:30:25, 13.86s/it] 16%|█▌ | 1555/10000 [6:05:14<32:24:30, 13.82s/it] {'loss': 0.2906, 'learning_rate': 4.2245e-05, 'epoch': 2.04} 16%|█▌ | 1555/10000 [6:05:14<32:24:30, 13.82s/it] 16%|█▌ | 1556/10000 [6:05:28<32:28:28, 13.85s/it] {'loss': 0.272, 'learning_rate': 4.224e-05, 'epoch': 2.04} 16%|█▌ | 1556/10000 [6:05:28<32:28:28, 13.85s/it] 16%|█▌ | 1557/10000 [6:05:42<32:28:01, 13.84s/it] {'loss': 0.3354, 'learning_rate': 4.2235000000000005e-05, 'epoch': 2.04} 16%|█▌ | 1557/10000 [6:05:42<32:28:01, 13.84s/it] 16%|█▌ | 1558/10000 [6:05:56<32:31:16, 13.87s/it] {'loss': 0.3328, 'learning_rate': 4.223e-05, 'epoch': 2.04} 16%|█▌ | 1558/10000 [6:05:56<32:31:16, 13.87s/it] 16%|█▌ | 1559/10000 [6:06:10<32:36:10, 13.90s/it] {'loss': 0.3303, 'learning_rate': 4.2225e-05, 'epoch': 2.04} 16%|█▌ | 1559/10000 [6:06:10<32:36:10, 13.90s/it] 16%|█▌ | 1560/10000 [6:06:24<32:38:02, 13.92s/it] {'loss': 0.2802, 'learning_rate': 4.2220000000000006e-05, 'epoch': 2.04} 16%|█▌ | 1560/10000 [6:06:24<32:38:02, 13.92s/it] 16%|█▌ | 1561/10000 [6:06:38<32:45:08, 13.97s/it] {'loss': 0.3158, 'learning_rate': 4.2215e-05, 'epoch': 2.04} 16%|█▌ | 1561/10000 [6:06:38<32:45:08, 13.97s/it] 16%|█▌ | 1562/10000 [6:06:52<32:38:22, 13.93s/it] {'loss': 0.292, 'learning_rate': 4.221e-05, 'epoch': 2.04} 16%|█▌ | 1562/10000 [6:06:52<32:38:22, 13.93s/it] 16%|█▌ | 1563/10000 [6:07:05<32:31:40, 13.88s/it] {'loss': 0.2472, 'learning_rate': 4.2205e-05, 'epoch': 2.05} 16%|█▌ | 1563/10000 [6:07:05<32:31:40, 13.88s/it] 16%|█▌ | 1564/10000 [6:07:19<32:29:54, 13.87s/it] {'loss': 0.3769, 'learning_rate': 4.22e-05, 'epoch': 2.05} 16%|█▌ | 1564/10000 [6:07:19<32:29:54, 13.87s/it] 16%|█▌ | 1565/10000 [6:07:33<32:32:11, 13.89s/it] {'loss': 0.3355, 'learning_rate': 4.2195e-05, 'epoch': 2.05} 16%|█▌ | 1565/10000 [6:07:33<32:32:11, 13.89s/it] 16%|█▌ | 1566/10000 [6:07:47<32:32:41, 13.89s/it] {'loss': 0.3446, 'learning_rate': 4.219e-05, 'epoch': 2.05} 16%|█▌ | 1566/10000 [6:07:47<32:32:41, 13.89s/it] 16%|█▌ | 1567/10000 [6:08:01<32:30:30, 13.88s/it] {'loss': 0.2497, 'learning_rate': 4.2185000000000004e-05, 'epoch': 2.05} 16%|█▌ | 1567/10000 [6:08:01<32:30:30, 13.88s/it] 16%|█▌ | 1568/10000 [6:08:15<32:28:49, 13.87s/it] {'loss': 0.3456, 'learning_rate': 4.2180000000000006e-05, 'epoch': 2.05} 16%|█▌ | 1568/10000 [6:08:15<32:28:49, 13.87s/it] 16%|█▌ | 1569/10000 [6:08:29<32:34:52, 13.91s/it] {'loss': 0.4579, 'learning_rate': 4.2175e-05, 'epoch': 2.05} 16%|█▌ | 1569/10000 [6:08:29<32:34:52, 13.91s/it] 16%|█▌ | 1570/10000 [6:08:43<32:31:25, 13.89s/it] {'loss': 0.2384, 'learning_rate': 4.2170000000000005e-05, 'epoch': 2.05} 16%|█▌ | 1570/10000 [6:08:43<32:31:25, 13.89s/it] 16%|█▌ | 1571/10000 [6:08:56<32:30:50, 13.89s/it] {'loss': 0.3328, 'learning_rate': 4.216500000000001e-05, 'epoch': 2.06} 16%|█▌ | 1571/10000 [6:08:56<32:30:50, 13.89s/it] 16%|█▌ | 1572/10000 [6:09:10<32:24:01, 13.84s/it] {'loss': 0.28, 'learning_rate': 4.2159999999999996e-05, 'epoch': 2.06} 16%|█▌ | 1572/10000 [6:09:10<32:24:01, 13.84s/it] 16%|█▌ | 1573/10000 [6:09:24<32:22:19, 13.83s/it] {'loss': 0.3203, 'learning_rate': 4.2155e-05, 'epoch': 2.06} 16%|█▌ | 1573/10000 [6:09:24<32:22:19, 13.83s/it] 16%|█▌ | 1574/10000 [6:09:38<32:23:24, 13.84s/it] {'loss': 0.4162, 'learning_rate': 4.215e-05, 'epoch': 2.06} 16%|█▌ | 1574/10000 [6:09:38<32:23:24, 13.84s/it] 16%|█▌ | 1575/10000 [6:09:52<32:24:55, 13.85s/it] {'loss': 0.4055, 'learning_rate': 4.2145000000000004e-05, 'epoch': 2.06} 16%|█▌ | 1575/10000 [6:09:52<32:24:55, 13.85s/it] 16%|█▌ | 1576/10000 [6:10:06<32:24:41, 13.85s/it] {'loss': 0.324, 'learning_rate': 4.214e-05, 'epoch': 2.06} 16%|█▌ | 1576/10000 [6:10:06<32:24:41, 13.85s/it] 16%|█▌ | 1577/10000 [6:10:20<32:27:41, 13.87s/it] {'loss': 0.2356, 'learning_rate': 4.2135e-05, 'epoch': 2.06} 16%|█▌ | 1577/10000 [6:10:20<32:27:41, 13.87s/it] 16%|█▌ | 1578/10000 [6:10:33<32:21:52, 13.83s/it] {'loss': 0.3229, 'learning_rate': 4.2130000000000005e-05, 'epoch': 2.07} 16%|█▌ | 1578/10000 [6:10:33<32:21:52, 13.83s/it] 16%|█▌ | 1579/10000 [6:10:47<32:25:51, 13.86s/it] {'loss': 0.3569, 'learning_rate': 4.2125e-05, 'epoch': 2.07} 16%|█▌ | 1579/10000 [6:10:47<32:25:51, 13.86s/it] 16%|█▌ | 1580/10000 [6:11:01<32:27:18, 13.88s/it] {'loss': 0.325, 'learning_rate': 4.212e-05, 'epoch': 2.07} 16%|█▌ | 1580/10000 [6:11:01<32:27:18, 13.88s/it] 16%|█▌ | 1581/10000 [6:11:15<32:29:00, 13.89s/it] {'loss': 0.2554, 'learning_rate': 4.2115000000000006e-05, 'epoch': 2.07} 16%|█▌ | 1581/10000 [6:11:15<32:29:00, 13.89s/it] 16%|█▌ | 1582/10000 [6:11:29<32:27:25, 13.88s/it] {'loss': 0.4428, 'learning_rate': 4.211e-05, 'epoch': 2.07} 16%|█▌ | 1582/10000 [6:11:29<32:27:25, 13.88s/it] 16%|█▌ | 1583/10000 [6:11:43<32:23:58, 13.86s/it] {'loss': 0.251, 'learning_rate': 4.2105e-05, 'epoch': 2.07} 16%|█▌ | 1583/10000 [6:11:43<32:23:58, 13.86s/it] 16%|█▌ | 1584/10000 [6:11:56<32:17:37, 13.81s/it] {'loss': 0.4141, 'learning_rate': 4.21e-05, 'epoch': 2.07} 16%|█▌ | 1584/10000 [6:11:56<32:17:37, 13.81s/it] 16%|█▌ | 1585/10000 [6:12:10<32:21:48, 13.85s/it] {'loss': 0.275, 'learning_rate': 4.2095e-05, 'epoch': 2.07} 16%|█▌ | 1585/10000 [6:12:10<32:21:48, 13.85s/it] 16%|█▌ | 1586/10000 [6:12:24<32:17:22, 13.82s/it] {'loss': 0.3538, 'learning_rate': 4.209e-05, 'epoch': 2.08} 16%|█▌ | 1586/10000 [6:12:24<32:17:22, 13.82s/it] 16%|█▌ | 1587/10000 [6:12:38<32:13:35, 13.79s/it] {'loss': 0.2654, 'learning_rate': 4.2085e-05, 'epoch': 2.08} 16%|█▌ | 1587/10000 [6:12:38<32:13:35, 13.79s/it] 16%|█▌ | 1588/10000 [6:12:52<32:15:29, 13.81s/it] {'loss': 0.3514, 'learning_rate': 4.2080000000000004e-05, 'epoch': 2.08} 16%|█▌ | 1588/10000 [6:12:52<32:15:29, 13.81s/it] 16%|█▌ | 1589/10000 [6:13:05<32:16:59, 13.82s/it] {'loss': 0.3404, 'learning_rate': 4.2075000000000006e-05, 'epoch': 2.08} 16%|█▌ | 1589/10000 [6:13:05<32:16:59, 13.82s/it] 16%|█▌ | 1590/10000 [6:13:19<32:13:43, 13.80s/it] {'loss': 0.3179, 'learning_rate': 4.207e-05, 'epoch': 2.08} 16%|█▌ | 1590/10000 [6:13:19<32:13:43, 13.80s/it] 16%|█▌ | 1591/10000 [6:13:33<32:16:37, 13.82s/it] {'loss': 0.3524, 'learning_rate': 4.2065000000000005e-05, 'epoch': 2.08} 16%|█▌ | 1591/10000 [6:13:33<32:16:37, 13.82s/it] 16%|█▌ | 1592/10000 [6:13:47<32:16:02, 13.82s/it] {'loss': 0.2821, 'learning_rate': 4.206e-05, 'epoch': 2.08} 16%|█▌ | 1592/10000 [6:13:47<32:16:02, 13.82s/it] 16%|█▌ | 1593/10000 [6:14:01<32:19:01, 13.84s/it] {'loss': 0.2664, 'learning_rate': 4.2055e-05, 'epoch': 2.09} 16%|█▌ | 1593/10000 [6:14:01<32:19:01, 13.84s/it] 16%|█▌ | 1594/10000 [6:14:15<32:20:32, 13.85s/it] {'loss': 0.344, 'learning_rate': 4.205e-05, 'epoch': 2.09} 16%|█▌ | 1594/10000 [6:14:15<32:20:32, 13.85s/it] 16%|█▌ | 1595/10000 [6:14:29<32:24:35, 13.88s/it] {'loss': 0.2593, 'learning_rate': 4.2045e-05, 'epoch': 2.09} 16%|█▌ | 1595/10000 [6:14:29<32:24:35, 13.88s/it] 16%|█▌ | 1596/10000 [6:14:42<32:20:42, 13.86s/it] {'loss': 0.4171, 'learning_rate': 4.2040000000000004e-05, 'epoch': 2.09} 16%|█▌ | 1596/10000 [6:14:42<32:20:42, 13.86s/it] 16%|█▌ | 1597/10000 [6:14:56<32:19:29, 13.85s/it] {'loss': 0.2749, 'learning_rate': 4.2035e-05, 'epoch': 2.09} 16%|█▌ | 1597/10000 [6:14:56<32:19:29, 13.85s/it] 16%|█▌ | 1598/10000 [6:15:10<32:15:04, 13.82s/it] {'loss': 0.4619, 'learning_rate': 4.203e-05, 'epoch': 2.09} 16%|█▌ | 1598/10000 [6:15:10<32:15:04, 13.82s/it] 16%|█▌ | 1599/10000 [6:15:24<32:24:05, 13.88s/it] {'loss': 0.3171, 'learning_rate': 4.2025000000000005e-05, 'epoch': 2.09} 16%|█▌ | 1599/10000 [6:15:24<32:24:05, 13.88s/it] 16%|█▌ | 1600/10000 [6:15:38<32:21:18, 13.87s/it] {'loss': 0.348, 'learning_rate': 4.202e-05, 'epoch': 2.09} 16%|█▌ | 1600/10000 [6:15:38<32:21:18, 13.87s/it] 16%|█▌ | 1601/10000 [6:15:52<32:20:01, 13.86s/it] {'loss': 0.2944, 'learning_rate': 4.2015000000000003e-05, 'epoch': 2.1} 16%|█▌ | 1601/10000 [6:15:52<32:20:01, 13.86s/it] 16%|█▌ | 1602/10000 [6:16:06<32:19:09, 13.85s/it] {'loss': 0.2174, 'learning_rate': 4.201e-05, 'epoch': 2.1} 16%|█▌ | 1602/10000 [6:16:06<32:19:09, 13.85s/it] 16%|█▌ | 1603/10000 [6:16:19<32:20:30, 13.87s/it] {'loss': 0.324, 'learning_rate': 4.2005e-05, 'epoch': 2.1} 16%|█▌ | 1603/10000 [6:16:19<32:20:30, 13.87s/it] 16%|█▌ | 1604/10000 [6:16:33<32:19:03, 13.86s/it] {'loss': 0.3175, 'learning_rate': 4.2e-05, 'epoch': 2.1} 16%|█▌ | 1604/10000 [6:16:33<32:19:03, 13.86s/it] 16%|█▌ | 1605/10000 [6:16:47<32:26:09, 13.91s/it] {'loss': 0.2976, 'learning_rate': 4.1995e-05, 'epoch': 2.1} 16%|█▌ | 1605/10000 [6:16:47<32:26:09, 13.91s/it] 16%|█▌ | 1606/10000 [6:17:01<32:29:58, 13.94s/it] {'loss': 0.3703, 'learning_rate': 4.199e-05, 'epoch': 2.1} 16%|█▌ | 1606/10000 [6:17:01<32:29:58, 13.94s/it] 16%|█▌ | 1607/10000 [6:17:15<32:28:09, 13.93s/it] {'loss': 0.2903, 'learning_rate': 4.1985000000000005e-05, 'epoch': 2.1} 16%|█▌ | 1607/10000 [6:17:15<32:28:09, 13.93s/it] 16%|█▌ | 1608/10000 [6:17:29<32:25:06, 13.91s/it] {'loss': 0.4046, 'learning_rate': 4.198e-05, 'epoch': 2.1} 16%|█▌ | 1608/10000 [6:17:29<32:25:06, 13.91s/it] 16%|█▌ | 1609/10000 [6:17:43<32:19:44, 13.87s/it] {'loss': 0.3156, 'learning_rate': 4.1975000000000004e-05, 'epoch': 2.11} 16%|█▌ | 1609/10000 [6:17:43<32:19:44, 13.87s/it] 16%|█▌ | 1610/10000 [6:17:57<32:27:07, 13.92s/it] {'loss': 0.3706, 'learning_rate': 4.1970000000000006e-05, 'epoch': 2.11} 16%|█▌ | 1610/10000 [6:17:57<32:27:07, 13.92s/it] 16%|█▌ | 1611/10000 [6:18:11<32:24:12, 13.91s/it] {'loss': 0.2953, 'learning_rate': 4.1965e-05, 'epoch': 2.11} 16%|█▌ | 1611/10000 [6:18:11<32:24:12, 13.91s/it] 16%|█▌ | 1612/10000 [6:18:25<32:21:32, 13.89s/it] {'loss': 0.2764, 'learning_rate': 4.196e-05, 'epoch': 2.11} 16%|█▌ | 1612/10000 [6:18:25<32:21:32, 13.89s/it] 16%|█▌ | 1613/10000 [6:18:38<32:19:35, 13.88s/it] {'loss': 0.3728, 'learning_rate': 4.1955e-05, 'epoch': 2.11} 16%|█▌ | 1613/10000 [6:18:38<32:19:35, 13.88s/it] 16%|█▌ | 1614/10000 [6:18:52<32:23:21, 13.90s/it] {'loss': 0.2934, 'learning_rate': 4.195e-05, 'epoch': 2.11} 16%|█▌ | 1614/10000 [6:18:52<32:23:21, 13.90s/it] 16%|█▌ | 1615/10000 [6:19:06<32:21:50, 13.90s/it] {'loss': 0.3308, 'learning_rate': 4.1945e-05, 'epoch': 2.11} 16%|█▌ | 1615/10000 [6:19:06<32:21:50, 13.90s/it] 16%|█▌ | 1616/10000 [6:19:20<32:16:36, 13.86s/it] {'loss': 0.315, 'learning_rate': 4.194e-05, 'epoch': 2.12} 16%|█▌ | 1616/10000 [6:19:20<32:16:36, 13.86s/it] 16%|█▌ | 1617/10000 [6:19:34<32:12:44, 13.83s/it] {'loss': 0.3041, 'learning_rate': 4.1935000000000004e-05, 'epoch': 2.12} 16%|█▌ | 1617/10000 [6:19:34<32:12:44, 13.83s/it] 16%|█▌ | 1618/10000 [6:19:48<32:11:29, 13.83s/it] {'loss': 0.2571, 'learning_rate': 4.193e-05, 'epoch': 2.12} 16%|█▌ | 1618/10000 [6:19:48<32:11:29, 13.83s/it] 16%|█▌ | 1619/10000 [6:20:02<32:12:32, 13.84s/it] {'loss': 0.3105, 'learning_rate': 4.1925e-05, 'epoch': 2.12} 16%|█▌ | 1619/10000 [6:20:02<32:12:32, 13.84s/it] 16%|█▌ | 1620/10000 [6:20:15<32:11:06, 13.83s/it] {'loss': 0.2692, 'learning_rate': 4.1920000000000005e-05, 'epoch': 2.12} 16%|█▌ | 1620/10000 [6:20:15<32:11:06, 13.83s/it] 16%|█▌ | 1621/10000 [6:20:29<32:16:48, 13.87s/it] {'loss': 0.3936, 'learning_rate': 4.1915e-05, 'epoch': 2.12} 16%|█▌ | 1621/10000 [6:20:29<32:16:48, 13.87s/it] 16%|█▌ | 1622/10000 [6:20:43<32:16:43, 13.87s/it] {'loss': 0.2815, 'learning_rate': 4.191e-05, 'epoch': 2.12} 16%|█▌ | 1622/10000 [6:20:43<32:16:43, 13.87s/it] 16%|█▌ | 1623/10000 [6:20:57<32:16:01, 13.87s/it] {'loss': 0.3448, 'learning_rate': 4.1905e-05, 'epoch': 2.12} 16%|█▌ | 1623/10000 [6:20:57<32:16:01, 13.87s/it] 16%|█▌ | 1624/10000 [6:21:11<32:13:54, 13.85s/it] {'loss': 0.3261, 'learning_rate': 4.19e-05, 'epoch': 2.13} 16%|█▌ | 1624/10000 [6:21:11<32:13:54, 13.85s/it] 16%|█▋ | 1625/10000 [6:21:25<32:13:14, 13.85s/it] {'loss': 0.2742, 'learning_rate': 4.1895e-05, 'epoch': 2.13} 16%|█▋ | 1625/10000 [6:21:25<32:13:14, 13.85s/it] 16%|█▋ | 1626/10000 [6:21:38<32:10:03, 13.83s/it] {'loss': 0.3567, 'learning_rate': 4.189e-05, 'epoch': 2.13} 16%|█▋ | 1626/10000 [6:21:38<32:10:03, 13.83s/it] 16%|█▋ | 1627/10000 [6:21:52<32:15:07, 13.87s/it] {'loss': 0.3429, 'learning_rate': 4.1885e-05, 'epoch': 2.13} 16%|█▋ | 1627/10000 [6:21:52<32:15:07, 13.87s/it] 16%|█▋ | 1628/10000 [6:22:06<32:09:36, 13.83s/it] {'loss': 0.3237, 'learning_rate': 4.1880000000000006e-05, 'epoch': 2.13} 16%|█▋ | 1628/10000 [6:22:06<32:09:36, 13.83s/it] 16%|█▋ | 1629/10000 [6:22:20<32:10:54, 13.84s/it] {'loss': 0.3011, 'learning_rate': 4.1875e-05, 'epoch': 2.13} 16%|█▋ | 1629/10000 [6:22:20<32:10:54, 13.84s/it] 16%|█▋ | 1630/10000 [6:22:34<32:10:56, 13.84s/it] {'loss': 0.2864, 'learning_rate': 4.1870000000000004e-05, 'epoch': 2.13} 16%|█▋ | 1630/10000 [6:22:34<32:10:56, 13.84s/it] 16%|█▋ | 1631/10000 [6:22:48<32:12:49, 13.86s/it] {'loss': 0.2656, 'learning_rate': 4.1865000000000007e-05, 'epoch': 2.13} 16%|█▋ | 1631/10000 [6:22:48<32:12:49, 13.86s/it] 16%|█▋ | 1632/10000 [6:23:02<32:13:23, 13.86s/it] {'loss': 0.2928, 'learning_rate': 4.186e-05, 'epoch': 2.14} 16%|█▋ | 1632/10000 [6:23:02<32:13:23, 13.86s/it] 16%|█▋ | 1633/10000 [6:23:16<32:16:37, 13.89s/it] {'loss': 0.3293, 'learning_rate': 4.1855e-05, 'epoch': 2.14} 16%|█▋ | 1633/10000 [6:23:16<32:16:37, 13.89s/it] 16%|█▋ | 1634/10000 [6:23:29<32:13:14, 13.87s/it] {'loss': 0.3051, 'learning_rate': 4.185e-05, 'epoch': 2.14} 16%|█▋ | 1634/10000 [6:23:29<32:13:14, 13.87s/it] 16%|█▋ | 1635/10000 [6:23:43<32:09:10, 13.84s/it] {'loss': 0.3039, 'learning_rate': 4.1845000000000003e-05, 'epoch': 2.14} 16%|█▋ | 1635/10000 [6:23:43<32:09:10, 13.84s/it] 16%|█▋ | 1636/10000 [6:23:57<32:06:22, 13.82s/it] {'loss': 0.2867, 'learning_rate': 4.184e-05, 'epoch': 2.14} 16%|█▋ | 1636/10000 [6:23:57<32:06:22, 13.82s/it] 16%|█▋ | 1637/10000 [6:24:11<32:06:09, 13.82s/it] {'loss': 0.3641, 'learning_rate': 4.1835e-05, 'epoch': 2.14} 16%|█▋ | 1637/10000 [6:24:11<32:06:09, 13.82s/it] 16%|█▋ | 1638/10000 [6:24:25<32:05:59, 13.82s/it] {'loss': 0.349, 'learning_rate': 4.1830000000000004e-05, 'epoch': 2.14} 16%|█▋ | 1638/10000 [6:24:25<32:05:59, 13.82s/it] 16%|█▋ | 1639/10000 [6:24:38<32:08:39, 13.84s/it] {'loss': 0.3757, 'learning_rate': 4.1825e-05, 'epoch': 2.15} 16%|█▋ | 1639/10000 [6:24:39<32:08:39, 13.84s/it] 16%|█▋ | 1640/10000 [6:24:53<32:20:26, 13.93s/it] {'loss': 0.3391, 'learning_rate': 4.182e-05, 'epoch': 2.15} 16%|█▋ | 1640/10000 [6:24:53<32:20:26, 13.93s/it] 16%|█▋ | 1641/10000 [6:25:06<32:17:01, 13.90s/it] {'loss': 0.2923, 'learning_rate': 4.1815000000000005e-05, 'epoch': 2.15} 16%|█▋ | 1641/10000 [6:25:06<32:17:01, 13.90s/it] 16%|█▋ | 1642/10000 [6:25:20<32:11:38, 13.87s/it] {'loss': 0.3192, 'learning_rate': 4.181000000000001e-05, 'epoch': 2.15} 16%|█▋ | 1642/10000 [6:25:20<32:11:38, 13.87s/it] 16%|█▋ | 1643/10000 [6:25:34<32:13:20, 13.88s/it] {'loss': 0.3486, 'learning_rate': 4.1805e-05, 'epoch': 2.15} 16%|█▋ | 1643/10000 [6:25:34<32:13:20, 13.88s/it] 16%|█▋ | 1644/10000 [6:25:48<32:15:25, 13.90s/it] {'loss': 0.3567, 'learning_rate': 4.18e-05, 'epoch': 2.15} 16%|█▋ | 1644/10000 [6:25:48<32:15:25, 13.90s/it] 16%|█▋ | 1645/10000 [6:26:02<32:13:28, 13.88s/it] {'loss': 0.2219, 'learning_rate': 4.1795e-05, 'epoch': 2.15} 16%|█▋ | 1645/10000 [6:26:02<32:13:28, 13.88s/it] 16%|█▋ | 1646/10000 [6:26:16<32:09:01, 13.85s/it] {'loss': 0.2944, 'learning_rate': 4.179e-05, 'epoch': 2.15} 16%|█▋ | 1646/10000 [6:26:16<32:09:01, 13.85s/it] 16%|█▋ | 1647/10000 [6:26:30<32:14:55, 13.90s/it] {'loss': 0.2494, 'learning_rate': 4.1785e-05, 'epoch': 2.16} 16%|█▋ | 1647/10000 [6:26:30<32:14:55, 13.90s/it] 16%|█▋ | 1648/10000 [6:26:44<32:12:09, 13.88s/it] {'loss': 0.2891, 'learning_rate': 4.178e-05, 'epoch': 2.16} 16%|█▋ | 1648/10000 [6:26:44<32:12:09, 13.88s/it] 16%|█▋ | 1649/10000 [6:26:57<32:10:24, 13.87s/it] {'loss': 0.2877, 'learning_rate': 4.1775000000000006e-05, 'epoch': 2.16} 16%|█▋ | 1649/10000 [6:26:57<32:10:24, 13.87s/it] 16%|█▋ | 1650/10000 [6:27:11<32:10:38, 13.87s/it] {'loss': 0.2797, 'learning_rate': 4.177e-05, 'epoch': 2.16} 16%|█▋ | 1650/10000 [6:27:11<32:10:38, 13.87s/it] 17%|█▋ | 1651/10000 [6:27:25<32:07:35, 13.85s/it] {'loss': 0.3754, 'learning_rate': 4.1765000000000004e-05, 'epoch': 2.16} 17%|█▋ | 1651/10000 [6:27:25<32:07:35, 13.85s/it] 17%|█▋ | 1652/10000 [6:27:39<32:07:35, 13.85s/it] {'loss': 0.4105, 'learning_rate': 4.176000000000001e-05, 'epoch': 2.16} 17%|█▋ | 1652/10000 [6:27:39<32:07:35, 13.85s/it] 17%|█▋ | 1653/10000 [6:27:53<32:07:16, 13.85s/it] {'loss': 0.3697, 'learning_rate': 4.1755e-05, 'epoch': 2.16} 17%|█▋ | 1653/10000 [6:27:53<32:07:16, 13.85s/it] 17%|█▋ | 1654/10000 [6:28:07<32:06:38, 13.85s/it] {'loss': 0.3338, 'learning_rate': 4.175e-05, 'epoch': 2.16} 17%|█▋ | 1654/10000 [6:28:07<32:06:38, 13.85s/it] 17%|█▋ | 1655/10000 [6:28:20<32:04:04, 13.83s/it] {'loss': 0.3007, 'learning_rate': 4.1745e-05, 'epoch': 2.17} 17%|█▋ | 1655/10000 [6:28:20<32:04:04, 13.83s/it] 17%|█▋ | 1656/10000 [6:28:34<31:59:08, 13.80s/it] {'loss': 0.3441, 'learning_rate': 4.1740000000000004e-05, 'epoch': 2.17} 17%|█▋ | 1656/10000 [6:28:34<31:59:08, 13.80s/it] 17%|█▋ | 1657/10000 [6:28:48<32:04:09, 13.84s/it] {'loss': 0.3035, 'learning_rate': 4.1735e-05, 'epoch': 2.17} 17%|█▋ | 1657/10000 [6:28:48<32:04:09, 13.84s/it] 17%|█▋ | 1658/10000 [6:29:02<32:11:59, 13.90s/it] {'loss': 0.32, 'learning_rate': 4.173e-05, 'epoch': 2.17} 17%|█▋ | 1658/10000 [6:29:02<32:11:59, 13.90s/it] 17%|█▋ | 1659/10000 [6:29:16<32:08:46, 13.87s/it] {'loss': 0.442, 'learning_rate': 4.1725000000000005e-05, 'epoch': 2.17} 17%|█▋ | 1659/10000 [6:29:16<32:08:46, 13.87s/it] 17%|█▋ | 1660/10000 [6:29:30<32:05:35, 13.85s/it] {'loss': 0.3129, 'learning_rate': 4.172e-05, 'epoch': 2.17} 17%|█▋ | 1660/10000 [6:29:30<32:05:35, 13.85s/it] 17%|█▋ | 1661/10000 [6:29:44<32:02:30, 13.83s/it] {'loss': 0.2412, 'learning_rate': 4.1715e-05, 'epoch': 2.17} 17%|█▋ | 1661/10000 [6:29:44<32:02:30, 13.83s/it] 17%|█▋ | 1662/10000 [6:29:57<32:02:00, 13.83s/it] {'loss': 0.3548, 'learning_rate': 4.1710000000000006e-05, 'epoch': 2.18} 17%|█▋ | 1662/10000 [6:29:57<32:02:00, 13.83s/it] 17%|█▋ | 1663/10000 [6:30:11<32:04:39, 13.85s/it] {'loss': 0.2905, 'learning_rate': 4.1705e-05, 'epoch': 2.18} 17%|█▋ | 1663/10000 [6:30:11<32:04:39, 13.85s/it] 17%|█▋ | 1664/10000 [6:30:25<32:01:37, 13.83s/it] {'loss': 0.3689, 'learning_rate': 4.17e-05, 'epoch': 2.18} 17%|█▋ | 1664/10000 [6:30:25<32:01:37, 13.83s/it] 17%|█▋ | 1665/10000 [6:30:39<32:06:25, 13.87s/it] {'loss': 0.299, 'learning_rate': 4.1695e-05, 'epoch': 2.18} 17%|█▋ | 1665/10000 [6:30:39<32:06:25, 13.87s/it] 17%|█▋ | 1666/10000 [6:30:53<32:04:42, 13.86s/it] {'loss': 0.359, 'learning_rate': 4.169e-05, 'epoch': 2.18} 17%|█▋ | 1666/10000 [6:30:53<32:04:42, 13.86s/it] 17%|█▋ | 1667/10000 [6:31:07<31:58:31, 13.81s/it] {'loss': 0.4102, 'learning_rate': 4.1685000000000005e-05, 'epoch': 2.18} 17%|█▋ | 1667/10000 [6:31:07<31:58:31, 13.81s/it] 17%|█▋ | 1668/10000 [6:31:20<31:55:38, 13.79s/it] {'loss': 0.4641, 'learning_rate': 4.168e-05, 'epoch': 2.18} 17%|█▋ | 1668/10000 [6:31:20<31:55:38, 13.79s/it] 17%|█▋ | 1669/10000 [6:31:34<31:50:47, 13.76s/it] {'loss': 0.3554, 'learning_rate': 4.1675e-05, 'epoch': 2.18} 17%|█▋ | 1669/10000 [6:31:34<31:50:47, 13.76s/it] 17%|█▋ | 1670/10000 [6:31:48<31:56:45, 13.81s/it] {'loss': 0.2876, 'learning_rate': 4.1670000000000006e-05, 'epoch': 2.19} 17%|█▋ | 1670/10000 [6:31:48<31:56:45, 13.81s/it] 17%|█▋ | 1671/10000 [6:32:02<31:59:11, 13.83s/it] {'loss': 0.3166, 'learning_rate': 4.1665e-05, 'epoch': 2.19} 17%|█▋ | 1671/10000 [6:32:02<31:59:11, 13.83s/it] 17%|█▋ | 1672/10000 [6:32:16<32:01:30, 13.84s/it] {'loss': 0.3159, 'learning_rate': 4.1660000000000004e-05, 'epoch': 2.19} 17%|█▋ | 1672/10000 [6:32:16<32:01:30, 13.84s/it] 17%|█▋ | 1673/10000 [6:32:29<31:59:11, 13.83s/it] {'loss': 0.3192, 'learning_rate': 4.1655e-05, 'epoch': 2.19} 17%|█▋ | 1673/10000 [6:32:29<31:59:11, 13.83s/it] 17%|█▋ | 1674/10000 [6:32:43<31:56:35, 13.81s/it] {'loss': 0.2619, 'learning_rate': 4.165e-05, 'epoch': 2.19} 17%|█▋ | 1674/10000 [6:32:43<31:56:35, 13.81s/it] 17%|█▋ | 1675/10000 [6:32:57<31:53:58, 13.79s/it] {'loss': 0.304, 'learning_rate': 4.1645e-05, 'epoch': 2.19} 17%|█▋ | 1675/10000 [6:32:57<31:53:58, 13.79s/it] 17%|█▋ | 1676/10000 [6:33:11<31:51:41, 13.78s/it] {'loss': 0.2741, 'learning_rate': 4.164e-05, 'epoch': 2.19} 17%|█▋ | 1676/10000 [6:33:11<31:51:41, 13.78s/it] 17%|█▋ | 1677/10000 [6:33:25<31:57:57, 13.83s/it] {'loss': 0.2538, 'learning_rate': 4.1635000000000004e-05, 'epoch': 2.2} 17%|█▋ | 1677/10000 [6:33:25<31:57:57, 13.83s/it] 17%|█▋ | 1678/10000 [6:33:38<31:56:30, 13.82s/it] {'loss': 0.3124, 'learning_rate': 4.163e-05, 'epoch': 2.2} 17%|█▋ | 1678/10000 [6:33:38<31:56:30, 13.82s/it] 17%|█▋ | 1679/10000 [6:33:52<31:53:08, 13.80s/it] {'loss': 0.2812, 'learning_rate': 4.1625e-05, 'epoch': 2.2} 17%|█▋ | 1679/10000 [6:33:52<31:53:08, 13.80s/it] 17%|█▋ | 1680/10000 [6:34:06<31:59:29, 13.84s/it] {'loss': 0.2628, 'learning_rate': 4.1620000000000005e-05, 'epoch': 2.2} 17%|█▋ | 1680/10000 [6:34:06<31:59:29, 13.84s/it] 17%|█▋ | 1681/10000 [6:34:20<31:59:15, 13.84s/it] {'loss': 0.3342, 'learning_rate': 4.161500000000001e-05, 'epoch': 2.2} 17%|█▋ | 1681/10000 [6:34:20<31:59:15, 13.84s/it] 17%|█▋ | 1682/10000 [6:34:34<32:00:48, 13.86s/it] {'loss': 0.3308, 'learning_rate': 4.161e-05, 'epoch': 2.2} 17%|█▋ | 1682/10000 [6:34:34<32:00:48, 13.86s/it] 17%|█▋ | 1683/10000 [6:34:48<32:00:44, 13.86s/it] {'loss': 0.3962, 'learning_rate': 4.1605e-05, 'epoch': 2.2} 17%|█▋ | 1683/10000 [6:34:48<32:00:44, 13.86s/it] 17%|█▋ | 1684/10000 [6:35:02<32:00:28, 13.86s/it] {'loss': 0.3415, 'learning_rate': 4.16e-05, 'epoch': 2.2} 17%|█▋ | 1684/10000 [6:35:02<32:00:28, 13.86s/it] 17%|█▋ | 1685/10000 [6:35:16<32:05:45, 13.90s/it] {'loss': 0.2747, 'learning_rate': 4.1595e-05, 'epoch': 2.21} 17%|█▋ | 1685/10000 [6:35:16<32:05:45, 13.90s/it] 17%|█▋ | 1686/10000 [6:35:29<32:03:23, 13.88s/it] {'loss': 0.2318, 'learning_rate': 4.159e-05, 'epoch': 2.21} 17%|█▋ | 1686/10000 [6:35:29<32:03:23, 13.88s/it] 17%|█▋ | 1687/10000 [6:35:43<31:59:27, 13.85s/it] {'loss': 0.3392, 'learning_rate': 4.1585e-05, 'epoch': 2.21} 17%|█▋ | 1687/10000 [6:35:43<31:59:27, 13.85s/it] 17%|█▋ | 1688/10000 [6:35:57<32:05:38, 13.90s/it] {'loss': 0.2673, 'learning_rate': 4.1580000000000005e-05, 'epoch': 2.21} 17%|█▋ | 1688/10000 [6:35:57<32:05:38, 13.90s/it] 17%|█▋ | 1689/10000 [6:36:11<32:05:58, 13.90s/it] {'loss': 0.3236, 'learning_rate': 4.1575e-05, 'epoch': 2.21} 17%|█▋ | 1689/10000 [6:36:11<32:05:58, 13.90s/it] 17%|█▋ | 1690/10000 [6:36:25<32:03:39, 13.89s/it] {'loss': 0.3188, 'learning_rate': 4.1570000000000003e-05, 'epoch': 2.21} 17%|█▋ | 1690/10000 [6:36:25<32:03:39, 13.89s/it] 17%|█▋ | 1691/10000 [6:36:39<32:03:26, 13.89s/it] {'loss': 0.3549, 'learning_rate': 4.1565000000000006e-05, 'epoch': 2.21} 17%|█▋ | 1691/10000 [6:36:39<32:03:26, 13.89s/it] 17%|█▋ | 1692/10000 [6:36:53<32:06:37, 13.91s/it] {'loss': 0.3499, 'learning_rate': 4.156e-05, 'epoch': 2.21} 17%|█▋ | 1692/10000 [6:36:53<32:06:37, 13.91s/it] 17%|█▋ | 1693/10000 [6:37:07<31:59:08, 13.86s/it] {'loss': 0.2864, 'learning_rate': 4.1555e-05, 'epoch': 2.22} 17%|█▋ | 1693/10000 [6:37:07<31:59:08, 13.86s/it] 17%|█▋ | 1694/10000 [6:37:21<32:01:31, 13.88s/it] {'loss': 0.2763, 'learning_rate': 4.155e-05, 'epoch': 2.22} 17%|█▋ | 1694/10000 [6:37:21<32:01:31, 13.88s/it] 17%|█▋ | 1695/10000 [6:37:34<31:59:53, 13.87s/it] {'loss': 0.2873, 'learning_rate': 4.1545e-05, 'epoch': 2.22} 17%|█▋ | 1695/10000 [6:37:34<31:59:53, 13.87s/it] 17%|█▋ | 1696/10000 [6:37:48<32:00:07, 13.87s/it] {'loss': 0.323, 'learning_rate': 4.154e-05, 'epoch': 2.22} 17%|█▋ | 1696/10000 [6:37:48<32:00:07, 13.87s/it] 17%|█▋ | 1697/10000 [6:38:02<32:04:17, 13.91s/it] {'loss': 0.348, 'learning_rate': 4.1535e-05, 'epoch': 2.22} 17%|█▋ | 1697/10000 [6:38:02<32:04:17, 13.91s/it] 17%|█▋ | 1698/10000 [6:38:16<32:13:04, 13.97s/it] {'loss': 0.3022, 'learning_rate': 4.1530000000000004e-05, 'epoch': 2.22} 17%|█▋ | 1698/10000 [6:38:16<32:13:04, 13.97s/it] 17%|█▋ | 1699/10000 [6:38:30<32:02:17, 13.89s/it] {'loss': 0.2634, 'learning_rate': 4.1525e-05, 'epoch': 2.22} 17%|█▋ | 1699/10000 [6:38:30<32:02:17, 13.89s/it] 17%|█▋ | 1700/10000 [6:38:44<31:57:21, 13.86s/it] {'loss': 0.3013, 'learning_rate': 4.152e-05, 'epoch': 2.23} 17%|█▋ | 1700/10000 [6:38:44<31:57:21, 13.86s/it] 17%|█▋ | 1701/10000 [6:38:58<31:54:34, 13.84s/it] {'loss': 0.2834, 'learning_rate': 4.1515000000000005e-05, 'epoch': 2.23} 17%|█▋ | 1701/10000 [6:38:58<31:54:34, 13.84s/it] 17%|█▋ | 1702/10000 [6:39:11<31:51:07, 13.82s/it] {'loss': 0.3295, 'learning_rate': 4.151000000000001e-05, 'epoch': 2.23} 17%|█▋ | 1702/10000 [6:39:11<31:51:07, 13.82s/it] 17%|█▋ | 1703/10000 [6:39:25<31:51:27, 13.82s/it] {'loss': 0.2767, 'learning_rate': 4.1504999999999996e-05, 'epoch': 2.23} 17%|█▋ | 1703/10000 [6:39:25<31:51:27, 13.82s/it] 17%|█▋ | 1704/10000 [6:39:39<31:51:34, 13.83s/it] {'loss': 0.2919, 'learning_rate': 4.15e-05, 'epoch': 2.23} 17%|█▋ | 1704/10000 [6:39:39<31:51:34, 13.83s/it] 17%|█▋ | 1705/10000 [6:39:53<31:51:07, 13.82s/it] {'loss': 0.3793, 'learning_rate': 4.1495e-05, 'epoch': 2.23} 17%|█▋ | 1705/10000 [6:39:53<31:51:07, 13.82s/it] 17%|█▋ | 1706/10000 [6:40:07<31:57:55, 13.87s/it] {'loss': 0.3633, 'learning_rate': 4.1490000000000004e-05, 'epoch': 2.23} 17%|█▋ | 1706/10000 [6:40:07<31:57:55, 13.87s/it] 17%|█▋ | 1707/10000 [6:40:21<32:05:59, 13.93s/it] {'loss': 0.3292, 'learning_rate': 4.1485e-05, 'epoch': 2.23} 17%|█▋ | 1707/10000 [6:40:21<32:05:59, 13.93s/it] 17%|█▋ | 1708/10000 [6:40:35<32:03:01, 13.91s/it] {'loss': 0.2708, 'learning_rate': 4.148e-05, 'epoch': 2.24} 17%|█▋ | 1708/10000 [6:40:35<32:03:01, 13.91s/it] 17%|█▋ | 1709/10000 [6:40:49<32:00:19, 13.90s/it] {'loss': 0.3505, 'learning_rate': 4.1475000000000005e-05, 'epoch': 2.24} 17%|█▋ | 1709/10000 [6:40:49<32:00:19, 13.90s/it] 17%|█▋ | 1710/10000 [6:41:03<31:57:04, 13.88s/it] {'loss': 0.3505, 'learning_rate': 4.147e-05, 'epoch': 2.24} 17%|█▋ | 1710/10000 [6:41:03<31:57:04, 13.88s/it] 17%|█▋ | 1711/10000 [6:41:16<31:58:22, 13.89s/it] {'loss': 0.4256, 'learning_rate': 4.1465000000000004e-05, 'epoch': 2.24} 17%|█▋ | 1711/10000 [6:41:16<31:58:22, 13.89s/it] 17%|█▋ | 1712/10000 [6:41:30<31:54:52, 13.86s/it] {'loss': 0.2889, 'learning_rate': 4.1460000000000006e-05, 'epoch': 2.24} 17%|█▋ | 1712/10000 [6:41:30<31:54:52, 13.86s/it] 17%|█▋ | 1713/10000 [6:41:44<31:52:03, 13.84s/it] {'loss': 0.2609, 'learning_rate': 4.1455e-05, 'epoch': 2.24} 17%|█▋ | 1713/10000 [6:41:44<31:52:03, 13.84s/it] 17%|█▋ | 1714/10000 [6:41:58<31:51:36, 13.84s/it] {'loss': 0.3167, 'learning_rate': 4.145e-05, 'epoch': 2.24} 17%|█▋ | 1714/10000 [6:41:58<31:51:36, 13.84s/it] 17%|█▋ | 1715/10000 [6:42:12<31:53:22, 13.86s/it] {'loss': 0.357, 'learning_rate': 4.1445e-05, 'epoch': 2.24} 17%|█▋ | 1715/10000 [6:42:12<31:53:22, 13.86s/it] 17%|█▋ | 1716/10000 [6:42:25<31:47:27, 13.82s/it] {'loss': 0.2885, 'learning_rate': 4.144e-05, 'epoch': 2.25} 17%|█▋ | 1716/10000 [6:42:26<31:47:27, 13.82s/it] 17%|█▋ | 1717/10000 [6:42:39<31:53:31, 13.86s/it] {'loss': 0.2972, 'learning_rate': 4.1435e-05, 'epoch': 2.25} 17%|█▋ | 1717/10000 [6:42:39<31:53:31, 13.86s/it] 17%|█▋ | 1718/10000 [6:42:53<31:49:56, 13.84s/it] {'loss': 0.3058, 'learning_rate': 4.143e-05, 'epoch': 2.25} 17%|█▋ | 1718/10000 [6:42:53<31:49:56, 13.84s/it] 17%|█▋ | 1719/10000 [6:43:07<31:48:03, 13.82s/it] {'loss': 0.3495, 'learning_rate': 4.1425000000000004e-05, 'epoch': 2.25} 17%|█▋ | 1719/10000 [6:43:07<31:48:03, 13.82s/it] 17%|█▋ | 1720/10000 [6:43:21<31:51:07, 13.85s/it] {'loss': 0.3858, 'learning_rate': 4.142000000000001e-05, 'epoch': 2.25} 17%|█▋ | 1720/10000 [6:43:21<31:51:07, 13.85s/it] 17%|█▋ | 1721/10000 [6:43:35<31:47:41, 13.83s/it] {'loss': 0.3398, 'learning_rate': 4.1415e-05, 'epoch': 2.25} 17%|█▋ | 1721/10000 [6:43:35<31:47:41, 13.83s/it] 17%|█▋ | 1722/10000 [6:43:48<31:44:47, 13.81s/it] {'loss': 0.3383, 'learning_rate': 4.1410000000000005e-05, 'epoch': 2.25} 17%|█▋ | 1722/10000 [6:43:48<31:44:47, 13.81s/it] 17%|█▋ | 1723/10000 [6:44:02<31:46:18, 13.82s/it] {'loss': 0.3423, 'learning_rate': 4.1405e-05, 'epoch': 2.26} 17%|█▋ | 1723/10000 [6:44:02<31:46:18, 13.82s/it] 17%|█▋ | 1724/10000 [6:44:16<31:55:38, 13.89s/it] {'loss': 0.3133, 'learning_rate': 4.14e-05, 'epoch': 2.26} 17%|█▋ | 1724/10000 [6:44:16<31:55:38, 13.89s/it] 17%|█▋ | 1725/10000 [6:44:30<31:52:46, 13.87s/it] {'loss': 0.3498, 'learning_rate': 4.1395e-05, 'epoch': 2.26} 17%|█▋ | 1725/10000 [6:44:30<31:52:46, 13.87s/it] 17%|█▋ | 1726/10000 [6:44:44<31:48:35, 13.84s/it] {'loss': 0.3363, 'learning_rate': 4.139e-05, 'epoch': 2.26} 17%|█▋ | 1726/10000 [6:44:44<31:48:35, 13.84s/it] 17%|█▋ | 1727/10000 [6:44:58<31:45:21, 13.82s/it] {'loss': 0.36, 'learning_rate': 4.1385000000000004e-05, 'epoch': 2.26} 17%|█▋ | 1727/10000 [6:44:58<31:45:21, 13.82s/it] 17%|█▋ | 1728/10000 [6:45:12<31:50:45, 13.86s/it] {'loss': 0.258, 'learning_rate': 4.138e-05, 'epoch': 2.26} 17%|█▋ | 1728/10000 [6:45:12<31:50:45, 13.86s/it] 17%|█▋ | 1729/10000 [6:45:26<31:51:53, 13.87s/it] {'loss': 0.342, 'learning_rate': 4.1375e-05, 'epoch': 2.26} 17%|█▋ | 1729/10000 [6:45:26<31:51:53, 13.87s/it] 17%|█▋ | 1730/10000 [6:45:39<31:54:00, 13.89s/it] {'loss': 0.347, 'learning_rate': 4.1370000000000005e-05, 'epoch': 2.26} 17%|█▋ | 1730/10000 [6:45:40<31:54:00, 13.89s/it] 17%|█▋ | 1731/10000 [6:45:53<31:50:49, 13.87s/it] {'loss': 0.2887, 'learning_rate': 4.1365e-05, 'epoch': 2.27} 17%|█▋ | 1731/10000 [6:45:53<31:50:49, 13.87s/it] 17%|█▋ | 1732/10000 [6:46:07<31:50:58, 13.87s/it] {'loss': 0.2634, 'learning_rate': 4.1360000000000004e-05, 'epoch': 2.27} 17%|█▋ | 1732/10000 [6:46:07<31:50:58, 13.87s/it] 17%|█▋ | 1733/10000 [6:46:21<31:51:19, 13.87s/it] {'loss': 0.3289, 'learning_rate': 4.1355e-05, 'epoch': 2.27} 17%|█▋ | 1733/10000 [6:46:21<31:51:19, 13.87s/it] 17%|█▋ | 1734/10000 [6:46:35<31:49:41, 13.86s/it] {'loss': 0.2948, 'learning_rate': 4.135e-05, 'epoch': 2.27} 17%|█▋ | 1734/10000 [6:46:35<31:49:41, 13.86s/it] 17%|█▋ | 1735/10000 [6:46:49<31:51:14, 13.87s/it] {'loss': 0.3676, 'learning_rate': 4.1345e-05, 'epoch': 2.27} 17%|█▋ | 1735/10000 [6:46:49<31:51:14, 13.87s/it] 17%|█▋ | 1736/10000 [6:47:03<31:51:21, 13.88s/it] {'loss': 0.2393, 'learning_rate': 4.134e-05, 'epoch': 2.27} 17%|█▋ | 1736/10000 [6:47:03<31:51:21, 13.88s/it] 17%|█▋ | 1737/10000 [6:47:17<31:54:25, 13.90s/it] {'loss': 0.3351, 'learning_rate': 4.1335e-05, 'epoch': 2.27} 17%|█▋ | 1737/10000 [6:47:17<31:54:25, 13.90s/it] 17%|█▋ | 1738/10000 [6:47:31<31:52:46, 13.89s/it] {'loss': 0.3519, 'learning_rate': 4.133e-05, 'epoch': 2.27} 17%|█▋ | 1738/10000 [6:47:31<31:52:46, 13.89s/it] 17%|█▋ | 1739/10000 [6:47:44<31:47:28, 13.85s/it] {'loss': 0.3845, 'learning_rate': 4.1325e-05, 'epoch': 2.28} 17%|█▋ | 1739/10000 [6:47:44<31:47:28, 13.85s/it] 17%|█▋ | 1740/10000 [6:47:58<31:43:53, 13.83s/it] {'loss': 0.2989, 'learning_rate': 4.1320000000000004e-05, 'epoch': 2.28} 17%|█▋ | 1740/10000 [6:47:58<31:43:53, 13.83s/it] 17%|█▋ | 1741/10000 [6:48:12<31:40:11, 13.80s/it] {'loss': 0.3242, 'learning_rate': 4.131500000000001e-05, 'epoch': 2.28} 17%|█▋ | 1741/10000 [6:48:12<31:40:11, 13.80s/it] 17%|█▋ | 1742/10000 [6:48:26<31:38:20, 13.79s/it] {'loss': 0.3054, 'learning_rate': 4.131e-05, 'epoch': 2.28} 17%|█▋ | 1742/10000 [6:48:26<31:38:20, 13.79s/it] 17%|█▋ | 1743/10000 [6:48:39<31:39:04, 13.80s/it] {'loss': 0.2942, 'learning_rate': 4.1305e-05, 'epoch': 2.28} 17%|█▋ | 1743/10000 [6:48:39<31:39:04, 13.80s/it] 17%|█▋ | 1744/10000 [6:48:53<31:37:38, 13.79s/it] {'loss': 0.2807, 'learning_rate': 4.13e-05, 'epoch': 2.28} 17%|█▋ | 1744/10000 [6:48:53<31:37:38, 13.79s/it] 17%|█▋ | 1745/10000 [6:49:07<31:35:00, 13.77s/it] {'loss': 0.2782, 'learning_rate': 4.1295000000000004e-05, 'epoch': 2.28} 17%|█▋ | 1745/10000 [6:49:07<31:35:00, 13.77s/it] 17%|█▋ | 1746/10000 [6:49:21<31:45:18, 13.85s/it] {'loss': 0.3007, 'learning_rate': 4.129e-05, 'epoch': 2.29} 17%|█▋ | 1746/10000 [6:49:21<31:45:18, 13.85s/it] 17%|█▋ | 1747/10000 [6:49:35<31:46:30, 13.86s/it] {'loss': 0.3307, 'learning_rate': 4.1285e-05, 'epoch': 2.29} 17%|█▋ | 1747/10000 [6:49:35<31:46:30, 13.86s/it] 17%|█▋ | 1748/10000 [6:49:49<31:46:20, 13.86s/it] {'loss': 0.3663, 'learning_rate': 4.1280000000000005e-05, 'epoch': 2.29} 17%|█▋ | 1748/10000 [6:49:49<31:46:20, 13.86s/it] 17%|█▋ | 1749/10000 [6:50:02<31:41:41, 13.83s/it] {'loss': 0.4097, 'learning_rate': 4.1275e-05, 'epoch': 2.29} 17%|█▋ | 1749/10000 [6:50:02<31:41:41, 13.83s/it] 18%|█▊ | 1750/10000 [6:50:16<31:47:11, 13.87s/it] {'loss': 0.2834, 'learning_rate': 4.127e-05, 'epoch': 2.29} 18%|█▊ | 1750/10000 [6:50:16<31:47:11, 13.87s/it] 18%|█▊ | 1751/10000 [6:50:30<31:42:48, 13.84s/it] {'loss': 0.2654, 'learning_rate': 4.1265000000000006e-05, 'epoch': 2.29} 18%|█▊ | 1751/10000 [6:50:30<31:42:48, 13.84s/it] 18%|█▊ | 1752/10000 [6:50:44<31:46:12, 13.87s/it] {'loss': 0.3431, 'learning_rate': 4.126e-05, 'epoch': 2.29} 18%|█▊ | 1752/10000 [6:50:44<31:46:12, 13.87s/it] 18%|█▊ | 1753/10000 [6:50:58<31:44:59, 13.86s/it] {'loss': 0.3615, 'learning_rate': 4.1255e-05, 'epoch': 2.29} 18%|█▊ | 1753/10000 [6:50:58<31:44:59, 13.86s/it] 18%|█▊ | 1754/10000 [6:51:12<31:47:54, 13.88s/it] {'loss': 0.3973, 'learning_rate': 4.125e-05, 'epoch': 2.3} 18%|█▊ | 1754/10000 [6:51:12<31:47:54, 13.88s/it] 18%|█▊ | 1755/10000 [6:51:26<31:44:43, 13.86s/it] {'loss': 0.2994, 'learning_rate': 4.1245e-05, 'epoch': 2.3} 18%|█▊ | 1755/10000 [6:51:26<31:44:43, 13.86s/it] 18%|█▊ | 1756/10000 [6:51:39<31:42:13, 13.84s/it] {'loss': 0.2908, 'learning_rate': 4.124e-05, 'epoch': 2.3} 18%|█▊ | 1756/10000 [6:51:39<31:42:13, 13.84s/it] 18%|█▊ | 1757/10000 [6:51:53<31:40:56, 13.84s/it] {'loss': 0.3081, 'learning_rate': 4.1235e-05, 'epoch': 2.3} 18%|█▊ | 1757/10000 [6:51:53<31:40:56, 13.84s/it] 18%|█▊ | 1758/10000 [6:52:07<31:40:40, 13.84s/it] {'loss': 0.4095, 'learning_rate': 4.123e-05, 'epoch': 2.3} 18%|█▊ | 1758/10000 [6:52:07<31:40:40, 13.84s/it] 18%|█▊ | 1759/10000 [6:52:21<31:35:35, 13.80s/it] {'loss': 0.3492, 'learning_rate': 4.1225e-05, 'epoch': 2.3} 18%|█▊ | 1759/10000 [6:52:21<31:35:35, 13.80s/it] 18%|█▊ | 1760/10000 [6:52:35<31:40:07, 13.84s/it] {'loss': 0.3958, 'learning_rate': 4.122e-05, 'epoch': 2.3} 18%|█▊ | 1760/10000 [6:52:35<31:40:07, 13.84s/it] 18%|█▊ | 1761/10000 [6:52:49<31:41:14, 13.85s/it] {'loss': 0.2831, 'learning_rate': 4.1215000000000004e-05, 'epoch': 2.3} 18%|█▊ | 1761/10000 [6:52:49<31:41:14, 13.85s/it] 18%|█▊ | 1762/10000 [6:53:03<31:43:52, 13.87s/it] {'loss': 0.304, 'learning_rate': 4.121000000000001e-05, 'epoch': 2.31} 18%|█▊ | 1762/10000 [6:53:03<31:43:52, 13.87s/it] 18%|█▊ | 1763/10000 [6:53:16<31:44:32, 13.87s/it] {'loss': 0.3452, 'learning_rate': 4.1205e-05, 'epoch': 2.31} 18%|█▊ | 1763/10000 [6:53:16<31:44:32, 13.87s/it] 18%|█▊ | 1764/10000 [6:53:30<31:42:41, 13.86s/it] {'loss': 0.3454, 'learning_rate': 4.12e-05, 'epoch': 2.31} 18%|█▊ | 1764/10000 [6:53:30<31:42:41, 13.86s/it] 18%|█▊ | 1765/10000 [6:53:44<31:41:42, 13.86s/it] {'loss': 0.3856, 'learning_rate': 4.1195e-05, 'epoch': 2.31} 18%|█▊ | 1765/10000 [6:53:44<31:41:42, 13.86s/it] 18%|█▊ | 1766/10000 [6:53:58<31:43:05, 13.87s/it] {'loss': 0.3027, 'learning_rate': 4.1190000000000004e-05, 'epoch': 2.31} 18%|█▊ | 1766/10000 [6:53:58<31:43:05, 13.87s/it] 18%|█▊ | 1767/10000 [6:54:12<31:44:02, 13.88s/it] {'loss': 0.3281, 'learning_rate': 4.1185e-05, 'epoch': 2.31} 18%|█▊ | 1767/10000 [6:54:12<31:44:02, 13.88s/it] 18%|█▊ | 1768/10000 [6:54:26<31:44:27, 13.88s/it] {'loss': 0.243, 'learning_rate': 4.118e-05, 'epoch': 2.31} 18%|█▊ | 1768/10000 [6:54:26<31:44:27, 13.88s/it] 18%|█▊ | 1769/10000 [6:54:40<31:44:41, 13.88s/it] {'loss': 0.3376, 'learning_rate': 4.1175000000000005e-05, 'epoch': 2.32} 18%|█▊ | 1769/10000 [6:54:40<31:44:41, 13.88s/it] 18%|█▊ | 1770/10000 [6:54:53<31:41:04, 13.86s/it] {'loss': 0.283, 'learning_rate': 4.117e-05, 'epoch': 2.32} 18%|█▊ | 1770/10000 [6:54:54<31:41:04, 13.86s/it] 18%|█▊ | 1771/10000 [6:55:07<31:39:48, 13.85s/it] {'loss': 0.2676, 'learning_rate': 4.1165e-05, 'epoch': 2.32} 18%|█▊ | 1771/10000 [6:55:07<31:39:48, 13.85s/it] 18%|█▊ | 1772/10000 [6:55:21<31:37:24, 13.84s/it] {'loss': 0.2968, 'learning_rate': 4.1160000000000006e-05, 'epoch': 2.32} 18%|█▊ | 1772/10000 [6:55:21<31:37:24, 13.84s/it] 18%|█▊ | 1773/10000 [6:55:35<31:33:42, 13.81s/it] {'loss': 0.3453, 'learning_rate': 4.1155e-05, 'epoch': 2.32} 18%|█▊ | 1773/10000 [6:55:35<31:33:42, 13.81s/it] 18%|█▊ | 1774/10000 [6:55:49<31:36:29, 13.83s/it] {'loss': 0.3079, 'learning_rate': 4.115e-05, 'epoch': 2.32} 18%|█▊ | 1774/10000 [6:55:49<31:36:29, 13.83s/it] 18%|█▊ | 1775/10000 [6:56:03<31:42:19, 13.88s/it] {'loss': 0.3592, 'learning_rate': 4.1145e-05, 'epoch': 2.32} 18%|█▊ | 1775/10000 [6:56:03<31:42:19, 13.88s/it] 18%|█▊ | 1776/10000 [6:56:17<31:39:40, 13.86s/it] {'loss': 0.3574, 'learning_rate': 4.114e-05, 'epoch': 2.32} 18%|█▊ | 1776/10000 [6:56:17<31:39:40, 13.86s/it] 18%|█▊ | 1777/10000 [6:56:30<31:32:01, 13.81s/it] {'loss': 0.313, 'learning_rate': 4.1135e-05, 'epoch': 2.33} 18%|█▊ | 1777/10000 [6:56:30<31:32:01, 13.81s/it] 18%|█▊ | 1778/10000 [6:56:44<31:39:33, 13.86s/it] {'loss': 0.3518, 'learning_rate': 4.113e-05, 'epoch': 2.33} 18%|█▊ | 1778/10000 [6:56:44<31:39:33, 13.86s/it] 18%|█▊ | 1779/10000 [6:56:58<31:37:00, 13.85s/it] {'loss': 0.3197, 'learning_rate': 4.1125000000000004e-05, 'epoch': 2.33} 18%|█▊ | 1779/10000 [6:56:58<31:37:00, 13.85s/it] 18%|█▊ | 1780/10000 [6:57:12<31:43:14, 13.89s/it] {'loss': 0.3873, 'learning_rate': 4.1120000000000006e-05, 'epoch': 2.33} 18%|█▊ | 1780/10000 [6:57:12<31:43:14, 13.89s/it] 18%|█▊ | 1781/10000 [6:57:26<31:36:26, 13.84s/it] {'loss': 0.2871, 'learning_rate': 4.1115e-05, 'epoch': 2.33} 18%|█▊ | 1781/10000 [6:57:26<31:36:26, 13.84s/it] 18%|█▊ | 1782/10000 [6:57:40<31:39:08, 13.87s/it] {'loss': 0.3961, 'learning_rate': 4.1110000000000005e-05, 'epoch': 2.33} 18%|█▊ | 1782/10000 [6:57:40<31:39:08, 13.87s/it] 18%|█▊ | 1783/10000 [6:57:54<31:38:10, 13.86s/it] {'loss': 0.3354, 'learning_rate': 4.110500000000001e-05, 'epoch': 2.33} 18%|█▊ | 1783/10000 [6:57:54<31:38:10, 13.86s/it] 18%|█▊ | 1784/10000 [6:58:07<31:40:13, 13.88s/it] {'loss': 0.2912, 'learning_rate': 4.11e-05, 'epoch': 2.34} 18%|█▊ | 1784/10000 [6:58:07<31:40:13, 13.88s/it] 18%|█▊ | 1785/10000 [6:58:21<31:33:48, 13.83s/it] {'loss': 0.3255, 'learning_rate': 4.1095e-05, 'epoch': 2.34} 18%|█▊ | 1785/10000 [6:58:21<31:33:48, 13.83s/it] 18%|█▊ | 1786/10000 [6:58:35<31:35:08, 13.84s/it] {'loss': 0.3752, 'learning_rate': 4.109e-05, 'epoch': 2.34} 18%|█▊ | 1786/10000 [6:58:35<31:35:08, 13.84s/it] 18%|█▊ | 1787/10000 [6:58:49<31:29:31, 13.80s/it] {'loss': 0.3857, 'learning_rate': 4.1085000000000004e-05, 'epoch': 2.34} 18%|█▊ | 1787/10000 [6:58:49<31:29:31, 13.80s/it] 18%|█▊ | 1788/10000 [6:59:03<31:27:38, 13.79s/it] {'loss': 0.3509, 'learning_rate': 4.108e-05, 'epoch': 2.34} 18%|█▊ | 1788/10000 [6:59:03<31:27:38, 13.79s/it] 18%|█▊ | 1789/10000 [6:59:16<31:26:50, 13.79s/it] {'loss': 0.288, 'learning_rate': 4.1075e-05, 'epoch': 2.34} 18%|█▊ | 1789/10000 [6:59:16<31:26:50, 13.79s/it] 18%|█▊ | 1790/10000 [6:59:30<31:28:13, 13.80s/it] {'loss': 0.3108, 'learning_rate': 4.1070000000000005e-05, 'epoch': 2.34} 18%|█▊ | 1790/10000 [6:59:30<31:28:13, 13.80s/it] 18%|█▊ | 1791/10000 [6:59:44<31:26:00, 13.78s/it] {'loss': 0.3539, 'learning_rate': 4.1065e-05, 'epoch': 2.34} 18%|█▊ | 1791/10000 [6:59:44<31:26:00, 13.78s/it] 18%|█▊ | 1792/10000 [6:59:58<31:31:19, 13.83s/it] {'loss': 0.3774, 'learning_rate': 4.106e-05, 'epoch': 2.35} 18%|█▊ | 1792/10000 [6:59:58<31:31:19, 13.83s/it] 18%|█▊ | 1793/10000 [7:00:12<31:32:09, 13.83s/it] {'loss': 0.3917, 'learning_rate': 4.1055000000000006e-05, 'epoch': 2.35} 18%|█▊ | 1793/10000 [7:00:12<31:32:09, 13.83s/it] 18%|█▊ | 1794/10000 [7:00:25<31:30:55, 13.83s/it] {'loss': 0.4233, 'learning_rate': 4.105e-05, 'epoch': 2.35} 18%|█▊ | 1794/10000 [7:00:25<31:30:55, 13.83s/it] 18%|█▊ | 1795/10000 [7:00:39<31:34:52, 13.86s/it] {'loss': 0.3654, 'learning_rate': 4.1045e-05, 'epoch': 2.35} 18%|█▊ | 1795/10000 [7:00:39<31:34:52, 13.86s/it] 18%|█▊ | 1796/10000 [7:00:53<31:34:31, 13.86s/it] {'loss': 0.291, 'learning_rate': 4.104e-05, 'epoch': 2.35} 18%|█▊ | 1796/10000 [7:00:53<31:34:31, 13.86s/it] 18%|█▊ | 1797/10000 [7:01:07<31:39:29, 13.89s/it] {'loss': 0.3032, 'learning_rate': 4.1035e-05, 'epoch': 2.35} 18%|█▊ | 1797/10000 [7:01:07<31:39:29, 13.89s/it] 18%|█▊ | 1798/10000 [7:01:21<31:34:16, 13.86s/it] {'loss': 0.3674, 'learning_rate': 4.103e-05, 'epoch': 2.35} 18%|█▊ | 1798/10000 [7:01:21<31:34:16, 13.86s/it] 18%|█▊ | 1799/10000 [7:01:35<31:33:20, 13.85s/it] {'loss': 0.3337, 'learning_rate': 4.1025e-05, 'epoch': 2.35} 18%|█▊ | 1799/10000 [7:01:35<31:33:20, 13.85s/it] 18%|█▊ | 1800/10000 [7:01:49<31:36:13, 13.87s/it] {'loss': 0.3046, 'learning_rate': 4.1020000000000004e-05, 'epoch': 2.36} 18%|█▊ | 1800/10000 [7:01:49<31:36:13, 13.87s/it] 18%|█▊ | 1801/10000 [7:02:03<31:31:37, 13.84s/it] {'loss': 0.3291, 'learning_rate': 4.1015000000000006e-05, 'epoch': 2.36} 18%|█▊ | 1801/10000 [7:02:03<31:31:37, 13.84s/it] 18%|█▊ | 1802/10000 [7:02:17<31:37:24, 13.89s/it] {'loss': 0.3165, 'learning_rate': 4.101e-05, 'epoch': 2.36} 18%|█▊ | 1802/10000 [7:02:17<31:37:24, 13.89s/it] 18%|█▊ | 1803/10000 [7:02:30<31:33:52, 13.86s/it] {'loss': 0.3604, 'learning_rate': 4.1005000000000005e-05, 'epoch': 2.36} 18%|█▊ | 1803/10000 [7:02:30<31:33:52, 13.86s/it] 18%|█▊ | 1804/10000 [7:02:44<31:33:11, 13.86s/it] {'loss': 0.2907, 'learning_rate': 4.1e-05, 'epoch': 2.36} 18%|█▊ | 1804/10000 [7:02:44<31:33:11, 13.86s/it] 18%|█▊ | 1805/10000 [7:02:58<31:37:52, 13.90s/it] {'loss': 0.2677, 'learning_rate': 4.0995e-05, 'epoch': 2.36} 18%|█▊ | 1805/10000 [7:02:58<31:37:52, 13.90s/it] 18%|█▊ | 1806/10000 [7:03:12<31:36:27, 13.89s/it] {'loss': 0.289, 'learning_rate': 4.099e-05, 'epoch': 2.36} 18%|█▊ | 1806/10000 [7:03:12<31:36:27, 13.89s/it] 18%|█▊ | 1807/10000 [7:03:26<31:33:47, 13.87s/it] {'loss': 0.4074, 'learning_rate': 4.0985e-05, 'epoch': 2.37} 18%|█▊ | 1807/10000 [7:03:26<31:33:47, 13.87s/it] 18%|█▊ | 1808/10000 [7:03:40<31:39:05, 13.91s/it] {'loss': 0.3201, 'learning_rate': 4.0980000000000004e-05, 'epoch': 2.37} 18%|█▊ | 1808/10000 [7:03:40<31:39:05, 13.91s/it] 18%|█▊ | 1809/10000 [7:03:54<31:37:21, 13.90s/it] {'loss': 0.3037, 'learning_rate': 4.0975e-05, 'epoch': 2.37} 18%|█▊ | 1809/10000 [7:03:54<31:37:21, 13.90s/it] 18%|█▊ | 1810/10000 [7:04:08<31:38:36, 13.91s/it] {'loss': 0.3757, 'learning_rate': 4.097e-05, 'epoch': 2.37} 18%|█▊ | 1810/10000 [7:04:08<31:38:36, 13.91s/it] 18%|█▊ | 1811/10000 [7:04:22<31:40:47, 13.93s/it] {'loss': 0.3026, 'learning_rate': 4.0965000000000005e-05, 'epoch': 2.37} 18%|█▊ | 1811/10000 [7:04:22<31:40:47, 13.93s/it] 18%|█▊ | 1812/10000 [7:04:35<31:29:59, 13.85s/it] {'loss': 0.3501, 'learning_rate': 4.096e-05, 'epoch': 2.37} 18%|█▊ | 1812/10000 [7:04:35<31:29:59, 13.85s/it] 18%|█▊ | 1813/10000 [7:04:49<31:35:34, 13.89s/it] {'loss': 0.4073, 'learning_rate': 4.0955000000000003e-05, 'epoch': 2.37} 18%|█▊ | 1813/10000 [7:04:49<31:35:34, 13.89s/it] 18%|█▊ | 1814/10000 [7:05:03<31:36:47, 13.90s/it] {'loss': 0.3419, 'learning_rate': 4.095e-05, 'epoch': 2.37} 18%|█▊ | 1814/10000 [7:05:03<31:36:47, 13.90s/it] 18%|█▊ | 1815/10000 [7:05:17<31:34:20, 13.89s/it] {'loss': 0.3219, 'learning_rate': 4.0945e-05, 'epoch': 2.38} 18%|█▊ | 1815/10000 [7:05:17<31:34:20, 13.89s/it] 18%|█▊ | 1816/10000 [7:05:31<31:38:54, 13.92s/it] {'loss': 0.3447, 'learning_rate': 4.094e-05, 'epoch': 2.38} 18%|█▊ | 1816/10000 [7:05:31<31:38:54, 13.92s/it] 18%|█▊ | 1817/10000 [7:05:45<31:32:15, 13.87s/it] {'loss': 0.241, 'learning_rate': 4.0935e-05, 'epoch': 2.38} 18%|█▊ | 1817/10000 [7:05:45<31:32:15, 13.87s/it] 18%|█▊ | 1818/10000 [7:05:59<31:30:54, 13.87s/it] {'loss': 0.3074, 'learning_rate': 4.093e-05, 'epoch': 2.38} 18%|█▊ | 1818/10000 [7:05:59<31:30:54, 13.87s/it] 18%|█▊ | 1819/10000 [7:06:12<31:26:39, 13.84s/it] {'loss': 0.3184, 'learning_rate': 4.0925000000000005e-05, 'epoch': 2.38} 18%|█▊ | 1819/10000 [7:06:12<31:26:39, 13.84s/it] 18%|█▊ | 1820/10000 [7:06:26<31:29:39, 13.86s/it] {'loss': 0.4582, 'learning_rate': 4.092e-05, 'epoch': 2.38} 18%|█▊ | 1820/10000 [7:06:26<31:29:39, 13.86s/it] 18%|█▊ | 1821/10000 [7:06:40<31:29:16, 13.86s/it] {'loss': 0.3076, 'learning_rate': 4.0915000000000004e-05, 'epoch': 2.38} 18%|█▊ | 1821/10000 [7:06:40<31:29:16, 13.86s/it] 18%|█▊ | 1822/10000 [7:06:54<31:30:59, 13.87s/it] {'loss': 0.2425, 'learning_rate': 4.0910000000000006e-05, 'epoch': 2.38} 18%|█▊ | 1822/10000 [7:06:54<31:30:59, 13.87s/it] 18%|█▊ | 1823/10000 [7:07:08<31:26:46, 13.84s/it] {'loss': 0.2814, 'learning_rate': 4.0905e-05, 'epoch': 2.39} 18%|█▊ | 1823/10000 [7:07:08<31:26:46, 13.84s/it] 18%|█▊ | 1824/10000 [7:07:22<31:32:33, 13.89s/it] {'loss': 0.3428, 'learning_rate': 4.09e-05, 'epoch': 2.39} 18%|█▊ | 1824/10000 [7:07:22<31:32:33, 13.89s/it] 18%|█▊ | 1825/10000 [7:07:36<31:42:27, 13.96s/it] {'loss': 0.3085, 'learning_rate': 4.0895e-05, 'epoch': 2.39} 18%|█▊ | 1825/10000 [7:07:36<31:42:27, 13.96s/it] 18%|█▊ | 1826/10000 [7:07:50<31:36:17, 13.92s/it] {'loss': 0.3901, 'learning_rate': 4.089e-05, 'epoch': 2.39} 18%|█▊ | 1826/10000 [7:07:50<31:36:17, 13.92s/it] 18%|█▊ | 1827/10000 [7:08:04<31:39:59, 13.95s/it] {'loss': 0.3415, 'learning_rate': 4.0885e-05, 'epoch': 2.39} 18%|█▊ | 1827/10000 [7:08:04<31:39:59, 13.95s/it] 18%|█▊ | 1828/10000 [7:08:18<31:33:47, 13.90s/it] {'loss': 0.4483, 'learning_rate': 4.088e-05, 'epoch': 2.39} 18%|█▊ | 1828/10000 [7:08:18<31:33:47, 13.90s/it] 18%|█▊ | 1829/10000 [7:08:31<31:29:07, 13.87s/it] {'loss': 0.2692, 'learning_rate': 4.0875000000000004e-05, 'epoch': 2.39} 18%|█▊ | 1829/10000 [7:08:32<31:29:07, 13.87s/it] 18%|█▊ | 1830/10000 [7:08:45<31:31:30, 13.89s/it] {'loss': 0.3915, 'learning_rate': 4.087e-05, 'epoch': 2.4} 18%|█▊ | 1830/10000 [7:08:45<31:31:30, 13.89s/it] 18%|█▊ | 1831/10000 [7:08:59<31:34:55, 13.92s/it] {'loss': 0.3217, 'learning_rate': 4.0865e-05, 'epoch': 2.4} 18%|█▊ | 1831/10000 [7:08:59<31:34:55, 13.92s/it] 18%|█▊ | 1832/10000 [7:09:13<31:30:40, 13.89s/it] {'loss': 0.3205, 'learning_rate': 4.0860000000000005e-05, 'epoch': 2.4} 18%|█▊ | 1832/10000 [7:09:13<31:30:40, 13.89s/it] 18%|█▊ | 1833/10000 [7:09:27<31:31:04, 13.89s/it] {'loss': 0.3109, 'learning_rate': 4.0855e-05, 'epoch': 2.4} 18%|█▊ | 1833/10000 [7:09:27<31:31:04, 13.89s/it] 18%|█▊ | 1834/10000 [7:09:41<31:30:06, 13.89s/it] {'loss': 0.3544, 'learning_rate': 4.085e-05, 'epoch': 2.4} 18%|█▊ | 1834/10000 [7:09:41<31:30:06, 13.89s/it] 18%|█▊ | 1835/10000 [7:09:55<31:30:45, 13.89s/it] {'loss': 0.3056, 'learning_rate': 4.0845e-05, 'epoch': 2.4} 18%|█▊ | 1835/10000 [7:09:55<31:30:45, 13.89s/it] 18%|█▊ | 1836/10000 [7:10:09<31:27:10, 13.87s/it] {'loss': 0.3749, 'learning_rate': 4.084e-05, 'epoch': 2.4} 18%|█▊ | 1836/10000 [7:10:09<31:27:10, 13.87s/it] 18%|█▊ | 1837/10000 [7:10:23<31:25:36, 13.86s/it] {'loss': 0.3414, 'learning_rate': 4.0835e-05, 'epoch': 2.4} 18%|█▊ | 1837/10000 [7:10:23<31:25:36, 13.86s/it] 18%|█▊ | 1838/10000 [7:10:36<31:23:23, 13.85s/it] {'loss': 0.3458, 'learning_rate': 4.083e-05, 'epoch': 2.41} 18%|█▊ | 1838/10000 [7:10:36<31:23:23, 13.85s/it] 18%|█▊ | 1839/10000 [7:10:50<31:29:55, 13.89s/it] {'loss': 0.3, 'learning_rate': 4.0825e-05, 'epoch': 2.41} 18%|█▊ | 1839/10000 [7:10:50<31:29:55, 13.89s/it] 18%|█▊ | 1840/10000 [7:11:04<31:38:06, 13.96s/it] {'loss': 0.3048, 'learning_rate': 4.0820000000000006e-05, 'epoch': 2.41} 18%|█▊ | 1840/10000 [7:11:05<31:38:06, 13.96s/it] 18%|█▊ | 1841/10000 [7:11:18<31:30:43, 13.90s/it] {'loss': 0.3354, 'learning_rate': 4.0815e-05, 'epoch': 2.41} 18%|█▊ | 1841/10000 [7:11:18<31:30:43, 13.90s/it] 18%|█▊ | 1842/10000 [7:11:32<31:23:15, 13.85s/it] {'loss': 0.2943, 'learning_rate': 4.0810000000000004e-05, 'epoch': 2.41} 18%|█▊ | 1842/10000 [7:11:32<31:23:15, 13.85s/it] 18%|█▊ | 1843/10000 [7:11:46<31:22:54, 13.85s/it] {'loss': 0.3474, 'learning_rate': 4.0805000000000007e-05, 'epoch': 2.41} 18%|█▊ | 1843/10000 [7:11:46<31:22:54, 13.85s/it] 18%|█▊ | 1844/10000 [7:12:00<31:18:04, 13.82s/it] {'loss': 0.3252, 'learning_rate': 4.08e-05, 'epoch': 2.41} 18%|█▊ | 1844/10000 [7:12:00<31:18:04, 13.82s/it] 18%|█▊ | 1845/10000 [7:12:13<31:16:09, 13.80s/it] {'loss': 0.431, 'learning_rate': 4.0795e-05, 'epoch': 2.41} 18%|█▊ | 1845/10000 [7:12:13<31:16:09, 13.80s/it] 18%|█▊ | 1846/10000 [7:12:27<31:14:57, 13.80s/it] {'loss': 0.328, 'learning_rate': 4.079e-05, 'epoch': 2.42} 18%|█▊ | 1846/10000 [7:12:27<31:14:57, 13.80s/it] 18%|█▊ | 1847/10000 [7:12:41<31:13:54, 13.79s/it] {'loss': 0.2933, 'learning_rate': 4.0785e-05, 'epoch': 2.42} 18%|█▊ | 1847/10000 [7:12:41<31:13:54, 13.79s/it] 18%|█▊ | 1848/10000 [7:12:55<31:12:34, 13.78s/it] {'loss': 0.3405, 'learning_rate': 4.078e-05, 'epoch': 2.42} 18%|█▊ | 1848/10000 [7:12:55<31:12:34, 13.78s/it] 18%|█▊ | 1849/10000 [7:13:09<31:17:47, 13.82s/it] {'loss': 0.3414, 'learning_rate': 4.0775e-05, 'epoch': 2.42} 18%|█▊ | 1849/10000 [7:13:09<31:17:47, 13.82s/it] 18%|█▊ | 1850/10000 [7:13:22<31:16:27, 13.81s/it] {'loss': 0.3559, 'learning_rate': 4.0770000000000004e-05, 'epoch': 2.42} 18%|█▊ | 1850/10000 [7:13:22<31:16:27, 13.81s/it] 19%|█▊ | 1851/10000 [7:13:36<31:23:29, 13.87s/it] {'loss': 0.4188, 'learning_rate': 4.0765e-05, 'epoch': 2.42} 19%|█▊ | 1851/10000 [7:13:36<31:23:29, 13.87s/it] 19%|█▊ | 1852/10000 [7:13:50<31:24:00, 13.87s/it] {'loss': 0.2406, 'learning_rate': 4.076e-05, 'epoch': 2.42} 19%|█▊ | 1852/10000 [7:13:50<31:24:00, 13.87s/it] 19%|█▊ | 1853/10000 [7:14:04<31:20:50, 13.85s/it] {'loss': 0.3752, 'learning_rate': 4.0755000000000005e-05, 'epoch': 2.43} 19%|█▊ | 1853/10000 [7:14:04<31:20:50, 13.85s/it] 19%|█▊ | 1854/10000 [7:14:18<31:21:37, 13.86s/it] {'loss': 0.3736, 'learning_rate': 4.075e-05, 'epoch': 2.43} 19%|█▊ | 1854/10000 [7:14:18<31:21:37, 13.86s/it] 19%|█▊ | 1855/10000 [7:14:32<31:20:35, 13.85s/it] {'loss': 0.3293, 'learning_rate': 4.0745e-05, 'epoch': 2.43} 19%|█▊ | 1855/10000 [7:14:32<31:20:35, 13.85s/it] 19%|█▊ | 1856/10000 [7:14:46<31:24:22, 13.88s/it] {'loss': 0.3737, 'learning_rate': 4.074e-05, 'epoch': 2.43} 19%|█▊ | 1856/10000 [7:14:46<31:24:22, 13.88s/it] 19%|█▊ | 1857/10000 [7:14:59<31:18:46, 13.84s/it] {'loss': 0.2964, 'learning_rate': 4.0735e-05, 'epoch': 2.43} 19%|█▊ | 1857/10000 [7:15:00<31:18:46, 13.84s/it] 19%|█▊ | 1858/10000 [7:15:13<31:16:54, 13.83s/it] {'loss': 0.322, 'learning_rate': 4.0730000000000005e-05, 'epoch': 2.43} 19%|█▊ | 1858/10000 [7:15:13<31:16:54, 13.83s/it] 19%|█▊ | 1859/10000 [7:15:27<31:14:58, 13.82s/it] {'loss': 0.3356, 'learning_rate': 4.0725e-05, 'epoch': 2.43} 19%|█▊ | 1859/10000 [7:15:27<31:14:58, 13.82s/it] 19%|█▊ | 1860/10000 [7:15:41<31:12:55, 13.81s/it] {'loss': 0.3987, 'learning_rate': 4.072e-05, 'epoch': 2.43} 19%|█▊ | 1860/10000 [7:15:41<31:12:55, 13.81s/it] 19%|█▊ | 1861/10000 [7:15:55<31:08:33, 13.77s/it] {'loss': 0.2965, 'learning_rate': 4.0715000000000006e-05, 'epoch': 2.44} 19%|█▊ | 1861/10000 [7:15:55<31:08:33, 13.77s/it] 19%|█▊ | 1862/10000 [7:16:09<31:16:27, 13.83s/it] {'loss': 0.3215, 'learning_rate': 4.071e-05, 'epoch': 2.44} 19%|█▊ | 1862/10000 [7:16:09<31:16:27, 13.83s/it] 19%|█▊ | 1863/10000 [7:16:22<31:19:12, 13.86s/it] {'loss': 0.3685, 'learning_rate': 4.0705000000000004e-05, 'epoch': 2.44} 19%|█▊ | 1863/10000 [7:16:22<31:19:12, 13.86s/it] 19%|█▊ | 1864/10000 [7:16:36<31:22:43, 13.88s/it] {'loss': 0.3369, 'learning_rate': 4.07e-05, 'epoch': 2.44} 19%|█▊ | 1864/10000 [7:16:36<31:22:43, 13.88s/it] 19%|█▊ | 1865/10000 [7:16:50<31:24:03, 13.90s/it] {'loss': 0.3531, 'learning_rate': 4.0695e-05, 'epoch': 2.44} 19%|█▊ | 1865/10000 [7:16:50<31:24:03, 13.90s/it] 19%|█▊ | 1866/10000 [7:17:04<31:22:22, 13.89s/it] {'loss': 0.3957, 'learning_rate': 4.069e-05, 'epoch': 2.44} 19%|█▊ | 1866/10000 [7:17:04<31:22:22, 13.89s/it] 19%|█▊ | 1867/10000 [7:17:18<31:15:23, 13.84s/it] {'loss': 0.3223, 'learning_rate': 4.0685e-05, 'epoch': 2.44} 19%|█▊ | 1867/10000 [7:17:18<31:15:23, 13.84s/it] 19%|█▊ | 1868/10000 [7:17:32<31:23:59, 13.90s/it] {'loss': 0.3611, 'learning_rate': 4.0680000000000004e-05, 'epoch': 2.45} 19%|█▊ | 1868/10000 [7:17:32<31:23:59, 13.90s/it] 19%|█▊ | 1869/10000 [7:17:46<31:18:48, 13.86s/it] {'loss': 0.3873, 'learning_rate': 4.0675e-05, 'epoch': 2.45} 19%|█▊ | 1869/10000 [7:17:46<31:18:48, 13.86s/it] 19%|█▊ | 1870/10000 [7:18:00<31:21:24, 13.88s/it] {'loss': 0.3736, 'learning_rate': 4.067e-05, 'epoch': 2.45} 19%|█▊ | 1870/10000 [7:18:00<31:21:24, 13.88s/it] 19%|█▊ | 1871/10000 [7:18:13<31:18:42, 13.87s/it] {'loss': 0.3745, 'learning_rate': 4.0665000000000005e-05, 'epoch': 2.45} 19%|█▊ | 1871/10000 [7:18:13<31:18:42, 13.87s/it] 19%|█▊ | 1872/10000 [7:18:27<31:23:20, 13.90s/it] {'loss': 0.3636, 'learning_rate': 4.066e-05, 'epoch': 2.45} 19%|█▊ | 1872/10000 [7:18:27<31:23:20, 13.90s/it] 19%|█▊ | 1873/10000 [7:18:41<31:22:14, 13.90s/it] {'loss': 0.2981, 'learning_rate': 4.0655e-05, 'epoch': 2.45} 19%|█▊ | 1873/10000 [7:18:41<31:22:14, 13.90s/it] 19%|█▊ | 1874/10000 [7:18:55<31:24:35, 13.92s/it] {'loss': 0.3844, 'learning_rate': 4.065e-05, 'epoch': 2.45} 19%|█▊ | 1874/10000 [7:18:55<31:24:35, 13.92s/it] 19%|█▉ | 1875/10000 [7:19:09<31:23:06, 13.91s/it] {'loss': 0.4024, 'learning_rate': 4.0645e-05, 'epoch': 2.45} 19%|█▉ | 1875/10000 [7:19:09<31:23:06, 13.91s/it] 19%|█▉ | 1876/10000 [7:19:23<31:20:30, 13.89s/it] {'loss': 0.3305, 'learning_rate': 4.064e-05, 'epoch': 2.46} 19%|█▉ | 1876/10000 [7:19:23<31:20:30, 13.89s/it] 19%|█▉ | 1877/10000 [7:19:37<31:17:34, 13.87s/it] {'loss': 0.3903, 'learning_rate': 4.0635e-05, 'epoch': 2.46} 19%|█▉ | 1877/10000 [7:19:37<31:17:34, 13.87s/it] 19%|█▉ | 1878/10000 [7:19:51<31:16:30, 13.86s/it] {'loss': 0.3736, 'learning_rate': 4.063e-05, 'epoch': 2.46} 19%|█▉ | 1878/10000 [7:19:51<31:16:30, 13.86s/it] 19%|█▉ | 1879/10000 [7:20:05<31:21:10, 13.90s/it] {'loss': 0.3656, 'learning_rate': 4.0625000000000005e-05, 'epoch': 2.46} 19%|█▉ | 1879/10000 [7:20:05<31:21:10, 13.90s/it] 19%|█▉ | 1880/10000 [7:20:18<31:16:04, 13.86s/it] {'loss': 0.3847, 'learning_rate': 4.062e-05, 'epoch': 2.46} 19%|█▉ | 1880/10000 [7:20:18<31:16:04, 13.86s/it] 19%|█▉ | 1881/10000 [7:20:32<31:19:53, 13.89s/it] {'loss': 0.3859, 'learning_rate': 4.0615e-05, 'epoch': 2.46} 19%|█▉ | 1881/10000 [7:20:32<31:19:53, 13.89s/it] 19%|█▉ | 1882/10000 [7:20:46<31:17:58, 13.88s/it] {'loss': 0.3001, 'learning_rate': 4.0610000000000006e-05, 'epoch': 2.46} 19%|█▉ | 1882/10000 [7:20:46<31:17:58, 13.88s/it] 19%|█▉ | 1883/10000 [7:21:00<31:17:12, 13.88s/it] {'loss': 0.3194, 'learning_rate': 4.0605e-05, 'epoch': 2.46} 19%|█▉ | 1883/10000 [7:21:00<31:17:12, 13.88s/it] 19%|█▉ | 1884/10000 [7:21:14<31:12:15, 13.84s/it] {'loss': 0.279, 'learning_rate': 4.0600000000000004e-05, 'epoch': 2.47} 19%|█▉ | 1884/10000 [7:21:14<31:12:15, 13.84s/it] 19%|█▉ | 1885/10000 [7:21:28<31:17:52, 13.88s/it] {'loss': 0.3395, 'learning_rate': 4.0595e-05, 'epoch': 2.47} 19%|█▉ | 1885/10000 [7:21:28<31:17:52, 13.88s/it] 19%|█▉ | 1886/10000 [7:21:42<31:12:59, 13.85s/it] {'loss': 0.343, 'learning_rate': 4.059e-05, 'epoch': 2.47} 19%|█▉ | 1886/10000 [7:21:42<31:12:59, 13.85s/it] 19%|█▉ | 1887/10000 [7:21:56<31:15:57, 13.87s/it] {'loss': 0.3112, 'learning_rate': 4.0585e-05, 'epoch': 2.47} 19%|█▉ | 1887/10000 [7:21:56<31:15:57, 13.87s/it] 19%|█▉ | 1888/10000 [7:22:09<31:16:37, 13.88s/it] {'loss': 0.4298, 'learning_rate': 4.058e-05, 'epoch': 2.47} 19%|█▉ | 1888/10000 [7:22:10<31:16:37, 13.88s/it] 19%|█▉ | 1889/10000 [7:22:23<31:14:51, 13.87s/it] {'loss': 0.2715, 'learning_rate': 4.0575000000000004e-05, 'epoch': 2.47} 19%|█▉ | 1889/10000 [7:22:23<31:14:51, 13.87s/it] 19%|█▉ | 1890/10000 [7:22:37<31:12:08, 13.85s/it] {'loss': 0.2858, 'learning_rate': 4.057e-05, 'epoch': 2.47} 19%|█▉ | 1890/10000 [7:22:37<31:12:08, 13.85s/it] 19%|█▉ | 1891/10000 [7:22:51<31:10:17, 13.84s/it] {'loss': 0.3608, 'learning_rate': 4.0565e-05, 'epoch': 2.48} 19%|█▉ | 1891/10000 [7:22:51<31:10:17, 13.84s/it] 19%|█▉ | 1892/10000 [7:23:05<31:13:35, 13.86s/it] {'loss': 0.2997, 'learning_rate': 4.0560000000000005e-05, 'epoch': 2.48} 19%|█▉ | 1892/10000 [7:23:05<31:13:35, 13.86s/it] 19%|█▉ | 1893/10000 [7:23:19<31:10:15, 13.84s/it] {'loss': 0.4019, 'learning_rate': 4.055500000000001e-05, 'epoch': 2.48} 19%|█▉ | 1893/10000 [7:23:19<31:10:15, 13.84s/it] 19%|█▉ | 1894/10000 [7:23:33<31:10:34, 13.85s/it] {'loss': 0.3876, 'learning_rate': 4.055e-05, 'epoch': 2.48} 19%|█▉ | 1894/10000 [7:23:33<31:10:34, 13.85s/it] 19%|█▉ | 1895/10000 [7:23:46<31:08:34, 13.83s/it] {'loss': 0.3923, 'learning_rate': 4.0545e-05, 'epoch': 2.48} 19%|█▉ | 1895/10000 [7:23:46<31:08:34, 13.83s/it] 19%|█▉ | 1896/10000 [7:24:00<31:09:46, 13.84s/it] {'loss': 0.3376, 'learning_rate': 4.054e-05, 'epoch': 2.48} 19%|█▉ | 1896/10000 [7:24:00<31:09:46, 13.84s/it] 19%|█▉ | 1897/10000 [7:24:14<31:07:54, 13.83s/it] {'loss': 0.3032, 'learning_rate': 4.0535000000000004e-05, 'epoch': 2.48} 19%|█▉ | 1897/10000 [7:24:14<31:07:54, 13.83s/it] 19%|█▉ | 1898/10000 [7:24:28<31:12:55, 13.87s/it] {'loss': 0.3847, 'learning_rate': 4.053e-05, 'epoch': 2.48} 19%|█▉ | 1898/10000 [7:24:28<31:12:55, 13.87s/it] 19%|█▉ | 1899/10000 [7:24:42<31:14:17, 13.88s/it] {'loss': 0.3973, 'learning_rate': 4.0525e-05, 'epoch': 2.49} 19%|█▉ | 1899/10000 [7:24:42<31:14:17, 13.88s/it] 19%|█▉ | 1900/10000 [7:24:56<31:10:01, 13.85s/it] {'loss': 0.2824, 'learning_rate': 4.0520000000000005e-05, 'epoch': 2.49} 19%|█▉ | 1900/10000 [7:24:56<31:10:01, 13.85s/it] 19%|█▉ | 1901/10000 [7:25:09<31:07:21, 13.83s/it] {'loss': 0.457, 'learning_rate': 4.0515e-05, 'epoch': 2.49} 19%|█▉ | 1901/10000 [7:25:09<31:07:21, 13.83s/it] 19%|█▉ | 1902/10000 [7:25:23<31:09:29, 13.85s/it] {'loss': 0.3124, 'learning_rate': 4.0510000000000003e-05, 'epoch': 2.49} 19%|█▉ | 1902/10000 [7:25:23<31:09:29, 13.85s/it] 19%|█▉ | 1903/10000 [7:25:37<31:06:46, 13.83s/it] {'loss': 0.2899, 'learning_rate': 4.0505000000000006e-05, 'epoch': 2.49} 19%|█▉ | 1903/10000 [7:25:37<31:06:46, 13.83s/it] 19%|█▉ | 1904/10000 [7:25:51<31:05:06, 13.82s/it] {'loss': 0.2965, 'learning_rate': 4.05e-05, 'epoch': 2.49} 19%|█▉ | 1904/10000 [7:25:51<31:05:06, 13.82s/it] 19%|█▉ | 1905/10000 [7:26:05<31:06:15, 13.83s/it] {'loss': 0.3596, 'learning_rate': 4.0495e-05, 'epoch': 2.49} 19%|█▉ | 1905/10000 [7:26:05<31:06:15, 13.83s/it] 19%|█▉ | 1906/10000 [7:26:19<31:17:57, 13.92s/it] {'loss': 0.3727, 'learning_rate': 4.049e-05, 'epoch': 2.49} 19%|█▉ | 1906/10000 [7:26:19<31:17:57, 13.92s/it] 19%|█▉ | 1907/10000 [7:26:33<31:12:13, 13.88s/it] {'loss': 0.463, 'learning_rate': 4.0485e-05, 'epoch': 2.5} 19%|█▉ | 1907/10000 [7:26:33<31:12:13, 13.88s/it] 19%|█▉ | 1908/10000 [7:26:46<31:06:25, 13.84s/it] {'loss': 0.3569, 'learning_rate': 4.048e-05, 'epoch': 2.5} 19%|█▉ | 1908/10000 [7:26:46<31:06:25, 13.84s/it] 19%|█▉ | 1909/10000 [7:27:00<31:09:51, 13.87s/it] {'loss': 0.3118, 'learning_rate': 4.0475e-05, 'epoch': 2.5} 19%|█▉ | 1909/10000 [7:27:00<31:09:51, 13.87s/it] 19%|█▉ | 1910/10000 [7:27:14<31:06:46, 13.85s/it] {'loss': 0.3549, 'learning_rate': 4.0470000000000004e-05, 'epoch': 2.5} 19%|█▉ | 1910/10000 [7:27:14<31:06:46, 13.85s/it] 19%|█▉ | 1911/10000 [7:27:28<31:08:56, 13.86s/it] {'loss': 0.3339, 'learning_rate': 4.0465e-05, 'epoch': 2.5} 19%|█▉ | 1911/10000 [7:27:28<31:08:56, 13.86s/it] 19%|█▉ | 1912/10000 [7:27:42<31:07:23, 13.85s/it] {'loss': 0.3528, 'learning_rate': 4.046e-05, 'epoch': 2.5} 19%|█▉ | 1912/10000 [7:27:42<31:07:23, 13.85s/it] 19%|█▉ | 1913/10000 [7:27:56<31:04:18, 13.83s/it] {'loss': 0.3114, 'learning_rate': 4.0455000000000005e-05, 'epoch': 2.5} 19%|█▉ | 1913/10000 [7:27:56<31:04:18, 13.83s/it] 19%|█▉ | 1914/10000 [7:28:09<31:01:56, 13.82s/it] {'loss': 0.3621, 'learning_rate': 4.045000000000001e-05, 'epoch': 2.51} 19%|█▉ | 1914/10000 [7:28:09<31:01:56, 13.82s/it] 19%|█▉ | 1915/10000 [7:28:23<30:58:15, 13.79s/it] {'loss': 0.4449, 'learning_rate': 4.0444999999999996e-05, 'epoch': 2.51} 19%|█▉ | 1915/10000 [7:28:23<30:58:15, 13.79s/it] 19%|█▉ | 1916/10000 [7:28:37<31:01:41, 13.82s/it] {'loss': 0.353, 'learning_rate': 4.044e-05, 'epoch': 2.51} 19%|█▉ | 1916/10000 [7:28:37<31:01:41, 13.82s/it] 19%|█▉ | 1917/10000 [7:28:51<31:00:10, 13.81s/it] {'loss': 0.3421, 'learning_rate': 4.0435e-05, 'epoch': 2.51} 19%|█▉ | 1917/10000 [7:28:51<31:00:10, 13.81s/it] 19%|█▉ | 1918/10000 [7:29:05<30:58:01, 13.79s/it] {'loss': 0.3354, 'learning_rate': 4.0430000000000004e-05, 'epoch': 2.51} 19%|█▉ | 1918/10000 [7:29:05<30:58:01, 13.79s/it] 19%|█▉ | 1919/10000 [7:29:18<30:56:07, 13.78s/it] {'loss': 0.2878, 'learning_rate': 4.0425e-05, 'epoch': 2.51} 19%|█▉ | 1919/10000 [7:29:18<30:56:07, 13.78s/it] 19%|█▉ | 1920/10000 [7:29:32<30:58:30, 13.80s/it] {'loss': 0.4211, 'learning_rate': 4.042e-05, 'epoch': 2.51} 19%|█▉ | 1920/10000 [7:29:32<30:58:30, 13.80s/it] 19%|█▉ | 1921/10000 [7:29:46<31:03:20, 13.84s/it] {'loss': 0.3111, 'learning_rate': 4.0415000000000005e-05, 'epoch': 2.51} 19%|█▉ | 1921/10000 [7:29:46<31:03:20, 13.84s/it] 19%|█▉ | 1922/10000 [7:30:00<31:06:53, 13.87s/it] {'loss': 0.4217, 'learning_rate': 4.041e-05, 'epoch': 2.52} 19%|█▉ | 1922/10000 [7:30:00<31:06:53, 13.87s/it] 19%|█▉ | 1923/10000 [7:30:14<31:10:35, 13.90s/it] {'loss': 0.3267, 'learning_rate': 4.0405000000000004e-05, 'epoch': 2.52} 19%|█▉ | 1923/10000 [7:30:14<31:10:35, 13.90s/it] 19%|█▉ | 1924/10000 [7:30:28<31:12:18, 13.91s/it] {'loss': 0.3275, 'learning_rate': 4.0400000000000006e-05, 'epoch': 2.52} 19%|█▉ | 1924/10000 [7:30:28<31:12:18, 13.91s/it] 19%|█▉ | 1925/10000 [7:30:42<31:13:07, 13.92s/it] {'loss': 0.3608, 'learning_rate': 4.0395e-05, 'epoch': 2.52} 19%|█▉ | 1925/10000 [7:30:42<31:13:07, 13.92s/it] 19%|█▉ | 1926/10000 [7:30:56<31:11:40, 13.91s/it] {'loss': 0.3314, 'learning_rate': 4.039e-05, 'epoch': 2.52} 19%|█▉ | 1926/10000 [7:30:56<31:11:40, 13.91s/it] 19%|█▉ | 1927/10000 [7:31:10<31:08:21, 13.89s/it] {'loss': 0.4196, 'learning_rate': 4.0385e-05, 'epoch': 2.52} 19%|█▉ | 1927/10000 [7:31:10<31:08:21, 13.89s/it] 19%|█▉ | 1928/10000 [7:31:23<31:05:50, 13.87s/it] {'loss': 0.3102, 'learning_rate': 4.038e-05, 'epoch': 2.52} 19%|█▉ | 1928/10000 [7:31:23<31:05:50, 13.87s/it] 19%|█▉ | 1929/10000 [7:31:37<31:06:37, 13.88s/it] {'loss': 0.3281, 'learning_rate': 4.0375e-05, 'epoch': 2.52} 19%|█▉ | 1929/10000 [7:31:37<31:06:37, 13.88s/it] 19%|█▉ | 1930/10000 [7:31:51<31:04:57, 13.87s/it] {'loss': 0.364, 'learning_rate': 4.037e-05, 'epoch': 2.53} 19%|█▉ | 1930/10000 [7:31:51<31:04:57, 13.87s/it] 19%|█▉ | 1931/10000 [7:32:05<31:03:37, 13.86s/it] {'loss': 0.3542, 'learning_rate': 4.0365000000000004e-05, 'epoch': 2.53} 19%|█▉ | 1931/10000 [7:32:05<31:03:37, 13.86s/it] 19%|█▉ | 1932/10000 [7:32:19<31:06:06, 13.88s/it] {'loss': 0.3358, 'learning_rate': 4.0360000000000007e-05, 'epoch': 2.53} 19%|█▉ | 1932/10000 [7:32:19<31:06:06, 13.88s/it] 19%|█▉ | 1933/10000 [7:32:33<31:02:15, 13.85s/it] {'loss': 0.3069, 'learning_rate': 4.0355e-05, 'epoch': 2.53} 19%|█▉ | 1933/10000 [7:32:33<31:02:15, 13.85s/it] 19%|█▉ | 1934/10000 [7:32:47<31:04:42, 13.87s/it] {'loss': 0.3884, 'learning_rate': 4.0350000000000005e-05, 'epoch': 2.53} 19%|█▉ | 1934/10000 [7:32:47<31:04:42, 13.87s/it] 19%|█▉ | 1935/10000 [7:33:00<31:01:36, 13.85s/it] {'loss': 0.3582, 'learning_rate': 4.0345e-05, 'epoch': 2.53} 19%|█▉ | 1935/10000 [7:33:00<31:01:36, 13.85s/it] 19%|█▉ | 1936/10000 [7:33:14<31:01:57, 13.85s/it] {'loss': 0.355, 'learning_rate': 4.034e-05, 'epoch': 2.53} 19%|█▉ | 1936/10000 [7:33:14<31:01:57, 13.85s/it] 19%|█▉ | 1937/10000 [7:33:28<31:01:06, 13.85s/it] {'loss': 0.3981, 'learning_rate': 4.0335e-05, 'epoch': 2.54} 19%|█▉ | 1937/10000 [7:33:28<31:01:06, 13.85s/it] 19%|█▉ | 1938/10000 [7:33:42<30:59:04, 13.84s/it] {'loss': 0.3513, 'learning_rate': 4.033e-05, 'epoch': 2.54} 19%|█▉ | 1938/10000 [7:33:42<30:59:04, 13.84s/it] 19%|█▉ | 1939/10000 [7:33:56<30:59:28, 13.84s/it] {'loss': 0.3902, 'learning_rate': 4.0325000000000004e-05, 'epoch': 2.54} 19%|█▉ | 1939/10000 [7:33:56<30:59:28, 13.84s/it] 19%|█▉ | 1940/10000 [7:34:10<31:02:55, 13.87s/it] {'loss': 0.3943, 'learning_rate': 4.032e-05, 'epoch': 2.54} 19%|█▉ | 1940/10000 [7:34:10<31:02:55, 13.87s/it] 19%|█▉ | 1941/10000 [7:34:24<31:03:20, 13.87s/it] {'loss': 0.354, 'learning_rate': 4.0315e-05, 'epoch': 2.54} 19%|█▉ | 1941/10000 [7:34:24<31:03:20, 13.87s/it] 19%|█▉ | 1942/10000 [7:34:37<31:00:49, 13.86s/it] {'loss': 0.3658, 'learning_rate': 4.0310000000000005e-05, 'epoch': 2.54} 19%|█▉ | 1942/10000 [7:34:37<31:00:49, 13.86s/it] 19%|█▉ | 1943/10000 [7:34:51<31:06:53, 13.90s/it] {'loss': 0.3507, 'learning_rate': 4.0305e-05, 'epoch': 2.54} 19%|█▉ | 1943/10000 [7:34:51<31:06:53, 13.90s/it] 19%|█▉ | 1944/10000 [7:35:05<31:05:27, 13.89s/it] {'loss': 0.3172, 'learning_rate': 4.0300000000000004e-05, 'epoch': 2.54} 19%|█▉ | 1944/10000 [7:35:05<31:05:27, 13.89s/it] 19%|█▉ | 1945/10000 [7:35:19<31:07:03, 13.91s/it] {'loss': 0.3145, 'learning_rate': 4.0295e-05, 'epoch': 2.55} 19%|█▉ | 1945/10000 [7:35:19<31:07:03, 13.91s/it] 19%|█▉ | 1946/10000 [7:35:33<31:01:24, 13.87s/it] {'loss': 0.3091, 'learning_rate': 4.029e-05, 'epoch': 2.55} 19%|█▉ | 1946/10000 [7:35:33<31:01:24, 13.87s/it] 19%|█▉ | 1947/10000 [7:35:47<31:01:39, 13.87s/it] {'loss': 0.3479, 'learning_rate': 4.0285e-05, 'epoch': 2.55} 19%|█▉ | 1947/10000 [7:35:47<31:01:39, 13.87s/it] 19%|█▉ | 1948/10000 [7:36:01<31:01:40, 13.87s/it] {'loss': 0.3559, 'learning_rate': 4.028e-05, 'epoch': 2.55} 19%|█▉ | 1948/10000 [7:36:01<31:01:40, 13.87s/it] 19%|█▉ | 1949/10000 [7:36:15<30:59:50, 13.86s/it] {'loss': 0.3871, 'learning_rate': 4.0275e-05, 'epoch': 2.55} 19%|█▉ | 1949/10000 [7:36:15<30:59:50, 13.86s/it] 20%|█▉ | 1950/10000 [7:36:28<30:57:09, 13.84s/it] {'loss': 0.3534, 'learning_rate': 4.027e-05, 'epoch': 2.55} 20%|█▉ | 1950/10000 [7:36:28<30:57:09, 13.84s/it] 20%|█▉ | 1951/10000 [7:36:42<30:57:46, 13.85s/it] {'loss': 0.3505, 'learning_rate': 4.0265e-05, 'epoch': 2.55} 20%|█▉ | 1951/10000 [7:36:42<30:57:46, 13.85s/it] 20%|█▉ | 1952/10000 [7:36:56<31:02:15, 13.88s/it] {'loss': 0.3875, 'learning_rate': 4.0260000000000004e-05, 'epoch': 2.55} 20%|█▉ | 1952/10000 [7:36:56<31:02:15, 13.88s/it] 20%|█▉ | 1953/10000 [7:37:10<30:57:01, 13.85s/it] {'loss': 0.3907, 'learning_rate': 4.025500000000001e-05, 'epoch': 2.56} 20%|█▉ | 1953/10000 [7:37:10<30:57:01, 13.85s/it] 20%|█▉ | 1954/10000 [7:37:24<30:57:09, 13.85s/it] {'loss': 0.2574, 'learning_rate': 4.025e-05, 'epoch': 2.56} 20%|█▉ | 1954/10000 [7:37:24<30:57:09, 13.85s/it] 20%|█▉ | 1955/10000 [7:37:38<30:56:53, 13.85s/it] {'loss': 0.3622, 'learning_rate': 4.0245e-05, 'epoch': 2.56} 20%|█▉ | 1955/10000 [7:37:38<30:56:53, 13.85s/it] 20%|█▉ | 1956/10000 [7:37:52<30:55:38, 13.84s/it] {'loss': 0.3146, 'learning_rate': 4.024e-05, 'epoch': 2.56} 20%|█▉ | 1956/10000 [7:37:52<30:55:38, 13.84s/it] 20%|█▉ | 1957/10000 [7:38:05<30:51:47, 13.81s/it] {'loss': 0.3393, 'learning_rate': 4.0235000000000004e-05, 'epoch': 2.56} 20%|█▉ | 1957/10000 [7:38:05<30:51:47, 13.81s/it] 20%|█▉ | 1958/10000 [7:38:19<30:53:36, 13.83s/it] {'loss': 0.3141, 'learning_rate': 4.023e-05, 'epoch': 2.56} 20%|█▉ | 1958/10000 [7:38:19<30:53:36, 13.83s/it] 20%|█▉ | 1959/10000 [7:38:33<30:49:42, 13.80s/it] {'loss': 0.3921, 'learning_rate': 4.0225e-05, 'epoch': 2.56} 20%|█▉ | 1959/10000 [7:38:33<30:49:42, 13.80s/it] 20%|█▉ | 1960/10000 [7:38:47<30:54:08, 13.84s/it] {'loss': 0.3441, 'learning_rate': 4.0220000000000005e-05, 'epoch': 2.57} 20%|█▉ | 1960/10000 [7:38:47<30:54:08, 13.84s/it] 20%|█▉ | 1961/10000 [7:39:01<30:52:40, 13.83s/it] {'loss': 0.3335, 'learning_rate': 4.0215e-05, 'epoch': 2.57} 20%|█▉ | 1961/10000 [7:39:01<30:52:40, 13.83s/it] 20%|█▉ | 1962/10000 [7:39:14<30:49:08, 13.80s/it] {'loss': 0.4799, 'learning_rate': 4.021e-05, 'epoch': 2.57} 20%|█▉ | 1962/10000 [7:39:14<30:49:08, 13.80s/it] 20%|█▉ | 1963/10000 [7:39:28<30:49:07, 13.80s/it] {'loss': 0.294, 'learning_rate': 4.0205000000000006e-05, 'epoch': 2.57} 20%|█▉ | 1963/10000 [7:39:28<30:49:07, 13.80s/it] 20%|█▉ | 1964/10000 [7:39:42<30:52:46, 13.83s/it] {'loss': 0.3542, 'learning_rate': 4.02e-05, 'epoch': 2.57} 20%|█▉ | 1964/10000 [7:39:42<30:52:46, 13.83s/it] 20%|█▉ | 1965/10000 [7:39:56<30:53:15, 13.84s/it] {'loss': 0.3136, 'learning_rate': 4.0195e-05, 'epoch': 2.57} 20%|█▉ | 1965/10000 [7:39:56<30:53:15, 13.84s/it] 20%|█▉ | 1966/10000 [7:40:10<30:50:49, 13.82s/it] {'loss': 0.277, 'learning_rate': 4.019e-05, 'epoch': 2.57} 20%|█▉ | 1966/10000 [7:40:10<30:50:49, 13.82s/it] 20%|█▉ | 1967/10000 [7:40:24<30:49:39, 13.82s/it] {'loss': 0.3058, 'learning_rate': 4.0185e-05, 'epoch': 2.57} 20%|█▉ | 1967/10000 [7:40:24<30:49:39, 13.82s/it] 20%|█▉ | 1968/10000 [7:40:37<30:51:05, 13.83s/it] {'loss': 0.2766, 'learning_rate': 4.018e-05, 'epoch': 2.58} 20%|█▉ | 1968/10000 [7:40:37<30:51:05, 13.83s/it] 20%|█▉ | 1969/10000 [7:40:51<30:47:48, 13.81s/it] {'loss': 0.3284, 'learning_rate': 4.0175e-05, 'epoch': 2.58} 20%|█▉ | 1969/10000 [7:40:51<30:47:48, 13.81s/it] 20%|█▉ | 1970/10000 [7:41:05<30:45:35, 13.79s/it] {'loss': 0.3695, 'learning_rate': 4.017e-05, 'epoch': 2.58} 20%|█▉ | 1970/10000 [7:41:05<30:45:35, 13.79s/it] 20%|█▉ | 1971/10000 [7:41:19<30:51:41, 13.84s/it] {'loss': 0.4057, 'learning_rate': 4.0165000000000006e-05, 'epoch': 2.58} 20%|█▉ | 1971/10000 [7:41:19<30:51:41, 13.84s/it] 20%|█▉ | 1972/10000 [7:41:33<30:52:23, 13.84s/it] {'loss': 0.3651, 'learning_rate': 4.016e-05, 'epoch': 2.58} 20%|█▉ | 1972/10000 [7:41:33<30:52:23, 13.84s/it] 20%|█▉ | 1973/10000 [7:41:47<30:58:03, 13.89s/it] {'loss': 0.3805, 'learning_rate': 4.0155000000000004e-05, 'epoch': 2.58} 20%|█▉ | 1973/10000 [7:41:47<30:58:03, 13.89s/it] 20%|█▉ | 1974/10000 [7:42:01<30:56:40, 13.88s/it] {'loss': 0.3536, 'learning_rate': 4.015000000000001e-05, 'epoch': 2.58} 20%|█▉ | 1974/10000 [7:42:01<30:56:40, 13.88s/it] 20%|█▉ | 1975/10000 [7:42:14<30:54:23, 13.86s/it] {'loss': 0.3362, 'learning_rate': 4.0144999999999996e-05, 'epoch': 2.59} 20%|█▉ | 1975/10000 [7:42:14<30:54:23, 13.86s/it] 20%|█▉ | 1976/10000 [7:42:28<31:00:24, 13.91s/it] {'loss': 0.3047, 'learning_rate': 4.014e-05, 'epoch': 2.59} 20%|█▉ | 1976/10000 [7:42:28<31:00:24, 13.91s/it] 20%|█▉ | 1977/10000 [7:42:42<31:00:56, 13.92s/it] {'loss': 0.3169, 'learning_rate': 4.0135e-05, 'epoch': 2.59} 20%|█▉ | 1977/10000 [7:42:42<31:00:56, 13.92s/it] 20%|█▉ | 1978/10000 [7:42:56<31:01:08, 13.92s/it] {'loss': 0.352, 'learning_rate': 4.0130000000000004e-05, 'epoch': 2.59} 20%|█▉ | 1978/10000 [7:42:56<31:01:08, 13.92s/it] 20%|█▉ | 1979/10000 [7:43:10<31:03:07, 13.94s/it] {'loss': 0.3912, 'learning_rate': 4.0125e-05, 'epoch': 2.59} 20%|█▉ | 1979/10000 [7:43:10<31:03:07, 13.94s/it] 20%|█▉ | 1980/10000 [7:43:24<31:00:10, 13.92s/it] {'loss': 0.4038, 'learning_rate': 4.012e-05, 'epoch': 2.59} 20%|█▉ | 1980/10000 [7:43:24<31:00:10, 13.92s/it] 20%|█▉ | 1981/10000 [7:43:38<31:00:04, 13.92s/it] {'loss': 0.4145, 'learning_rate': 4.0115000000000005e-05, 'epoch': 2.59} 20%|█▉ | 1981/10000 [7:43:38<31:00:04, 13.92s/it] 20%|█▉ | 1982/10000 [7:43:52<30:58:26, 13.91s/it] {'loss': 0.4171, 'learning_rate': 4.011e-05, 'epoch': 2.59} 20%|█▉ | 1982/10000 [7:43:52<30:58:26, 13.91s/it] 20%|█▉ | 1983/10000 [7:44:06<30:57:41, 13.90s/it] {'loss': 0.3067, 'learning_rate': 4.0105e-05, 'epoch': 2.6} 20%|█▉ | 1983/10000 [7:44:06<30:57:41, 13.90s/it] 20%|█▉ | 1984/10000 [7:44:20<30:56:21, 13.89s/it] {'loss': 0.405, 'learning_rate': 4.0100000000000006e-05, 'epoch': 2.6} 20%|█▉ | 1984/10000 [7:44:20<30:56:21, 13.89s/it] 20%|█▉ | 1985/10000 [7:44:34<30:56:48, 13.90s/it] {'loss': 0.3285, 'learning_rate': 4.0095e-05, 'epoch': 2.6} 20%|█▉ | 1985/10000 [7:44:34<30:56:48, 13.90s/it] 20%|█▉ | 1986/10000 [7:44:47<30:54:58, 13.89s/it] {'loss': 0.3333, 'learning_rate': 4.009e-05, 'epoch': 2.6} 20%|█▉ | 1986/10000 [7:44:47<30:54:58, 13.89s/it] 20%|█▉ | 1987/10000 [7:45:01<30:48:38, 13.84s/it] {'loss': 0.2913, 'learning_rate': 4.0085e-05, 'epoch': 2.6} 20%|█▉ | 1987/10000 [7:45:01<30:48:38, 13.84s/it] 20%|█▉ | 1988/10000 [7:45:15<30:48:10, 13.84s/it] {'loss': 0.3593, 'learning_rate': 4.008e-05, 'epoch': 2.6} 20%|█▉ | 1988/10000 [7:45:15<30:48:10, 13.84s/it] 20%|█▉ | 1989/10000 [7:45:29<30:47:35, 13.84s/it] {'loss': 0.3486, 'learning_rate': 4.0075e-05, 'epoch': 2.6} 20%|█▉ | 1989/10000 [7:45:29<30:47:35, 13.84s/it] 20%|█▉ | 1990/10000 [7:45:43<30:57:48, 13.92s/it] {'loss': 0.3708, 'learning_rate': 4.007e-05, 'epoch': 2.6} 20%|█▉ | 1990/10000 [7:45:43<30:57:48, 13.92s/it] 20%|█▉ | 1991/10000 [7:45:57<30:55:11, 13.90s/it] {'loss': 0.3751, 'learning_rate': 4.0065000000000003e-05, 'epoch': 2.61} 20%|█▉ | 1991/10000 [7:45:57<30:55:11, 13.90s/it] 20%|█▉ | 1992/10000 [7:46:11<30:58:30, 13.92s/it] {'loss': 0.3515, 'learning_rate': 4.0060000000000006e-05, 'epoch': 2.61} 20%|█▉ | 1992/10000 [7:46:11<30:58:30, 13.92s/it] 20%|█▉ | 1993/10000 [7:46:25<30:55:44, 13.91s/it] {'loss': 0.4082, 'learning_rate': 4.0055e-05, 'epoch': 2.61} 20%|█▉ | 1993/10000 [7:46:25<30:55:44, 13.91s/it] 20%|█▉ | 1994/10000 [7:46:38<30:53:59, 13.89s/it] {'loss': 0.3594, 'learning_rate': 4.0050000000000004e-05, 'epoch': 2.61} 20%|█▉ | 1994/10000 [7:46:39<30:53:59, 13.89s/it] 20%|█▉ | 1995/10000 [7:46:52<30:51:34, 13.88s/it] {'loss': 0.3905, 'learning_rate': 4.0045e-05, 'epoch': 2.61} 20%|█▉ | 1995/10000 [7:46:52<30:51:34, 13.88s/it] 20%|█▉ | 1996/10000 [7:47:06<30:45:37, 13.84s/it] {'loss': 0.3814, 'learning_rate': 4.004e-05, 'epoch': 2.61} 20%|█▉ | 1996/10000 [7:47:06<30:45:37, 13.84s/it] 20%|█▉ | 1997/10000 [7:47:20<30:45:49, 13.84s/it] {'loss': 0.3407, 'learning_rate': 4.0035e-05, 'epoch': 2.61} 20%|█▉ | 1997/10000 [7:47:20<30:45:49, 13.84s/it] 20%|█▉ | 1998/10000 [7:47:34<30:46:33, 13.85s/it] {'loss': 0.2639, 'learning_rate': 4.003e-05, 'epoch': 2.62} 20%|█▉ | 1998/10000 [7:47:34<30:46:33, 13.85s/it] 20%|█▉ | 1999/10000 [7:47:48<30:45:55, 13.84s/it] {'loss': 0.392, 'learning_rate': 4.0025000000000004e-05, 'epoch': 2.62} 20%|█▉ | 1999/10000 [7:47:48<30:45:55, 13.84s/it] 20%|██ | 2000/10000 [7:48:02<30:48:28, 13.86s/it] {'loss': 0.3274, 'learning_rate': 4.002e-05, 'epoch': 2.62} 20%|██ | 2000/10000 [7:48:02<30:48:28, 13.86s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 04:06:09,820 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-2000/config.json [INFO|configuration_utils.py:364] 2024-11-04 04:06:09,822 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-2000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 04:06:58,526 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 04:06:58,529 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 04:06:58,530 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-2000/special_tokens_map.json [2024-11-04 04:06:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is about to be saved! [2024-11-04 04:06:58,567] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt [2024-11-04 04:06:58,567] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt... [2024-11-04 04:07:48,267] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/mp_rank_00_model_states.pt. [2024-11-04 04:07:48,408] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 04:08:29,046] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 04:08:30,550] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 04:08:30,550] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step2000 is ready now! 20%|██ | 2001/10000 [7:51:36<164:26:48, 74.01s/it] {'loss': 0.3518, 'learning_rate': 4.0015e-05, 'epoch': 2.62} 20%|██ | 2001/10000 [7:51:36<164:26:48, 74.01s/it] 20%|██ | 2002/10000 [7:51:50<124:17:06, 55.94s/it] {'loss': 0.3017, 'learning_rate': 4.0010000000000005e-05, 'epoch': 2.62} 20%|██ | 2002/10000 [7:51:50<124:17:06, 55.94s/it] 20%|██ | 2003/10000 [7:52:03<96:10:57, 43.30s/it] {'loss': 0.3142, 'learning_rate': 4.0005e-05, 'epoch': 2.62} 20%|██ | 2003/10000 [7:52:03<96:10:57, 43.30s/it] 20%|██ | 2004/10000 [7:52:17<76:30:58, 34.45s/it] {'loss': 0.3865, 'learning_rate': 4e-05, 'epoch': 2.62} 20%|██ | 2004/10000 [7:52:17<76:30:58, 34.45s/it] 20%|██ | 2005/10000 [7:52:31<62:49:57, 28.29s/it] {'loss': 0.3674, 'learning_rate': 3.9995000000000006e-05, 'epoch': 2.62} 20%|██ | 2005/10000 [7:52:31<62:49:57, 28.29s/it] 20%|██ | 2006/10000 [7:52:45<53:19:43, 24.02s/it] {'loss': 0.3394, 'learning_rate': 3.999e-05, 'epoch': 2.63} 20%|██ | 2006/10000 [7:52:45<53:19:43, 24.02s/it] 20%|██ | 2007/10000 [7:52:59<46:32:05, 20.96s/it] {'loss': 0.3544, 'learning_rate': 3.9985e-05, 'epoch': 2.63} 20%|██ | 2007/10000 [7:52:59<46:32:05, 20.96s/it] 20%|██ | 2008/10000 [7:53:13<41:47:36, 18.83s/it] {'loss': 0.3147, 'learning_rate': 3.998e-05, 'epoch': 2.63} 20%|██ | 2008/10000 [7:53:13<41:47:36, 18.83s/it] 20%|██ | 2009/10000 [7:53:27<38:29:10, 17.34s/it] {'loss': 0.3867, 'learning_rate': 3.9975e-05, 'epoch': 2.63} 20%|██ | 2009/10000 [7:53:27<38:29:10, 17.34s/it] 20%|██ | 2010/10000 [7:53:41<36:13:24, 16.32s/it] {'loss': 0.3062, 'learning_rate': 3.9970000000000005e-05, 'epoch': 2.63} 20%|██ | 2010/10000 [7:53:41<36:13:24, 16.32s/it] 20%|██ | 2011/10000 [7:53:55<34:37:57, 15.61s/it] {'loss': 0.3969, 'learning_rate': 3.9965e-05, 'epoch': 2.63} 20%|██ | 2011/10000 [7:53:55<34:37:57, 15.61s/it] 20%|██ | 2012/10000 [7:54:09<33:28:54, 15.09s/it] {'loss': 0.3453, 'learning_rate': 3.9960000000000004e-05, 'epoch': 2.63} 20%|██ | 2012/10000 [7:54:09<33:28:54, 15.09s/it] 20%|██ | 2013/10000 [7:54:22<32:40:52, 14.73s/it] {'loss': 0.3114, 'learning_rate': 3.9955000000000006e-05, 'epoch': 2.63} 20%|██ | 2013/10000 [7:54:22<32:40:52, 14.73s/it] 20%|██ | 2014/10000 [7:54:36<32:07:20, 14.48s/it] {'loss': 0.3963, 'learning_rate': 3.995e-05, 'epoch': 2.64} 20%|██ | 2014/10000 [7:54:36<32:07:20, 14.48s/it] 20%|██ | 2015/10000 [7:54:50<31:41:34, 14.29s/it] {'loss': 0.3387, 'learning_rate': 3.9945000000000005e-05, 'epoch': 2.64} 20%|██ | 2015/10000 [7:54:50<31:41:34, 14.29s/it] 20%|██ | 2016/10000 [7:55:04<31:29:12, 14.20s/it] {'loss': 0.4589, 'learning_rate': 3.994e-05, 'epoch': 2.64} 20%|██ | 2016/10000 [7:55:04<31:29:12, 14.20s/it] 20%|██ | 2017/10000 [7:55:18<31:20:34, 14.13s/it] {'loss': 0.4004, 'learning_rate': 3.9935e-05, 'epoch': 2.64} 20%|██ | 2017/10000 [7:55:18<31:20:34, 14.13s/it] 20%|██ | 2018/10000 [7:55:32<31:13:15, 14.08s/it] {'loss': 0.3234, 'learning_rate': 3.993e-05, 'epoch': 2.64} 20%|██ | 2018/10000 [7:55:32<31:13:15, 14.08s/it] 20%|██ | 2019/10000 [7:55:46<31:04:33, 14.02s/it] {'loss': 0.3351, 'learning_rate': 3.9925e-05, 'epoch': 2.64} 20%|██ | 2019/10000 [7:55:46<31:04:33, 14.02s/it] 20%|██ | 2020/10000 [7:56:00<30:58:47, 13.98s/it] {'loss': 0.3222, 'learning_rate': 3.9920000000000004e-05, 'epoch': 2.64} 20%|██ | 2020/10000 [7:56:00<30:58:47, 13.98s/it] 20%|██ | 2021/10000 [7:56:14<30:58:15, 13.97s/it] {'loss': 0.3768, 'learning_rate': 3.9915e-05, 'epoch': 2.65} 20%|██ | 2021/10000 [7:56:14<30:58:15, 13.97s/it] 20%|██ | 2022/10000 [7:56:28<30:54:55, 13.95s/it] {'loss': 0.4757, 'learning_rate': 3.991e-05, 'epoch': 2.65} 20%|██ | 2022/10000 [7:56:28<30:54:55, 13.95s/it] 20%|██ | 2023/10000 [7:56:41<30:47:34, 13.90s/it] {'loss': 0.3501, 'learning_rate': 3.9905000000000005e-05, 'epoch': 2.65} 20%|██ | 2023/10000 [7:56:42<30:47:34, 13.90s/it] 20%|██ | 2024/10000 [7:56:55<30:45:33, 13.88s/it] {'loss': 0.3616, 'learning_rate': 3.99e-05, 'epoch': 2.65} 20%|██ | 2024/10000 [7:56:55<30:45:33, 13.88s/it] 20%|██ | 2025/10000 [7:57:09<30:45:22, 13.88s/it] {'loss': 0.3546, 'learning_rate': 3.9895000000000003e-05, 'epoch': 2.65} 20%|██ | 2025/10000 [7:57:09<30:45:22, 13.88s/it] 20%|██ | 2026/10000 [7:57:23<30:48:57, 13.91s/it] {'loss': 0.3603, 'learning_rate': 3.989e-05, 'epoch': 2.65} 20%|██ | 2026/10000 [7:57:23<30:48:57, 13.91s/it] 20%|██ | 2027/10000 [7:57:37<30:50:46, 13.93s/it] {'loss': 0.5043, 'learning_rate': 3.9885e-05, 'epoch': 2.65} 20%|██ | 2027/10000 [7:57:37<30:50:46, 13.93s/it] 20%|██ | 2028/10000 [7:57:51<30:50:24, 13.93s/it] {'loss': 0.3673, 'learning_rate': 3.988e-05, 'epoch': 2.65} 20%|██ | 2028/10000 [7:57:51<30:50:24, 13.93s/it] 20%|██ | 2029/10000 [7:58:05<30:54:23, 13.96s/it] {'loss': 0.3671, 'learning_rate': 3.9875e-05, 'epoch': 2.66} 20%|██ | 2029/10000 [7:58:05<30:54:23, 13.96s/it] 20%|██ | 2030/10000 [7:58:19<30:53:04, 13.95s/it] {'loss': 0.3281, 'learning_rate': 3.987e-05, 'epoch': 2.66} 20%|██ | 2030/10000 [7:58:19<30:53:04, 13.95s/it] 20%|██ | 2031/10000 [7:58:33<30:50:29, 13.93s/it] {'loss': 0.3363, 'learning_rate': 3.9865000000000005e-05, 'epoch': 2.66} 20%|██ | 2031/10000 [7:58:33<30:50:29, 13.93s/it] 20%|██ | 2032/10000 [7:58:47<30:44:59, 13.89s/it] {'loss': 0.3211, 'learning_rate': 3.986e-05, 'epoch': 2.66} 20%|██ | 2032/10000 [7:58:47<30:44:59, 13.89s/it] 20%|██ | 2033/10000 [7:59:01<30:44:43, 13.89s/it] {'loss': 0.3283, 'learning_rate': 3.9855000000000004e-05, 'epoch': 2.66} 20%|██ | 2033/10000 [7:59:01<30:44:43, 13.89s/it] 20%|██ | 2034/10000 [7:59:15<30:44:59, 13.90s/it] {'loss': 0.3677, 'learning_rate': 3.9850000000000006e-05, 'epoch': 2.66} 20%|██ | 2034/10000 [7:59:15<30:44:59, 13.90s/it] 20%|██ | 2035/10000 [7:59:29<30:48:20, 13.92s/it] {'loss': 0.3823, 'learning_rate': 3.9845e-05, 'epoch': 2.66} 20%|██ | 2035/10000 [7:59:29<30:48:20, 13.92s/it] 20%|██ | 2036/10000 [7:59:42<30:47:42, 13.92s/it] {'loss': 0.3694, 'learning_rate': 3.984e-05, 'epoch': 2.66} 20%|██ | 2036/10000 [7:59:42<30:47:42, 13.92s/it] 20%|██ | 2037/10000 [7:59:56<30:49:09, 13.93s/it] {'loss': 0.3303, 'learning_rate': 3.9835e-05, 'epoch': 2.67} 20%|██ | 2037/10000 [7:59:56<30:49:09, 13.93s/it] 20%|██ | 2038/10000 [8:00:10<30:51:55, 13.96s/it] {'loss': 0.3785, 'learning_rate': 3.983e-05, 'epoch': 2.67} 20%|██ | 2038/10000 [8:00:10<30:51:55, 13.96s/it] 20%|██ | 2039/10000 [8:00:24<30:48:01, 13.93s/it] {'loss': 0.3727, 'learning_rate': 3.9825e-05, 'epoch': 2.67} 20%|██ | 2039/10000 [8:00:24<30:48:01, 13.93s/it] 20%|██ | 2040/10000 [8:00:38<30:44:36, 13.90s/it] {'loss': 0.3473, 'learning_rate': 3.982e-05, 'epoch': 2.67} 20%|██ | 2040/10000 [8:00:38<30:44:36, 13.90s/it] 20%|██ | 2041/10000 [8:00:52<30:46:55, 13.92s/it] {'loss': 0.3221, 'learning_rate': 3.9815000000000004e-05, 'epoch': 2.67} 20%|██ | 2041/10000 [8:00:52<30:46:55, 13.92s/it] 20%|██ | 2042/10000 [8:01:06<30:50:43, 13.95s/it] {'loss': 0.3174, 'learning_rate': 3.981e-05, 'epoch': 2.67} 20%|██ | 2042/10000 [8:01:06<30:50:43, 13.95s/it] 20%|██ | 2043/10000 [8:01:20<30:49:53, 13.95s/it] {'loss': 0.2973, 'learning_rate': 3.9805e-05, 'epoch': 2.67} 20%|██ | 2043/10000 [8:01:20<30:49:53, 13.95s/it] 20%|██ | 2044/10000 [8:01:34<30:52:45, 13.97s/it] {'loss': 0.36, 'learning_rate': 3.9800000000000005e-05, 'epoch': 2.68} 20%|██ | 2044/10000 [8:01:34<30:52:45, 13.97s/it] 20%|██ | 2045/10000 [8:01:48<30:52:54, 13.98s/it] {'loss': 0.3062, 'learning_rate': 3.979500000000001e-05, 'epoch': 2.68} 20%|██ | 2045/10000 [8:01:48<30:52:54, 13.98s/it] 20%|██ | 2046/10000 [8:02:02<30:47:54, 13.94s/it] {'loss': 0.3585, 'learning_rate': 3.979e-05, 'epoch': 2.68} 20%|██ | 2046/10000 [8:02:02<30:47:54, 13.94s/it] 20%|██ | 2047/10000 [8:02:16<30:49:26, 13.95s/it] {'loss': 0.3475, 'learning_rate': 3.9785e-05, 'epoch': 2.68} 20%|██ | 2047/10000 [8:02:16<30:49:26, 13.95s/it] 20%|██ | 2048/10000 [8:02:30<30:42:48, 13.90s/it] {'loss': 0.3851, 'learning_rate': 3.978e-05, 'epoch': 2.68} 20%|██ | 2048/10000 [8:02:30<30:42:48, 13.90s/it] 20%|██ | 2049/10000 [8:02:44<30:45:18, 13.93s/it] {'loss': 0.4498, 'learning_rate': 3.9775e-05, 'epoch': 2.68} 20%|██ | 2049/10000 [8:02:44<30:45:18, 13.93s/it] 20%|██ | 2050/10000 [8:02:58<30:44:56, 13.92s/it] {'loss': 0.4059, 'learning_rate': 3.977e-05, 'epoch': 2.68} 20%|██ | 2050/10000 [8:02:58<30:44:56, 13.92s/it] 21%|██ | 2051/10000 [8:03:11<30:42:45, 13.91s/it] {'loss': 0.2794, 'learning_rate': 3.9765e-05, 'epoch': 2.68} 21%|██ | 2051/10000 [8:03:12<30:42:45, 13.91s/it] 21%|██ | 2052/10000 [8:03:25<30:46:41, 13.94s/it] {'loss': 0.3834, 'learning_rate': 3.9760000000000006e-05, 'epoch': 2.69} 21%|██ | 2052/10000 [8:03:26<30:46:41, 13.94s/it] 21%|██ | 2053/10000 [8:03:39<30:44:12, 13.92s/it] {'loss': 0.3558, 'learning_rate': 3.9755e-05, 'epoch': 2.69} 21%|██ | 2053/10000 [8:03:39<30:44:12, 13.92s/it] 21%|██ | 2054/10000 [8:03:53<30:47:28, 13.95s/it] {'loss': 0.4561, 'learning_rate': 3.9750000000000004e-05, 'epoch': 2.69} 21%|██ | 2054/10000 [8:03:53<30:47:28, 13.95s/it] 21%|██ | 2055/10000 [8:04:07<30:45:52, 13.94s/it] {'loss': 0.3138, 'learning_rate': 3.9745000000000007e-05, 'epoch': 2.69} 21%|██ | 2055/10000 [8:04:07<30:45:52, 13.94s/it] 21%|██ | 2056/10000 [8:04:21<30:40:56, 13.90s/it] {'loss': 0.3181, 'learning_rate': 3.974e-05, 'epoch': 2.69} 21%|██ | 2056/10000 [8:04:21<30:40:56, 13.90s/it] 21%|██ | 2057/10000 [8:04:35<30:37:48, 13.88s/it] {'loss': 0.3657, 'learning_rate': 3.9735e-05, 'epoch': 2.69} 21%|██ | 2057/10000 [8:04:35<30:37:48, 13.88s/it] 21%|██ | 2058/10000 [8:04:49<30:32:34, 13.84s/it] {'loss': 0.3915, 'learning_rate': 3.973e-05, 'epoch': 2.69} 21%|██ | 2058/10000 [8:04:49<30:32:34, 13.84s/it] 21%|██ | 2059/10000 [8:05:03<30:32:35, 13.85s/it] {'loss': 0.2536, 'learning_rate': 3.9725e-05, 'epoch': 2.7} 21%|██ | 2059/10000 [8:05:03<30:32:35, 13.85s/it] 21%|██ | 2060/10000 [8:05:16<30:32:06, 13.84s/it] {'loss': 0.3201, 'learning_rate': 3.972e-05, 'epoch': 2.7} 21%|██ | 2060/10000 [8:05:16<30:32:06, 13.84s/it] 21%|██ | 2061/10000 [8:05:30<30:32:09, 13.85s/it] {'loss': 0.2973, 'learning_rate': 3.9715e-05, 'epoch': 2.7} 21%|██ | 2061/10000 [8:05:30<30:32:09, 13.85s/it] 21%|██ | 2062/10000 [8:05:44<30:34:45, 13.87s/it] {'loss': 0.2983, 'learning_rate': 3.9710000000000004e-05, 'epoch': 2.7} 21%|██ | 2062/10000 [8:05:44<30:34:45, 13.87s/it] 21%|██ | 2063/10000 [8:05:58<30:37:36, 13.89s/it] {'loss': 0.4328, 'learning_rate': 3.9705e-05, 'epoch': 2.7} 21%|██ | 2063/10000 [8:05:58<30:37:36, 13.89s/it] 21%|██ | 2064/10000 [8:06:12<30:33:35, 13.86s/it] {'loss': 0.3735, 'learning_rate': 3.97e-05, 'epoch': 2.7} 21%|██ | 2064/10000 [8:06:12<30:33:35, 13.86s/it] 21%|██ | 2065/10000 [8:06:26<30:41:20, 13.92s/it] {'loss': 0.3353, 'learning_rate': 3.9695000000000005e-05, 'epoch': 2.7} 21%|██ | 2065/10000 [8:06:26<30:41:20, 13.92s/it] 21%|██ | 2066/10000 [8:06:40<30:42:16, 13.93s/it] {'loss': 0.3085, 'learning_rate': 3.969e-05, 'epoch': 2.7} 21%|██ | 2066/10000 [8:06:40<30:42:16, 13.93s/it] 21%|██ | 2067/10000 [8:06:54<30:43:14, 13.94s/it] {'loss': 0.3053, 'learning_rate': 3.9685e-05, 'epoch': 2.71} 21%|██ | 2067/10000 [8:06:54<30:43:14, 13.94s/it] 21%|██ | 2068/10000 [8:07:08<30:42:14, 13.94s/it] {'loss': 0.3517, 'learning_rate': 3.968e-05, 'epoch': 2.71} 21%|██ | 2068/10000 [8:07:08<30:42:14, 13.94s/it] 21%|██ | 2069/10000 [8:07:22<30:42:39, 13.94s/it] {'loss': 0.3695, 'learning_rate': 3.9675e-05, 'epoch': 2.71} 21%|██ | 2069/10000 [8:07:22<30:42:39, 13.94s/it] 21%|██ | 2070/10000 [8:07:36<30:43:02, 13.94s/it] {'loss': 0.3424, 'learning_rate': 3.9670000000000005e-05, 'epoch': 2.71} 21%|██ | 2070/10000 [8:07:36<30:43:02, 13.94s/it] 21%|██ | 2071/10000 [8:07:50<30:42:50, 13.95s/it] {'loss': 0.3276, 'learning_rate': 3.9665e-05, 'epoch': 2.71} 21%|██ | 2071/10000 [8:07:50<30:42:50, 13.95s/it] 21%|██ | 2072/10000 [8:08:04<30:39:21, 13.92s/it] {'loss': 0.3246, 'learning_rate': 3.966e-05, 'epoch': 2.71} 21%|██ | 2072/10000 [8:08:04<30:39:21, 13.92s/it] 21%|██ | 2073/10000 [8:08:17<30:38:26, 13.92s/it] {'loss': 0.3121, 'learning_rate': 3.9655000000000006e-05, 'epoch': 2.71} 21%|██ | 2073/10000 [8:08:17<30:38:26, 13.92s/it] 21%|██ | 2074/10000 [8:08:31<30:35:42, 13.90s/it] {'loss': 0.2926, 'learning_rate': 3.965e-05, 'epoch': 2.71} 21%|██ | 2074/10000 [8:08:31<30:35:42, 13.90s/it] 21%|██ | 2075/10000 [8:08:45<30:37:51, 13.91s/it] {'loss': 0.4189, 'learning_rate': 3.9645000000000004e-05, 'epoch': 2.72} 21%|██ | 2075/10000 [8:08:45<30:37:51, 13.91s/it] 21%|██ | 2076/10000 [8:08:59<30:39:17, 13.93s/it] {'loss': 0.3517, 'learning_rate': 3.964e-05, 'epoch': 2.72} 21%|██ | 2076/10000 [8:08:59<30:39:17, 13.93s/it] 21%|██ | 2077/10000 [8:09:13<30:39:25, 13.93s/it] {'loss': 0.3133, 'learning_rate': 3.9635e-05, 'epoch': 2.72} 21%|██ | 2077/10000 [8:09:13<30:39:25, 13.93s/it] 21%|██ | 2078/10000 [8:09:27<30:41:57, 13.95s/it] {'loss': 0.3965, 'learning_rate': 3.963e-05, 'epoch': 2.72} 21%|██ | 2078/10000 [8:09:27<30:41:57, 13.95s/it] 21%|██ | 2079/10000 [8:09:41<30:38:15, 13.92s/it] {'loss': 0.4023, 'learning_rate': 3.9625e-05, 'epoch': 2.72} 21%|██ | 2079/10000 [8:09:41<30:38:15, 13.92s/it] 21%|██ | 2080/10000 [8:09:55<30:37:41, 13.92s/it] {'loss': 0.3506, 'learning_rate': 3.9620000000000004e-05, 'epoch': 2.72} 21%|██ | 2080/10000 [8:09:55<30:37:41, 13.92s/it] 21%|██ | 2081/10000 [8:10:09<30:30:09, 13.87s/it] {'loss': 0.3768, 'learning_rate': 3.9615e-05, 'epoch': 2.72} 21%|██ | 2081/10000 [8:10:09<30:30:09, 13.87s/it] 21%|██ | 2082/10000 [8:10:23<30:32:49, 13.89s/it] {'loss': 0.3514, 'learning_rate': 3.961e-05, 'epoch': 2.73} 21%|██ | 2082/10000 [8:10:23<30:32:49, 13.89s/it] 21%|██ | 2083/10000 [8:10:36<30:29:59, 13.87s/it] {'loss': 0.3087, 'learning_rate': 3.9605000000000005e-05, 'epoch': 2.73} 21%|██ | 2083/10000 [8:10:36<30:29:59, 13.87s/it] 21%|██ | 2084/10000 [8:10:50<30:31:51, 13.88s/it] {'loss': 0.3844, 'learning_rate': 3.960000000000001e-05, 'epoch': 2.73} 21%|██ | 2084/10000 [8:10:50<30:31:51, 13.88s/it] 21%|██ | 2085/10000 [8:11:04<30:29:09, 13.87s/it] {'loss': 0.4455, 'learning_rate': 3.9595e-05, 'epoch': 2.73} 21%|██ | 2085/10000 [8:11:04<30:29:09, 13.87s/it] 21%|██ | 2086/10000 [8:11:18<30:29:08, 13.87s/it] {'loss': 0.39, 'learning_rate': 3.959e-05, 'epoch': 2.73} 21%|██ | 2086/10000 [8:11:18<30:29:08, 13.87s/it] 21%|██ | 2087/10000 [8:11:32<30:35:36, 13.92s/it] {'loss': 0.3586, 'learning_rate': 3.9585e-05, 'epoch': 2.73} 21%|██ | 2087/10000 [8:11:32<30:35:36, 13.92s/it] 21%|██ | 2088/10000 [8:11:46<30:37:28, 13.93s/it] {'loss': 0.3862, 'learning_rate': 3.958e-05, 'epoch': 2.73} 21%|██ | 2088/10000 [8:11:46<30:37:28, 13.93s/it] 21%|██ | 2089/10000 [8:12:00<30:34:34, 13.91s/it] {'loss': 0.3552, 'learning_rate': 3.9575e-05, 'epoch': 2.73} 21%|██ | 2089/10000 [8:12:00<30:34:34, 13.91s/it] 21%|██ | 2090/10000 [8:12:14<30:32:15, 13.90s/it] {'loss': 0.3988, 'learning_rate': 3.957e-05, 'epoch': 2.74} 21%|██ | 2090/10000 [8:12:14<30:32:15, 13.90s/it] 21%|██ | 2091/10000 [8:12:28<30:33:50, 13.91s/it] {'loss': 0.4096, 'learning_rate': 3.9565000000000005e-05, 'epoch': 2.74} 21%|██ | 2091/10000 [8:12:28<30:33:50, 13.91s/it] 21%|██ | 2092/10000 [8:12:42<30:33:30, 13.91s/it] {'loss': 0.3731, 'learning_rate': 3.956e-05, 'epoch': 2.74} 21%|██ | 2092/10000 [8:12:42<30:33:30, 13.91s/it] 21%|██ | 2093/10000 [8:12:55<30:30:53, 13.89s/it] {'loss': 0.3867, 'learning_rate': 3.9555e-05, 'epoch': 2.74} 21%|██ | 2093/10000 [8:12:55<30:30:53, 13.89s/it] 21%|██ | 2094/10000 [8:13:09<30:33:26, 13.91s/it] {'loss': 0.3715, 'learning_rate': 3.9550000000000006e-05, 'epoch': 2.74} 21%|██ | 2094/10000 [8:13:09<30:33:26, 13.91s/it] 21%|██ | 2095/10000 [8:13:23<30:31:53, 13.90s/it] {'loss': 0.315, 'learning_rate': 3.9545e-05, 'epoch': 2.74} 21%|██ | 2095/10000 [8:13:23<30:31:53, 13.90s/it] 21%|██ | 2096/10000 [8:13:37<30:31:44, 13.90s/it] {'loss': 0.3273, 'learning_rate': 3.954e-05, 'epoch': 2.74} 21%|██ | 2096/10000 [8:13:37<30:31:44, 13.90s/it] 21%|██ | 2097/10000 [8:13:51<30:32:16, 13.91s/it] {'loss': 0.3192, 'learning_rate': 3.9535e-05, 'epoch': 2.74} 21%|██ | 2097/10000 [8:13:51<30:32:16, 13.91s/it] 21%|██ | 2098/10000 [8:14:05<30:27:58, 13.88s/it] {'loss': 0.2966, 'learning_rate': 3.953e-05, 'epoch': 2.75} 21%|██ | 2098/10000 [8:14:05<30:27:58, 13.88s/it] 21%|██ | 2099/10000 [8:14:19<30:29:13, 13.89s/it] {'loss': 0.3566, 'learning_rate': 3.9525e-05, 'epoch': 2.75} 21%|██ | 2099/10000 [8:14:19<30:29:13, 13.89s/it] 21%|██ | 2100/10000 [8:14:33<30:35:59, 13.94s/it] {'loss': 0.2955, 'learning_rate': 3.952e-05, 'epoch': 2.75} 21%|██ | 2100/10000 [8:14:33<30:35:59, 13.94s/it] 21%|██ | 2101/10000 [8:14:47<30:33:47, 13.93s/it] {'loss': 0.406, 'learning_rate': 3.9515000000000004e-05, 'epoch': 2.75} 21%|██ | 2101/10000 [8:14:47<30:33:47, 13.93s/it] 21%|██ | 2102/10000 [8:15:01<30:36:31, 13.95s/it] {'loss': 0.3417, 'learning_rate': 3.951e-05, 'epoch': 2.75} 21%|██ | 2102/10000 [8:15:01<30:36:31, 13.95s/it] 21%|██ | 2103/10000 [8:15:15<30:31:47, 13.92s/it] {'loss': 0.3727, 'learning_rate': 3.9505e-05, 'epoch': 2.75} 21%|██ | 2103/10000 [8:15:15<30:31:47, 13.92s/it] 21%|██ | 2104/10000 [8:15:29<30:29:51, 13.90s/it] {'loss': 0.3218, 'learning_rate': 3.9500000000000005e-05, 'epoch': 2.75} 21%|██ | 2104/10000 [8:15:29<30:29:51, 13.90s/it] 21%|██ | 2105/10000 [8:15:42<30:26:34, 13.88s/it] {'loss': 0.3099, 'learning_rate': 3.949500000000001e-05, 'epoch': 2.76} 21%|██ | 2105/10000 [8:15:42<30:26:34, 13.88s/it] 21%|██ | 2106/10000 [8:15:56<30:28:51, 13.90s/it] {'loss': 0.3741, 'learning_rate': 3.9489999999999996e-05, 'epoch': 2.76} 21%|██ | 2106/10000 [8:15:56<30:28:51, 13.90s/it] 21%|██ | 2107/10000 [8:16:10<30:27:56, 13.90s/it] {'loss': 0.3754, 'learning_rate': 3.9485e-05, 'epoch': 2.76} 21%|██ | 2107/10000 [8:16:10<30:27:56, 13.90s/it] 21%|██ | 2108/10000 [8:16:24<30:28:30, 13.90s/it] {'loss': 0.4158, 'learning_rate': 3.948e-05, 'epoch': 2.76} 21%|██ | 2108/10000 [8:16:24<30:28:30, 13.90s/it] 21%|██ | 2109/10000 [8:16:38<30:32:19, 13.93s/it] {'loss': 0.3372, 'learning_rate': 3.9475000000000004e-05, 'epoch': 2.76} 21%|██ | 2109/10000 [8:16:38<30:32:19, 13.93s/it] 21%|██ | 2110/10000 [8:16:52<30:27:08, 13.89s/it] {'loss': 0.3395, 'learning_rate': 3.947e-05, 'epoch': 2.76} 21%|██ | 2110/10000 [8:16:52<30:27:08, 13.89s/it] 21%|██ | 2111/10000 [8:17:06<30:26:02, 13.89s/it] {'loss': 0.4418, 'learning_rate': 3.9465e-05, 'epoch': 2.76} 21%|██ | 2111/10000 [8:17:06<30:26:02, 13.89s/it] 21%|██ | 2112/10000 [8:17:20<30:26:37, 13.89s/it] {'loss': 0.3166, 'learning_rate': 3.9460000000000005e-05, 'epoch': 2.76} 21%|██ | 2112/10000 [8:17:20<30:26:37, 13.89s/it] 21%|██ | 2113/10000 [8:17:34<30:26:52, 13.90s/it] {'loss': 0.3565, 'learning_rate': 3.9455e-05, 'epoch': 2.77} 21%|██ | 2113/10000 [8:17:34<30:26:52, 13.90s/it] 21%|██ | 2114/10000 [8:17:48<30:27:54, 13.91s/it] {'loss': 0.3647, 'learning_rate': 3.9450000000000003e-05, 'epoch': 2.77} 21%|██ | 2114/10000 [8:17:48<30:27:54, 13.91s/it] 21%|██ | 2115/10000 [8:18:02<30:33:43, 13.95s/it] {'loss': 0.3598, 'learning_rate': 3.9445000000000006e-05, 'epoch': 2.77} 21%|██ | 2115/10000 [8:18:02<30:33:43, 13.95s/it] 21%|██ | 2116/10000 [8:18:16<30:35:16, 13.97s/it] {'loss': 0.3991, 'learning_rate': 3.944e-05, 'epoch': 2.77} 21%|██ | 2116/10000 [8:18:16<30:35:16, 13.97s/it] 21%|██ | 2117/10000 [8:18:29<30:29:58, 13.93s/it] {'loss': 0.3914, 'learning_rate': 3.9435e-05, 'epoch': 2.77} 21%|██ | 2117/10000 [8:18:29<30:29:58, 13.93s/it] 21%|██ | 2118/10000 [8:18:43<30:26:14, 13.90s/it] {'loss': 0.2433, 'learning_rate': 3.943e-05, 'epoch': 2.77} 21%|██ | 2118/10000 [8:18:43<30:26:14, 13.90s/it] 21%|██ | 2119/10000 [8:18:57<30:29:12, 13.93s/it] {'loss': 0.3602, 'learning_rate': 3.9425e-05, 'epoch': 2.77} 21%|██ | 2119/10000 [8:18:57<30:29:12, 13.93s/it] 21%|██ | 2120/10000 [8:19:11<30:31:59, 13.95s/it] {'loss': 0.3086, 'learning_rate': 3.942e-05, 'epoch': 2.77} 21%|██ | 2120/10000 [8:19:11<30:31:59, 13.95s/it] 21%|██ | 2121/10000 [8:19:25<30:28:50, 13.93s/it] {'loss': 0.3214, 'learning_rate': 3.9415e-05, 'epoch': 2.78} 21%|██ | 2121/10000 [8:19:25<30:28:50, 13.93s/it] 21%|██ | 2122/10000 [8:19:39<30:28:40, 13.93s/it] {'loss': 0.4753, 'learning_rate': 3.9410000000000004e-05, 'epoch': 2.78} 21%|██ | 2122/10000 [8:19:39<30:28:40, 13.93s/it] 21%|██ | 2123/10000 [8:19:53<30:25:11, 13.90s/it] {'loss': 0.3987, 'learning_rate': 3.9405e-05, 'epoch': 2.78} 21%|██ | 2123/10000 [8:19:53<30:25:11, 13.90s/it] 21%|██ | 2124/10000 [8:20:07<30:22:27, 13.88s/it] {'loss': 0.3298, 'learning_rate': 3.94e-05, 'epoch': 2.78} 21%|██ | 2124/10000 [8:20:07<30:22:27, 13.88s/it] 21%|██▏ | 2125/10000 [8:20:21<30:20:17, 13.87s/it] {'loss': 0.3705, 'learning_rate': 3.9395000000000005e-05, 'epoch': 2.78} 21%|██▏ | 2125/10000 [8:20:21<30:20:17, 13.87s/it] 21%|██▏ | 2126/10000 [8:20:35<30:26:46, 13.92s/it] {'loss': 0.3801, 'learning_rate': 3.939e-05, 'epoch': 2.78} 21%|██▏ | 2126/10000 [8:20:35<30:26:46, 13.92s/it] 21%|██▏ | 2127/10000 [8:20:49<30:37:19, 14.00s/it] {'loss': 0.3036, 'learning_rate': 3.9384999999999996e-05, 'epoch': 2.78} 21%|██▏ | 2127/10000 [8:20:49<30:37:19, 14.00s/it] 21%|██▏ | 2128/10000 [8:21:03<30:29:00, 13.94s/it] {'loss': 0.2943, 'learning_rate': 3.938e-05, 'epoch': 2.79} 21%|██▏ | 2128/10000 [8:21:03<30:29:00, 13.94s/it] 21%|██▏ | 2129/10000 [8:21:17<30:28:47, 13.94s/it] {'loss': 0.3567, 'learning_rate': 3.9375e-05, 'epoch': 2.79} 21%|██▏ | 2129/10000 [8:21:17<30:28:47, 13.94s/it] 21%|██▏ | 2130/10000 [8:21:30<30:28:33, 13.94s/it] {'loss': 0.3797, 'learning_rate': 3.9370000000000004e-05, 'epoch': 2.79} 21%|██▏ | 2130/10000 [8:21:31<30:28:33, 13.94s/it] 21%|██▏ | 2131/10000 [8:21:44<30:25:22, 13.92s/it] {'loss': 0.3006, 'learning_rate': 3.9365e-05, 'epoch': 2.79} 21%|██▏ | 2131/10000 [8:21:44<30:25:22, 13.92s/it] 21%|██▏ | 2132/10000 [8:21:58<30:27:37, 13.94s/it] {'loss': 0.2769, 'learning_rate': 3.936e-05, 'epoch': 2.79} 21%|██▏ | 2132/10000 [8:21:58<30:27:37, 13.94s/it] 21%|██▏ | 2133/10000 [8:22:12<30:27:17, 13.94s/it] {'loss': 0.3399, 'learning_rate': 3.9355000000000005e-05, 'epoch': 2.79} 21%|██▏ | 2133/10000 [8:22:12<30:27:17, 13.94s/it] 21%|██▏ | 2134/10000 [8:22:26<30:28:08, 13.94s/it] {'loss': 0.3453, 'learning_rate': 3.935e-05, 'epoch': 2.79} 21%|██▏ | 2134/10000 [8:22:26<30:28:08, 13.94s/it] 21%|██▏ | 2135/10000 [8:22:40<30:26:57, 13.94s/it] {'loss': 0.3499, 'learning_rate': 3.9345000000000004e-05, 'epoch': 2.79} 21%|██▏ | 2135/10000 [8:22:40<30:26:57, 13.94s/it] 21%|██▏ | 2136/10000 [8:22:54<30:27:10, 13.94s/it] {'loss': 0.3166, 'learning_rate': 3.9340000000000006e-05, 'epoch': 2.8} 21%|██▏ | 2136/10000 [8:22:54<30:27:10, 13.94s/it] 21%|██▏ | 2137/10000 [8:23:08<30:30:00, 13.96s/it] {'loss': 0.3433, 'learning_rate': 3.9335e-05, 'epoch': 2.8} 21%|██▏ | 2137/10000 [8:23:08<30:30:00, 13.96s/it] 21%|██▏ | 2138/10000 [8:23:22<30:26:21, 13.94s/it] {'loss': 0.379, 'learning_rate': 3.933e-05, 'epoch': 2.8} 21%|██▏ | 2138/10000 [8:23:22<30:26:21, 13.94s/it] 21%|██▏ | 2139/10000 [8:23:36<30:22:10, 13.91s/it] {'loss': 0.3577, 'learning_rate': 3.9325e-05, 'epoch': 2.8} 21%|██▏ | 2139/10000 [8:23:36<30:22:10, 13.91s/it] 21%|██▏ | 2140/10000 [8:23:50<30:21:26, 13.90s/it] {'loss': 0.322, 'learning_rate': 3.932e-05, 'epoch': 2.8} 21%|██▏ | 2140/10000 [8:23:50<30:21:26, 13.90s/it] 21%|██▏ | 2141/10000 [8:24:04<30:22:05, 13.91s/it] {'loss': 0.3339, 'learning_rate': 3.9315e-05, 'epoch': 2.8} 21%|██▏ | 2141/10000 [8:24:04<30:22:05, 13.91s/it] 21%|██▏ | 2142/10000 [8:24:18<30:19:57, 13.90s/it] {'loss': 0.3165, 'learning_rate': 3.931e-05, 'epoch': 2.8} 21%|██▏ | 2142/10000 [8:24:18<30:19:57, 13.90s/it] 21%|██▏ | 2143/10000 [8:24:31<30:22:10, 13.92s/it] {'loss': 0.4252, 'learning_rate': 3.9305000000000004e-05, 'epoch': 2.8} 21%|██▏ | 2143/10000 [8:24:32<30:22:10, 13.92s/it] 21%|██▏ | 2144/10000 [8:24:45<30:21:50, 13.91s/it] {'loss': 0.3811, 'learning_rate': 3.9300000000000007e-05, 'epoch': 2.81} 21%|██▏ | 2144/10000 [8:24:45<30:21:50, 13.91s/it] 21%|██▏ | 2145/10000 [8:24:59<30:20:01, 13.90s/it] {'loss': 0.3691, 'learning_rate': 3.9295e-05, 'epoch': 2.81} 21%|██▏ | 2145/10000 [8:24:59<30:20:01, 13.90s/it] 21%|██▏ | 2146/10000 [8:25:13<30:19:42, 13.90s/it] {'loss': 0.4013, 'learning_rate': 3.9290000000000005e-05, 'epoch': 2.81} 21%|██▏ | 2146/10000 [8:25:13<30:19:42, 13.90s/it] 21%|██▏ | 2147/10000 [8:25:27<30:20:31, 13.91s/it] {'loss': 0.3839, 'learning_rate': 3.9285e-05, 'epoch': 2.81} 21%|██▏ | 2147/10000 [8:25:27<30:20:31, 13.91s/it] 21%|██▏ | 2148/10000 [8:25:41<30:18:24, 13.90s/it] {'loss': 0.3021, 'learning_rate': 3.9280000000000003e-05, 'epoch': 2.81} 21%|██▏ | 2148/10000 [8:25:41<30:18:24, 13.90s/it] 21%|██▏ | 2149/10000 [8:25:55<30:20:20, 13.91s/it] {'loss': 0.3311, 'learning_rate': 3.9275e-05, 'epoch': 2.81} 21%|██▏ | 2149/10000 [8:25:55<30:20:20, 13.91s/it] 22%|██▏ | 2150/10000 [8:26:09<30:19:10, 13.90s/it] {'loss': 0.3191, 'learning_rate': 3.927e-05, 'epoch': 2.81} 22%|██▏ | 2150/10000 [8:26:09<30:19:10, 13.90s/it] 22%|██▏ | 2151/10000 [8:26:23<30:26:27, 13.96s/it] {'loss': 0.3228, 'learning_rate': 3.9265000000000004e-05, 'epoch': 2.82} 22%|██▏ | 2151/10000 [8:26:23<30:26:27, 13.96s/it] 22%|██▏ | 2152/10000 [8:26:37<30:25:35, 13.96s/it] {'loss': 0.2992, 'learning_rate': 3.926e-05, 'epoch': 2.82} 22%|██▏ | 2152/10000 [8:26:37<30:25:35, 13.96s/it] 22%|██▏ | 2153/10000 [8:26:51<30:19:07, 13.91s/it] {'loss': 0.3077, 'learning_rate': 3.9255e-05, 'epoch': 2.82} 22%|██▏ | 2153/10000 [8:26:51<30:19:07, 13.91s/it] 22%|██▏ | 2154/10000 [8:27:05<30:21:48, 13.93s/it] {'loss': 0.3489, 'learning_rate': 3.9250000000000005e-05, 'epoch': 2.82} 22%|██▏ | 2154/10000 [8:27:05<30:21:48, 13.93s/it] 22%|██▏ | 2155/10000 [8:27:18<30:17:37, 13.90s/it] {'loss': 0.3957, 'learning_rate': 3.9245e-05, 'epoch': 2.82} 22%|██▏ | 2155/10000 [8:27:19<30:17:37, 13.90s/it] 22%|██▏ | 2156/10000 [8:27:32<30:21:27, 13.93s/it] {'loss': 0.3527, 'learning_rate': 3.9240000000000004e-05, 'epoch': 2.82} 22%|██▏ | 2156/10000 [8:27:33<30:21:27, 13.93s/it] 22%|██▏ | 2157/10000 [8:27:46<30:24:39, 13.96s/it] {'loss': 0.4673, 'learning_rate': 3.9235e-05, 'epoch': 2.82} 22%|██▏ | 2157/10000 [8:27:47<30:24:39, 13.96s/it] 22%|██▏ | 2158/10000 [8:28:00<30:23:06, 13.95s/it] {'loss': 0.3519, 'learning_rate': 3.923e-05, 'epoch': 2.82} 22%|██▏ | 2158/10000 [8:28:00<30:23:06, 13.95s/it] 22%|██▏ | 2159/10000 [8:28:14<30:20:11, 13.93s/it] {'loss': 0.3471, 'learning_rate': 3.9225e-05, 'epoch': 2.83} 22%|██▏ | 2159/10000 [8:28:14<30:20:11, 13.93s/it] 22%|██▏ | 2160/10000 [8:28:28<30:20:32, 13.93s/it] {'loss': 0.3161, 'learning_rate': 3.922e-05, 'epoch': 2.83} 22%|██▏ | 2160/10000 [8:28:28<30:20:32, 13.93s/it] 22%|██▏ | 2161/10000 [8:28:42<30:24:42, 13.97s/it] {'loss': 0.3125, 'learning_rate': 3.9215e-05, 'epoch': 2.83} 22%|██▏ | 2161/10000 [8:28:42<30:24:42, 13.97s/it] 22%|██▏ | 2162/10000 [8:28:56<30:31:22, 14.02s/it] {'loss': 0.3479, 'learning_rate': 3.921e-05, 'epoch': 2.83} 22%|██▏ | 2162/10000 [8:28:56<30:31:22, 14.02s/it] 22%|██▏ | 2163/10000 [8:29:10<30:24:44, 13.97s/it] {'loss': 0.35, 'learning_rate': 3.9205e-05, 'epoch': 2.83} 22%|██▏ | 2163/10000 [8:29:10<30:24:44, 13.97s/it] 22%|██▏ | 2164/10000 [8:29:24<30:20:06, 13.94s/it] {'loss': 0.3094, 'learning_rate': 3.9200000000000004e-05, 'epoch': 2.83} 22%|██▏ | 2164/10000 [8:29:24<30:20:06, 13.94s/it] 22%|██▏ | 2165/10000 [8:29:38<30:19:09, 13.93s/it] {'loss': 0.4238, 'learning_rate': 3.919500000000001e-05, 'epoch': 2.83} 22%|██▏ | 2165/10000 [8:29:38<30:19:09, 13.93s/it] 22%|██▏ | 2166/10000 [8:29:52<30:14:16, 13.90s/it] {'loss': 0.3569, 'learning_rate': 3.919e-05, 'epoch': 2.84} 22%|██▏ | 2166/10000 [8:29:52<30:14:16, 13.90s/it] 22%|██▏ | 2167/10000 [8:30:06<30:14:52, 13.90s/it] {'loss': 0.2969, 'learning_rate': 3.9185e-05, 'epoch': 2.84} 22%|██▏ | 2167/10000 [8:30:06<30:14:52, 13.90s/it] 22%|██▏ | 2168/10000 [8:30:20<30:18:50, 13.93s/it] {'loss': 0.4248, 'learning_rate': 3.918e-05, 'epoch': 2.84} 22%|██▏ | 2168/10000 [8:30:20<30:18:50, 13.93s/it] 22%|██▏ | 2169/10000 [8:30:34<30:13:55, 13.90s/it] {'loss': 0.3741, 'learning_rate': 3.9175000000000004e-05, 'epoch': 2.84} 22%|██▏ | 2169/10000 [8:30:34<30:13:55, 13.90s/it] 22%|██▏ | 2170/10000 [8:30:48<30:17:41, 13.93s/it] {'loss': 0.3347, 'learning_rate': 3.917e-05, 'epoch': 2.84} 22%|██▏ | 2170/10000 [8:30:48<30:17:41, 13.93s/it] 22%|██▏ | 2171/10000 [8:31:01<30:14:10, 13.90s/it] {'loss': 0.3477, 'learning_rate': 3.9165e-05, 'epoch': 2.84} 22%|██▏ | 2171/10000 [8:31:01<30:14:10, 13.90s/it] 22%|██▏ | 2172/10000 [8:31:15<30:10:09, 13.87s/it] {'loss': 0.4127, 'learning_rate': 3.9160000000000005e-05, 'epoch': 2.84} 22%|██▏ | 2172/10000 [8:31:15<30:10:09, 13.87s/it] 22%|██▏ | 2173/10000 [8:31:29<30:16:42, 13.93s/it] {'loss': 0.3885, 'learning_rate': 3.9155e-05, 'epoch': 2.84} 22%|██▏ | 2173/10000 [8:31:29<30:16:42, 13.93s/it] 22%|██▏ | 2174/10000 [8:31:43<30:17:47, 13.94s/it] {'loss': 0.34, 'learning_rate': 3.915e-05, 'epoch': 2.85} 22%|██▏ | 2174/10000 [8:31:43<30:17:47, 13.94s/it] 22%|██▏ | 2175/10000 [8:31:57<30:23:01, 13.98s/it] {'loss': 0.3091, 'learning_rate': 3.9145000000000006e-05, 'epoch': 2.85} 22%|██▏ | 2175/10000 [8:31:57<30:23:01, 13.98s/it] 22%|██▏ | 2176/10000 [8:32:11<30:20:02, 13.96s/it] {'loss': 0.3324, 'learning_rate': 3.914e-05, 'epoch': 2.85} 22%|██▏ | 2176/10000 [8:32:11<30:20:02, 13.96s/it] 22%|██▏ | 2177/10000 [8:32:25<30:18:48, 13.95s/it] {'loss': 0.3018, 'learning_rate': 3.9135e-05, 'epoch': 2.85} 22%|██▏ | 2177/10000 [8:32:25<30:18:48, 13.95s/it] 22%|██▏ | 2178/10000 [8:32:39<30:18:30, 13.95s/it] {'loss': 0.3195, 'learning_rate': 3.913e-05, 'epoch': 2.85} 22%|██▏ | 2178/10000 [8:32:39<30:18:30, 13.95s/it] 22%|██▏ | 2179/10000 [8:32:53<30:22:48, 13.98s/it] {'loss': 0.3756, 'learning_rate': 3.9125e-05, 'epoch': 2.85} 22%|██▏ | 2179/10000 [8:32:53<30:22:48, 13.98s/it] 22%|██▏ | 2180/10000 [8:33:07<30:23:00, 13.99s/it] {'loss': 0.3127, 'learning_rate': 3.912e-05, 'epoch': 2.85} 22%|██▏ | 2180/10000 [8:33:07<30:23:00, 13.99s/it] 22%|██▏ | 2181/10000 [8:33:21<30:19:11, 13.96s/it] {'loss': 0.3331, 'learning_rate': 3.9115e-05, 'epoch': 2.85} 22%|██▏ | 2181/10000 [8:33:21<30:19:11, 13.96s/it] 22%|██▏ | 2182/10000 [8:33:35<30:12:34, 13.91s/it] {'loss': 0.3245, 'learning_rate': 3.911e-05, 'epoch': 2.86} 22%|██▏ | 2182/10000 [8:33:35<30:12:34, 13.91s/it] 22%|██▏ | 2183/10000 [8:33:49<30:18:41, 13.96s/it] {'loss': 0.3203, 'learning_rate': 3.9105000000000006e-05, 'epoch': 2.86} 22%|██▏ | 2183/10000 [8:33:49<30:18:41, 13.96s/it] 22%|██▏ | 2184/10000 [8:34:03<30:18:06, 13.96s/it] {'loss': 0.3262, 'learning_rate': 3.91e-05, 'epoch': 2.86} 22%|██▏ | 2184/10000 [8:34:03<30:18:06, 13.96s/it] 22%|██▏ | 2185/10000 [8:34:17<30:17:06, 13.95s/it] {'loss': 0.4796, 'learning_rate': 3.9095000000000004e-05, 'epoch': 2.86} 22%|██▏ | 2185/10000 [8:34:17<30:17:06, 13.95s/it] 22%|██▏ | 2186/10000 [8:34:31<30:17:51, 13.96s/it] {'loss': 0.3845, 'learning_rate': 3.909000000000001e-05, 'epoch': 2.86} 22%|██▏ | 2186/10000 [8:34:31<30:17:51, 13.96s/it] 22%|██▏ | 2187/10000 [8:34:45<30:14:51, 13.94s/it] {'loss': 0.2725, 'learning_rate': 3.9085e-05, 'epoch': 2.86} 22%|██▏ | 2187/10000 [8:34:45<30:14:51, 13.94s/it] 22%|██▏ | 2188/10000 [8:34:59<30:14:09, 13.93s/it] {'loss': 0.3443, 'learning_rate': 3.908e-05, 'epoch': 2.86} 22%|██▏ | 2188/10000 [8:34:59<30:14:09, 13.93s/it] 22%|██▏ | 2189/10000 [8:35:13<30:13:19, 13.93s/it] {'loss': 0.3239, 'learning_rate': 3.9075e-05, 'epoch': 2.87} 22%|██▏ | 2189/10000 [8:35:13<30:13:19, 13.93s/it] 22%|██▏ | 2190/10000 [8:35:27<30:14:37, 13.94s/it] {'loss': 0.3233, 'learning_rate': 3.9070000000000004e-05, 'epoch': 2.87} 22%|██▏ | 2190/10000 [8:35:27<30:14:37, 13.94s/it] 22%|██▏ | 2191/10000 [8:35:40<30:12:42, 13.93s/it] {'loss': 0.2848, 'learning_rate': 3.9065e-05, 'epoch': 2.87} 22%|██▏ | 2191/10000 [8:35:40<30:12:42, 13.93s/it] 22%|██▏ | 2192/10000 [8:35:54<30:12:13, 13.93s/it] {'loss': 0.4323, 'learning_rate': 3.906e-05, 'epoch': 2.87} 22%|██▏ | 2192/10000 [8:35:54<30:12:13, 13.93s/it] 22%|██▏ | 2193/10000 [8:36:08<30:18:22, 13.97s/it] {'loss': 0.3577, 'learning_rate': 3.9055000000000005e-05, 'epoch': 2.87} 22%|██▏ | 2193/10000 [8:36:08<30:18:22, 13.97s/it] 22%|██▏ | 2194/10000 [8:36:22<30:17:21, 13.97s/it] {'loss': 0.4078, 'learning_rate': 3.905e-05, 'epoch': 2.87} 22%|██▏ | 2194/10000 [8:36:22<30:17:21, 13.97s/it] 22%|██▏ | 2195/10000 [8:36:36<30:12:33, 13.93s/it] {'loss': 0.4691, 'learning_rate': 3.9045e-05, 'epoch': 2.87} 22%|██▏ | 2195/10000 [8:36:36<30:12:33, 13.93s/it] 22%|██▏ | 2196/10000 [8:36:50<30:13:39, 13.94s/it] {'loss': 0.4545, 'learning_rate': 3.9040000000000006e-05, 'epoch': 2.87} 22%|██▏ | 2196/10000 [8:36:50<30:13:39, 13.94s/it] 22%|██▏ | 2197/10000 [8:37:04<30:13:26, 13.94s/it] {'loss': 0.4028, 'learning_rate': 3.9035e-05, 'epoch': 2.88} 22%|██▏ | 2197/10000 [8:37:04<30:13:26, 13.94s/it] 22%|██▏ | 2198/10000 [8:37:18<30:05:04, 13.88s/it] {'loss': 0.3825, 'learning_rate': 3.903e-05, 'epoch': 2.88} 22%|██▏ | 2198/10000 [8:37:18<30:05:04, 13.88s/it] 22%|██▏ | 2199/10000 [8:37:32<30:00:33, 13.85s/it] {'loss': 0.3502, 'learning_rate': 3.9025e-05, 'epoch': 2.88} 22%|██▏ | 2199/10000 [8:37:32<30:00:33, 13.85s/it] 22%|██▏ | 2200/10000 [8:37:46<30:02:20, 13.86s/it] {'loss': 0.3968, 'learning_rate': 3.902e-05, 'epoch': 2.88} 22%|██▏ | 2200/10000 [8:37:46<30:02:20, 13.86s/it] 22%|██▏ | 2201/10000 [8:38:00<30:06:01, 13.89s/it] {'loss': 0.3001, 'learning_rate': 3.9015e-05, 'epoch': 2.88} 22%|██▏ | 2201/10000 [8:38:00<30:06:01, 13.89s/it] 22%|██▏ | 2202/10000 [8:38:13<30:02:53, 13.87s/it] {'loss': 0.2937, 'learning_rate': 3.901e-05, 'epoch': 2.88} 22%|██▏ | 2202/10000 [8:38:13<30:02:53, 13.87s/it] 22%|██▏ | 2203/10000 [8:38:27<30:03:56, 13.88s/it] {'loss': 0.3334, 'learning_rate': 3.9005000000000003e-05, 'epoch': 2.88} 22%|██▏ | 2203/10000 [8:38:27<30:03:56, 13.88s/it] 22%|██▏ | 2204/10000 [8:38:41<30:04:07, 13.88s/it] {'loss': 0.478, 'learning_rate': 3.9000000000000006e-05, 'epoch': 2.88} 22%|██▏ | 2204/10000 [8:38:41<30:04:07, 13.88s/it] 22%|██▏ | 2205/10000 [8:38:55<29:59:06, 13.85s/it] {'loss': 0.2333, 'learning_rate': 3.8995e-05, 'epoch': 2.89} 22%|██▏ | 2205/10000 [8:38:55<29:59:06, 13.85s/it] 22%|██▏ | 2206/10000 [8:39:09<30:02:32, 13.88s/it] {'loss': 0.3829, 'learning_rate': 3.8990000000000004e-05, 'epoch': 2.89} 22%|██▏ | 2206/10000 [8:39:09<30:02:32, 13.88s/it] 22%|██▏ | 2207/10000 [8:39:23<30:10:28, 13.94s/it] {'loss': 0.3906, 'learning_rate': 3.8985e-05, 'epoch': 2.89} 22%|██▏ | 2207/10000 [8:39:23<30:10:28, 13.94s/it] 22%|██▏ | 2208/10000 [8:39:37<30:14:09, 13.97s/it] {'loss': 0.3699, 'learning_rate': 3.898e-05, 'epoch': 2.89} 22%|██▏ | 2208/10000 [8:39:37<30:14:09, 13.97s/it] 22%|██▏ | 2209/10000 [8:39:51<30:07:41, 13.92s/it] {'loss': 0.3615, 'learning_rate': 3.8975e-05, 'epoch': 2.89} 22%|██▏ | 2209/10000 [8:39:51<30:07:41, 13.92s/it] 22%|██▏ | 2210/10000 [8:40:05<30:02:58, 13.89s/it] {'loss': 0.3205, 'learning_rate': 3.897e-05, 'epoch': 2.89} 22%|██▏ | 2210/10000 [8:40:05<30:02:58, 13.89s/it] 22%|██▏ | 2211/10000 [8:40:18<30:02:34, 13.89s/it] {'loss': 0.3541, 'learning_rate': 3.8965000000000004e-05, 'epoch': 2.89} 22%|██▏ | 2211/10000 [8:40:18<30:02:34, 13.89s/it] 22%|██▏ | 2212/10000 [8:40:32<29:59:41, 13.87s/it] {'loss': 0.3946, 'learning_rate': 3.896e-05, 'epoch': 2.9} 22%|██▏ | 2212/10000 [8:40:32<29:59:41, 13.87s/it] 22%|██▏ | 2213/10000 [8:40:46<30:03:33, 13.90s/it] {'loss': 0.4003, 'learning_rate': 3.8955e-05, 'epoch': 2.9} 22%|██▏ | 2213/10000 [8:40:46<30:03:33, 13.90s/it] 22%|██▏ | 2214/10000 [8:41:00<30:00:16, 13.87s/it] {'loss': 0.2914, 'learning_rate': 3.8950000000000005e-05, 'epoch': 2.9} 22%|██▏ | 2214/10000 [8:41:00<30:00:16, 13.87s/it] 22%|██▏ | 2215/10000 [8:41:14<30:08:16, 13.94s/it] {'loss': 0.4146, 'learning_rate': 3.8945e-05, 'epoch': 2.9} 22%|██▏ | 2215/10000 [8:41:14<30:08:16, 13.94s/it] 22%|██▏ | 2216/10000 [8:41:28<30:10:26, 13.96s/it] {'loss': 0.3312, 'learning_rate': 3.894e-05, 'epoch': 2.9} 22%|██▏ | 2216/10000 [8:41:28<30:10:26, 13.96s/it] 22%|██▏ | 2217/10000 [8:41:42<30:12:08, 13.97s/it] {'loss': 0.3657, 'learning_rate': 3.8935e-05, 'epoch': 2.9} 22%|██▏ | 2217/10000 [8:41:42<30:12:08, 13.97s/it] 22%|██▏ | 2218/10000 [8:41:56<30:09:34, 13.95s/it] {'loss': 0.3677, 'learning_rate': 3.893e-05, 'epoch': 2.9} 22%|██▏ | 2218/10000 [8:41:56<30:09:34, 13.95s/it] 22%|██▏ | 2219/10000 [8:42:10<30:13:10, 13.98s/it] {'loss': 0.3457, 'learning_rate': 3.8925e-05, 'epoch': 2.9} 22%|██▏ | 2219/10000 [8:42:10<30:13:10, 13.98s/it] 22%|██▏ | 2220/10000 [8:42:24<30:09:13, 13.95s/it] {'loss': 0.3544, 'learning_rate': 3.892e-05, 'epoch': 2.91} 22%|██▏ | 2220/10000 [8:42:24<30:09:13, 13.95s/it] 22%|██▏ | 2221/10000 [8:42:38<30:03:23, 13.91s/it] {'loss': 0.3513, 'learning_rate': 3.8915e-05, 'epoch': 2.91} 22%|██▏ | 2221/10000 [8:42:38<30:03:23, 13.91s/it] 22%|██▏ | 2222/10000 [8:42:52<30:07:01, 13.94s/it] {'loss': 0.3333, 'learning_rate': 3.8910000000000005e-05, 'epoch': 2.91} 22%|██▏ | 2222/10000 [8:42:52<30:07:01, 13.94s/it] 22%|██▏ | 2223/10000 [8:43:06<30:06:38, 13.94s/it] {'loss': 0.3839, 'learning_rate': 3.8905e-05, 'epoch': 2.91} 22%|██▏ | 2223/10000 [8:43:06<30:06:38, 13.94s/it] 22%|██▏ | 2224/10000 [8:43:20<30:06:00, 13.94s/it] {'loss': 0.3348, 'learning_rate': 3.8900000000000004e-05, 'epoch': 2.91} 22%|██▏ | 2224/10000 [8:43:20<30:06:00, 13.94s/it] 22%|██▏ | 2225/10000 [8:43:34<30:01:54, 13.91s/it] {'loss': 0.4732, 'learning_rate': 3.8895000000000006e-05, 'epoch': 2.91} 22%|██▏ | 2225/10000 [8:43:34<30:01:54, 13.91s/it] 22%|██▏ | 2226/10000 [8:43:47<30:03:23, 13.92s/it] {'loss': 0.2938, 'learning_rate': 3.889e-05, 'epoch': 2.91} 22%|██▏ | 2226/10000 [8:43:48<30:03:23, 13.92s/it] 22%|██▏ | 2227/10000 [8:44:01<30:01:11, 13.90s/it] {'loss': 0.3608, 'learning_rate': 3.8885e-05, 'epoch': 2.91} 22%|██▏ | 2227/10000 [8:44:01<30:01:11, 13.90s/it] 22%|██▏ | 2228/10000 [8:44:15<30:04:40, 13.93s/it] {'loss': 0.3225, 'learning_rate': 3.888e-05, 'epoch': 2.92} 22%|██▏ | 2228/10000 [8:44:15<30:04:40, 13.93s/it] 22%|██▏ | 2229/10000 [8:44:29<30:04:25, 13.93s/it] {'loss': 0.3497, 'learning_rate': 3.8875e-05, 'epoch': 2.92} 22%|██▏ | 2229/10000 [8:44:29<30:04:25, 13.93s/it] 22%|██▏ | 2230/10000 [8:44:43<30:03:12, 13.92s/it] {'loss': 0.3939, 'learning_rate': 3.887e-05, 'epoch': 2.92} 22%|██▏ | 2230/10000 [8:44:43<30:03:12, 13.92s/it] 22%|██▏ | 2231/10000 [8:44:57<30:02:48, 13.92s/it] {'loss': 0.3885, 'learning_rate': 3.8865e-05, 'epoch': 2.92} 22%|██▏ | 2231/10000 [8:44:57<30:02:48, 13.92s/it] 22%|██▏ | 2232/10000 [8:45:11<29:56:46, 13.88s/it] {'loss': 0.5581, 'learning_rate': 3.8860000000000004e-05, 'epoch': 2.92} 22%|██▏ | 2232/10000 [8:45:11<29:56:46, 13.88s/it] 22%|██▏ | 2233/10000 [8:45:25<30:00:39, 13.91s/it] {'loss': 0.3681, 'learning_rate': 3.8855e-05, 'epoch': 2.92} 22%|██▏ | 2233/10000 [8:45:25<30:00:39, 13.91s/it] 22%|██▏ | 2234/10000 [8:45:39<30:00:56, 13.91s/it] {'loss': 0.3383, 'learning_rate': 3.885e-05, 'epoch': 2.92} 22%|██▏ | 2234/10000 [8:45:39<30:00:56, 13.91s/it] 22%|██▏ | 2235/10000 [8:45:53<29:59:32, 13.91s/it] {'loss': 0.3911, 'learning_rate': 3.8845000000000005e-05, 'epoch': 2.93} 22%|██▏ | 2235/10000 [8:45:53<29:59:32, 13.91s/it] 22%|██▏ | 2236/10000 [8:46:07<29:57:33, 13.89s/it] {'loss': 0.3524, 'learning_rate': 3.884e-05, 'epoch': 2.93} 22%|██▏ | 2236/10000 [8:46:07<29:57:33, 13.89s/it] 22%|██▏ | 2237/10000 [8:46:20<29:57:55, 13.90s/it] {'loss': 0.3077, 'learning_rate': 3.8835e-05, 'epoch': 2.93} 22%|██▏ | 2237/10000 [8:46:20<29:57:55, 13.90s/it] 22%|██▏ | 2238/10000 [8:46:34<29:57:13, 13.89s/it] {'loss': 0.3071, 'learning_rate': 3.883e-05, 'epoch': 2.93} 22%|██▏ | 2238/10000 [8:46:34<29:57:13, 13.89s/it] 22%|██▏ | 2239/10000 [8:46:48<29:57:28, 13.90s/it] {'loss': 0.3446, 'learning_rate': 3.8825e-05, 'epoch': 2.93} 22%|██▏ | 2239/10000 [8:46:48<29:57:28, 13.90s/it] 22%|██▏ | 2240/10000 [8:47:02<29:54:08, 13.87s/it] {'loss': 0.551, 'learning_rate': 3.882e-05, 'epoch': 2.93} 22%|██▏ | 2240/10000 [8:47:02<29:54:08, 13.87s/it] 22%|██▏ | 2241/10000 [8:47:16<29:55:08, 13.88s/it] {'loss': 0.3325, 'learning_rate': 3.8815e-05, 'epoch': 2.93} 22%|██▏ | 2241/10000 [8:47:16<29:55:08, 13.88s/it] 22%|██▏ | 2242/10000 [8:47:30<29:53:08, 13.87s/it] {'loss': 0.375, 'learning_rate': 3.881e-05, 'epoch': 2.93} 22%|██▏ | 2242/10000 [8:47:30<29:53:08, 13.87s/it] 22%|██▏ | 2243/10000 [8:47:44<29:58:01, 13.91s/it] {'loss': 0.3296, 'learning_rate': 3.8805000000000005e-05, 'epoch': 2.94} 22%|██▏ | 2243/10000 [8:47:44<29:58:01, 13.91s/it] 22%|██▏ | 2244/10000 [8:47:58<29:59:40, 13.92s/it] {'loss': 0.3733, 'learning_rate': 3.88e-05, 'epoch': 2.94} 22%|██▏ | 2244/10000 [8:47:58<29:59:40, 13.92s/it] 22%|██▏ | 2245/10000 [8:48:12<29:59:42, 13.92s/it] {'loss': 0.3502, 'learning_rate': 3.8795000000000004e-05, 'epoch': 2.94} 22%|██▏ | 2245/10000 [8:48:12<29:59:42, 13.92s/it] 22%|██▏ | 2246/10000 [8:48:26<30:00:09, 13.93s/it] {'loss': 0.3342, 'learning_rate': 3.8790000000000006e-05, 'epoch': 2.94} 22%|██▏ | 2246/10000 [8:48:26<30:00:09, 13.93s/it] 22%|██▏ | 2247/10000 [8:48:40<30:01:17, 13.94s/it] {'loss': 0.3906, 'learning_rate': 3.8785e-05, 'epoch': 2.94} 22%|██▏ | 2247/10000 [8:48:40<30:01:17, 13.94s/it] 22%|██▏ | 2248/10000 [8:48:54<30:01:18, 13.94s/it] {'loss': 0.3529, 'learning_rate': 3.878e-05, 'epoch': 2.94} 22%|██▏ | 2248/10000 [8:48:54<30:01:18, 13.94s/it] 22%|██▏ | 2249/10000 [8:49:07<30:02:41, 13.95s/it] {'loss': 0.3154, 'learning_rate': 3.8775e-05, 'epoch': 2.94} 22%|██▏ | 2249/10000 [8:49:08<30:02:41, 13.95s/it] 22%|██▎ | 2250/10000 [8:49:21<29:57:29, 13.92s/it] {'loss': 0.3415, 'learning_rate': 3.877e-05, 'epoch': 2.95} 22%|██▎ | 2250/10000 [8:49:21<29:57:29, 13.92s/it] 23%|██▎ | 2251/10000 [8:49:35<29:59:05, 13.93s/it] {'loss': 0.3821, 'learning_rate': 3.8765e-05, 'epoch': 2.95} 23%|██▎ | 2251/10000 [8:49:35<29:59:05, 13.93s/it] 23%|██▎ | 2252/10000 [8:49:49<29:57:04, 13.92s/it] {'loss': 0.3084, 'learning_rate': 3.876e-05, 'epoch': 2.95} 23%|██▎ | 2252/10000 [8:49:49<29:57:04, 13.92s/it] 23%|██▎ | 2253/10000 [8:50:03<29:56:41, 13.92s/it] {'loss': 0.3834, 'learning_rate': 3.8755000000000004e-05, 'epoch': 2.95} 23%|██▎ | 2253/10000 [8:50:03<29:56:41, 13.92s/it] 23%|██▎ | 2254/10000 [8:50:17<29:57:42, 13.92s/it] {'loss': 0.2939, 'learning_rate': 3.875e-05, 'epoch': 2.95} 23%|██▎ | 2254/10000 [8:50:17<29:57:42, 13.92s/it] 23%|██▎ | 2255/10000 [8:50:31<29:55:17, 13.91s/it] {'loss': 0.4187, 'learning_rate': 3.8745e-05, 'epoch': 2.95} 23%|██▎ | 2255/10000 [8:50:31<29:55:17, 13.91s/it] 23%|██▎ | 2256/10000 [8:50:45<29:58:17, 13.93s/it] {'loss': 0.3591, 'learning_rate': 3.8740000000000005e-05, 'epoch': 2.95} 23%|██▎ | 2256/10000 [8:50:45<29:58:17, 13.93s/it] 23%|██▎ | 2257/10000 [8:50:59<29:58:10, 13.93s/it] {'loss': 0.33, 'learning_rate': 3.873500000000001e-05, 'epoch': 2.95} 23%|██▎ | 2257/10000 [8:50:59<29:58:10, 13.93s/it] 23%|██▎ | 2258/10000 [8:51:13<29:57:14, 13.93s/it] {'loss': 0.2997, 'learning_rate': 3.873e-05, 'epoch': 2.96} 23%|██▎ | 2258/10000 [8:51:13<29:57:14, 13.93s/it] 23%|██▎ | 2259/10000 [8:51:27<29:54:53, 13.91s/it] {'loss': 0.4219, 'learning_rate': 3.8725e-05, 'epoch': 2.96} 23%|██▎ | 2259/10000 [8:51:27<29:54:53, 13.91s/it] 23%|██▎ | 2260/10000 [8:51:41<29:53:53, 13.91s/it] {'loss': 0.3514, 'learning_rate': 3.872e-05, 'epoch': 2.96} 23%|██▎ | 2260/10000 [8:51:41<29:53:53, 13.91s/it] 23%|██▎ | 2261/10000 [8:51:55<30:03:22, 13.98s/it] {'loss': 0.3604, 'learning_rate': 3.8715000000000005e-05, 'epoch': 2.96} 23%|██▎ | 2261/10000 [8:51:55<30:03:22, 13.98s/it] 23%|██▎ | 2262/10000 [8:52:08<29:55:50, 13.92s/it] {'loss': 0.3819, 'learning_rate': 3.871e-05, 'epoch': 2.96} 23%|██▎ | 2262/10000 [8:52:09<29:55:50, 13.92s/it] 23%|██▎ | 2263/10000 [8:52:22<29:58:05, 13.94s/it] {'loss': 0.3438, 'learning_rate': 3.8705e-05, 'epoch': 2.96} 23%|██▎ | 2263/10000 [8:52:22<29:58:05, 13.94s/it] 23%|██▎ | 2264/10000 [8:52:36<29:56:34, 13.93s/it] {'loss': 0.3181, 'learning_rate': 3.8700000000000006e-05, 'epoch': 2.96} 23%|██▎ | 2264/10000 [8:52:36<29:56:34, 13.93s/it] 23%|██▎ | 2265/10000 [8:52:50<29:51:09, 13.89s/it] {'loss': 0.4213, 'learning_rate': 3.8695e-05, 'epoch': 2.96} 23%|██▎ | 2265/10000 [8:52:50<29:51:09, 13.89s/it] 23%|██▎ | 2266/10000 [8:53:04<29:53:46, 13.92s/it] {'loss': 0.4253, 'learning_rate': 3.8690000000000004e-05, 'epoch': 2.97} 23%|██▎ | 2266/10000 [8:53:04<29:53:46, 13.92s/it] 23%|██▎ | 2267/10000 [8:53:18<29:59:25, 13.96s/it] {'loss': 0.3352, 'learning_rate': 3.8685000000000007e-05, 'epoch': 2.97} 23%|██▎ | 2267/10000 [8:53:18<29:59:25, 13.96s/it] 23%|██▎ | 2268/10000 [8:53:32<30:02:20, 13.99s/it] {'loss': 0.3154, 'learning_rate': 3.868e-05, 'epoch': 2.97} 23%|██▎ | 2268/10000 [8:53:32<30:02:20, 13.99s/it] 23%|██▎ | 2269/10000 [8:53:46<30:01:22, 13.98s/it] {'loss': 0.3981, 'learning_rate': 3.8675e-05, 'epoch': 2.97} 23%|██▎ | 2269/10000 [8:53:46<30:01:22, 13.98s/it] 23%|██▎ | 2270/10000 [8:54:00<30:01:40, 13.98s/it] {'loss': 0.3172, 'learning_rate': 3.867e-05, 'epoch': 2.97} 23%|██▎ | 2270/10000 [8:54:00<30:01:40, 13.98s/it] 23%|██▎ | 2271/10000 [8:54:14<29:55:50, 13.94s/it] {'loss': 0.3335, 'learning_rate': 3.8665e-05, 'epoch': 2.97} 23%|██▎ | 2271/10000 [8:54:14<29:55:50, 13.94s/it] 23%|██▎ | 2272/10000 [8:54:28<29:56:12, 13.95s/it] {'loss': 0.3302, 'learning_rate': 3.866e-05, 'epoch': 2.97} 23%|██▎ | 2272/10000 [8:54:28<29:56:12, 13.95s/it] 23%|██▎ | 2273/10000 [8:54:42<29:59:45, 13.98s/it] {'loss': 0.3963, 'learning_rate': 3.8655e-05, 'epoch': 2.98} 23%|██▎ | 2273/10000 [8:54:42<29:59:45, 13.98s/it] 23%|██▎ | 2274/10000 [8:54:56<30:02:46, 14.00s/it] {'loss': 0.3598, 'learning_rate': 3.8650000000000004e-05, 'epoch': 2.98} 23%|██▎ | 2274/10000 [8:54:56<30:02:46, 14.00s/it] 23%|██▎ | 2275/10000 [8:55:10<30:00:24, 13.98s/it] {'loss': 0.3352, 'learning_rate': 3.8645e-05, 'epoch': 2.98} 23%|██▎ | 2275/10000 [8:55:10<30:00:24, 13.98s/it] 23%|██▎ | 2276/10000 [8:55:24<29:57:23, 13.96s/it] {'loss': 0.4012, 'learning_rate': 3.864e-05, 'epoch': 2.98} 23%|██▎ | 2276/10000 [8:55:24<29:57:23, 13.96s/it] 23%|██▎ | 2277/10000 [8:55:38<29:58:10, 13.97s/it] {'loss': 0.3249, 'learning_rate': 3.8635000000000005e-05, 'epoch': 2.98} 23%|██▎ | 2277/10000 [8:55:38<29:58:10, 13.97s/it] 23%|██▎ | 2278/10000 [8:55:52<29:54:37, 13.94s/it] {'loss': 0.3611, 'learning_rate': 3.863e-05, 'epoch': 2.98} 23%|██▎ | 2278/10000 [8:55:52<29:54:37, 13.94s/it] 23%|██▎ | 2279/10000 [8:56:06<29:59:17, 13.98s/it] {'loss': 0.4381, 'learning_rate': 3.8625e-05, 'epoch': 2.98} 23%|██▎ | 2279/10000 [8:56:06<29:59:17, 13.98s/it] 23%|██▎ | 2280/10000 [8:56:20<29:58:58, 13.98s/it] {'loss': 0.3648, 'learning_rate': 3.862e-05, 'epoch': 2.98} 23%|██▎ | 2280/10000 [8:56:20<29:58:58, 13.98s/it] 23%|██▎ | 2281/10000 [8:56:34<29:59:25, 13.99s/it] {'loss': 0.3679, 'learning_rate': 3.8615e-05, 'epoch': 2.99} 23%|██▎ | 2281/10000 [8:56:34<29:59:25, 13.99s/it] 23%|██▎ | 2282/10000 [8:56:48<29:56:56, 13.97s/it] {'loss': 0.3302, 'learning_rate': 3.8610000000000005e-05, 'epoch': 2.99} 23%|██▎ | 2282/10000 [8:56:48<29:56:56, 13.97s/it] 23%|██▎ | 2283/10000 [8:57:02<29:55:53, 13.96s/it] {'loss': 0.3491, 'learning_rate': 3.8605e-05, 'epoch': 2.99} 23%|██▎ | 2283/10000 [8:57:02<29:55:53, 13.96s/it] 23%|██▎ | 2284/10000 [8:57:16<29:58:34, 13.99s/it] {'loss': 0.2951, 'learning_rate': 3.86e-05, 'epoch': 2.99} 23%|██▎ | 2284/10000 [8:57:16<29:58:34, 13.99s/it] 23%|██▎ | 2285/10000 [8:57:30<29:57:41, 13.98s/it] {'loss': 0.3828, 'learning_rate': 3.8595000000000006e-05, 'epoch': 2.99} 23%|██▎ | 2285/10000 [8:57:30<29:57:41, 13.98s/it] 23%|██▎ | 2286/10000 [8:57:44<29:51:52, 13.94s/it] {'loss': 0.3982, 'learning_rate': 3.859e-05, 'epoch': 2.99} 23%|██▎ | 2286/10000 [8:57:44<29:51:52, 13.94s/it] 23%|██▎ | 2287/10000 [8:57:58<29:51:24, 13.94s/it] {'loss': 0.3322, 'learning_rate': 3.8585000000000004e-05, 'epoch': 2.99} 23%|██▎ | 2287/10000 [8:57:58<29:51:24, 13.94s/it] 23%|██▎ | 2288/10000 [8:58:11<29:50:29, 13.93s/it] {'loss': 0.4381, 'learning_rate': 3.858e-05, 'epoch': 2.99} 23%|██▎ | 2288/10000 [8:58:11<29:50:29, 13.93s/it] 23%|██▎ | 2289/10000 [8:58:25<29:54:24, 13.96s/it] {'loss': 0.3841, 'learning_rate': 3.8575e-05, 'epoch': 3.0} 23%|██▎ | 2289/10000 [8:58:26<29:54:24, 13.96s/it] 23%|██▎ | 2290/10000 [8:58:40<29:59:03, 14.00s/it] {'loss': 0.3365, 'learning_rate': 3.857e-05, 'epoch': 3.0} 23%|██▎ | 2290/10000 [8:58:40<29:59:03, 14.00s/it] 23%|██▎ | 2291/10000 [8:58:53<29:52:38, 13.95s/it] {'loss': 0.4769, 'learning_rate': 3.8565e-05, 'epoch': 3.0} 23%|██▎ | 2291/10000 [8:58:53<29:52:38, 13.95s/it] 23%|██▎ | 2292/10000 [8:59:06<29:03:50, 13.57s/it] {'loss': 0.332, 'learning_rate': 3.8560000000000004e-05, 'epoch': 3.0} 23%|██▎ | 2292/10000 [8:59:06<29:03:50, 13.57s/it] 23%|██▎ | 2293/10000 [8:59:20<29:12:20, 13.64s/it] {'loss': 0.1694, 'learning_rate': 3.8555e-05, 'epoch': 3.0} 23%|██▎ | 2293/10000 [8:59:20<29:12:20, 13.64s/it] 23%|██▎ | 2294/10000 [8:59:34<29:20:28, 13.71s/it] {'loss': 0.1913, 'learning_rate': 3.855e-05, 'epoch': 3.0} 23%|██▎ | 2294/10000 [8:59:34<29:20:28, 13.71s/it] 23%|██▎ | 2295/10000 [8:59:48<29:30:19, 13.79s/it] {'loss': 0.1581, 'learning_rate': 3.8545000000000004e-05, 'epoch': 3.0} 23%|██▎ | 2295/10000 [8:59:48<29:30:19, 13.79s/it] 23%|██▎ | 2296/10000 [9:00:02<29:35:40, 13.83s/it] {'loss': 0.1788, 'learning_rate': 3.854000000000001e-05, 'epoch': 3.01} 23%|██▎ | 2296/10000 [9:00:02<29:35:40, 13.83s/it] 23%|██▎ | 2297/10000 [9:00:16<29:39:43, 13.86s/it] {'loss': 0.1341, 'learning_rate': 3.8535e-05, 'epoch': 3.01} 23%|██▎ | 2297/10000 [9:00:16<29:39:43, 13.86s/it] 23%|██▎ | 2298/10000 [9:00:30<29:41:06, 13.88s/it] {'loss': 0.126, 'learning_rate': 3.853e-05, 'epoch': 3.01} 23%|██▎ | 2298/10000 [9:00:30<29:41:06, 13.88s/it] 23%|██▎ | 2299/10000 [9:00:43<29:44:37, 13.90s/it] {'loss': 0.1357, 'learning_rate': 3.8525e-05, 'epoch': 3.01} 23%|██▎ | 2299/10000 [9:00:44<29:44:37, 13.90s/it] 23%|██▎ | 2300/10000 [9:00:57<29:47:43, 13.93s/it] {'loss': 0.1671, 'learning_rate': 3.8520000000000004e-05, 'epoch': 3.01} 23%|██▎ | 2300/10000 [9:00:58<29:47:43, 13.93s/it] 23%|██▎ | 2301/10000 [9:01:11<29:47:22, 13.93s/it] {'loss': 0.1385, 'learning_rate': 3.8515e-05, 'epoch': 3.01} 23%|██▎ | 2301/10000 [9:01:11<29:47:22, 13.93s/it] 23%|██▎ | 2302/10000 [9:01:25<29:49:35, 13.95s/it] {'loss': 0.1836, 'learning_rate': 3.851e-05, 'epoch': 3.01} 23%|██▎ | 2302/10000 [9:01:25<29:49:35, 13.95s/it] 23%|██▎ | 2303/10000 [9:01:39<29:48:15, 13.94s/it] {'loss': 0.1718, 'learning_rate': 3.8505000000000005e-05, 'epoch': 3.01} 23%|██▎ | 2303/10000 [9:01:39<29:48:15, 13.94s/it] 23%|██▎ | 2304/10000 [9:01:53<29:47:23, 13.93s/it] {'loss': 0.148, 'learning_rate': 3.85e-05, 'epoch': 3.02} 23%|██▎ | 2304/10000 [9:01:53<29:47:23, 13.93s/it] 23%|██▎ | 2305/10000 [9:02:07<29:49:31, 13.95s/it] {'loss': 0.1634, 'learning_rate': 3.8495e-05, 'epoch': 3.02} 23%|██▎ | 2305/10000 [9:02:07<29:49:31, 13.95s/it] 23%|██▎ | 2306/10000 [9:02:21<29:55:42, 14.00s/it] {'loss': 0.149, 'learning_rate': 3.8490000000000006e-05, 'epoch': 3.02} 23%|██▎ | 2306/10000 [9:02:21<29:55:42, 14.00s/it] 23%|██▎ | 2307/10000 [9:02:35<29:47:11, 13.94s/it] {'loss': 0.1442, 'learning_rate': 3.8485e-05, 'epoch': 3.02} 23%|██▎ | 2307/10000 [9:02:35<29:47:11, 13.94s/it] 23%|██▎ | 2308/10000 [9:02:49<29:52:01, 13.98s/it] {'loss': 0.1526, 'learning_rate': 3.848e-05, 'epoch': 3.02} 23%|██▎ | 2308/10000 [9:02:49<29:52:01, 13.98s/it] 23%|██▎ | 2309/10000 [9:03:03<29:48:11, 13.95s/it] {'loss': 0.1401, 'learning_rate': 3.8475e-05, 'epoch': 3.02} 23%|██▎ | 2309/10000 [9:03:03<29:48:11, 13.95s/it] 23%|██▎ | 2310/10000 [9:03:17<29:45:24, 13.93s/it] {'loss': 0.1137, 'learning_rate': 3.847e-05, 'epoch': 3.02} 23%|██▎ | 2310/10000 [9:03:17<29:45:24, 13.93s/it] 23%|██▎ | 2311/10000 [9:03:31<29:48:02, 13.95s/it] {'loss': 0.1519, 'learning_rate': 3.8465e-05, 'epoch': 3.02} 23%|██▎ | 2311/10000 [9:03:31<29:48:02, 13.95s/it] 23%|██▎ | 2312/10000 [9:03:45<29:51:09, 13.98s/it] {'loss': 0.1247, 'learning_rate': 3.846e-05, 'epoch': 3.03} 23%|██▎ | 2312/10000 [9:03:45<29:51:09, 13.98s/it] 23%|██▎ | 2313/10000 [9:03:59<29:52:30, 13.99s/it] {'loss': 0.1412, 'learning_rate': 3.8455000000000004e-05, 'epoch': 3.03} 23%|██▎ | 2313/10000 [9:03:59<29:52:30, 13.99s/it] 23%|██▎ | 2314/10000 [9:04:13<29:48:55, 13.97s/it] {'loss': 0.1283, 'learning_rate': 3.845e-05, 'epoch': 3.03} 23%|██▎ | 2314/10000 [9:04:13<29:48:55, 13.97s/it] 23%|██▎ | 2315/10000 [9:04:27<29:47:24, 13.95s/it] {'loss': 0.1489, 'learning_rate': 3.8445e-05, 'epoch': 3.03} 23%|██▎ | 2315/10000 [9:04:27<29:47:24, 13.95s/it] 23%|██▎ | 2316/10000 [9:04:41<29:48:08, 13.96s/it] {'loss': 0.1191, 'learning_rate': 3.8440000000000005e-05, 'epoch': 3.03} 23%|██▎ | 2316/10000 [9:04:41<29:48:08, 13.96s/it] 23%|██▎ | 2317/10000 [9:04:55<29:48:34, 13.97s/it] {'loss': 0.1726, 'learning_rate': 3.843500000000001e-05, 'epoch': 3.03} 23%|██▎ | 2317/10000 [9:04:55<29:48:34, 13.97s/it] 23%|██▎ | 2318/10000 [9:05:09<29:54:20, 14.01s/it] {'loss': 0.1837, 'learning_rate': 3.8429999999999996e-05, 'epoch': 3.03} 23%|██▎ | 2318/10000 [9:05:09<29:54:20, 14.01s/it] 23%|██▎ | 2319/10000 [9:05:23<29:54:46, 14.02s/it] {'loss': 0.1449, 'learning_rate': 3.8425e-05, 'epoch': 3.04} 23%|██▎ | 2319/10000 [9:05:23<29:54:46, 14.02s/it] 23%|██▎ | 2320/10000 [9:05:37<29:48:35, 13.97s/it] {'loss': 0.1189, 'learning_rate': 3.842e-05, 'epoch': 3.04} 23%|██▎ | 2320/10000 [9:05:37<29:48:35, 13.97s/it] 23%|██▎ | 2321/10000 [9:05:51<29:46:40, 13.96s/it] {'loss': 0.1732, 'learning_rate': 3.8415000000000004e-05, 'epoch': 3.04} 23%|██▎ | 2321/10000 [9:05:51<29:46:40, 13.96s/it] 23%|██▎ | 2322/10000 [9:06:05<29:46:30, 13.96s/it] {'loss': 0.154, 'learning_rate': 3.841e-05, 'epoch': 3.04} 23%|██▎ | 2322/10000 [9:06:05<29:46:30, 13.96s/it] 23%|██▎ | 2323/10000 [9:06:19<29:40:37, 13.92s/it] {'loss': 0.1506, 'learning_rate': 3.8405e-05, 'epoch': 3.04} 23%|██▎ | 2323/10000 [9:06:19<29:40:37, 13.92s/it] 23%|██▎ | 2324/10000 [9:06:33<29:43:50, 13.94s/it] {'loss': 0.1587, 'learning_rate': 3.8400000000000005e-05, 'epoch': 3.04} 23%|██▎ | 2324/10000 [9:06:33<29:43:50, 13.94s/it] 23%|██▎ | 2325/10000 [9:06:46<29:40:42, 13.92s/it] {'loss': 0.1511, 'learning_rate': 3.8395e-05, 'epoch': 3.04} 23%|██▎ | 2325/10000 [9:06:46<29:40:42, 13.92s/it] 23%|██▎ | 2326/10000 [9:07:00<29:44:16, 13.95s/it] {'loss': 0.139, 'learning_rate': 3.8390000000000003e-05, 'epoch': 3.04} 23%|██▎ | 2326/10000 [9:07:00<29:44:16, 13.95s/it] 23%|██▎ | 2327/10000 [9:07:14<29:40:56, 13.93s/it] {'loss': 0.1291, 'learning_rate': 3.8385000000000006e-05, 'epoch': 3.05} 23%|██▎ | 2327/10000 [9:07:14<29:40:56, 13.93s/it] 23%|██▎ | 2328/10000 [9:07:28<29:39:20, 13.92s/it] {'loss': 0.157, 'learning_rate': 3.838e-05, 'epoch': 3.05} 23%|██▎ | 2328/10000 [9:07:28<29:39:20, 13.92s/it] 23%|██▎ | 2329/10000 [9:07:42<29:46:10, 13.97s/it] {'loss': 0.1223, 'learning_rate': 3.8375e-05, 'epoch': 3.05} 23%|██▎ | 2329/10000 [9:07:42<29:46:10, 13.97s/it] 23%|██▎ | 2330/10000 [9:07:56<29:48:35, 13.99s/it] {'loss': 0.1402, 'learning_rate': 3.837e-05, 'epoch': 3.05} 23%|██▎ | 2330/10000 [9:07:56<29:48:35, 13.99s/it] 23%|██▎ | 2331/10000 [9:08:10<29:53:54, 14.04s/it] {'loss': 0.1695, 'learning_rate': 3.8365e-05, 'epoch': 3.05} 23%|██▎ | 2331/10000 [9:08:11<29:53:54, 14.04s/it] 23%|██▎ | 2332/10000 [9:08:24<29:51:14, 14.02s/it] {'loss': 0.1395, 'learning_rate': 3.836e-05, 'epoch': 3.05} 23%|██▎ | 2332/10000 [9:08:25<29:51:14, 14.02s/it] 23%|██▎ | 2333/10000 [9:08:38<29:47:38, 13.99s/it] {'loss': 0.1465, 'learning_rate': 3.8355e-05, 'epoch': 3.05} 23%|██▎ | 2333/10000 [9:08:38<29:47:38, 13.99s/it] 23%|██▎ | 2334/10000 [9:08:52<29:44:44, 13.97s/it] {'loss': 0.1703, 'learning_rate': 3.8350000000000004e-05, 'epoch': 3.05} 23%|██▎ | 2334/10000 [9:08:52<29:44:44, 13.97s/it] 23%|██▎ | 2335/10000 [9:09:06<29:39:31, 13.93s/it] {'loss': 0.1352, 'learning_rate': 3.8345000000000006e-05, 'epoch': 3.06} 23%|██▎ | 2335/10000 [9:09:06<29:39:31, 13.93s/it] 23%|██▎ | 2336/10000 [9:09:20<29:40:03, 13.94s/it] {'loss': 0.1411, 'learning_rate': 3.834e-05, 'epoch': 3.06} 23%|██▎ | 2336/10000 [9:09:20<29:40:03, 13.94s/it] 23%|██▎ | 2337/10000 [9:09:34<29:40:16, 13.94s/it] {'loss': 0.2126, 'learning_rate': 3.8335000000000005e-05, 'epoch': 3.06} 23%|██▎ | 2337/10000 [9:09:34<29:40:16, 13.94s/it] 23%|██▎ | 2338/10000 [9:09:48<29:40:05, 13.94s/it] {'loss': 0.1434, 'learning_rate': 3.833e-05, 'epoch': 3.06} 23%|██▎ | 2338/10000 [9:09:48<29:40:05, 13.94s/it] 23%|██▎ | 2339/10000 [9:10:02<29:35:06, 13.90s/it] {'loss': 0.1473, 'learning_rate': 3.8324999999999996e-05, 'epoch': 3.06} 23%|██▎ | 2339/10000 [9:10:02<29:35:06, 13.90s/it] 23%|██▎ | 2340/10000 [9:10:16<29:37:50, 13.93s/it] {'loss': 0.1447, 'learning_rate': 3.832e-05, 'epoch': 3.06} 23%|██▎ | 2340/10000 [9:10:16<29:37:50, 13.93s/it] 23%|██▎ | 2341/10000 [9:10:30<29:35:48, 13.91s/it] {'loss': 0.1425, 'learning_rate': 3.8315e-05, 'epoch': 3.06} 23%|██▎ | 2341/10000 [9:10:30<29:35:48, 13.91s/it] 23%|██▎ | 2342/10000 [9:10:44<29:39:19, 13.94s/it] {'loss': 0.1535, 'learning_rate': 3.8310000000000004e-05, 'epoch': 3.07} 23%|██▎ | 2342/10000 [9:10:44<29:39:19, 13.94s/it] 23%|██▎ | 2343/10000 [9:10:58<29:37:04, 13.93s/it] {'loss': 0.1271, 'learning_rate': 3.8305e-05, 'epoch': 3.07} 23%|██▎ | 2343/10000 [9:10:58<29:37:04, 13.93s/it] 23%|██▎ | 2344/10000 [9:11:11<29:31:35, 13.88s/it] {'loss': 0.1623, 'learning_rate': 3.83e-05, 'epoch': 3.07} 23%|██▎ | 2344/10000 [9:11:11<29:31:35, 13.88s/it] 23%|██▎ | 2345/10000 [9:11:25<29:31:52, 13.89s/it] {'loss': 0.1424, 'learning_rate': 3.8295000000000005e-05, 'epoch': 3.07} 23%|██▎ | 2345/10000 [9:11:25<29:31:52, 13.89s/it] 23%|██▎ | 2346/10000 [9:11:39<29:35:16, 13.92s/it] {'loss': 0.1472, 'learning_rate': 3.829e-05, 'epoch': 3.07} 23%|██▎ | 2346/10000 [9:11:39<29:35:16, 13.92s/it] 23%|██▎ | 2347/10000 [9:11:53<29:34:33, 13.91s/it] {'loss': 0.1603, 'learning_rate': 3.8285000000000004e-05, 'epoch': 3.07} 23%|██▎ | 2347/10000 [9:11:53<29:34:33, 13.91s/it] 23%|██▎ | 2348/10000 [9:12:07<29:37:12, 13.94s/it] {'loss': 0.1878, 'learning_rate': 3.828e-05, 'epoch': 3.07} 23%|██▎ | 2348/10000 [9:12:07<29:37:12, 13.94s/it] 23%|██▎ | 2349/10000 [9:12:21<29:39:26, 13.95s/it] {'loss': 0.1335, 'learning_rate': 3.8275e-05, 'epoch': 3.07} 23%|██▎ | 2349/10000 [9:12:21<29:39:26, 13.95s/it] 24%|██▎ | 2350/10000 [9:12:35<29:38:48, 13.95s/it] {'loss': 0.1569, 'learning_rate': 3.827e-05, 'epoch': 3.08} 24%|██▎ | 2350/10000 [9:12:35<29:38:48, 13.95s/it] 24%|██▎ | 2351/10000 [9:12:49<29:39:42, 13.96s/it] {'loss': 0.1588, 'learning_rate': 3.8265e-05, 'epoch': 3.08} 24%|██▎ | 2351/10000 [9:12:49<29:39:42, 13.96s/it] 24%|██▎ | 2352/10000 [9:13:03<29:35:00, 13.93s/it] {'loss': 0.1511, 'learning_rate': 3.826e-05, 'epoch': 3.08} 24%|██▎ | 2352/10000 [9:13:03<29:35:00, 13.93s/it] 24%|██▎ | 2353/10000 [9:13:17<29:41:10, 13.98s/it] {'loss': 0.1501, 'learning_rate': 3.8255e-05, 'epoch': 3.08} 24%|██▎ | 2353/10000 [9:13:17<29:41:10, 13.98s/it] 24%|██▎ | 2354/10000 [9:13:31<29:39:26, 13.96s/it] {'loss': 0.1561, 'learning_rate': 3.825e-05, 'epoch': 3.08} 24%|██▎ | 2354/10000 [9:13:31<29:39:26, 13.96s/it] 24%|██▎ | 2355/10000 [9:13:45<29:34:54, 13.93s/it] {'loss': 0.1588, 'learning_rate': 3.8245000000000004e-05, 'epoch': 3.08} 24%|██▎ | 2355/10000 [9:13:45<29:34:54, 13.93s/it] 24%|██▎ | 2356/10000 [9:13:59<29:32:53, 13.92s/it] {'loss': 0.1494, 'learning_rate': 3.8240000000000007e-05, 'epoch': 3.08} 24%|██▎ | 2356/10000 [9:13:59<29:32:53, 13.92s/it] 24%|██▎ | 2357/10000 [9:14:13<29:31:15, 13.90s/it] {'loss': 0.1404, 'learning_rate': 3.8235e-05, 'epoch': 3.09} 24%|██▎ | 2357/10000 [9:14:13<29:31:15, 13.90s/it] 24%|██▎ | 2358/10000 [9:14:26<29:30:43, 13.90s/it] {'loss': 0.1528, 'learning_rate': 3.823e-05, 'epoch': 3.09} 24%|██▎ | 2358/10000 [9:14:26<29:30:43, 13.90s/it] 24%|██▎ | 2359/10000 [9:14:40<29:31:36, 13.91s/it] {'loss': 0.1407, 'learning_rate': 3.8225e-05, 'epoch': 3.09} 24%|██▎ | 2359/10000 [9:14:40<29:31:36, 13.91s/it] 24%|██▎ | 2360/10000 [9:14:54<29:30:56, 13.91s/it] {'loss': 0.1688, 'learning_rate': 3.822e-05, 'epoch': 3.09} 24%|██▎ | 2360/10000 [9:14:54<29:30:56, 13.91s/it] 24%|██▎ | 2361/10000 [9:15:08<29:31:27, 13.91s/it] {'loss': 0.1267, 'learning_rate': 3.8215e-05, 'epoch': 3.09} 24%|██▎ | 2361/10000 [9:15:08<29:31:27, 13.91s/it] 24%|██▎ | 2362/10000 [9:15:22<29:28:54, 13.90s/it] {'loss': 0.1055, 'learning_rate': 3.821e-05, 'epoch': 3.09} 24%|██▎ | 2362/10000 [9:15:22<29:28:54, 13.90s/it] 24%|██▎ | 2363/10000 [9:15:36<29:29:26, 13.90s/it] {'loss': 0.1278, 'learning_rate': 3.8205000000000004e-05, 'epoch': 3.09} 24%|██▎ | 2363/10000 [9:15:36<29:29:26, 13.90s/it] 24%|██▎ | 2364/10000 [9:15:50<29:29:02, 13.90s/it] {'loss': 0.1304, 'learning_rate': 3.82e-05, 'epoch': 3.09} 24%|██▎ | 2364/10000 [9:15:50<29:29:02, 13.90s/it] 24%|██▎ | 2365/10000 [9:16:04<29:32:11, 13.93s/it] {'loss': 0.1568, 'learning_rate': 3.8195e-05, 'epoch': 3.1} 24%|██▎ | 2365/10000 [9:16:04<29:32:11, 13.93s/it] 24%|██▎ | 2366/10000 [9:16:18<29:32:30, 13.93s/it] {'loss': 0.1596, 'learning_rate': 3.8190000000000005e-05, 'epoch': 3.1} 24%|██▎ | 2366/10000 [9:16:18<29:32:30, 13.93s/it] 24%|██▎ | 2367/10000 [9:16:32<29:34:05, 13.95s/it] {'loss': 0.1394, 'learning_rate': 3.8185e-05, 'epoch': 3.1} 24%|██▎ | 2367/10000 [9:16:32<29:34:05, 13.95s/it] 24%|██▎ | 2368/10000 [9:16:46<29:34:11, 13.95s/it] {'loss': 0.1414, 'learning_rate': 3.818e-05, 'epoch': 3.1} 24%|██▎ | 2368/10000 [9:16:46<29:34:11, 13.95s/it] 24%|██▎ | 2369/10000 [9:17:00<29:32:52, 13.94s/it] {'loss': 0.1654, 'learning_rate': 3.8175e-05, 'epoch': 3.1} 24%|██▎ | 2369/10000 [9:17:00<29:32:52, 13.94s/it] 24%|██▎ | 2370/10000 [9:17:14<29:36:01, 13.97s/it] {'loss': 0.1472, 'learning_rate': 3.817e-05, 'epoch': 3.1} 24%|██▎ | 2370/10000 [9:17:14<29:36:01, 13.97s/it] 24%|██▎ | 2371/10000 [9:17:28<29:33:15, 13.95s/it] {'loss': 0.1317, 'learning_rate': 3.8165e-05, 'epoch': 3.1} 24%|██▎ | 2371/10000 [9:17:28<29:33:15, 13.95s/it] 24%|██▎ | 2372/10000 [9:17:42<29:38:36, 13.99s/it] {'loss': 0.1674, 'learning_rate': 3.816e-05, 'epoch': 3.1} 24%|██▎ | 2372/10000 [9:17:42<29:38:36, 13.99s/it] 24%|██▎ | 2373/10000 [9:17:56<29:33:53, 13.95s/it] {'loss': 0.1583, 'learning_rate': 3.8155e-05, 'epoch': 3.11} 24%|██▎ | 2373/10000 [9:17:56<29:33:53, 13.95s/it] 24%|██▎ | 2374/10000 [9:18:09<29:33:14, 13.95s/it] {'loss': 0.1404, 'learning_rate': 3.8150000000000006e-05, 'epoch': 3.11} 24%|██▎ | 2374/10000 [9:18:10<29:33:14, 13.95s/it] 24%|██▍ | 2375/10000 [9:18:23<29:31:44, 13.94s/it] {'loss': 0.1477, 'learning_rate': 3.8145e-05, 'epoch': 3.11} 24%|██▍ | 2375/10000 [9:18:23<29:31:44, 13.94s/it] 24%|██▍ | 2376/10000 [9:18:37<29:32:05, 13.95s/it] {'loss': 0.1557, 'learning_rate': 3.8140000000000004e-05, 'epoch': 3.11} 24%|██▍ | 2376/10000 [9:18:37<29:32:05, 13.95s/it] 24%|██▍ | 2377/10000 [9:18:51<29:31:27, 13.94s/it] {'loss': 0.1376, 'learning_rate': 3.813500000000001e-05, 'epoch': 3.11} 24%|██▍ | 2377/10000 [9:18:51<29:31:27, 13.94s/it] 24%|██▍ | 2378/10000 [9:19:05<29:31:17, 13.94s/it] {'loss': 0.1654, 'learning_rate': 3.8129999999999996e-05, 'epoch': 3.11} 24%|██▍ | 2378/10000 [9:19:05<29:31:17, 13.94s/it] 24%|██▍ | 2379/10000 [9:19:19<29:32:54, 13.96s/it] {'loss': 0.1243, 'learning_rate': 3.8125e-05, 'epoch': 3.11} 24%|██▍ | 2379/10000 [9:19:19<29:32:54, 13.96s/it] 24%|██▍ | 2380/10000 [9:19:33<29:37:00, 13.99s/it] {'loss': 0.1358, 'learning_rate': 3.812e-05, 'epoch': 3.12} 24%|██▍ | 2380/10000 [9:19:33<29:37:00, 13.99s/it] 24%|██▍ | 2381/10000 [9:19:47<29:38:09, 14.00s/it] {'loss': 0.146, 'learning_rate': 3.8115000000000004e-05, 'epoch': 3.12} 24%|██▍ | 2381/10000 [9:19:47<29:38:09, 14.00s/it] 24%|██▍ | 2382/10000 [9:20:01<29:36:18, 13.99s/it] {'loss': 0.1281, 'learning_rate': 3.811e-05, 'epoch': 3.12} 24%|██▍ | 2382/10000 [9:20:01<29:36:18, 13.99s/it] 24%|██▍ | 2383/10000 [9:20:15<29:37:46, 14.00s/it] {'loss': 0.1626, 'learning_rate': 3.8105e-05, 'epoch': 3.12} 24%|██▍ | 2383/10000 [9:20:15<29:37:46, 14.00s/it] 24%|██▍ | 2384/10000 [9:20:29<29:31:00, 13.95s/it] {'loss': 0.1384, 'learning_rate': 3.8100000000000005e-05, 'epoch': 3.12} 24%|██▍ | 2384/10000 [9:20:29<29:31:00, 13.95s/it] 24%|██▍ | 2385/10000 [9:20:43<29:26:20, 13.92s/it] {'loss': 0.1351, 'learning_rate': 3.8095e-05, 'epoch': 3.12} 24%|██▍ | 2385/10000 [9:20:43<29:26:20, 13.92s/it] 24%|██▍ | 2386/10000 [9:20:57<29:25:26, 13.91s/it] {'loss': 0.1316, 'learning_rate': 3.809e-05, 'epoch': 3.12} 24%|██▍ | 2386/10000 [9:20:57<29:25:26, 13.91s/it] 24%|██▍ | 2387/10000 [9:21:11<29:23:59, 13.90s/it] {'loss': 0.1387, 'learning_rate': 3.8085000000000006e-05, 'epoch': 3.12} 24%|██▍ | 2387/10000 [9:21:11<29:23:59, 13.90s/it] 24%|██▍ | 2388/10000 [9:21:25<29:22:54, 13.90s/it] {'loss': 0.1109, 'learning_rate': 3.808e-05, 'epoch': 3.13} 24%|██▍ | 2388/10000 [9:21:25<29:22:54, 13.90s/it] 24%|██▍ | 2389/10000 [9:21:39<29:21:03, 13.88s/it] {'loss': 0.1527, 'learning_rate': 3.8075e-05, 'epoch': 3.13} 24%|██▍ | 2389/10000 [9:21:39<29:21:03, 13.88s/it] 24%|██▍ | 2390/10000 [9:21:52<29:14:41, 13.83s/it] {'loss': 0.1627, 'learning_rate': 3.807e-05, 'epoch': 3.13} 24%|██▍ | 2390/10000 [9:21:52<29:14:41, 13.83s/it] 24%|██▍ | 2391/10000 [9:22:06<29:19:25, 13.87s/it] {'loss': 0.1576, 'learning_rate': 3.8065e-05, 'epoch': 3.13} 24%|██▍ | 2391/10000 [9:22:06<29:19:25, 13.87s/it] 24%|██▍ | 2392/10000 [9:22:20<29:14:25, 13.84s/it] {'loss': 0.1465, 'learning_rate': 3.806e-05, 'epoch': 3.13} 24%|██▍ | 2392/10000 [9:22:20<29:14:25, 13.84s/it] 24%|██▍ | 2393/10000 [9:22:34<29:20:51, 13.89s/it] {'loss': 0.1494, 'learning_rate': 3.8055e-05, 'epoch': 3.13} 24%|██▍ | 2393/10000 [9:22:34<29:20:51, 13.89s/it] 24%|██▍ | 2394/10000 [9:22:48<29:22:28, 13.90s/it] {'loss': 0.1439, 'learning_rate': 3.805e-05, 'epoch': 3.13} 24%|██▍ | 2394/10000 [9:22:48<29:22:28, 13.90s/it] 24%|██▍ | 2395/10000 [9:23:02<29:26:01, 13.93s/it] {'loss': 0.1323, 'learning_rate': 3.8045000000000006e-05, 'epoch': 3.13} 24%|██▍ | 2395/10000 [9:23:02<29:26:01, 13.93s/it] 24%|██▍ | 2396/10000 [9:23:16<29:23:16, 13.91s/it] {'loss': 0.1658, 'learning_rate': 3.804e-05, 'epoch': 3.14} 24%|██▍ | 2396/10000 [9:23:16<29:23:16, 13.91s/it] 24%|██▍ | 2397/10000 [9:23:30<29:24:06, 13.92s/it] {'loss': 0.1742, 'learning_rate': 3.8035000000000004e-05, 'epoch': 3.14} 24%|██▍ | 2397/10000 [9:23:30<29:24:06, 13.92s/it] 24%|██▍ | 2398/10000 [9:23:44<29:26:03, 13.94s/it] {'loss': 0.1499, 'learning_rate': 3.803000000000001e-05, 'epoch': 3.14} 24%|██▍ | 2398/10000 [9:23:44<29:26:03, 13.94s/it] 24%|██▍ | 2399/10000 [9:23:58<29:33:30, 14.00s/it] {'loss': 0.1335, 'learning_rate': 3.8025e-05, 'epoch': 3.14} 24%|██▍ | 2399/10000 [9:23:58<29:33:30, 14.00s/it] 24%|██▍ | 2400/10000 [9:24:12<29:32:22, 13.99s/it] {'loss': 0.1364, 'learning_rate': 3.802e-05, 'epoch': 3.14} 24%|██▍ | 2400/10000 [9:24:12<29:32:22, 13.99s/it] 24%|██▍ | 2401/10000 [9:24:26<29:28:28, 13.96s/it] {'loss': 0.1485, 'learning_rate': 3.8015e-05, 'epoch': 3.14} 24%|██▍ | 2401/10000 [9:24:26<29:28:28, 13.96s/it] 24%|██▍ | 2402/10000 [9:24:40<29:24:48, 13.94s/it] {'loss': 0.141, 'learning_rate': 3.8010000000000004e-05, 'epoch': 3.14} 24%|██▍ | 2402/10000 [9:24:40<29:24:48, 13.94s/it] 24%|██▍ | 2403/10000 [9:24:53<29:23:03, 13.92s/it] {'loss': 0.1368, 'learning_rate': 3.8005e-05, 'epoch': 3.15} 24%|██▍ | 2403/10000 [9:24:53<29:23:03, 13.92s/it] 24%|██▍ | 2404/10000 [9:25:07<29:23:40, 13.93s/it] {'loss': 0.1388, 'learning_rate': 3.8e-05, 'epoch': 3.15} 24%|██▍ | 2404/10000 [9:25:07<29:23:40, 13.93s/it] 24%|██▍ | 2405/10000 [9:25:21<29:25:31, 13.95s/it] {'loss': 0.1455, 'learning_rate': 3.7995000000000005e-05, 'epoch': 3.15} 24%|██▍ | 2405/10000 [9:25:21<29:25:31, 13.95s/it] 24%|██▍ | 2406/10000 [9:25:35<29:24:00, 13.94s/it] {'loss': 0.1191, 'learning_rate': 3.799e-05, 'epoch': 3.15} 24%|██▍ | 2406/10000 [9:25:35<29:24:00, 13.94s/it] 24%|██▍ | 2407/10000 [9:25:49<29:24:19, 13.94s/it] {'loss': 0.1473, 'learning_rate': 3.7985e-05, 'epoch': 3.15} 24%|██▍ | 2407/10000 [9:25:49<29:24:19, 13.94s/it] 24%|██▍ | 2408/10000 [9:26:03<29:27:50, 13.97s/it] {'loss': 0.1323, 'learning_rate': 3.7980000000000006e-05, 'epoch': 3.15} 24%|██▍ | 2408/10000 [9:26:03<29:27:50, 13.97s/it] 24%|██▍ | 2409/10000 [9:26:17<29:24:48, 13.95s/it] {'loss': 0.1428, 'learning_rate': 3.7975e-05, 'epoch': 3.15} 24%|██▍ | 2409/10000 [9:26:17<29:24:48, 13.95s/it] 24%|██▍ | 2410/10000 [9:26:31<29:27:57, 13.98s/it] {'loss': 0.1342, 'learning_rate': 3.797e-05, 'epoch': 3.15} 24%|██▍ | 2410/10000 [9:26:31<29:27:57, 13.98s/it] 24%|██▍ | 2411/10000 [9:26:45<29:25:42, 13.96s/it] {'loss': 0.1613, 'learning_rate': 3.7965e-05, 'epoch': 3.16} 24%|██▍ | 2411/10000 [9:26:45<29:25:42, 13.96s/it] 24%|██▍ | 2412/10000 [9:26:59<29:25:47, 13.96s/it] {'loss': 0.1419, 'learning_rate': 3.796e-05, 'epoch': 3.16} 24%|██▍ | 2412/10000 [9:26:59<29:25:47, 13.96s/it] 24%|██▍ | 2413/10000 [9:27:13<29:19:32, 13.91s/it] {'loss': 0.1327, 'learning_rate': 3.7955e-05, 'epoch': 3.16} 24%|██▍ | 2413/10000 [9:27:13<29:19:32, 13.91s/it] 24%|██▍ | 2414/10000 [9:27:27<29:23:13, 13.95s/it] {'loss': 0.1249, 'learning_rate': 3.795e-05, 'epoch': 3.16} 24%|██▍ | 2414/10000 [9:27:27<29:23:13, 13.95s/it] 24%|██▍ | 2415/10000 [9:27:41<29:24:33, 13.96s/it] {'loss': 0.1483, 'learning_rate': 3.7945000000000003e-05, 'epoch': 3.16} 24%|██▍ | 2415/10000 [9:27:41<29:24:33, 13.96s/it] 24%|██▍ | 2416/10000 [9:27:55<29:24:49, 13.96s/it] {'loss': 0.1301, 'learning_rate': 3.7940000000000006e-05, 'epoch': 3.16} 24%|██▍ | 2416/10000 [9:27:55<29:24:49, 13.96s/it] 24%|██▍ | 2417/10000 [9:28:09<29:20:44, 13.93s/it] {'loss': 0.1348, 'learning_rate': 3.7935e-05, 'epoch': 3.16} 24%|██▍ | 2417/10000 [9:28:09<29:20:44, 13.93s/it] 24%|██▍ | 2418/10000 [9:28:23<29:25:57, 13.97s/it] {'loss': 0.1317, 'learning_rate': 3.7930000000000004e-05, 'epoch': 3.16} 24%|██▍ | 2418/10000 [9:28:23<29:25:57, 13.97s/it] 24%|██▍ | 2419/10000 [9:28:37<29:19:33, 13.93s/it] {'loss': 0.1568, 'learning_rate': 3.7925e-05, 'epoch': 3.17} 24%|██▍ | 2419/10000 [9:28:37<29:19:33, 13.93s/it] 24%|██▍ | 2420/10000 [9:28:50<29:14:41, 13.89s/it] {'loss': 0.1527, 'learning_rate': 3.792e-05, 'epoch': 3.17} 24%|██▍ | 2420/10000 [9:28:50<29:14:41, 13.89s/it] 24%|██▍ | 2421/10000 [9:29:04<29:16:14, 13.90s/it] {'loss': 0.1487, 'learning_rate': 3.7915e-05, 'epoch': 3.17} 24%|██▍ | 2421/10000 [9:29:04<29:16:14, 13.90s/it] 24%|██▍ | 2422/10000 [9:29:18<29:10:59, 13.86s/it] {'loss': 0.1077, 'learning_rate': 3.791e-05, 'epoch': 3.17} 24%|██▍ | 2422/10000 [9:29:18<29:10:59, 13.86s/it] 24%|██▍ | 2423/10000 [9:29:32<29:17:00, 13.91s/it] {'loss': 0.1348, 'learning_rate': 3.7905000000000004e-05, 'epoch': 3.17} 24%|██▍ | 2423/10000 [9:29:32<29:17:00, 13.91s/it] 24%|██▍ | 2424/10000 [9:29:46<29:17:13, 13.92s/it] {'loss': 0.1489, 'learning_rate': 3.79e-05, 'epoch': 3.17} 24%|██▍ | 2424/10000 [9:29:46<29:17:13, 13.92s/it] 24%|██▍ | 2425/10000 [9:30:00<29:17:58, 13.92s/it] {'loss': 0.1819, 'learning_rate': 3.7895e-05, 'epoch': 3.17} 24%|██▍ | 2425/10000 [9:30:00<29:17:58, 13.92s/it] 24%|██▍ | 2426/10000 [9:30:14<29:17:25, 13.92s/it] {'loss': 0.1449, 'learning_rate': 3.7890000000000005e-05, 'epoch': 3.18} 24%|██▍ | 2426/10000 [9:30:14<29:17:25, 13.92s/it] 24%|██▍ | 2427/10000 [9:30:28<29:17:21, 13.92s/it] {'loss': 0.134, 'learning_rate': 3.7885e-05, 'epoch': 3.18} 24%|██▍ | 2427/10000 [9:30:28<29:17:21, 13.92s/it] 24%|██▍ | 2428/10000 [9:30:42<29:17:57, 13.93s/it] {'loss': 0.1581, 'learning_rate': 3.788e-05, 'epoch': 3.18} 24%|██▍ | 2428/10000 [9:30:42<29:17:57, 13.93s/it] 24%|██▍ | 2429/10000 [9:30:56<29:16:45, 13.92s/it] {'loss': 0.1493, 'learning_rate': 3.7875e-05, 'epoch': 3.18} 24%|██▍ | 2429/10000 [9:30:56<29:16:45, 13.92s/it] 24%|██▍ | 2430/10000 [9:31:10<29:15:03, 13.91s/it] {'loss': 0.1378, 'learning_rate': 3.787e-05, 'epoch': 3.18} 24%|██▍ | 2430/10000 [9:31:10<29:15:03, 13.91s/it] 24%|██▍ | 2431/10000 [9:31:24<29:13:51, 13.90s/it] {'loss': 0.1387, 'learning_rate': 3.7865e-05, 'epoch': 3.18} 24%|██▍ | 2431/10000 [9:31:24<29:13:51, 13.90s/it] 24%|██▍ | 2432/10000 [9:31:37<29:13:48, 13.90s/it] {'loss': 0.1591, 'learning_rate': 3.786e-05, 'epoch': 3.18} 24%|██▍ | 2432/10000 [9:31:37<29:13:48, 13.90s/it] 24%|██▍ | 2433/10000 [9:31:51<29:12:20, 13.89s/it] {'loss': 0.1638, 'learning_rate': 3.7855e-05, 'epoch': 3.18} 24%|██▍ | 2433/10000 [9:31:51<29:12:20, 13.89s/it] 24%|██▍ | 2434/10000 [9:32:05<29:13:49, 13.91s/it] {'loss': 0.1333, 'learning_rate': 3.7850000000000005e-05, 'epoch': 3.19} 24%|██▍ | 2434/10000 [9:32:05<29:13:49, 13.91s/it] 24%|██▍ | 2435/10000 [9:32:19<29:10:43, 13.89s/it] {'loss': 0.1273, 'learning_rate': 3.7845e-05, 'epoch': 3.19} 24%|██▍ | 2435/10000 [9:32:19<29:10:43, 13.89s/it] 24%|██▍ | 2436/10000 [9:32:33<29:17:11, 13.94s/it] {'loss': 0.16, 'learning_rate': 3.7840000000000004e-05, 'epoch': 3.19} 24%|██▍ | 2436/10000 [9:32:33<29:17:11, 13.94s/it] 24%|██▍ | 2437/10000 [9:32:47<29:18:36, 13.95s/it] {'loss': 0.1505, 'learning_rate': 3.7835000000000006e-05, 'epoch': 3.19} 24%|██▍ | 2437/10000 [9:32:47<29:18:36, 13.95s/it] 24%|██▍ | 2438/10000 [9:33:01<29:17:43, 13.95s/it] {'loss': 0.1252, 'learning_rate': 3.783e-05, 'epoch': 3.19} 24%|██▍ | 2438/10000 [9:33:01<29:17:43, 13.95s/it] 24%|██▍ | 2439/10000 [9:33:15<29:19:03, 13.96s/it] {'loss': 0.1233, 'learning_rate': 3.7825e-05, 'epoch': 3.19} 24%|██▍ | 2439/10000 [9:33:15<29:19:03, 13.96s/it] 24%|██▍ | 2440/10000 [9:33:29<29:14:21, 13.92s/it] {'loss': 0.1302, 'learning_rate': 3.782e-05, 'epoch': 3.19} 24%|██▍ | 2440/10000 [9:33:29<29:14:21, 13.92s/it] 24%|██▍ | 2441/10000 [9:33:43<29:21:48, 13.98s/it] {'loss': 0.1622, 'learning_rate': 3.7815e-05, 'epoch': 3.2} 24%|██▍ | 2441/10000 [9:33:43<29:21:48, 13.98s/it] 24%|██▍ | 2442/10000 [9:33:57<29:20:04, 13.97s/it] {'loss': 0.1274, 'learning_rate': 3.781e-05, 'epoch': 3.2} 24%|██▍ | 2442/10000 [9:33:57<29:20:04, 13.97s/it] 24%|██▍ | 2443/10000 [9:34:11<29:20:35, 13.98s/it] {'loss': 0.1397, 'learning_rate': 3.7805e-05, 'epoch': 3.2} 24%|██▍ | 2443/10000 [9:34:11<29:20:35, 13.98s/it] 24%|██▍ | 2444/10000 [9:34:25<29:17:26, 13.96s/it] {'loss': 0.1677, 'learning_rate': 3.7800000000000004e-05, 'epoch': 3.2} 24%|██▍ | 2444/10000 [9:34:25<29:17:26, 13.96s/it] 24%|██▍ | 2445/10000 [9:34:39<29:12:05, 13.91s/it] {'loss': 0.0998, 'learning_rate': 3.7795e-05, 'epoch': 3.2} 24%|██▍ | 2445/10000 [9:34:39<29:12:05, 13.91s/it] 24%|██▍ | 2446/10000 [9:34:53<29:11:53, 13.91s/it] {'loss': 0.163, 'learning_rate': 3.779e-05, 'epoch': 3.2} 24%|██▍ | 2446/10000 [9:34:53<29:11:53, 13.91s/it] 24%|██▍ | 2447/10000 [9:35:06<29:10:12, 13.90s/it] {'loss': 0.1507, 'learning_rate': 3.7785000000000005e-05, 'epoch': 3.2} 24%|██▍ | 2447/10000 [9:35:07<29:10:12, 13.90s/it] 24%|██▍ | 2448/10000 [9:35:20<29:09:44, 13.90s/it] {'loss': 0.1299, 'learning_rate': 3.778000000000001e-05, 'epoch': 3.2} 24%|██▍ | 2448/10000 [9:35:20<29:09:44, 13.90s/it] 24%|██▍ | 2449/10000 [9:35:34<29:08:14, 13.89s/it] {'loss': 0.1445, 'learning_rate': 3.7775e-05, 'epoch': 3.21} 24%|██▍ | 2449/10000 [9:35:34<29:08:14, 13.89s/it] 24%|██▍ | 2450/10000 [9:35:48<29:08:54, 13.90s/it] {'loss': 0.1395, 'learning_rate': 3.777e-05, 'epoch': 3.21} 24%|██▍ | 2450/10000 [9:35:48<29:08:54, 13.90s/it] 25%|██▍ | 2451/10000 [9:36:02<29:15:55, 13.96s/it] {'loss': 0.1444, 'learning_rate': 3.7765e-05, 'epoch': 3.21} 25%|██▍ | 2451/10000 [9:36:02<29:15:55, 13.96s/it] 25%|██▍ | 2452/10000 [9:36:16<29:11:45, 13.92s/it] {'loss': 0.1548, 'learning_rate': 3.776e-05, 'epoch': 3.21} 25%|██▍ | 2452/10000 [9:36:16<29:11:45, 13.92s/it] 25%|██▍ | 2453/10000 [9:36:30<29:15:41, 13.96s/it] {'loss': 0.1301, 'learning_rate': 3.7755e-05, 'epoch': 3.21} 25%|██▍ | 2453/10000 [9:36:30<29:15:41, 13.96s/it] 25%|██▍ | 2454/10000 [9:36:44<29:15:30, 13.96s/it] {'loss': 0.1324, 'learning_rate': 3.775e-05, 'epoch': 3.21} 25%|██▍ | 2454/10000 [9:36:44<29:15:30, 13.96s/it] 25%|██▍ | 2455/10000 [9:36:58<29:12:34, 13.94s/it] {'loss': 0.1202, 'learning_rate': 3.7745000000000005e-05, 'epoch': 3.21} 25%|██▍ | 2455/10000 [9:36:58<29:12:34, 13.94s/it] 25%|██▍ | 2456/10000 [9:37:12<29:07:33, 13.90s/it] {'loss': 0.1044, 'learning_rate': 3.774e-05, 'epoch': 3.21} 25%|██▍ | 2456/10000 [9:37:12<29:07:33, 13.90s/it] 25%|██▍ | 2457/10000 [9:37:26<29:06:53, 13.90s/it] {'loss': 0.1556, 'learning_rate': 3.7735000000000004e-05, 'epoch': 3.22} 25%|██▍ | 2457/10000 [9:37:26<29:06:53, 13.90s/it] 25%|██▍ | 2458/10000 [9:37:40<29:06:13, 13.89s/it] {'loss': 0.1416, 'learning_rate': 3.7730000000000006e-05, 'epoch': 3.22} 25%|██▍ | 2458/10000 [9:37:40<29:06:13, 13.89s/it] 25%|██▍ | 2459/10000 [9:37:54<29:10:40, 13.93s/it] {'loss': 0.1887, 'learning_rate': 3.7725e-05, 'epoch': 3.22} 25%|██▍ | 2459/10000 [9:37:54<29:10:40, 13.93s/it] 25%|██▍ | 2460/10000 [9:38:08<29:11:55, 13.94s/it] {'loss': 0.1401, 'learning_rate': 3.772e-05, 'epoch': 3.22} 25%|██▍ | 2460/10000 [9:38:08<29:11:55, 13.94s/it] 25%|██▍ | 2461/10000 [9:38:22<29:13:26, 13.95s/it] {'loss': 0.137, 'learning_rate': 3.7715e-05, 'epoch': 3.22} 25%|██▍ | 2461/10000 [9:38:22<29:13:26, 13.95s/it] 25%|██▍ | 2462/10000 [9:38:35<29:13:14, 13.96s/it] {'loss': 0.1415, 'learning_rate': 3.771e-05, 'epoch': 3.22} 25%|██▍ | 2462/10000 [9:38:36<29:13:14, 13.96s/it] 25%|██▍ | 2463/10000 [9:38:49<29:11:46, 13.95s/it] {'loss': 0.1645, 'learning_rate': 3.7705e-05, 'epoch': 3.22} 25%|██▍ | 2463/10000 [9:38:49<29:11:46, 13.95s/it] 25%|██▍ | 2464/10000 [9:39:03<29:08:44, 13.92s/it] {'loss': 0.143, 'learning_rate': 3.77e-05, 'epoch': 3.23} 25%|██▍ | 2464/10000 [9:39:03<29:08:44, 13.92s/it] 25%|██▍ | 2465/10000 [9:39:17<29:11:03, 13.94s/it] {'loss': 0.1216, 'learning_rate': 3.7695000000000004e-05, 'epoch': 3.23} 25%|██▍ | 2465/10000 [9:39:17<29:11:03, 13.94s/it] 25%|██▍ | 2466/10000 [9:39:31<29:05:06, 13.90s/it] {'loss': 0.1648, 'learning_rate': 3.769e-05, 'epoch': 3.23} 25%|██▍ | 2466/10000 [9:39:31<29:05:06, 13.90s/it] 25%|██▍ | 2467/10000 [9:39:45<29:08:27, 13.93s/it] {'loss': 0.1555, 'learning_rate': 3.7685e-05, 'epoch': 3.23} 25%|██▍ | 2467/10000 [9:39:45<29:08:27, 13.93s/it] 25%|██▍ | 2468/10000 [9:39:59<29:07:30, 13.92s/it] {'loss': 0.14, 'learning_rate': 3.7680000000000005e-05, 'epoch': 3.23} 25%|██▍ | 2468/10000 [9:39:59<29:07:30, 13.92s/it] 25%|██▍ | 2469/10000 [9:40:13<29:07:05, 13.92s/it] {'loss': 0.14, 'learning_rate': 3.7675e-05, 'epoch': 3.23} 25%|██▍ | 2469/10000 [9:40:13<29:07:05, 13.92s/it] 25%|██▍ | 2470/10000 [9:40:27<29:05:23, 13.91s/it] {'loss': 0.1656, 'learning_rate': 3.767e-05, 'epoch': 3.23} 25%|██▍ | 2470/10000 [9:40:27<29:05:23, 13.91s/it] 25%|██▍ | 2471/10000 [9:40:41<29:08:48, 13.94s/it] {'loss': 0.1578, 'learning_rate': 3.7665e-05, 'epoch': 3.23} 25%|██▍ | 2471/10000 [9:40:41<29:08:48, 13.94s/it] 25%|██▍ | 2472/10000 [9:40:55<29:03:37, 13.90s/it] {'loss': 0.1588, 'learning_rate': 3.766e-05, 'epoch': 3.24} 25%|██▍ | 2472/10000 [9:40:55<29:03:37, 13.90s/it] 25%|██▍ | 2473/10000 [9:41:08<29:03:19, 13.90s/it] {'loss': 0.1284, 'learning_rate': 3.7655000000000005e-05, 'epoch': 3.24} 25%|██▍ | 2473/10000 [9:41:08<29:03:19, 13.90s/it] 25%|██▍ | 2474/10000 [9:41:22<29:05:17, 13.91s/it] {'loss': 0.1687, 'learning_rate': 3.765e-05, 'epoch': 3.24} 25%|██▍ | 2474/10000 [9:41:22<29:05:17, 13.91s/it] 25%|██▍ | 2475/10000 [9:41:36<29:03:37, 13.90s/it] {'loss': 0.1596, 'learning_rate': 3.7645e-05, 'epoch': 3.24} 25%|██▍ | 2475/10000 [9:41:36<29:03:37, 13.90s/it] 25%|██▍ | 2476/10000 [9:41:50<29:01:23, 13.89s/it] {'loss': 0.1386, 'learning_rate': 3.7640000000000006e-05, 'epoch': 3.24} 25%|██▍ | 2476/10000 [9:41:50<29:01:23, 13.89s/it] 25%|██▍ | 2477/10000 [9:42:04<29:02:21, 13.90s/it] {'loss': 0.1361, 'learning_rate': 3.7635e-05, 'epoch': 3.24} 25%|██▍ | 2477/10000 [9:42:04<29:02:21, 13.90s/it] 25%|██▍ | 2478/10000 [9:42:18<29:04:31, 13.92s/it] {'loss': 0.1318, 'learning_rate': 3.7630000000000004e-05, 'epoch': 3.24} 25%|██▍ | 2478/10000 [9:42:18<29:04:31, 13.92s/it] 25%|██▍ | 2479/10000 [9:42:32<29:03:27, 13.91s/it] {'loss': 0.1496, 'learning_rate': 3.7625e-05, 'epoch': 3.24} 25%|██▍ | 2479/10000 [9:42:32<29:03:27, 13.91s/it] 25%|██▍ | 2480/10000 [9:42:46<29:01:13, 13.89s/it] {'loss': 0.1271, 'learning_rate': 3.762e-05, 'epoch': 3.25} 25%|██▍ | 2480/10000 [9:42:46<29:01:13, 13.89s/it] 25%|██▍ | 2481/10000 [9:43:00<29:01:13, 13.89s/it] {'loss': 0.1415, 'learning_rate': 3.7615e-05, 'epoch': 3.25} 25%|██▍ | 2481/10000 [9:43:00<29:01:13, 13.89s/it] 25%|██▍ | 2482/10000 [9:43:14<29:04:59, 13.93s/it] {'loss': 0.1324, 'learning_rate': 3.761e-05, 'epoch': 3.25} 25%|██▍ | 2482/10000 [9:43:14<29:04:59, 13.93s/it] 25%|██▍ | 2483/10000 [9:43:28<29:07:49, 13.95s/it] {'loss': 0.2164, 'learning_rate': 3.7605e-05, 'epoch': 3.25} 25%|██▍ | 2483/10000 [9:43:28<29:07:49, 13.95s/it] 25%|██▍ | 2484/10000 [9:43:42<29:09:07, 13.96s/it] {'loss': 0.1723, 'learning_rate': 3.76e-05, 'epoch': 3.25} 25%|██▍ | 2484/10000 [9:43:42<29:09:07, 13.96s/it] 25%|██▍ | 2485/10000 [9:43:56<29:08:36, 13.96s/it] {'loss': 0.1245, 'learning_rate': 3.7595e-05, 'epoch': 3.25} 25%|██▍ | 2485/10000 [9:43:56<29:08:36, 13.96s/it] 25%|██▍ | 2486/10000 [9:44:10<29:12:42, 14.00s/it] {'loss': 0.1492, 'learning_rate': 3.7590000000000004e-05, 'epoch': 3.25} 25%|██▍ | 2486/10000 [9:44:10<29:12:42, 14.00s/it] 25%|██▍ | 2487/10000 [9:44:24<29:13:44, 14.01s/it] {'loss': 0.1865, 'learning_rate': 3.758500000000001e-05, 'epoch': 3.26} 25%|██▍ | 2487/10000 [9:44:24<29:13:44, 14.01s/it] 25%|██▍ | 2488/10000 [9:44:38<29:08:24, 13.96s/it] {'loss': 0.1473, 'learning_rate': 3.758e-05, 'epoch': 3.26} 25%|██▍ | 2488/10000 [9:44:38<29:08:24, 13.96s/it] 25%|██▍ | 2489/10000 [9:44:52<29:09:59, 13.98s/it] {'loss': 0.1462, 'learning_rate': 3.7575e-05, 'epoch': 3.26} 25%|██▍ | 2489/10000 [9:44:52<29:09:59, 13.98s/it] 25%|██▍ | 2490/10000 [9:45:05<29:06:37, 13.95s/it] {'loss': 0.117, 'learning_rate': 3.757e-05, 'epoch': 3.26} 25%|██▍ | 2490/10000 [9:45:06<29:06:37, 13.95s/it] 25%|██▍ | 2491/10000 [9:45:19<29:04:29, 13.94s/it] {'loss': 0.1509, 'learning_rate': 3.7565e-05, 'epoch': 3.26} 25%|██▍ | 2491/10000 [9:45:19<29:04:29, 13.94s/it] 25%|██▍ | 2492/10000 [9:45:33<29:05:40, 13.95s/it] {'loss': 0.142, 'learning_rate': 3.756e-05, 'epoch': 3.26} 25%|██▍ | 2492/10000 [9:45:33<29:05:40, 13.95s/it] 25%|██▍ | 2493/10000 [9:45:47<29:07:51, 13.97s/it] {'loss': 0.1387, 'learning_rate': 3.7555e-05, 'epoch': 3.26} 25%|██▍ | 2493/10000 [9:45:47<29:07:51, 13.97s/it] 25%|██▍ | 2494/10000 [9:46:01<29:08:15, 13.97s/it] {'loss': 0.1447, 'learning_rate': 3.7550000000000005e-05, 'epoch': 3.26} 25%|██▍ | 2494/10000 [9:46:01<29:08:15, 13.97s/it] 25%|██▍ | 2495/10000 [9:46:15<29:07:21, 13.97s/it] {'loss': 0.1412, 'learning_rate': 3.7545e-05, 'epoch': 3.27} 25%|██▍ | 2495/10000 [9:46:15<29:07:21, 13.97s/it] 25%|██▍ | 2496/10000 [9:46:29<29:04:45, 13.95s/it] {'loss': 0.1493, 'learning_rate': 3.754e-05, 'epoch': 3.27} 25%|██▍ | 2496/10000 [9:46:29<29:04:45, 13.95s/it] 25%|██▍ | 2497/10000 [9:46:43<29:06:53, 13.97s/it] {'loss': 0.1414, 'learning_rate': 3.7535000000000006e-05, 'epoch': 3.27} 25%|██▍ | 2497/10000 [9:46:43<29:06:53, 13.97s/it] 25%|██▍ | 2498/10000 [9:46:57<29:09:26, 13.99s/it] {'loss': 0.1492, 'learning_rate': 3.753e-05, 'epoch': 3.27} 25%|██▍ | 2498/10000 [9:46:57<29:09:26, 13.99s/it] 25%|██▍ | 2499/10000 [9:47:11<29:06:13, 13.97s/it] {'loss': 0.1126, 'learning_rate': 3.7525e-05, 'epoch': 3.27} 25%|██▍ | 2499/10000 [9:47:11<29:06:13, 13.97s/it] 25%|██▌ | 2500/10000 [9:47:25<29:07:13, 13.98s/it] {'loss': 0.1258, 'learning_rate': 3.752e-05, 'epoch': 3.27} 25%|██▌ | 2500/10000 [9:47:25<29:07:13, 13.98s/it] 25%|██▌ | 2501/10000 [9:47:39<29:06:16, 13.97s/it] {'loss': 0.1456, 'learning_rate': 3.7515e-05, 'epoch': 3.27} 25%|██▌ | 2501/10000 [9:47:39<29:06:16, 13.97s/it] 25%|██▌ | 2502/10000 [9:47:53<29:06:03, 13.97s/it] {'loss': 0.1337, 'learning_rate': 3.751e-05, 'epoch': 3.27} 25%|██▌ | 2502/10000 [9:47:53<29:06:03, 13.97s/it] 25%|██▌ | 2503/10000 [9:48:07<29:05:39, 13.97s/it] {'loss': 0.1582, 'learning_rate': 3.7505e-05, 'epoch': 3.28} 25%|██▌ | 2503/10000 [9:48:07<29:05:39, 13.97s/it] 25%|██▌ | 2504/10000 [9:48:21<29:07:01, 13.98s/it] {'loss': 0.1425, 'learning_rate': 3.7500000000000003e-05, 'epoch': 3.28} 25%|██▌ | 2504/10000 [9:48:21<29:07:01, 13.98s/it] 25%|██▌ | 2505/10000 [9:48:35<29:07:56, 13.99s/it] {'loss': 0.138, 'learning_rate': 3.7495e-05, 'epoch': 3.28} 25%|██▌ | 2505/10000 [9:48:35<29:07:56, 13.99s/it] 25%|██▌ | 2506/10000 [9:48:49<29:13:22, 14.04s/it] {'loss': 0.1112, 'learning_rate': 3.749e-05, 'epoch': 3.28} 25%|██▌ | 2506/10000 [9:48:49<29:13:22, 14.04s/it] 25%|██▌ | 2507/10000 [9:49:03<29:07:41, 13.99s/it] {'loss': 0.1776, 'learning_rate': 3.7485000000000004e-05, 'epoch': 3.28} 25%|██▌ | 2507/10000 [9:49:03<29:07:41, 13.99s/it] 25%|██▌ | 2508/10000 [9:49:17<29:06:35, 13.99s/it] {'loss': 0.1573, 'learning_rate': 3.748000000000001e-05, 'epoch': 3.28} 25%|██▌ | 2508/10000 [9:49:17<29:06:35, 13.99s/it] 25%|██▌ | 2509/10000 [9:49:31<29:09:42, 14.01s/it] {'loss': 0.1806, 'learning_rate': 3.7475e-05, 'epoch': 3.28} 25%|██▌ | 2509/10000 [9:49:31<29:09:42, 14.01s/it] 25%|██▌ | 2510/10000 [9:49:45<29:05:45, 13.98s/it] {'loss': 0.1243, 'learning_rate': 3.747e-05, 'epoch': 3.29} 25%|██▌ | 2510/10000 [9:49:45<29:05:45, 13.98s/it] 25%|██▌ | 2511/10000 [9:49:59<29:08:37, 14.01s/it] {'loss': 0.1912, 'learning_rate': 3.7465e-05, 'epoch': 3.29} 25%|██▌ | 2511/10000 [9:49:59<29:08:37, 14.01s/it] 25%|██▌ | 2512/10000 [9:50:13<29:04:42, 13.98s/it] {'loss': 0.1115, 'learning_rate': 3.7460000000000004e-05, 'epoch': 3.29} 25%|██▌ | 2512/10000 [9:50:13<29:04:42, 13.98s/it] 25%|██▌ | 2513/10000 [9:50:27<29:02:45, 13.97s/it] {'loss': 0.1435, 'learning_rate': 3.7455e-05, 'epoch': 3.29} 25%|██▌ | 2513/10000 [9:50:27<29:02:45, 13.97s/it] 25%|██▌ | 2514/10000 [9:50:41<29:00:56, 13.95s/it] {'loss': 0.1208, 'learning_rate': 3.745e-05, 'epoch': 3.29} 25%|██▌ | 2514/10000 [9:50:41<29:00:56, 13.95s/it] 25%|██▌ | 2515/10000 [9:50:55<29:00:32, 13.95s/it] {'loss': 0.1336, 'learning_rate': 3.7445000000000005e-05, 'epoch': 3.29} 25%|██▌ | 2515/10000 [9:50:55<29:00:32, 13.95s/it] 25%|██▌ | 2516/10000 [9:51:09<28:58:47, 13.94s/it] {'loss': 0.1469, 'learning_rate': 3.744e-05, 'epoch': 3.29} 25%|██▌ | 2516/10000 [9:51:09<28:58:47, 13.94s/it] 25%|██▌ | 2517/10000 [9:51:23<29:00:38, 13.96s/it] {'loss': 0.156, 'learning_rate': 3.7435e-05, 'epoch': 3.29} 25%|██▌ | 2517/10000 [9:51:23<29:00:38, 13.96s/it] 25%|██▌ | 2518/10000 [9:51:37<28:59:01, 13.95s/it] {'loss': 0.1678, 'learning_rate': 3.7430000000000006e-05, 'epoch': 3.3} 25%|██▌ | 2518/10000 [9:51:37<28:59:01, 13.95s/it] 25%|██▌ | 2519/10000 [9:51:51<29:00:46, 13.96s/it] {'loss': 0.1256, 'learning_rate': 3.7425e-05, 'epoch': 3.3} 25%|██▌ | 2519/10000 [9:51:51<29:00:46, 13.96s/it] 25%|██▌ | 2520/10000 [9:52:05<29:00:47, 13.96s/it] {'loss': 0.1492, 'learning_rate': 3.742e-05, 'epoch': 3.3} 25%|██▌ | 2520/10000 [9:52:05<29:00:47, 13.96s/it] 25%|██▌ | 2521/10000 [9:52:19<28:55:39, 13.92s/it] {'loss': 0.1379, 'learning_rate': 3.7415e-05, 'epoch': 3.3} 25%|██▌ | 2521/10000 [9:52:19<28:55:39, 13.92s/it] 25%|██▌ | 2522/10000 [9:52:32<28:52:35, 13.90s/it] {'loss': 0.1329, 'learning_rate': 3.741e-05, 'epoch': 3.3} 25%|██▌ | 2522/10000 [9:52:32<28:52:35, 13.90s/it] 25%|██▌ | 2523/10000 [9:52:46<28:54:47, 13.92s/it] {'loss': 0.1532, 'learning_rate': 3.7405e-05, 'epoch': 3.3} 25%|██▌ | 2523/10000 [9:52:46<28:54:47, 13.92s/it] 25%|██▌ | 2524/10000 [9:53:00<28:50:58, 13.89s/it] {'loss': 0.138, 'learning_rate': 3.74e-05, 'epoch': 3.3} 25%|██▌ | 2524/10000 [9:53:00<28:50:58, 13.89s/it] 25%|██▌ | 2525/10000 [9:53:14<28:53:09, 13.91s/it] {'loss': 0.1645, 'learning_rate': 3.7395000000000004e-05, 'epoch': 3.3} 25%|██▌ | 2525/10000 [9:53:14<28:53:09, 13.91s/it] 25%|██▌ | 2526/10000 [9:53:28<28:54:40, 13.93s/it] {'loss': 0.1615, 'learning_rate': 3.739e-05, 'epoch': 3.31} 25%|██▌ | 2526/10000 [9:53:28<28:54:40, 13.93s/it] 25%|██▌ | 2527/10000 [9:53:42<28:55:27, 13.93s/it] {'loss': 0.1488, 'learning_rate': 3.7385e-05, 'epoch': 3.31} 25%|██▌ | 2527/10000 [9:53:42<28:55:27, 13.93s/it] 25%|██▌ | 2528/10000 [9:53:56<28:51:22, 13.90s/it] {'loss': 0.1497, 'learning_rate': 3.7380000000000005e-05, 'epoch': 3.31} 25%|██▌ | 2528/10000 [9:53:56<28:51:22, 13.90s/it] 25%|██▌ | 2529/10000 [9:54:10<28:50:00, 13.89s/it] {'loss': 0.1632, 'learning_rate': 3.737500000000001e-05, 'epoch': 3.31} 25%|██▌ | 2529/10000 [9:54:10<28:50:00, 13.89s/it] 25%|██▌ | 2530/10000 [9:54:24<28:52:28, 13.92s/it] {'loss': 0.1799, 'learning_rate': 3.7369999999999996e-05, 'epoch': 3.31} 25%|██▌ | 2530/10000 [9:54:24<28:52:28, 13.92s/it] 25%|██▌ | 2531/10000 [9:54:38<29:02:29, 14.00s/it] {'loss': 0.1476, 'learning_rate': 3.7365e-05, 'epoch': 3.31} 25%|██▌ | 2531/10000 [9:54:38<29:02:29, 14.00s/it] 25%|██▌ | 2532/10000 [9:54:52<28:59:02, 13.97s/it] {'loss': 0.1399, 'learning_rate': 3.736e-05, 'epoch': 3.31} 25%|██▌ | 2532/10000 [9:54:52<28:59:02, 13.97s/it] 25%|██▌ | 2533/10000 [9:55:06<29:03:05, 14.01s/it] {'loss': 0.1645, 'learning_rate': 3.7355000000000004e-05, 'epoch': 3.32} 25%|██▌ | 2533/10000 [9:55:06<29:03:05, 14.01s/it] 25%|██▌ | 2534/10000 [9:55:20<28:59:32, 13.98s/it] {'loss': 0.1295, 'learning_rate': 3.735e-05, 'epoch': 3.32} 25%|██▌ | 2534/10000 [9:55:20<28:59:32, 13.98s/it] 25%|██▌ | 2535/10000 [9:55:34<28:54:41, 13.94s/it] {'loss': 0.1571, 'learning_rate': 3.7345e-05, 'epoch': 3.32} 25%|██▌ | 2535/10000 [9:55:34<28:54:41, 13.94s/it] 25%|██▌ | 2536/10000 [9:55:48<28:55:41, 13.95s/it] {'loss': 0.1488, 'learning_rate': 3.7340000000000005e-05, 'epoch': 3.32} 25%|██▌ | 2536/10000 [9:55:48<28:55:41, 13.95s/it] 25%|██▌ | 2537/10000 [9:56:02<28:51:44, 13.92s/it] {'loss': 0.144, 'learning_rate': 3.7335e-05, 'epoch': 3.32} 25%|██▌ | 2537/10000 [9:56:02<28:51:44, 13.92s/it] 25%|██▌ | 2538/10000 [9:56:15<28:51:49, 13.93s/it] {'loss': 0.1636, 'learning_rate': 3.7330000000000003e-05, 'epoch': 3.32} 25%|██▌ | 2538/10000 [9:56:15<28:51:49, 13.93s/it] 25%|██▌ | 2539/10000 [9:56:29<28:52:00, 13.93s/it] {'loss': 0.147, 'learning_rate': 3.7325000000000006e-05, 'epoch': 3.32} 25%|██▌ | 2539/10000 [9:56:29<28:52:00, 13.93s/it] 25%|██▌ | 2540/10000 [9:56:43<28:52:12, 13.93s/it] {'loss': 0.1452, 'learning_rate': 3.732e-05, 'epoch': 3.32} 25%|██▌ | 2540/10000 [9:56:43<28:52:12, 13.93s/it] 25%|██▌ | 2541/10000 [9:56:57<28:55:04, 13.96s/it] {'loss': 0.1564, 'learning_rate': 3.7315e-05, 'epoch': 3.33} 25%|██▌ | 2541/10000 [9:56:57<28:55:04, 13.96s/it] 25%|██▌ | 2542/10000 [9:57:11<28:55:07, 13.96s/it] {'loss': 0.15, 'learning_rate': 3.731e-05, 'epoch': 3.33} 25%|██▌ | 2542/10000 [9:57:11<28:55:07, 13.96s/it] 25%|██▌ | 2543/10000 [9:57:25<28:57:29, 13.98s/it] {'loss': 0.1463, 'learning_rate': 3.7305e-05, 'epoch': 3.33} 25%|██▌ | 2543/10000 [9:57:25<28:57:29, 13.98s/it] 25%|██▌ | 2544/10000 [9:57:39<29:01:22, 14.01s/it] {'loss': 0.1357, 'learning_rate': 3.73e-05, 'epoch': 3.33} 25%|██▌ | 2544/10000 [9:57:39<29:01:22, 14.01s/it] 25%|██▌ | 2545/10000 [9:57:53<28:57:42, 13.99s/it] {'loss': 0.1525, 'learning_rate': 3.7295e-05, 'epoch': 3.33} 25%|██▌ | 2545/10000 [9:57:53<28:57:42, 13.99s/it] 25%|██▌ | 2546/10000 [9:58:07<29:01:25, 14.02s/it] {'loss': 0.1679, 'learning_rate': 3.7290000000000004e-05, 'epoch': 3.33} 25%|██▌ | 2546/10000 [9:58:07<29:01:25, 14.02s/it] 25%|██▌ | 2547/10000 [9:58:21<28:56:40, 13.98s/it] {'loss': 0.1273, 'learning_rate': 3.7285000000000006e-05, 'epoch': 3.33} 25%|██▌ | 2547/10000 [9:58:21<28:56:40, 13.98s/it] 25%|██▌ | 2548/10000 [9:58:35<28:52:04, 13.95s/it] {'loss': 0.1364, 'learning_rate': 3.728e-05, 'epoch': 3.34} 25%|██▌ | 2548/10000 [9:58:35<28:52:04, 13.95s/it] 25%|██▌ | 2549/10000 [9:58:49<28:52:23, 13.95s/it] {'loss': 0.1465, 'learning_rate': 3.7275000000000005e-05, 'epoch': 3.34} 25%|██▌ | 2549/10000 [9:58:49<28:52:23, 13.95s/it] 26%|██▌ | 2550/10000 [9:59:03<28:49:33, 13.93s/it] {'loss': 0.1481, 'learning_rate': 3.727e-05, 'epoch': 3.34} 26%|██▌ | 2550/10000 [9:59:03<28:49:33, 13.93s/it] 26%|██▌ | 2551/10000 [9:59:17<28:47:44, 13.92s/it] {'loss': 0.1596, 'learning_rate': 3.7265e-05, 'epoch': 3.34} 26%|██▌ | 2551/10000 [9:59:17<28:47:44, 13.92s/it] 26%|██▌ | 2552/10000 [9:59:31<28:48:15, 13.92s/it] {'loss': 0.1375, 'learning_rate': 3.726e-05, 'epoch': 3.34} 26%|██▌ | 2552/10000 [9:59:31<28:48:15, 13.92s/it] 26%|██▌ | 2553/10000 [9:59:45<28:48:48, 13.93s/it] {'loss': 0.1525, 'learning_rate': 3.7255e-05, 'epoch': 3.34} 26%|██▌ | 2553/10000 [9:59:45<28:48:48, 13.93s/it] 26%|██▌ | 2554/10000 [9:59:59<28:47:38, 13.92s/it] {'loss': 0.1748, 'learning_rate': 3.7250000000000004e-05, 'epoch': 3.34} 26%|██▌ | 2554/10000 [9:59:59<28:47:38, 13.92s/it] 26%|██▌ | 2555/10000 [10:00:13<28:49:08, 13.94s/it] {'loss': 0.1645, 'learning_rate': 3.7245e-05, 'epoch': 3.34} 26%|██▌ | 2555/10000 [10:00:13<28:49:08, 13.94s/it] 26%|██▌ | 2556/10000 [10:00:27<28:52:30, 13.96s/it] {'loss': 0.1376, 'learning_rate': 3.724e-05, 'epoch': 3.35} 26%|██▌ | 2556/10000 [10:00:27<28:52:30, 13.96s/it] 26%|██▌ | 2557/10000 [10:00:41<28:51:12, 13.96s/it] {'loss': 0.2017, 'learning_rate': 3.7235000000000005e-05, 'epoch': 3.35} 26%|██▌ | 2557/10000 [10:00:41<28:51:12, 13.96s/it] 26%|██▌ | 2558/10000 [10:00:54<28:44:42, 13.91s/it] {'loss': 0.1169, 'learning_rate': 3.723e-05, 'epoch': 3.35} 26%|██▌ | 2558/10000 [10:00:54<28:44:42, 13.91s/it] 26%|██▌ | 2559/10000 [10:01:08<28:48:20, 13.94s/it] {'loss': 0.161, 'learning_rate': 3.7225000000000004e-05, 'epoch': 3.35} 26%|██▌ | 2559/10000 [10:01:08<28:48:20, 13.94s/it] 26%|██▌ | 2560/10000 [10:01:22<28:45:25, 13.91s/it] {'loss': 0.1593, 'learning_rate': 3.722e-05, 'epoch': 3.35} 26%|██▌ | 2560/10000 [10:01:22<28:45:25, 13.91s/it] 26%|██▌ | 2561/10000 [10:01:36<28:43:35, 13.90s/it] {'loss': 0.1598, 'learning_rate': 3.7215e-05, 'epoch': 3.35} 26%|██▌ | 2561/10000 [10:01:36<28:43:35, 13.90s/it] 26%|██▌ | 2562/10000 [10:01:50<28:49:18, 13.95s/it] {'loss': 0.1813, 'learning_rate': 3.721e-05, 'epoch': 3.35} 26%|██▌ | 2562/10000 [10:01:50<28:49:18, 13.95s/it] 26%|██▌ | 2563/10000 [10:02:04<28:52:25, 13.98s/it] {'loss': 0.1368, 'learning_rate': 3.7205e-05, 'epoch': 3.35} 26%|██▌ | 2563/10000 [10:02:04<28:52:25, 13.98s/it] 26%|██▌ | 2564/10000 [10:02:18<28:48:24, 13.95s/it] {'loss': 0.1134, 'learning_rate': 3.72e-05, 'epoch': 3.36} 26%|██▌ | 2564/10000 [10:02:18<28:48:24, 13.95s/it] 26%|██▌ | 2565/10000 [10:02:32<28:47:36, 13.94s/it] {'loss': 0.1619, 'learning_rate': 3.7195e-05, 'epoch': 3.36} 26%|██▌ | 2565/10000 [10:02:32<28:47:36, 13.94s/it] 26%|██▌ | 2566/10000 [10:02:46<28:44:12, 13.92s/it] {'loss': 0.1264, 'learning_rate': 3.719e-05, 'epoch': 3.36} 26%|██▌ | 2566/10000 [10:02:46<28:44:12, 13.92s/it] 26%|██▌ | 2567/10000 [10:03:00<28:37:37, 13.86s/it] {'loss': 0.136, 'learning_rate': 3.7185000000000004e-05, 'epoch': 3.36} 26%|██▌ | 2567/10000 [10:03:00<28:37:37, 13.86s/it] 26%|██▌ | 2568/10000 [10:03:13<28:34:54, 13.84s/it] {'loss': 0.1278, 'learning_rate': 3.7180000000000007e-05, 'epoch': 3.36} 26%|██▌ | 2568/10000 [10:03:14<28:34:54, 13.84s/it] 26%|██▌ | 2569/10000 [10:03:28<28:42:32, 13.91s/it] {'loss': 0.1562, 'learning_rate': 3.7175e-05, 'epoch': 3.36} 26%|██▌ | 2569/10000 [10:03:28<28:42:32, 13.91s/it] 26%|██▌ | 2570/10000 [10:03:41<28:42:04, 13.91s/it] {'loss': 0.1306, 'learning_rate': 3.717e-05, 'epoch': 3.36} 26%|██▌ | 2570/10000 [10:03:41<28:42:04, 13.91s/it] 26%|██▌ | 2571/10000 [10:03:56<28:47:23, 13.95s/it] {'loss': 0.1789, 'learning_rate': 3.7165e-05, 'epoch': 3.37} 26%|██▌ | 2571/10000 [10:03:56<28:47:23, 13.95s/it] 26%|██▌ | 2572/10000 [10:04:09<28:48:25, 13.96s/it] {'loss': 0.1246, 'learning_rate': 3.716e-05, 'epoch': 3.37} 26%|██▌ | 2572/10000 [10:04:10<28:48:25, 13.96s/it] 26%|██▌ | 2573/10000 [10:04:23<28:48:04, 13.96s/it] {'loss': 0.1732, 'learning_rate': 3.7155e-05, 'epoch': 3.37} 26%|██▌ | 2573/10000 [10:04:23<28:48:04, 13.96s/it] 26%|██▌ | 2574/10000 [10:04:37<28:50:29, 13.98s/it] {'loss': 0.1746, 'learning_rate': 3.715e-05, 'epoch': 3.37} 26%|██▌ | 2574/10000 [10:04:38<28:50:29, 13.98s/it] 26%|██▌ | 2575/10000 [10:04:52<28:54:01, 14.01s/it] {'loss': 0.1664, 'learning_rate': 3.7145000000000004e-05, 'epoch': 3.37} 26%|██▌ | 2575/10000 [10:04:52<28:54:01, 14.01s/it] 26%|██▌ | 2576/10000 [10:05:06<28:54:24, 14.02s/it] {'loss': 0.1502, 'learning_rate': 3.714e-05, 'epoch': 3.37} 26%|██▌ | 2576/10000 [10:05:06<28:54:24, 14.02s/it] 26%|██▌ | 2577/10000 [10:05:19<28:48:28, 13.97s/it] {'loss': 0.1803, 'learning_rate': 3.7135e-05, 'epoch': 3.37} 26%|██▌ | 2577/10000 [10:05:19<28:48:28, 13.97s/it] 26%|██▌ | 2578/10000 [10:05:33<28:48:22, 13.97s/it] {'loss': 0.1489, 'learning_rate': 3.7130000000000005e-05, 'epoch': 3.37} 26%|██▌ | 2578/10000 [10:05:33<28:48:22, 13.97s/it] 26%|██▌ | 2579/10000 [10:05:47<28:44:52, 13.95s/it] {'loss': 0.1435, 'learning_rate': 3.7125e-05, 'epoch': 3.38} 26%|██▌ | 2579/10000 [10:05:47<28:44:52, 13.95s/it] 26%|██▌ | 2580/10000 [10:06:01<28:39:36, 13.91s/it] {'loss': 0.1678, 'learning_rate': 3.712e-05, 'epoch': 3.38} 26%|██▌ | 2580/10000 [10:06:01<28:39:36, 13.91s/it] 26%|██▌ | 2581/10000 [10:06:15<28:38:09, 13.90s/it] {'loss': 0.1678, 'learning_rate': 3.7115e-05, 'epoch': 3.38} 26%|██▌ | 2581/10000 [10:06:15<28:38:09, 13.90s/it] 26%|██▌ | 2582/10000 [10:06:29<28:40:13, 13.91s/it] {'loss': 0.1668, 'learning_rate': 3.711e-05, 'epoch': 3.38} 26%|██▌ | 2582/10000 [10:06:29<28:40:13, 13.91s/it] 26%|██▌ | 2583/10000 [10:06:43<28:36:23, 13.88s/it] {'loss': 0.1381, 'learning_rate': 3.7105e-05, 'epoch': 3.38} 26%|██▌ | 2583/10000 [10:06:43<28:36:23, 13.88s/it] 26%|██▌ | 2584/10000 [10:06:57<28:38:04, 13.90s/it] {'loss': 0.1448, 'learning_rate': 3.71e-05, 'epoch': 3.38} 26%|██▌ | 2584/10000 [10:06:57<28:38:04, 13.90s/it] 26%|██▌ | 2585/10000 [10:07:11<28:39:30, 13.91s/it] {'loss': 0.1684, 'learning_rate': 3.7095e-05, 'epoch': 3.38} 26%|██▌ | 2585/10000 [10:07:11<28:39:30, 13.91s/it] 26%|██▌ | 2586/10000 [10:07:25<28:43:09, 13.95s/it] {'loss': 0.1424, 'learning_rate': 3.7090000000000006e-05, 'epoch': 3.38} 26%|██▌ | 2586/10000 [10:07:25<28:43:09, 13.95s/it] 26%|██▌ | 2587/10000 [10:07:39<28:41:24, 13.93s/it] {'loss': 0.1395, 'learning_rate': 3.7085e-05, 'epoch': 3.39} 26%|██▌ | 2587/10000 [10:07:39<28:41:24, 13.93s/it] 26%|██▌ | 2588/10000 [10:07:52<28:36:49, 13.90s/it] {'loss': 0.1175, 'learning_rate': 3.7080000000000004e-05, 'epoch': 3.39} 26%|██▌ | 2588/10000 [10:07:52<28:36:49, 13.90s/it] 26%|██▌ | 2589/10000 [10:08:06<28:35:11, 13.89s/it] {'loss': 0.1622, 'learning_rate': 3.707500000000001e-05, 'epoch': 3.39} 26%|██▌ | 2589/10000 [10:08:06<28:35:11, 13.89s/it] 26%|██▌ | 2590/10000 [10:08:20<28:39:09, 13.92s/it] {'loss': 0.125, 'learning_rate': 3.707e-05, 'epoch': 3.39} 26%|██▌ | 2590/10000 [10:08:20<28:39:09, 13.92s/it] 26%|██▌ | 2591/10000 [10:08:34<28:38:03, 13.91s/it] {'loss': 0.1387, 'learning_rate': 3.7065e-05, 'epoch': 3.39} 26%|██▌ | 2591/10000 [10:08:34<28:38:03, 13.91s/it] 26%|██▌ | 2592/10000 [10:08:48<28:43:09, 13.96s/it] {'loss': 0.1575, 'learning_rate': 3.706e-05, 'epoch': 3.39} 26%|██▌ | 2592/10000 [10:08:48<28:43:09, 13.96s/it] 26%|██▌ | 2593/10000 [10:09:02<28:48:21, 14.00s/it] {'loss': 0.1526, 'learning_rate': 3.7055000000000004e-05, 'epoch': 3.39} 26%|██▌ | 2593/10000 [10:09:02<28:48:21, 14.00s/it] 26%|██▌ | 2594/10000 [10:09:16<28:45:32, 13.98s/it] {'loss': 0.1623, 'learning_rate': 3.705e-05, 'epoch': 3.4} 26%|██▌ | 2594/10000 [10:09:16<28:45:32, 13.98s/it] 26%|██▌ | 2595/10000 [10:09:30<28:44:26, 13.97s/it] {'loss': 0.1756, 'learning_rate': 3.7045e-05, 'epoch': 3.4} 26%|██▌ | 2595/10000 [10:09:30<28:44:26, 13.97s/it] 26%|██▌ | 2596/10000 [10:09:44<28:44:28, 13.97s/it] {'loss': 0.1571, 'learning_rate': 3.7040000000000005e-05, 'epoch': 3.4} 26%|██▌ | 2596/10000 [10:09:44<28:44:28, 13.97s/it] 26%|██▌ | 2597/10000 [10:09:58<28:49:45, 14.02s/it] {'loss': 0.1504, 'learning_rate': 3.7035e-05, 'epoch': 3.4} 26%|██▌ | 2597/10000 [10:09:58<28:49:45, 14.02s/it] 26%|██▌ | 2598/10000 [10:10:12<28:46:47, 14.00s/it] {'loss': 0.1881, 'learning_rate': 3.703e-05, 'epoch': 3.4} 26%|██▌ | 2598/10000 [10:10:12<28:46:47, 14.00s/it] 26%|██▌ | 2599/10000 [10:10:26<28:46:54, 14.00s/it] {'loss': 0.1826, 'learning_rate': 3.7025000000000005e-05, 'epoch': 3.4} 26%|██▌ | 2599/10000 [10:10:26<28:46:54, 14.00s/it] 26%|██▌ | 2600/10000 [10:10:40<28:48:18, 14.01s/it] {'loss': 0.1749, 'learning_rate': 3.702e-05, 'epoch': 3.4} 26%|██▌ | 2600/10000 [10:10:40<28:48:18, 14.01s/it] 26%|██▌ | 2601/10000 [10:10:54<28:43:19, 13.97s/it] {'loss': 0.1832, 'learning_rate': 3.7015e-05, 'epoch': 3.4} 26%|██▌ | 2601/10000 [10:10:54<28:43:19, 13.97s/it] 26%|██▌ | 2602/10000 [10:11:08<28:45:54, 14.00s/it] {'loss': 0.1462, 'learning_rate': 3.701e-05, 'epoch': 3.41} 26%|██▌ | 2602/10000 [10:11:08<28:45:54, 14.00s/it] 26%|██▌ | 2603/10000 [10:11:22<28:44:10, 13.99s/it] {'loss': 0.1481, 'learning_rate': 3.7005e-05, 'epoch': 3.41} 26%|██▌ | 2603/10000 [10:11:22<28:44:10, 13.99s/it] 26%|██▌ | 2604/10000 [10:11:36<28:37:21, 13.93s/it] {'loss': 0.1265, 'learning_rate': 3.7e-05, 'epoch': 3.41} 26%|██▌ | 2604/10000 [10:11:36<28:37:21, 13.93s/it] 26%|██▌ | 2605/10000 [10:11:50<28:36:37, 13.93s/it] {'loss': 0.1819, 'learning_rate': 3.6995e-05, 'epoch': 3.41} 26%|██▌ | 2605/10000 [10:11:50<28:36:37, 13.93s/it] 26%|██▌ | 2606/10000 [10:12:04<28:41:18, 13.97s/it] {'loss': 0.1567, 'learning_rate': 3.699e-05, 'epoch': 3.41} 26%|██▌ | 2606/10000 [10:12:04<28:41:18, 13.97s/it] 26%|██▌ | 2607/10000 [10:12:18<28:40:47, 13.97s/it] {'loss': 0.1559, 'learning_rate': 3.6985000000000006e-05, 'epoch': 3.41} 26%|██▌ | 2607/10000 [10:12:18<28:40:47, 13.97s/it] 26%|██▌ | 2608/10000 [10:12:32<28:38:34, 13.95s/it] {'loss': 0.16, 'learning_rate': 3.698e-05, 'epoch': 3.41} 26%|██▌ | 2608/10000 [10:12:32<28:38:34, 13.95s/it] 26%|██▌ | 2609/10000 [10:12:46<28:39:10, 13.96s/it] {'loss': 0.1566, 'learning_rate': 3.6975000000000004e-05, 'epoch': 3.41} 26%|██▌ | 2609/10000 [10:12:46<28:39:10, 13.96s/it] 26%|██▌ | 2610/10000 [10:13:00<28:36:51, 13.94s/it] {'loss': 0.1733, 'learning_rate': 3.697e-05, 'epoch': 3.42} 26%|██▌ | 2610/10000 [10:13:00<28:36:51, 13.94s/it] 26%|██▌ | 2611/10000 [10:13:14<28:41:16, 13.98s/it] {'loss': 0.1434, 'learning_rate': 3.6965e-05, 'epoch': 3.42} 26%|██▌ | 2611/10000 [10:13:14<28:41:16, 13.98s/it] 26%|██▌ | 2612/10000 [10:13:28<28:40:31, 13.97s/it] {'loss': 0.1215, 'learning_rate': 3.696e-05, 'epoch': 3.42} 26%|██▌ | 2612/10000 [10:13:28<28:40:31, 13.97s/it] 26%|██▌ | 2613/10000 [10:13:42<28:36:53, 13.95s/it] {'loss': 0.1722, 'learning_rate': 3.6955e-05, 'epoch': 3.42} 26%|██▌ | 2613/10000 [10:13:42<28:36:53, 13.95s/it] 26%|██▌ | 2614/10000 [10:13:56<28:39:14, 13.97s/it] {'loss': 0.1987, 'learning_rate': 3.6950000000000004e-05, 'epoch': 3.42} 26%|██▌ | 2614/10000 [10:13:56<28:39:14, 13.97s/it] 26%|██▌ | 2615/10000 [10:14:10<28:38:36, 13.96s/it] {'loss': 0.1804, 'learning_rate': 3.6945e-05, 'epoch': 3.42} 26%|██▌ | 2615/10000 [10:14:10<28:38:36, 13.96s/it] 26%|██▌ | 2616/10000 [10:14:24<28:38:07, 13.96s/it] {'loss': 0.1468, 'learning_rate': 3.694e-05, 'epoch': 3.42} 26%|██▌ | 2616/10000 [10:14:24<28:38:07, 13.96s/it] 26%|██▌ | 2617/10000 [10:14:37<28:31:51, 13.91s/it] {'loss': 0.1565, 'learning_rate': 3.6935000000000005e-05, 'epoch': 3.43} 26%|██▌ | 2617/10000 [10:14:37<28:31:51, 13.91s/it] 26%|██▌ | 2618/10000 [10:14:51<28:30:39, 13.90s/it] {'loss': 0.1245, 'learning_rate': 3.693e-05, 'epoch': 3.43} 26%|██▌ | 2618/10000 [10:14:51<28:30:39, 13.90s/it] 26%|██▌ | 2619/10000 [10:15:05<28:27:59, 13.88s/it] {'loss': 0.1508, 'learning_rate': 3.6925e-05, 'epoch': 3.43} 26%|██▌ | 2619/10000 [10:15:05<28:27:59, 13.88s/it] 26%|██▌ | 2620/10000 [10:15:19<28:29:01, 13.89s/it] {'loss': 0.1286, 'learning_rate': 3.692e-05, 'epoch': 3.43} 26%|██▌ | 2620/10000 [10:15:19<28:29:01, 13.89s/it] 26%|██▌ | 2621/10000 [10:15:33<28:30:47, 13.91s/it] {'loss': 0.1744, 'learning_rate': 3.6915e-05, 'epoch': 3.43} 26%|██▌ | 2621/10000 [10:15:33<28:30:47, 13.91s/it] 26%|██▌ | 2622/10000 [10:15:47<28:33:35, 13.94s/it] {'loss': 0.1662, 'learning_rate': 3.691e-05, 'epoch': 3.43} 26%|██▌ | 2622/10000 [10:15:47<28:33:35, 13.94s/it] 26%|██▌ | 2623/10000 [10:16:01<28:31:25, 13.92s/it] {'loss': 0.1711, 'learning_rate': 3.6905e-05, 'epoch': 3.43} 26%|██▌ | 2623/10000 [10:16:01<28:31:25, 13.92s/it] 26%|██▌ | 2624/10000 [10:16:15<28:36:31, 13.96s/it] {'loss': 0.1519, 'learning_rate': 3.69e-05, 'epoch': 3.43} 26%|██▌ | 2624/10000 [10:16:15<28:36:31, 13.96s/it] 26%|██▋ | 2625/10000 [10:16:29<28:32:03, 13.93s/it] {'loss': 0.1229, 'learning_rate': 3.6895000000000005e-05, 'epoch': 3.44} 26%|██▋ | 2625/10000 [10:16:29<28:32:03, 13.93s/it] 26%|██▋ | 2626/10000 [10:16:43<28:31:29, 13.93s/it] {'loss': 0.1551, 'learning_rate': 3.689e-05, 'epoch': 3.44} 26%|██▋ | 2626/10000 [10:16:43<28:31:29, 13.93s/it] 26%|██▋ | 2627/10000 [10:16:57<28:33:15, 13.94s/it] {'loss': 0.1245, 'learning_rate': 3.6885000000000003e-05, 'epoch': 3.44} 26%|██▋ | 2627/10000 [10:16:57<28:33:15, 13.94s/it] 26%|██▋ | 2628/10000 [10:17:11<28:38:38, 13.99s/it] {'loss': 0.1661, 'learning_rate': 3.6880000000000006e-05, 'epoch': 3.44} 26%|██▋ | 2628/10000 [10:17:11<28:38:38, 13.99s/it] 26%|██▋ | 2629/10000 [10:17:25<28:40:23, 14.00s/it] {'loss': 0.137, 'learning_rate': 3.6875e-05, 'epoch': 3.44} 26%|██▋ | 2629/10000 [10:17:25<28:40:23, 14.00s/it] 26%|██▋ | 2630/10000 [10:17:39<28:35:27, 13.97s/it] {'loss': 0.1414, 'learning_rate': 3.6870000000000004e-05, 'epoch': 3.44} 26%|██▋ | 2630/10000 [10:17:39<28:35:27, 13.97s/it] 26%|██▋ | 2631/10000 [10:17:53<28:42:06, 14.02s/it] {'loss': 0.1596, 'learning_rate': 3.6865e-05, 'epoch': 3.44} 26%|██▋ | 2631/10000 [10:17:53<28:42:06, 14.02s/it] 26%|██▋ | 2632/10000 [10:18:07<28:35:43, 13.97s/it] {'loss': 0.1414, 'learning_rate': 3.686e-05, 'epoch': 3.45} 26%|██▋ | 2632/10000 [10:18:07<28:35:43, 13.97s/it] 26%|██▋ | 2633/10000 [10:18:21<28:34:37, 13.96s/it] {'loss': 0.1408, 'learning_rate': 3.6855e-05, 'epoch': 3.45} 26%|██▋ | 2633/10000 [10:18:21<28:34:37, 13.96s/it] 26%|██▋ | 2634/10000 [10:18:35<28:35:50, 13.98s/it] {'loss': 0.1744, 'learning_rate': 3.685e-05, 'epoch': 3.45} 26%|██▋ | 2634/10000 [10:18:35<28:35:50, 13.98s/it] 26%|██▋ | 2635/10000 [10:18:48<28:31:54, 13.95s/it] {'loss': 0.1367, 'learning_rate': 3.6845000000000004e-05, 'epoch': 3.45} 26%|██▋ | 2635/10000 [10:18:49<28:31:54, 13.95s/it] 26%|██▋ | 2636/10000 [10:19:02<28:26:35, 13.90s/it] {'loss': 0.1503, 'learning_rate': 3.684e-05, 'epoch': 3.45} 26%|██▋ | 2636/10000 [10:19:02<28:26:35, 13.90s/it] 26%|██▋ | 2637/10000 [10:19:16<28:30:07, 13.94s/it] {'loss': 0.2008, 'learning_rate': 3.6835e-05, 'epoch': 3.45} 26%|██▋ | 2637/10000 [10:19:16<28:30:07, 13.94s/it] 26%|██▋ | 2638/10000 [10:19:30<28:29:03, 13.93s/it] {'loss': 0.1714, 'learning_rate': 3.6830000000000005e-05, 'epoch': 3.45} 26%|██▋ | 2638/10000 [10:19:30<28:29:03, 13.93s/it] 26%|██▋ | 2639/10000 [10:19:44<28:29:05, 13.93s/it] {'loss': 0.1357, 'learning_rate': 3.6825e-05, 'epoch': 3.45} 26%|██▋ | 2639/10000 [10:19:44<28:29:05, 13.93s/it] 26%|██▋ | 2640/10000 [10:19:58<28:27:56, 13.92s/it] {'loss': 0.145, 'learning_rate': 3.682e-05, 'epoch': 3.46} 26%|██▋ | 2640/10000 [10:19:58<28:27:56, 13.92s/it] 26%|██▋ | 2641/10000 [10:20:12<28:25:11, 13.90s/it] {'loss': 0.1504, 'learning_rate': 3.6815e-05, 'epoch': 3.46} 26%|██▋ | 2641/10000 [10:20:12<28:25:11, 13.90s/it] 26%|██▋ | 2642/10000 [10:20:26<28:26:08, 13.91s/it] {'loss': 0.1455, 'learning_rate': 3.681e-05, 'epoch': 3.46} 26%|██▋ | 2642/10000 [10:20:26<28:26:08, 13.91s/it] 26%|██▋ | 2643/10000 [10:20:40<28:26:19, 13.92s/it] {'loss': 0.1922, 'learning_rate': 3.6805e-05, 'epoch': 3.46} 26%|██▋ | 2643/10000 [10:20:40<28:26:19, 13.92s/it] 26%|██▋ | 2644/10000 [10:20:54<28:24:20, 13.90s/it] {'loss': 0.1357, 'learning_rate': 3.68e-05, 'epoch': 3.46} 26%|██▋ | 2644/10000 [10:20:54<28:24:20, 13.90s/it] 26%|██▋ | 2645/10000 [10:21:08<28:27:34, 13.93s/it] {'loss': 0.1582, 'learning_rate': 3.6795e-05, 'epoch': 3.46} 26%|██▋ | 2645/10000 [10:21:08<28:27:34, 13.93s/it] 26%|██▋ | 2646/10000 [10:21:22<28:25:22, 13.91s/it] {'loss': 0.1341, 'learning_rate': 3.6790000000000005e-05, 'epoch': 3.46} 26%|██▋ | 2646/10000 [10:21:22<28:25:22, 13.91s/it] 26%|██▋ | 2647/10000 [10:21:35<28:25:12, 13.91s/it] {'loss': 0.1702, 'learning_rate': 3.6785e-05, 'epoch': 3.46} 26%|██▋ | 2647/10000 [10:21:35<28:25:12, 13.91s/it] 26%|██▋ | 2648/10000 [10:21:49<28:26:57, 13.93s/it] {'loss': 0.1274, 'learning_rate': 3.6780000000000004e-05, 'epoch': 3.47} 26%|██▋ | 2648/10000 [10:21:49<28:26:57, 13.93s/it] 26%|██▋ | 2649/10000 [10:22:03<28:26:21, 13.93s/it] {'loss': 0.147, 'learning_rate': 3.6775000000000006e-05, 'epoch': 3.47} 26%|██▋ | 2649/10000 [10:22:03<28:26:21, 13.93s/it] 26%|██▋ | 2650/10000 [10:22:17<28:30:54, 13.97s/it] {'loss': 0.1679, 'learning_rate': 3.677e-05, 'epoch': 3.47} 26%|██▋ | 2650/10000 [10:22:17<28:30:54, 13.97s/it] 27%|██▋ | 2651/10000 [10:22:31<28:30:02, 13.96s/it] {'loss': 0.1719, 'learning_rate': 3.6765e-05, 'epoch': 3.47} 27%|██▋ | 2651/10000 [10:22:31<28:30:02, 13.96s/it] 27%|██▋ | 2652/10000 [10:22:45<28:27:33, 13.94s/it] {'loss': 0.1922, 'learning_rate': 3.676e-05, 'epoch': 3.47} 27%|██▋ | 2652/10000 [10:22:45<28:27:33, 13.94s/it] 27%|██▋ | 2653/10000 [10:22:59<28:29:07, 13.96s/it] {'loss': 0.1356, 'learning_rate': 3.6755e-05, 'epoch': 3.47} 27%|██▋ | 2653/10000 [10:22:59<28:29:07, 13.96s/it] 27%|██▋ | 2654/10000 [10:23:13<28:30:35, 13.97s/it] {'loss': 0.1534, 'learning_rate': 3.675e-05, 'epoch': 3.47} 27%|██▋ | 2654/10000 [10:23:13<28:30:35, 13.97s/it] 27%|██▋ | 2655/10000 [10:23:27<28:24:12, 13.92s/it] {'loss': 0.1729, 'learning_rate': 3.6745e-05, 'epoch': 3.48} 27%|██▋ | 2655/10000 [10:23:27<28:24:12, 13.92s/it] 27%|██▋ | 2656/10000 [10:23:41<28:27:26, 13.95s/it] {'loss': 0.1561, 'learning_rate': 3.6740000000000004e-05, 'epoch': 3.48} 27%|██▋ | 2656/10000 [10:23:41<28:27:26, 13.95s/it] 27%|██▋ | 2657/10000 [10:23:55<28:21:11, 13.90s/it] {'loss': 0.1503, 'learning_rate': 3.6735e-05, 'epoch': 3.48} 27%|██▋ | 2657/10000 [10:23:55<28:21:11, 13.90s/it] 27%|██▋ | 2658/10000 [10:24:09<28:21:23, 13.90s/it] {'loss': 0.1587, 'learning_rate': 3.673e-05, 'epoch': 3.48} 27%|██▋ | 2658/10000 [10:24:09<28:21:23, 13.90s/it] 27%|██▋ | 2659/10000 [10:24:23<28:24:46, 13.93s/it] {'loss': 0.1376, 'learning_rate': 3.6725000000000005e-05, 'epoch': 3.48} 27%|██▋ | 2659/10000 [10:24:23<28:24:46, 13.93s/it] 27%|██▋ | 2660/10000 [10:24:37<28:23:26, 13.92s/it] {'loss': 0.113, 'learning_rate': 3.672000000000001e-05, 'epoch': 3.48} 27%|██▋ | 2660/10000 [10:24:37<28:23:26, 13.92s/it] 27%|██▋ | 2661/10000 [10:24:51<28:24:03, 13.93s/it] {'loss': 0.1244, 'learning_rate': 3.6714999999999997e-05, 'epoch': 3.48} 27%|██▋ | 2661/10000 [10:24:51<28:24:03, 13.93s/it] 27%|██▋ | 2662/10000 [10:25:05<28:28:29, 13.97s/it] {'loss': 0.1494, 'learning_rate': 3.671e-05, 'epoch': 3.48} 27%|██▋ | 2662/10000 [10:25:05<28:28:29, 13.97s/it] 27%|██▋ | 2663/10000 [10:25:19<28:33:25, 14.01s/it] {'loss': 0.1488, 'learning_rate': 3.6705e-05, 'epoch': 3.49} 27%|██▋ | 2663/10000 [10:25:19<28:33:25, 14.01s/it] 27%|██▋ | 2664/10000 [10:25:33<28:28:23, 13.97s/it] {'loss': 0.1562, 'learning_rate': 3.6700000000000004e-05, 'epoch': 3.49} 27%|██▋ | 2664/10000 [10:25:33<28:28:23, 13.97s/it] 27%|██▋ | 2665/10000 [10:25:46<28:23:51, 13.94s/it] {'loss': 0.155, 'learning_rate': 3.6695e-05, 'epoch': 3.49} 27%|██▋ | 2665/10000 [10:25:47<28:23:51, 13.94s/it] 27%|██▋ | 2666/10000 [10:26:01<28:26:53, 13.96s/it] {'loss': 0.1576, 'learning_rate': 3.669e-05, 'epoch': 3.49} 27%|██▋ | 2666/10000 [10:26:01<28:26:53, 13.96s/it] 27%|██▋ | 2667/10000 [10:26:14<28:23:16, 13.94s/it] {'loss': 0.1748, 'learning_rate': 3.6685000000000005e-05, 'epoch': 3.49} 27%|██▋ | 2667/10000 [10:26:14<28:23:16, 13.94s/it] 27%|██▋ | 2668/10000 [10:26:28<28:25:56, 13.96s/it] {'loss': 0.1692, 'learning_rate': 3.668e-05, 'epoch': 3.49} 27%|██▋ | 2668/10000 [10:26:28<28:25:56, 13.96s/it] 27%|██▋ | 2669/10000 [10:26:42<28:28:06, 13.98s/it] {'loss': 0.1512, 'learning_rate': 3.6675000000000004e-05, 'epoch': 3.49} 27%|██▋ | 2669/10000 [10:26:42<28:28:06, 13.98s/it] 27%|██▋ | 2670/10000 [10:26:56<28:27:49, 13.98s/it] {'loss': 0.1341, 'learning_rate': 3.6670000000000006e-05, 'epoch': 3.49} 27%|██▋ | 2670/10000 [10:26:56<28:27:49, 13.98s/it] 27%|██▋ | 2671/10000 [10:27:10<28:26:53, 13.97s/it] {'loss': 0.1294, 'learning_rate': 3.6665e-05, 'epoch': 3.5} 27%|██▋ | 2671/10000 [10:27:10<28:26:53, 13.97s/it] 27%|██▋ | 2672/10000 [10:27:24<28:22:54, 13.94s/it] {'loss': 0.1392, 'learning_rate': 3.666e-05, 'epoch': 3.5} 27%|██▋ | 2672/10000 [10:27:24<28:22:54, 13.94s/it] 27%|██▋ | 2673/10000 [10:27:38<28:21:21, 13.93s/it] {'loss': 0.1448, 'learning_rate': 3.6655e-05, 'epoch': 3.5} 27%|██▋ | 2673/10000 [10:27:38<28:21:21, 13.93s/it] 27%|██▋ | 2674/10000 [10:27:52<28:26:09, 13.97s/it] {'loss': 0.1517, 'learning_rate': 3.665e-05, 'epoch': 3.5} 27%|██▋ | 2674/10000 [10:27:52<28:26:09, 13.97s/it] 27%|██▋ | 2675/10000 [10:28:06<28:23:06, 13.95s/it] {'loss': 0.183, 'learning_rate': 3.6645e-05, 'epoch': 3.5} 27%|██▋ | 2675/10000 [10:28:06<28:23:06, 13.95s/it] 27%|██▋ | 2676/10000 [10:28:20<28:19:42, 13.92s/it] {'loss': 0.1574, 'learning_rate': 3.664e-05, 'epoch': 3.5} 27%|██▋ | 2676/10000 [10:28:20<28:19:42, 13.92s/it] 27%|██▋ | 2677/10000 [10:28:34<28:22:05, 13.95s/it] {'loss': 0.1415, 'learning_rate': 3.6635000000000004e-05, 'epoch': 3.5} 27%|██▋ | 2677/10000 [10:28:34<28:22:05, 13.95s/it] 27%|██▋ | 2678/10000 [10:28:48<28:22:48, 13.95s/it] {'loss': 0.1669, 'learning_rate': 3.663e-05, 'epoch': 3.51} 27%|██▋ | 2678/10000 [10:28:48<28:22:48, 13.95s/it] 27%|██▋ | 2679/10000 [10:29:02<28:29:32, 14.01s/it] {'loss': 0.1644, 'learning_rate': 3.6625e-05, 'epoch': 3.51} 27%|██▋ | 2679/10000 [10:29:02<28:29:32, 14.01s/it] 27%|██▋ | 2680/10000 [10:29:16<28:27:35, 14.00s/it] {'loss': 0.194, 'learning_rate': 3.6620000000000005e-05, 'epoch': 3.51} 27%|██▋ | 2680/10000 [10:29:16<28:27:35, 14.00s/it] 27%|██▋ | 2681/10000 [10:29:30<28:26:21, 13.99s/it] {'loss': 0.1651, 'learning_rate': 3.6615e-05, 'epoch': 3.51} 27%|██▋ | 2681/10000 [10:29:30<28:26:21, 13.99s/it] 27%|██▋ | 2682/10000 [10:29:44<28:25:19, 13.98s/it] {'loss': 0.1412, 'learning_rate': 3.661e-05, 'epoch': 3.51} 27%|██▋ | 2682/10000 [10:29:44<28:25:19, 13.98s/it] 27%|██▋ | 2683/10000 [10:29:58<28:23:25, 13.97s/it] {'loss': 0.1766, 'learning_rate': 3.6605e-05, 'epoch': 3.51} 27%|██▋ | 2683/10000 [10:29:58<28:23:25, 13.97s/it] 27%|██▋ | 2684/10000 [10:30:12<28:18:38, 13.93s/it] {'loss': 0.1516, 'learning_rate': 3.66e-05, 'epoch': 3.51} 27%|██▋ | 2684/10000 [10:30:12<28:18:38, 13.93s/it] 27%|██▋ | 2685/10000 [10:30:26<28:21:33, 13.96s/it] {'loss': 0.1508, 'learning_rate': 3.6595000000000005e-05, 'epoch': 3.51} 27%|██▋ | 2685/10000 [10:30:26<28:21:33, 13.96s/it] 27%|██▋ | 2686/10000 [10:30:40<28:22:34, 13.97s/it] {'loss': 0.1295, 'learning_rate': 3.659e-05, 'epoch': 3.52} 27%|██▋ | 2686/10000 [10:30:40<28:22:34, 13.97s/it] 27%|██▋ | 2687/10000 [10:30:54<28:23:15, 13.97s/it] {'loss': 0.1504, 'learning_rate': 3.6585e-05, 'epoch': 3.52} 27%|██▋ | 2687/10000 [10:30:54<28:23:15, 13.97s/it] 27%|██▋ | 2688/10000 [10:31:08<28:19:03, 13.94s/it] {'loss': 0.1629, 'learning_rate': 3.6580000000000006e-05, 'epoch': 3.52} 27%|██▋ | 2688/10000 [10:31:08<28:19:03, 13.94s/it] 27%|██▋ | 2689/10000 [10:31:22<28:19:27, 13.95s/it] {'loss': 0.1831, 'learning_rate': 3.6575e-05, 'epoch': 3.52} 27%|██▋ | 2689/10000 [10:31:22<28:19:27, 13.95s/it] 27%|██▋ | 2690/10000 [10:31:35<28:17:54, 13.94s/it] {'loss': 0.1694, 'learning_rate': 3.6570000000000004e-05, 'epoch': 3.52} 27%|██▋ | 2690/10000 [10:31:36<28:17:54, 13.94s/it] 27%|██▋ | 2691/10000 [10:31:49<28:18:40, 13.94s/it] {'loss': 0.1578, 'learning_rate': 3.6565e-05, 'epoch': 3.52} 27%|██▋ | 2691/10000 [10:31:50<28:18:40, 13.94s/it] 27%|██▋ | 2692/10000 [10:32:03<28:18:35, 13.95s/it] {'loss': 0.1351, 'learning_rate': 3.656e-05, 'epoch': 3.52} 27%|██▋ | 2692/10000 [10:32:03<28:18:35, 13.95s/it] 27%|██▋ | 2693/10000 [10:32:17<28:14:50, 13.92s/it] {'loss': 0.1378, 'learning_rate': 3.6555e-05, 'epoch': 3.52} 27%|██▋ | 2693/10000 [10:32:17<28:14:50, 13.92s/it] 27%|██▋ | 2694/10000 [10:32:31<28:14:22, 13.91s/it] {'loss': 0.1432, 'learning_rate': 3.655e-05, 'epoch': 3.53} 27%|██▋ | 2694/10000 [10:32:31<28:14:22, 13.91s/it] 27%|██▋ | 2695/10000 [10:32:45<28:19:36, 13.96s/it] {'loss': 0.1431, 'learning_rate': 3.6545e-05, 'epoch': 3.53} 27%|██▋ | 2695/10000 [10:32:45<28:19:36, 13.96s/it] 27%|██▋ | 2696/10000 [10:32:59<28:16:23, 13.94s/it] {'loss': 0.1689, 'learning_rate': 3.654e-05, 'epoch': 3.53} 27%|██▋ | 2696/10000 [10:32:59<28:16:23, 13.94s/it] 27%|██▋ | 2697/10000 [10:33:13<28:14:24, 13.92s/it] {'loss': 0.154, 'learning_rate': 3.6535e-05, 'epoch': 3.53} 27%|██▋ | 2697/10000 [10:33:13<28:14:24, 13.92s/it] 27%|██▋ | 2698/10000 [10:33:27<28:18:09, 13.95s/it] {'loss': 0.1639, 'learning_rate': 3.6530000000000004e-05, 'epoch': 3.53} 27%|██▋ | 2698/10000 [10:33:27<28:18:09, 13.95s/it] 27%|██▋ | 2699/10000 [10:33:41<28:20:55, 13.98s/it] {'loss': 0.1652, 'learning_rate': 3.652500000000001e-05, 'epoch': 3.53} 27%|██▋ | 2699/10000 [10:33:41<28:20:55, 13.98s/it] 27%|██▋ | 2700/10000 [10:33:55<28:20:18, 13.98s/it] {'loss': 0.1783, 'learning_rate': 3.652e-05, 'epoch': 3.53} 27%|██▋ | 2700/10000 [10:33:55<28:20:18, 13.98s/it] 27%|██▋ | 2701/10000 [10:34:09<28:19:49, 13.97s/it] {'loss': 0.1639, 'learning_rate': 3.6515e-05, 'epoch': 3.54} 27%|██▋ | 2701/10000 [10:34:09<28:19:49, 13.97s/it] 27%|██▋ | 2702/10000 [10:34:23<28:21:49, 13.99s/it] {'loss': 0.1711, 'learning_rate': 3.651e-05, 'epoch': 3.54} 27%|██▋ | 2702/10000 [10:34:23<28:21:49, 13.99s/it] 27%|██▋ | 2703/10000 [10:34:37<28:23:23, 14.01s/it] {'loss': 0.16, 'learning_rate': 3.6505e-05, 'epoch': 3.54} 27%|██▋ | 2703/10000 [10:34:37<28:23:23, 14.01s/it] 27%|██▋ | 2704/10000 [10:34:51<28:18:15, 13.97s/it] {'loss': 0.1556, 'learning_rate': 3.65e-05, 'epoch': 3.54} 27%|██▋ | 2704/10000 [10:34:51<28:18:15, 13.97s/it] 27%|██▋ | 2705/10000 [10:35:05<28:20:24, 13.99s/it] {'loss': 0.1434, 'learning_rate': 3.6495e-05, 'epoch': 3.54} 27%|██▋ | 2705/10000 [10:35:05<28:20:24, 13.99s/it] 27%|██▋ | 2706/10000 [10:35:19<28:20:04, 13.98s/it] {'loss': 0.1437, 'learning_rate': 3.6490000000000005e-05, 'epoch': 3.54} 27%|██▋ | 2706/10000 [10:35:19<28:20:04, 13.98s/it] 27%|██▋ | 2707/10000 [10:35:33<28:20:59, 13.99s/it] {'loss': 0.1764, 'learning_rate': 3.6485e-05, 'epoch': 3.54} 27%|██▋ | 2707/10000 [10:35:33<28:20:59, 13.99s/it] 27%|██▋ | 2708/10000 [10:35:47<28:21:30, 14.00s/it] {'loss': 0.1677, 'learning_rate': 3.648e-05, 'epoch': 3.54} 27%|██▋ | 2708/10000 [10:35:47<28:21:30, 14.00s/it] 27%|██▋ | 2709/10000 [10:36:01<28:20:22, 13.99s/it] {'loss': 0.1737, 'learning_rate': 3.6475000000000006e-05, 'epoch': 3.55} 27%|██▋ | 2709/10000 [10:36:01<28:20:22, 13.99s/it] 27%|██▋ | 2710/10000 [10:36:15<28:19:05, 13.98s/it] {'loss': 0.1666, 'learning_rate': 3.647e-05, 'epoch': 3.55} 27%|██▋ | 2710/10000 [10:36:15<28:19:05, 13.98s/it] 27%|██▋ | 2711/10000 [10:36:29<28:12:26, 13.93s/it] {'loss': 0.1523, 'learning_rate': 3.6465e-05, 'epoch': 3.55} 27%|██▋ | 2711/10000 [10:36:29<28:12:26, 13.93s/it] 27%|██▋ | 2712/10000 [10:36:43<28:17:30, 13.98s/it] {'loss': 0.1594, 'learning_rate': 3.646e-05, 'epoch': 3.55} 27%|██▋ | 2712/10000 [10:36:43<28:17:30, 13.98s/it] 27%|██▋ | 2713/10000 [10:36:57<28:21:38, 14.01s/it] {'loss': 0.1692, 'learning_rate': 3.6455e-05, 'epoch': 3.55} 27%|██▋ | 2713/10000 [10:36:57<28:21:38, 14.01s/it] 27%|██▋ | 2714/10000 [10:37:11<28:18:40, 13.99s/it] {'loss': 0.1672, 'learning_rate': 3.645e-05, 'epoch': 3.55} 27%|██▋ | 2714/10000 [10:37:11<28:18:40, 13.99s/it] 27%|██▋ | 2715/10000 [10:37:25<28:16:41, 13.97s/it] {'loss': 0.1548, 'learning_rate': 3.6445e-05, 'epoch': 3.55} 27%|██▋ | 2715/10000 [10:37:25<28:16:41, 13.97s/it] 27%|██▋ | 2716/10000 [10:37:39<28:18:45, 13.99s/it] {'loss': 0.1773, 'learning_rate': 3.6440000000000003e-05, 'epoch': 3.55} 27%|██▋ | 2716/10000 [10:37:39<28:18:45, 13.99s/it] 27%|██▋ | 2717/10000 [10:37:53<28:12:35, 13.94s/it] {'loss': 0.1423, 'learning_rate': 3.6435e-05, 'epoch': 3.56} 27%|██▋ | 2717/10000 [10:37:53<28:12:35, 13.94s/it] 27%|██▋ | 2718/10000 [10:38:07<28:12:30, 13.95s/it] {'loss': 0.1358, 'learning_rate': 3.643e-05, 'epoch': 3.56} 27%|██▋ | 2718/10000 [10:38:07<28:12:30, 13.95s/it] 27%|██▋ | 2719/10000 [10:38:21<28:15:33, 13.97s/it] {'loss': 0.1536, 'learning_rate': 3.6425000000000004e-05, 'epoch': 3.56} 27%|██▋ | 2719/10000 [10:38:21<28:15:33, 13.97s/it] 27%|██▋ | 2720/10000 [10:38:35<28:17:56, 13.99s/it] {'loss': 0.173, 'learning_rate': 3.642000000000001e-05, 'epoch': 3.56} 27%|██▋ | 2720/10000 [10:38:35<28:17:56, 13.99s/it] 27%|██▋ | 2721/10000 [10:38:48<28:10:27, 13.93s/it] {'loss': 0.1279, 'learning_rate': 3.6414999999999996e-05, 'epoch': 3.56} 27%|██▋ | 2721/10000 [10:38:49<28:10:27, 13.93s/it] 27%|██▋ | 2722/10000 [10:39:02<28:05:14, 13.89s/it] {'loss': 0.1263, 'learning_rate': 3.641e-05, 'epoch': 3.56} 27%|██▋ | 2722/10000 [10:39:02<28:05:14, 13.89s/it] 27%|██▋ | 2723/10000 [10:39:16<28:05:23, 13.90s/it] {'loss': 0.1531, 'learning_rate': 3.6405e-05, 'epoch': 3.56} 27%|██▋ | 2723/10000 [10:39:16<28:05:23, 13.90s/it] 27%|██▋ | 2724/10000 [10:39:30<28:09:35, 13.93s/it] {'loss': 0.1518, 'learning_rate': 3.6400000000000004e-05, 'epoch': 3.57} 27%|██▋ | 2724/10000 [10:39:30<28:09:35, 13.93s/it] 27%|██▋ | 2725/10000 [10:39:44<28:13:11, 13.96s/it] {'loss': 0.1705, 'learning_rate': 3.6395e-05, 'epoch': 3.57} 27%|██▋ | 2725/10000 [10:39:44<28:13:11, 13.96s/it] 27%|██▋ | 2726/10000 [10:39:58<28:14:14, 13.98s/it] {'loss': 0.1383, 'learning_rate': 3.639e-05, 'epoch': 3.57} 27%|██▋ | 2726/10000 [10:39:58<28:14:14, 13.98s/it] 27%|██▋ | 2727/10000 [10:40:12<28:13:45, 13.97s/it] {'loss': 0.1673, 'learning_rate': 3.6385000000000005e-05, 'epoch': 3.57} 27%|██▋ | 2727/10000 [10:40:12<28:13:45, 13.97s/it] 27%|██▋ | 2728/10000 [10:40:26<28:08:42, 13.93s/it] {'loss': 0.1588, 'learning_rate': 3.638e-05, 'epoch': 3.57} 27%|██▋ | 2728/10000 [10:40:26<28:08:42, 13.93s/it] 27%|██▋ | 2729/10000 [10:40:40<28:10:26, 13.95s/it] {'loss': 0.1721, 'learning_rate': 3.6375e-05, 'epoch': 3.57} 27%|██▋ | 2729/10000 [10:40:40<28:10:26, 13.95s/it] 27%|██▋ | 2730/10000 [10:40:54<28:10:27, 13.95s/it] {'loss': 0.1505, 'learning_rate': 3.6370000000000006e-05, 'epoch': 3.57} 27%|██▋ | 2730/10000 [10:40:54<28:10:27, 13.95s/it] 27%|██▋ | 2731/10000 [10:41:08<28:14:44, 13.99s/it] {'loss': 0.1809, 'learning_rate': 3.6365e-05, 'epoch': 3.57} 27%|██▋ | 2731/10000 [10:41:08<28:14:44, 13.99s/it] 27%|██▋ | 2732/10000 [10:41:22<28:18:33, 14.02s/it] {'loss': 0.1494, 'learning_rate': 3.636e-05, 'epoch': 3.58} 27%|██▋ | 2732/10000 [10:41:22<28:18:33, 14.02s/it] 27%|██▋ | 2733/10000 [10:41:36<28:13:35, 13.98s/it] {'loss': 0.1536, 'learning_rate': 3.6355e-05, 'epoch': 3.58} 27%|██▋ | 2733/10000 [10:41:36<28:13:35, 13.98s/it] 27%|██▋ | 2734/10000 [10:41:50<28:09:21, 13.95s/it] {'loss': 0.1632, 'learning_rate': 3.635e-05, 'epoch': 3.58} 27%|██▋ | 2734/10000 [10:41:50<28:09:21, 13.95s/it] 27%|██▋ | 2735/10000 [10:42:04<28:10:09, 13.96s/it] {'loss': 0.1899, 'learning_rate': 3.6345e-05, 'epoch': 3.58} 27%|██▋ | 2735/10000 [10:42:04<28:10:09, 13.96s/it] 27%|██▋ | 2736/10000 [10:42:18<28:16:24, 14.01s/it] {'loss': 0.1528, 'learning_rate': 3.634e-05, 'epoch': 3.58} 27%|██▋ | 2736/10000 [10:42:18<28:16:24, 14.01s/it] 27%|██▋ | 2737/10000 [10:42:32<28:13:16, 13.99s/it] {'loss': 0.1637, 'learning_rate': 3.6335000000000004e-05, 'epoch': 3.58} 27%|██▋ | 2737/10000 [10:42:32<28:13:16, 13.99s/it] 27%|██▋ | 2738/10000 [10:42:46<28:11:24, 13.97s/it] {'loss': 0.1817, 'learning_rate': 3.6330000000000006e-05, 'epoch': 3.58} 27%|██▋ | 2738/10000 [10:42:46<28:11:24, 13.97s/it] 27%|██▋ | 2739/10000 [10:43:00<28:09:53, 13.96s/it] {'loss': 0.1984, 'learning_rate': 3.6325e-05, 'epoch': 3.59} 27%|██▋ | 2739/10000 [10:43:00<28:09:53, 13.96s/it] 27%|██▋ | 2740/10000 [10:43:14<28:07:46, 13.95s/it] {'loss': 0.1697, 'learning_rate': 3.6320000000000005e-05, 'epoch': 3.59} 27%|██▋ | 2740/10000 [10:43:14<28:07:46, 13.95s/it] 27%|██▋ | 2741/10000 [10:43:28<28:06:04, 13.94s/it] {'loss': 0.1661, 'learning_rate': 3.6315e-05, 'epoch': 3.59} 27%|██▋ | 2741/10000 [10:43:28<28:06:04, 13.94s/it] 27%|██▋ | 2742/10000 [10:43:42<28:03:37, 13.92s/it] {'loss': 0.151, 'learning_rate': 3.6309999999999996e-05, 'epoch': 3.59} 27%|██▋ | 2742/10000 [10:43:42<28:03:37, 13.92s/it] 27%|██▋ | 2743/10000 [10:43:56<28:05:14, 13.93s/it] {'loss': 0.1525, 'learning_rate': 3.6305e-05, 'epoch': 3.59} 27%|██▋ | 2743/10000 [10:43:56<28:05:14, 13.93s/it] 27%|██▋ | 2744/10000 [10:44:09<28:02:19, 13.91s/it] {'loss': 0.1356, 'learning_rate': 3.63e-05, 'epoch': 3.59} 27%|██▋ | 2744/10000 [10:44:09<28:02:19, 13.91s/it] 27%|██▋ | 2745/10000 [10:44:23<28:04:44, 13.93s/it] {'loss': 0.1725, 'learning_rate': 3.6295000000000004e-05, 'epoch': 3.59} 27%|██▋ | 2745/10000 [10:44:23<28:04:44, 13.93s/it] 27%|██▋ | 2746/10000 [10:44:37<28:02:43, 13.92s/it] {'loss': 0.1541, 'learning_rate': 3.629e-05, 'epoch': 3.59} 27%|██▋ | 2746/10000 [10:44:37<28:02:43, 13.92s/it] 27%|██▋ | 2747/10000 [10:44:51<28:00:22, 13.90s/it] {'loss': 0.1395, 'learning_rate': 3.6285e-05, 'epoch': 3.6} 27%|██▋ | 2747/10000 [10:44:51<28:00:22, 13.90s/it] 27%|██▋ | 2748/10000 [10:45:05<28:01:34, 13.91s/it] {'loss': 0.1696, 'learning_rate': 3.6280000000000005e-05, 'epoch': 3.6} 27%|██▋ | 2748/10000 [10:45:05<28:01:34, 13.91s/it] 27%|██▋ | 2749/10000 [10:45:19<28:06:58, 13.96s/it] {'loss': 0.1731, 'learning_rate': 3.6275e-05, 'epoch': 3.6} 27%|██▋ | 2749/10000 [10:45:19<28:06:58, 13.96s/it] 28%|██▊ | 2750/10000 [10:45:33<28:02:50, 13.93s/it] {'loss': 0.1759, 'learning_rate': 3.6270000000000003e-05, 'epoch': 3.6} 28%|██▊ | 2750/10000 [10:45:33<28:02:50, 13.93s/it] 28%|██▊ | 2751/10000 [10:45:47<28:03:34, 13.93s/it] {'loss': 0.1426, 'learning_rate': 3.6265e-05, 'epoch': 3.6} 28%|██▊ | 2751/10000 [10:45:47<28:03:34, 13.93s/it] 28%|██▊ | 2752/10000 [10:46:01<28:03:40, 13.94s/it] {'loss': 0.1652, 'learning_rate': 3.626e-05, 'epoch': 3.6} 28%|██▊ | 2752/10000 [10:46:01<28:03:40, 13.94s/it] 28%|██▊ | 2753/10000 [10:46:15<28:00:26, 13.91s/it] {'loss': 0.1455, 'learning_rate': 3.6255e-05, 'epoch': 3.6} 28%|██▊ | 2753/10000 [10:46:15<28:00:26, 13.91s/it] 28%|██▊ | 2754/10000 [10:46:29<28:02:48, 13.93s/it] {'loss': 0.1495, 'learning_rate': 3.625e-05, 'epoch': 3.6} 28%|██▊ | 2754/10000 [10:46:29<28:02:48, 13.93s/it] 28%|██▊ | 2755/10000 [10:46:43<28:08:59, 13.99s/it] {'loss': 0.1568, 'learning_rate': 3.6245e-05, 'epoch': 3.61} 28%|██▊ | 2755/10000 [10:46:43<28:08:59, 13.99s/it] 28%|██▊ | 2756/10000 [10:46:57<28:08:56, 13.99s/it] {'loss': 0.1499, 'learning_rate': 3.624e-05, 'epoch': 3.61} 28%|██▊ | 2756/10000 [10:46:57<28:08:56, 13.99s/it] 28%|██▊ | 2757/10000 [10:47:11<28:08:37, 13.99s/it] {'loss': 0.1458, 'learning_rate': 3.6235e-05, 'epoch': 3.61} 28%|██▊ | 2757/10000 [10:47:11<28:08:37, 13.99s/it] 28%|██▊ | 2758/10000 [10:47:25<28:08:57, 13.99s/it] {'loss': 0.182, 'learning_rate': 3.6230000000000004e-05, 'epoch': 3.61} 28%|██▊ | 2758/10000 [10:47:25<28:08:57, 13.99s/it] 28%|██▊ | 2759/10000 [10:47:39<28:13:48, 14.04s/it] {'loss': 0.1533, 'learning_rate': 3.6225000000000006e-05, 'epoch': 3.61} 28%|██▊ | 2759/10000 [10:47:39<28:13:48, 14.04s/it] 28%|██▊ | 2760/10000 [10:47:53<28:09:07, 14.00s/it] {'loss': 0.1412, 'learning_rate': 3.622e-05, 'epoch': 3.61} 28%|██▊ | 2760/10000 [10:47:53<28:09:07, 14.00s/it] 28%|██▊ | 2761/10000 [10:48:07<28:06:49, 13.98s/it] {'loss': 0.1684, 'learning_rate': 3.6215000000000005e-05, 'epoch': 3.61} 28%|██▊ | 2761/10000 [10:48:07<28:06:49, 13.98s/it] 28%|██▊ | 2762/10000 [10:48:21<28:03:47, 13.96s/it] {'loss': 0.1298, 'learning_rate': 3.621e-05, 'epoch': 3.62} 28%|██▊ | 2762/10000 [10:48:21<28:03:47, 13.96s/it] 28%|██▊ | 2763/10000 [10:48:35<28:05:17, 13.97s/it] {'loss': 0.1909, 'learning_rate': 3.6205e-05, 'epoch': 3.62} 28%|██▊ | 2763/10000 [10:48:35<28:05:17, 13.97s/it] 28%|██▊ | 2764/10000 [10:48:49<28:03:17, 13.96s/it] {'loss': 0.1893, 'learning_rate': 3.62e-05, 'epoch': 3.62} 28%|██▊ | 2764/10000 [10:48:49<28:03:17, 13.96s/it] 28%|██▊ | 2765/10000 [10:49:02<27:56:15, 13.90s/it] {'loss': 0.155, 'learning_rate': 3.6195e-05, 'epoch': 3.62} 28%|██▊ | 2765/10000 [10:49:02<27:56:15, 13.90s/it] 28%|██▊ | 2766/10000 [10:49:16<27:50:34, 13.86s/it] {'loss': 0.1579, 'learning_rate': 3.6190000000000004e-05, 'epoch': 3.62} 28%|██▊ | 2766/10000 [10:49:16<27:50:34, 13.86s/it] 28%|██▊ | 2767/10000 [10:49:30<27:52:08, 13.87s/it] {'loss': 0.1725, 'learning_rate': 3.6185e-05, 'epoch': 3.62} 28%|██▊ | 2767/10000 [10:49:30<27:52:08, 13.87s/it] 28%|██▊ | 2768/10000 [10:49:44<27:53:05, 13.88s/it] {'loss': 0.1741, 'learning_rate': 3.618e-05, 'epoch': 3.62} 28%|██▊ | 2768/10000 [10:49:44<27:53:05, 13.88s/it] 28%|██▊ | 2769/10000 [10:49:58<27:51:49, 13.87s/it] {'loss': 0.1675, 'learning_rate': 3.6175000000000005e-05, 'epoch': 3.62} 28%|██▊ | 2769/10000 [10:49:58<27:51:49, 13.87s/it] 28%|██▊ | 2770/10000 [10:50:12<27:53:38, 13.89s/it] {'loss': 0.1573, 'learning_rate': 3.617e-05, 'epoch': 3.63} 28%|██▊ | 2770/10000 [10:50:12<27:53:38, 13.89s/it] 28%|██▊ | 2771/10000 [10:50:26<27:52:00, 13.88s/it] {'loss': 0.1391, 'learning_rate': 3.6165000000000004e-05, 'epoch': 3.63} 28%|██▊ | 2771/10000 [10:50:26<27:52:00, 13.88s/it] 28%|██▊ | 2772/10000 [10:50:40<27:58:38, 13.93s/it] {'loss': 0.1918, 'learning_rate': 3.616e-05, 'epoch': 3.63} 28%|██▊ | 2772/10000 [10:50:40<27:58:38, 13.93s/it] 28%|██▊ | 2773/10000 [10:50:54<27:57:21, 13.93s/it] {'loss': 0.1613, 'learning_rate': 3.6155e-05, 'epoch': 3.63} 28%|██▊ | 2773/10000 [10:50:54<27:57:21, 13.93s/it] 28%|██▊ | 2774/10000 [10:51:07<27:55:16, 13.91s/it] {'loss': 0.1531, 'learning_rate': 3.615e-05, 'epoch': 3.63} 28%|██▊ | 2774/10000 [10:51:07<27:55:16, 13.91s/it] 28%|██▊ | 2775/10000 [10:51:21<27:53:50, 13.90s/it] {'loss': 0.1809, 'learning_rate': 3.6145e-05, 'epoch': 3.63} 28%|██▊ | 2775/10000 [10:51:21<27:53:50, 13.90s/it] 28%|██▊ | 2776/10000 [10:51:35<27:57:53, 13.94s/it] {'loss': 0.1678, 'learning_rate': 3.614e-05, 'epoch': 3.63} 28%|██▊ | 2776/10000 [10:51:35<27:57:53, 13.94s/it] 28%|██▊ | 2777/10000 [10:51:49<27:59:34, 13.95s/it] {'loss': 0.1515, 'learning_rate': 3.6135000000000006e-05, 'epoch': 3.63} 28%|██▊ | 2777/10000 [10:51:49<27:59:34, 13.95s/it] 28%|██▊ | 2778/10000 [10:52:03<27:58:07, 13.94s/it] {'loss': 0.157, 'learning_rate': 3.613e-05, 'epoch': 3.64} 28%|██▊ | 2778/10000 [10:52:03<27:58:07, 13.94s/it] 28%|██▊ | 2779/10000 [10:52:17<27:55:44, 13.92s/it] {'loss': 0.1513, 'learning_rate': 3.6125000000000004e-05, 'epoch': 3.64} 28%|██▊ | 2779/10000 [10:52:17<27:55:44, 13.92s/it] 28%|██▊ | 2780/10000 [10:52:31<27:55:36, 13.92s/it] {'loss': 0.2127, 'learning_rate': 3.6120000000000007e-05, 'epoch': 3.64} 28%|██▊ | 2780/10000 [10:52:31<27:55:36, 13.92s/it] 28%|██▊ | 2781/10000 [10:52:45<27:55:12, 13.92s/it] {'loss': 0.1424, 'learning_rate': 3.6115e-05, 'epoch': 3.64} 28%|██▊ | 2781/10000 [10:52:45<27:55:12, 13.92s/it] 28%|██▊ | 2782/10000 [10:52:59<27:57:09, 13.94s/it] {'loss': 0.1602, 'learning_rate': 3.611e-05, 'epoch': 3.64} 28%|██▊ | 2782/10000 [10:52:59<27:57:09, 13.94s/it] 28%|██▊ | 2783/10000 [10:53:13<27:54:29, 13.92s/it] {'loss': 0.1779, 'learning_rate': 3.6105e-05, 'epoch': 3.64} 28%|██▊ | 2783/10000 [10:53:13<27:54:29, 13.92s/it] 28%|██▊ | 2784/10000 [10:53:27<27:55:35, 13.93s/it] {'loss': 0.1549, 'learning_rate': 3.61e-05, 'epoch': 3.64} 28%|██▊ | 2784/10000 [10:53:27<27:55:35, 13.93s/it] 28%|██▊ | 2785/10000 [10:53:41<27:53:10, 13.91s/it] {'loss': 0.1627, 'learning_rate': 3.6095e-05, 'epoch': 3.65} 28%|██▊ | 2785/10000 [10:53:41<27:53:10, 13.91s/it] 28%|██▊ | 2786/10000 [10:53:55<27:56:02, 13.94s/it] {'loss': 0.181, 'learning_rate': 3.609e-05, 'epoch': 3.65} 28%|██▊ | 2786/10000 [10:53:55<27:56:02, 13.94s/it] 28%|██▊ | 2787/10000 [10:54:09<27:55:14, 13.94s/it] {'loss': 0.1744, 'learning_rate': 3.6085000000000004e-05, 'epoch': 3.65} 28%|██▊ | 2787/10000 [10:54:09<27:55:14, 13.94s/it] 28%|██▊ | 2788/10000 [10:54:23<27:56:54, 13.95s/it] {'loss': 0.1652, 'learning_rate': 3.608e-05, 'epoch': 3.65} 28%|██▊ | 2788/10000 [10:54:23<27:56:54, 13.95s/it] 28%|██▊ | 2789/10000 [10:54:37<27:57:33, 13.96s/it] {'loss': 0.1758, 'learning_rate': 3.6075e-05, 'epoch': 3.65} 28%|██▊ | 2789/10000 [10:54:37<27:57:33, 13.96s/it] 28%|██▊ | 2790/10000 [10:54:50<27:56:06, 13.95s/it] {'loss': 0.1647, 'learning_rate': 3.6070000000000005e-05, 'epoch': 3.65} 28%|██▊ | 2790/10000 [10:54:51<27:56:06, 13.95s/it] 28%|██▊ | 2791/10000 [10:55:04<27:50:58, 13.91s/it] {'loss': 0.1297, 'learning_rate': 3.6065e-05, 'epoch': 3.65} 28%|██▊ | 2791/10000 [10:55:04<27:50:58, 13.91s/it] 28%|██▊ | 2792/10000 [10:55:18<27:51:49, 13.92s/it] {'loss': 0.1536, 'learning_rate': 3.606e-05, 'epoch': 3.65} 28%|██▊ | 2792/10000 [10:55:18<27:51:49, 13.92s/it] 28%|██▊ | 2793/10000 [10:55:32<27:49:56, 13.90s/it] {'loss': 0.1872, 'learning_rate': 3.6055e-05, 'epoch': 3.66} 28%|██▊ | 2793/10000 [10:55:32<27:49:56, 13.90s/it] 28%|██▊ | 2794/10000 [10:55:46<27:55:22, 13.95s/it] {'loss': 0.1623, 'learning_rate': 3.605e-05, 'epoch': 3.66} 28%|██▊ | 2794/10000 [10:55:46<27:55:22, 13.95s/it] 28%|██▊ | 2795/10000 [10:56:00<27:50:54, 13.91s/it] {'loss': 0.1667, 'learning_rate': 3.6045e-05, 'epoch': 3.66} 28%|██▊ | 2795/10000 [10:56:00<27:50:54, 13.91s/it] 28%|██▊ | 2796/10000 [10:56:14<27:53:35, 13.94s/it] {'loss': 0.1716, 'learning_rate': 3.604e-05, 'epoch': 3.66} 28%|██▊ | 2796/10000 [10:56:14<27:53:35, 13.94s/it] 28%|██▊ | 2797/10000 [10:56:28<27:48:56, 13.90s/it] {'loss': 0.1479, 'learning_rate': 3.6035e-05, 'epoch': 3.66} 28%|██▊ | 2797/10000 [10:56:28<27:48:56, 13.90s/it] 28%|██▊ | 2798/10000 [10:56:42<27:50:41, 13.92s/it] {'loss': 0.1522, 'learning_rate': 3.6030000000000006e-05, 'epoch': 3.66} 28%|██▊ | 2798/10000 [10:56:42<27:50:41, 13.92s/it] 28%|██▊ | 2799/10000 [10:56:56<27:51:30, 13.93s/it] {'loss': 0.2088, 'learning_rate': 3.6025e-05, 'epoch': 3.66} 28%|██▊ | 2799/10000 [10:56:56<27:51:30, 13.93s/it] 28%|██▊ | 2800/10000 [10:57:10<27:56:47, 13.97s/it] {'loss': 0.1484, 'learning_rate': 3.6020000000000004e-05, 'epoch': 3.66} 28%|██▊ | 2800/10000 [10:57:10<27:56:47, 13.97s/it] 28%|██▊ | 2801/10000 [10:57:24<27:56:53, 13.98s/it] {'loss': 0.159, 'learning_rate': 3.601500000000001e-05, 'epoch': 3.67} 28%|██▊ | 2801/10000 [10:57:24<27:56:53, 13.98s/it] 28%|██▊ | 2802/10000 [10:57:38<28:02:14, 14.02s/it] {'loss': 0.1157, 'learning_rate': 3.601e-05, 'epoch': 3.67} 28%|██▊ | 2802/10000 [10:57:38<28:02:14, 14.02s/it] 28%|██▊ | 2803/10000 [10:57:52<28:01:35, 14.02s/it] {'loss': 0.1801, 'learning_rate': 3.6005e-05, 'epoch': 3.67} 28%|██▊ | 2803/10000 [10:57:52<28:01:35, 14.02s/it] 28%|██▊ | 2804/10000 [10:58:06<27:54:15, 13.96s/it] {'loss': 0.1871, 'learning_rate': 3.6e-05, 'epoch': 3.67} 28%|██▊ | 2804/10000 [10:58:06<27:54:15, 13.96s/it] 28%|██▊ | 2805/10000 [10:58:20<27:51:17, 13.94s/it] {'loss': 0.1419, 'learning_rate': 3.5995000000000004e-05, 'epoch': 3.67} 28%|██▊ | 2805/10000 [10:58:20<27:51:17, 13.94s/it] 28%|██▊ | 2806/10000 [10:58:33<27:48:27, 13.92s/it] {'loss': 0.1341, 'learning_rate': 3.599e-05, 'epoch': 3.67} 28%|██▊ | 2806/10000 [10:58:34<27:48:27, 13.92s/it] 28%|██▊ | 2807/10000 [10:58:47<27:49:24, 13.93s/it] {'loss': 0.1656, 'learning_rate': 3.5985e-05, 'epoch': 3.67} 28%|██▊ | 2807/10000 [10:58:47<27:49:24, 13.93s/it] 28%|██▊ | 2808/10000 [10:59:01<27:48:50, 13.92s/it] {'loss': 0.1722, 'learning_rate': 3.5980000000000004e-05, 'epoch': 3.68} 28%|██▊ | 2808/10000 [10:59:01<27:48:50, 13.92s/it] 28%|██▊ | 2809/10000 [10:59:15<27:56:47, 13.99s/it] {'loss': 0.1548, 'learning_rate': 3.5975e-05, 'epoch': 3.68} 28%|██▊ | 2809/10000 [10:59:16<27:56:47, 13.99s/it] 28%|██▊ | 2810/10000 [10:59:29<27:50:39, 13.94s/it] {'loss': 0.1638, 'learning_rate': 3.597e-05, 'epoch': 3.68} 28%|██▊ | 2810/10000 [10:59:29<27:50:39, 13.94s/it] 28%|██▊ | 2811/10000 [10:59:43<27:55:37, 13.98s/it] {'loss': 0.1704, 'learning_rate': 3.5965000000000005e-05, 'epoch': 3.68} 28%|██▊ | 2811/10000 [10:59:43<27:55:37, 13.98s/it] 28%|██▊ | 2812/10000 [10:59:57<27:51:54, 13.96s/it] {'loss': 0.1538, 'learning_rate': 3.596e-05, 'epoch': 3.68} 28%|██▊ | 2812/10000 [10:59:57<27:51:54, 13.96s/it] 28%|██▊ | 2813/10000 [11:00:11<27:51:36, 13.96s/it] {'loss': 0.1431, 'learning_rate': 3.5955e-05, 'epoch': 3.68} 28%|██▊ | 2813/10000 [11:00:11<27:51:36, 13.96s/it] 28%|██▊ | 2814/10000 [11:00:25<27:47:53, 13.93s/it] {'loss': 0.1435, 'learning_rate': 3.595e-05, 'epoch': 3.68} 28%|██▊ | 2814/10000 [11:00:25<27:47:53, 13.93s/it] 28%|██▊ | 2815/10000 [11:00:39<27:55:28, 13.99s/it] {'loss': 0.1781, 'learning_rate': 3.5945e-05, 'epoch': 3.68} 28%|██▊ | 2815/10000 [11:00:39<27:55:28, 13.99s/it] 28%|██▊ | 2816/10000 [11:00:53<27:53:58, 13.98s/it] {'loss': 0.1915, 'learning_rate': 3.594e-05, 'epoch': 3.69} 28%|██▊ | 2816/10000 [11:00:53<27:53:58, 13.98s/it] 28%|██▊ | 2817/10000 [11:01:07<27:57:26, 14.01s/it] {'loss': 0.1633, 'learning_rate': 3.5935e-05, 'epoch': 3.69} 28%|██▊ | 2817/10000 [11:01:07<27:57:26, 14.01s/it] 28%|██▊ | 2818/10000 [11:01:21<27:50:28, 13.96s/it] {'loss': 0.2085, 'learning_rate': 3.593e-05, 'epoch': 3.69} 28%|██▊ | 2818/10000 [11:01:21<27:50:28, 13.96s/it] 28%|██▊ | 2819/10000 [11:01:35<27:47:57, 13.94s/it] {'loss': 0.1368, 'learning_rate': 3.5925000000000006e-05, 'epoch': 3.69} 28%|██▊ | 2819/10000 [11:01:35<27:47:57, 13.94s/it] 28%|██▊ | 2820/10000 [11:01:49<27:44:41, 13.91s/it] {'loss': 0.1281, 'learning_rate': 3.592e-05, 'epoch': 3.69} 28%|██▊ | 2820/10000 [11:01:49<27:44:41, 13.91s/it] 28%|██▊ | 2821/10000 [11:02:03<27:42:37, 13.90s/it] {'loss': 0.1705, 'learning_rate': 3.5915000000000004e-05, 'epoch': 3.69} 28%|██▊ | 2821/10000 [11:02:03<27:42:37, 13.90s/it] 28%|██▊ | 2822/10000 [11:02:17<27:40:09, 13.88s/it] {'loss': 0.1673, 'learning_rate': 3.591e-05, 'epoch': 3.69} 28%|██▊ | 2822/10000 [11:02:17<27:40:09, 13.88s/it] 28%|██▊ | 2823/10000 [11:02:31<27:46:00, 13.93s/it] {'loss': 0.1735, 'learning_rate': 3.5905e-05, 'epoch': 3.7} 28%|██▊ | 2823/10000 [11:02:31<27:46:00, 13.93s/it] 28%|██▊ | 2824/10000 [11:02:45<27:47:21, 13.94s/it] {'loss': 0.1456, 'learning_rate': 3.59e-05, 'epoch': 3.7} 28%|██▊ | 2824/10000 [11:02:45<27:47:21, 13.94s/it] 28%|██▊ | 2825/10000 [11:02:58<27:46:13, 13.93s/it] {'loss': 0.1482, 'learning_rate': 3.5895e-05, 'epoch': 3.7} 28%|██▊ | 2825/10000 [11:02:59<27:46:13, 13.93s/it] 28%|██▊ | 2826/10000 [11:03:12<27:45:30, 13.93s/it] {'loss': 0.1875, 'learning_rate': 3.5890000000000004e-05, 'epoch': 3.7} 28%|██▊ | 2826/10000 [11:03:12<27:45:30, 13.93s/it] 28%|██▊ | 2827/10000 [11:03:26<27:47:22, 13.95s/it] {'loss': 0.1894, 'learning_rate': 3.5885e-05, 'epoch': 3.7} 28%|██▊ | 2827/10000 [11:03:26<27:47:22, 13.95s/it] 28%|██▊ | 2828/10000 [11:03:40<27:46:46, 13.94s/it] {'loss': 0.1788, 'learning_rate': 3.588e-05, 'epoch': 3.7} 28%|██▊ | 2828/10000 [11:03:40<27:46:46, 13.94s/it] 28%|██▊ | 2829/10000 [11:03:54<27:42:41, 13.91s/it] {'loss': 0.1648, 'learning_rate': 3.5875000000000005e-05, 'epoch': 3.7} 28%|██▊ | 2829/10000 [11:03:54<27:42:41, 13.91s/it] 28%|██▊ | 2830/10000 [11:04:08<27:42:07, 13.91s/it] {'loss': 0.1495, 'learning_rate': 3.587e-05, 'epoch': 3.7} 28%|██▊ | 2830/10000 [11:04:08<27:42:07, 13.91s/it] 28%|██▊ | 2831/10000 [11:04:22<27:44:53, 13.93s/it] {'loss': 0.1619, 'learning_rate': 3.5865e-05, 'epoch': 3.71} 28%|██▊ | 2831/10000 [11:04:22<27:44:53, 13.93s/it] 28%|██▊ | 2832/10000 [11:04:36<27:43:06, 13.92s/it] {'loss': 0.1317, 'learning_rate': 3.586e-05, 'epoch': 3.71} 28%|██▊ | 2832/10000 [11:04:36<27:43:06, 13.92s/it] 28%|██▊ | 2833/10000 [11:04:50<27:44:00, 13.93s/it] {'loss': 0.1331, 'learning_rate': 3.5855e-05, 'epoch': 3.71} 28%|██▊ | 2833/10000 [11:04:50<27:44:00, 13.93s/it] 28%|██▊ | 2834/10000 [11:05:04<27:49:09, 13.98s/it] {'loss': 0.1572, 'learning_rate': 3.585e-05, 'epoch': 3.71} 28%|██▊ | 2834/10000 [11:05:04<27:49:09, 13.98s/it] 28%|██▊ | 2835/10000 [11:05:18<27:45:53, 13.95s/it] {'loss': 0.1805, 'learning_rate': 3.5845e-05, 'epoch': 3.71} 28%|██▊ | 2835/10000 [11:05:18<27:45:53, 13.95s/it] 28%|██▊ | 2836/10000 [11:05:32<27:49:16, 13.98s/it] {'loss': 0.139, 'learning_rate': 3.584e-05, 'epoch': 3.71} 28%|██▊ | 2836/10000 [11:05:32<27:49:16, 13.98s/it] 28%|██▊ | 2837/10000 [11:05:46<27:52:02, 14.01s/it] {'loss': 0.2029, 'learning_rate': 3.5835000000000005e-05, 'epoch': 3.71} 28%|██▊ | 2837/10000 [11:05:46<27:52:02, 14.01s/it] 28%|██▊ | 2838/10000 [11:06:00<27:49:11, 13.98s/it] {'loss': 0.1246, 'learning_rate': 3.583e-05, 'epoch': 3.71} 28%|██▊ | 2838/10000 [11:06:00<27:49:11, 13.98s/it] 28%|██▊ | 2839/10000 [11:06:14<27:49:06, 13.99s/it] {'loss': 0.1788, 'learning_rate': 3.5825000000000003e-05, 'epoch': 3.72} 28%|██▊ | 2839/10000 [11:06:14<27:49:06, 13.99s/it] 28%|██▊ | 2840/10000 [11:06:28<27:48:51, 13.98s/it] {'loss': 0.1509, 'learning_rate': 3.5820000000000006e-05, 'epoch': 3.72} 28%|██▊ | 2840/10000 [11:06:28<27:48:51, 13.98s/it] 28%|██▊ | 2841/10000 [11:06:42<27:48:00, 13.98s/it] {'loss': 0.1648, 'learning_rate': 3.5815e-05, 'epoch': 3.72} 28%|██▊ | 2841/10000 [11:06:42<27:48:00, 13.98s/it] 28%|██▊ | 2842/10000 [11:06:56<27:46:03, 13.97s/it] {'loss': 0.1717, 'learning_rate': 3.581e-05, 'epoch': 3.72} 28%|██▊ | 2842/10000 [11:06:56<27:46:03, 13.97s/it] 28%|██▊ | 2843/10000 [11:07:10<27:45:47, 13.96s/it] {'loss': 0.1819, 'learning_rate': 3.5805e-05, 'epoch': 3.72} 28%|██▊ | 2843/10000 [11:07:10<27:45:47, 13.96s/it] 28%|██▊ | 2844/10000 [11:07:24<27:44:13, 13.95s/it] {'loss': 0.1464, 'learning_rate': 3.58e-05, 'epoch': 3.72} 28%|██▊ | 2844/10000 [11:07:24<27:44:13, 13.95s/it] 28%|██▊ | 2845/10000 [11:07:38<27:42:55, 13.94s/it] {'loss': 0.1706, 'learning_rate': 3.5795e-05, 'epoch': 3.72} 28%|██▊ | 2845/10000 [11:07:38<27:42:55, 13.94s/it] 28%|██▊ | 2846/10000 [11:07:51<27:38:13, 13.91s/it] {'loss': 0.1839, 'learning_rate': 3.579e-05, 'epoch': 3.73} 28%|██▊ | 2846/10000 [11:07:51<27:38:13, 13.91s/it] 28%|██▊ | 2847/10000 [11:08:05<27:38:30, 13.91s/it] {'loss': 0.1731, 'learning_rate': 3.5785000000000004e-05, 'epoch': 3.73} 28%|██▊ | 2847/10000 [11:08:05<27:38:30, 13.91s/it] 28%|██▊ | 2848/10000 [11:08:19<27:38:39, 13.91s/it] {'loss': 0.1665, 'learning_rate': 3.578e-05, 'epoch': 3.73} 28%|██▊ | 2848/10000 [11:08:19<27:38:39, 13.91s/it] 28%|██▊ | 2849/10000 [11:08:33<27:39:15, 13.92s/it] {'loss': 0.1571, 'learning_rate': 3.5775e-05, 'epoch': 3.73} 28%|██▊ | 2849/10000 [11:08:33<27:39:15, 13.92s/it] 28%|██▊ | 2850/10000 [11:08:47<27:37:35, 13.91s/it] {'loss': 0.1501, 'learning_rate': 3.5770000000000005e-05, 'epoch': 3.73} 28%|██▊ | 2850/10000 [11:08:47<27:37:35, 13.91s/it] 29%|██▊ | 2851/10000 [11:09:01<27:40:52, 13.94s/it] {'loss': 0.1519, 'learning_rate': 3.576500000000001e-05, 'epoch': 3.73} 29%|██▊ | 2851/10000 [11:09:01<27:40:52, 13.94s/it] 29%|██▊ | 2852/10000 [11:09:15<27:39:26, 13.93s/it] {'loss': 0.1304, 'learning_rate': 3.5759999999999996e-05, 'epoch': 3.73} 29%|██▊ | 2852/10000 [11:09:15<27:39:26, 13.93s/it] 29%|██▊ | 2853/10000 [11:09:29<27:36:48, 13.91s/it] {'loss': 0.1578, 'learning_rate': 3.5755e-05, 'epoch': 3.73} 29%|██▊ | 2853/10000 [11:09:29<27:36:48, 13.91s/it] 29%|██▊ | 2854/10000 [11:09:43<27:36:40, 13.91s/it] {'loss': 0.1598, 'learning_rate': 3.575e-05, 'epoch': 3.74} 29%|██▊ | 2854/10000 [11:09:43<27:36:40, 13.91s/it] 29%|██▊ | 2855/10000 [11:09:57<27:35:56, 13.91s/it] {'loss': 0.1903, 'learning_rate': 3.5745e-05, 'epoch': 3.74} 29%|██▊ | 2855/10000 [11:09:57<27:35:56, 13.91s/it] 29%|██▊ | 2856/10000 [11:10:11<27:36:05, 13.91s/it] {'loss': 0.1639, 'learning_rate': 3.574e-05, 'epoch': 3.74} 29%|██▊ | 2856/10000 [11:10:11<27:36:05, 13.91s/it] 29%|██▊ | 2857/10000 [11:10:25<27:39:15, 13.94s/it] {'loss': 0.1678, 'learning_rate': 3.5735e-05, 'epoch': 3.74} 29%|██▊ | 2857/10000 [11:10:25<27:39:15, 13.94s/it] 29%|██▊ | 2858/10000 [11:10:39<27:46:27, 14.00s/it] {'loss': 0.17, 'learning_rate': 3.5730000000000005e-05, 'epoch': 3.74} 29%|██▊ | 2858/10000 [11:10:39<27:46:27, 14.00s/it] 29%|██▊ | 2859/10000 [11:10:53<27:42:13, 13.97s/it] {'loss': 0.1642, 'learning_rate': 3.5725e-05, 'epoch': 3.74} 29%|██▊ | 2859/10000 [11:10:53<27:42:13, 13.97s/it] 29%|██▊ | 2860/10000 [11:11:07<27:42:26, 13.97s/it] {'loss': 0.1587, 'learning_rate': 3.5720000000000004e-05, 'epoch': 3.74} 29%|██▊ | 2860/10000 [11:11:07<27:42:26, 13.97s/it] 29%|██▊ | 2861/10000 [11:11:20<27:38:58, 13.94s/it] {'loss': 0.1618, 'learning_rate': 3.5715000000000006e-05, 'epoch': 3.74} 29%|██▊ | 2861/10000 [11:11:21<27:38:58, 13.94s/it] 29%|██▊ | 2862/10000 [11:11:34<27:39:17, 13.95s/it] {'loss': 0.1471, 'learning_rate': 3.571e-05, 'epoch': 3.75} 29%|██▊ | 2862/10000 [11:11:34<27:39:17, 13.95s/it] 29%|██▊ | 2863/10000 [11:11:49<27:43:02, 13.98s/it] {'loss': 0.1666, 'learning_rate': 3.5705e-05, 'epoch': 3.75} 29%|██▊ | 2863/10000 [11:11:49<27:43:02, 13.98s/it] 29%|██▊ | 2864/10000 [11:12:02<27:38:08, 13.94s/it] {'loss': 0.1861, 'learning_rate': 3.57e-05, 'epoch': 3.75} 29%|██▊ | 2864/10000 [11:12:02<27:38:08, 13.94s/it] 29%|██▊ | 2865/10000 [11:12:16<27:41:10, 13.97s/it] {'loss': 0.1436, 'learning_rate': 3.5695e-05, 'epoch': 3.75} 29%|██▊ | 2865/10000 [11:12:16<27:41:10, 13.97s/it] 29%|██▊ | 2866/10000 [11:12:30<27:38:59, 13.95s/it] {'loss': 0.1442, 'learning_rate': 3.569e-05, 'epoch': 3.75} 29%|██▊ | 2866/10000 [11:12:30<27:38:59, 13.95s/it] 29%|██▊ | 2867/10000 [11:12:44<27:40:36, 13.97s/it] {'loss': 0.1702, 'learning_rate': 3.5685e-05, 'epoch': 3.75} 29%|██▊ | 2867/10000 [11:12:44<27:40:36, 13.97s/it] 29%|██▊ | 2868/10000 [11:12:58<27:41:40, 13.98s/it] {'loss': 0.1446, 'learning_rate': 3.5680000000000004e-05, 'epoch': 3.75} 29%|██▊ | 2868/10000 [11:12:58<27:41:40, 13.98s/it] 29%|██▊ | 2869/10000 [11:13:12<27:41:35, 13.98s/it] {'loss': 0.117, 'learning_rate': 3.5675e-05, 'epoch': 3.76} 29%|██▊ | 2869/10000 [11:13:12<27:41:35, 13.98s/it] 29%|██▊ | 2870/10000 [11:13:26<27:42:35, 13.99s/it] {'loss': 0.1779, 'learning_rate': 3.567e-05, 'epoch': 3.76} 29%|██▊ | 2870/10000 [11:13:26<27:42:35, 13.99s/it] 29%|██▊ | 2871/10000 [11:13:40<27:37:58, 13.95s/it] {'loss': 0.1636, 'learning_rate': 3.5665000000000005e-05, 'epoch': 3.76} 29%|██▊ | 2871/10000 [11:13:40<27:37:58, 13.95s/it] 29%|██▊ | 2872/10000 [11:13:54<27:37:49, 13.95s/it] {'loss': 0.1988, 'learning_rate': 3.566e-05, 'epoch': 3.76} 29%|██▊ | 2872/10000 [11:13:54<27:37:49, 13.95s/it] 29%|██▊ | 2873/10000 [11:14:08<27:35:12, 13.93s/it] {'loss': 0.1624, 'learning_rate': 3.5654999999999997e-05, 'epoch': 3.76} 29%|██▊ | 2873/10000 [11:14:08<27:35:12, 13.93s/it] 29%|██▊ | 2874/10000 [11:14:22<27:36:43, 13.95s/it] {'loss': 0.1691, 'learning_rate': 3.565e-05, 'epoch': 3.76} 29%|██▊ | 2874/10000 [11:14:22<27:36:43, 13.95s/it] 29%|██▉ | 2875/10000 [11:14:36<27:39:10, 13.97s/it] {'loss': 0.1532, 'learning_rate': 3.5645e-05, 'epoch': 3.76} 29%|██▉ | 2875/10000 [11:14:36<27:39:10, 13.97s/it] 29%|██▉ | 2876/10000 [11:14:50<27:41:03, 13.99s/it] {'loss': 0.1125, 'learning_rate': 3.5640000000000004e-05, 'epoch': 3.76} 29%|██▉ | 2876/10000 [11:14:50<27:41:03, 13.99s/it] 29%|██▉ | 2877/10000 [11:15:04<27:38:03, 13.97s/it] {'loss': 0.1912, 'learning_rate': 3.5635e-05, 'epoch': 3.77} 29%|██▉ | 2877/10000 [11:15:04<27:38:03, 13.97s/it] 29%|██▉ | 2878/10000 [11:15:18<27:41:02, 13.99s/it] {'loss': 0.1909, 'learning_rate': 3.563e-05, 'epoch': 3.77} 29%|██▉ | 2878/10000 [11:15:18<27:41:02, 13.99s/it] 29%|██▉ | 2879/10000 [11:15:32<27:34:37, 13.94s/it] {'loss': 0.1674, 'learning_rate': 3.5625000000000005e-05, 'epoch': 3.77} 29%|██▉ | 2879/10000 [11:15:32<27:34:37, 13.94s/it] 29%|██▉ | 2880/10000 [11:15:46<27:34:06, 13.94s/it] {'loss': 0.166, 'learning_rate': 3.562e-05, 'epoch': 3.77} 29%|██▉ | 2880/10000 [11:15:46<27:34:06, 13.94s/it] 29%|██▉ | 2881/10000 [11:16:00<27:36:05, 13.96s/it] {'loss': 0.1524, 'learning_rate': 3.5615000000000004e-05, 'epoch': 3.77} 29%|██▉ | 2881/10000 [11:16:00<27:36:05, 13.96s/it] 29%|██▉ | 2882/10000 [11:16:14<27:40:55, 14.00s/it] {'loss': 0.1856, 'learning_rate': 3.5610000000000006e-05, 'epoch': 3.77} 29%|██▉ | 2882/10000 [11:16:14<27:40:55, 14.00s/it] 29%|██▉ | 2883/10000 [11:16:28<27:34:45, 13.95s/it] {'loss': 0.1528, 'learning_rate': 3.5605e-05, 'epoch': 3.77} 29%|██▉ | 2883/10000 [11:16:28<27:34:45, 13.95s/it] 29%|██▉ | 2884/10000 [11:16:42<27:31:24, 13.92s/it] {'loss': 0.1532, 'learning_rate': 3.56e-05, 'epoch': 3.77} 29%|██▉ | 2884/10000 [11:16:42<27:31:24, 13.92s/it] 29%|██▉ | 2885/10000 [11:16:56<27:31:29, 13.93s/it] {'loss': 0.2325, 'learning_rate': 3.5595e-05, 'epoch': 3.78} 29%|██▉ | 2885/10000 [11:16:56<27:31:29, 13.93s/it] 29%|██▉ | 2886/10000 [11:17:10<27:33:53, 13.95s/it] {'loss': 0.1946, 'learning_rate': 3.559e-05, 'epoch': 3.78} 29%|██▉ | 2886/10000 [11:17:10<27:33:53, 13.95s/it] 29%|██▉ | 2887/10000 [11:17:24<27:36:23, 13.97s/it] {'loss': 0.1651, 'learning_rate': 3.5585e-05, 'epoch': 3.78} 29%|██▉ | 2887/10000 [11:17:24<27:36:23, 13.97s/it] 29%|██▉ | 2888/10000 [11:17:37<27:32:19, 13.94s/it] {'loss': 0.1491, 'learning_rate': 3.558e-05, 'epoch': 3.78} 29%|██▉ | 2888/10000 [11:17:37<27:32:19, 13.94s/it] 29%|██▉ | 2889/10000 [11:17:51<27:30:23, 13.93s/it] {'loss': 0.176, 'learning_rate': 3.5575000000000004e-05, 'epoch': 3.78} 29%|██▉ | 2889/10000 [11:17:51<27:30:23, 13.93s/it] 29%|██▉ | 2890/10000 [11:18:05<27:32:11, 13.94s/it] {'loss': 0.1637, 'learning_rate': 3.557e-05, 'epoch': 3.78} 29%|██▉ | 2890/10000 [11:18:05<27:32:11, 13.94s/it] 29%|██▉ | 2891/10000 [11:18:19<27:33:11, 13.95s/it] {'loss': 0.1482, 'learning_rate': 3.5565e-05, 'epoch': 3.78} 29%|██▉ | 2891/10000 [11:18:19<27:33:11, 13.95s/it] 29%|██▉ | 2892/10000 [11:18:33<27:29:44, 13.93s/it] {'loss': 0.1764, 'learning_rate': 3.5560000000000005e-05, 'epoch': 3.79} 29%|██▉ | 2892/10000 [11:18:33<27:29:44, 13.93s/it] 29%|██▉ | 2893/10000 [11:18:47<27:31:22, 13.94s/it] {'loss': 0.1737, 'learning_rate': 3.5555e-05, 'epoch': 3.79} 29%|██▉ | 2893/10000 [11:18:47<27:31:22, 13.94s/it] 29%|██▉ | 2894/10000 [11:19:01<27:29:39, 13.93s/it] {'loss': 0.1646, 'learning_rate': 3.555e-05, 'epoch': 3.79} 29%|██▉ | 2894/10000 [11:19:01<27:29:39, 13.93s/it] 29%|██▉ | 2895/10000 [11:19:15<27:21:49, 13.86s/it] {'loss': 0.1527, 'learning_rate': 3.5545e-05, 'epoch': 3.79} 29%|██▉ | 2895/10000 [11:19:15<27:21:49, 13.86s/it] 29%|██▉ | 2896/10000 [11:19:29<27:19:42, 13.85s/it] {'loss': 0.1851, 'learning_rate': 3.554e-05, 'epoch': 3.79} 29%|██▉ | 2896/10000 [11:19:29<27:19:42, 13.85s/it] 29%|██▉ | 2897/10000 [11:19:42<27:22:35, 13.88s/it] {'loss': 0.1675, 'learning_rate': 3.5535000000000005e-05, 'epoch': 3.79} 29%|██▉ | 2897/10000 [11:19:43<27:22:35, 13.88s/it] 29%|██▉ | 2898/10000 [11:19:57<27:28:40, 13.93s/it] {'loss': 0.1487, 'learning_rate': 3.553e-05, 'epoch': 3.79} 29%|██▉ | 2898/10000 [11:19:57<27:28:40, 13.93s/it] 29%|██▉ | 2899/10000 [11:20:10<27:27:05, 13.92s/it] {'loss': 0.1454, 'learning_rate': 3.5525e-05, 'epoch': 3.79} 29%|██▉ | 2899/10000 [11:20:10<27:27:05, 13.92s/it] 29%|██▉ | 2900/10000 [11:20:24<27:30:14, 13.95s/it] {'loss': 0.1683, 'learning_rate': 3.5520000000000006e-05, 'epoch': 3.8} 29%|██▉ | 2900/10000 [11:20:24<27:30:14, 13.95s/it] 29%|██▉ | 2901/10000 [11:20:38<27:28:14, 13.93s/it] {'loss': 0.2025, 'learning_rate': 3.5515e-05, 'epoch': 3.8} 29%|██▉ | 2901/10000 [11:20:38<27:28:14, 13.93s/it] 29%|██▉ | 2902/10000 [11:20:52<27:25:21, 13.91s/it] {'loss': 0.2055, 'learning_rate': 3.5510000000000004e-05, 'epoch': 3.8} 29%|██▉ | 2902/10000 [11:20:52<27:25:21, 13.91s/it] 29%|██▉ | 2903/10000 [11:21:06<27:27:46, 13.93s/it] {'loss': 0.1901, 'learning_rate': 3.5505e-05, 'epoch': 3.8} 29%|██▉ | 2903/10000 [11:21:06<27:27:46, 13.93s/it] 29%|██▉ | 2904/10000 [11:21:20<27:27:14, 13.93s/it] {'loss': 0.1677, 'learning_rate': 3.55e-05, 'epoch': 3.8} 29%|██▉ | 2904/10000 [11:21:20<27:27:14, 13.93s/it] 29%|██▉ | 2905/10000 [11:21:34<27:28:55, 13.94s/it] {'loss': 0.1581, 'learning_rate': 3.5495e-05, 'epoch': 3.8} 29%|██▉ | 2905/10000 [11:21:34<27:28:55, 13.94s/it] 29%|██▉ | 2906/10000 [11:21:48<27:29:25, 13.95s/it] {'loss': 0.161, 'learning_rate': 3.549e-05, 'epoch': 3.8} 29%|██▉ | 2906/10000 [11:21:48<27:29:25, 13.95s/it] 29%|██▉ | 2907/10000 [11:22:02<27:26:22, 13.93s/it] {'loss': 0.1588, 'learning_rate': 3.5485e-05, 'epoch': 3.8} 29%|██▉ | 2907/10000 [11:22:02<27:26:22, 13.93s/it] 29%|██▉ | 2908/10000 [11:22:16<27:25:11, 13.92s/it] {'loss': 0.158, 'learning_rate': 3.548e-05, 'epoch': 3.81} 29%|██▉ | 2908/10000 [11:22:16<27:25:11, 13.92s/it] 29%|██▉ | 2909/10000 [11:22:30<27:28:16, 13.95s/it] {'loss': 0.1577, 'learning_rate': 3.5475e-05, 'epoch': 3.81} 29%|██▉ | 2909/10000 [11:22:30<27:28:16, 13.95s/it] 29%|██▉ | 2910/10000 [11:22:44<27:25:34, 13.93s/it] {'loss': 0.1261, 'learning_rate': 3.5470000000000004e-05, 'epoch': 3.81} 29%|██▉ | 2910/10000 [11:22:44<27:25:34, 13.93s/it] 29%|██▉ | 2911/10000 [11:22:58<27:30:20, 13.97s/it] {'loss': 0.1702, 'learning_rate': 3.546500000000001e-05, 'epoch': 3.81} 29%|██▉ | 2911/10000 [11:22:58<27:30:20, 13.97s/it] 29%|██▉ | 2912/10000 [11:23:12<27:31:56, 13.98s/it] {'loss': 0.1426, 'learning_rate': 3.546e-05, 'epoch': 3.81} 29%|██▉ | 2912/10000 [11:23:12<27:31:56, 13.98s/it] 29%|██▉ | 2913/10000 [11:23:26<27:32:24, 13.99s/it] {'loss': 0.1669, 'learning_rate': 3.5455e-05, 'epoch': 3.81} 29%|██▉ | 2913/10000 [11:23:26<27:32:24, 13.99s/it] 29%|██▉ | 2914/10000 [11:23:40<27:39:22, 14.05s/it] {'loss': 0.1628, 'learning_rate': 3.545e-05, 'epoch': 3.81} 29%|██▉ | 2914/10000 [11:23:40<27:39:22, 14.05s/it] 29%|██▉ | 2915/10000 [11:23:54<27:29:55, 13.97s/it] {'loss': 0.1518, 'learning_rate': 3.5445000000000004e-05, 'epoch': 3.82} 29%|██▉ | 2915/10000 [11:23:54<27:29:55, 13.97s/it] 29%|██▉ | 2916/10000 [11:24:08<27:25:01, 13.93s/it] {'loss': 0.1443, 'learning_rate': 3.544e-05, 'epoch': 3.82} 29%|██▉ | 2916/10000 [11:24:08<27:25:01, 13.93s/it] 29%|██▉ | 2917/10000 [11:24:22<27:27:27, 13.96s/it] {'loss': 0.2022, 'learning_rate': 3.5435e-05, 'epoch': 3.82} 29%|██▉ | 2917/10000 [11:24:22<27:27:27, 13.96s/it] 29%|██▉ | 2918/10000 [11:24:35<27:21:18, 13.91s/it] {'loss': 0.1911, 'learning_rate': 3.5430000000000005e-05, 'epoch': 3.82} 29%|██▉ | 2918/10000 [11:24:35<27:21:18, 13.91s/it] 29%|██▉ | 2919/10000 [11:24:49<27:22:38, 13.92s/it] {'loss': 0.1315, 'learning_rate': 3.5425e-05, 'epoch': 3.82} 29%|██▉ | 2919/10000 [11:24:49<27:22:38, 13.92s/it] 29%|██▉ | 2920/10000 [11:25:03<27:19:10, 13.89s/it] {'loss': 0.1824, 'learning_rate': 3.542e-05, 'epoch': 3.82} 29%|██▉ | 2920/10000 [11:25:03<27:19:10, 13.89s/it] 29%|██▉ | 2921/10000 [11:25:17<27:19:44, 13.90s/it] {'loss': 0.1709, 'learning_rate': 3.5415000000000006e-05, 'epoch': 3.82} 29%|██▉ | 2921/10000 [11:25:17<27:19:44, 13.90s/it] 29%|██▉ | 2922/10000 [11:25:31<27:26:32, 13.96s/it] {'loss': 0.1349, 'learning_rate': 3.541e-05, 'epoch': 3.82} 29%|██▉ | 2922/10000 [11:25:31<27:26:32, 13.96s/it] 29%|██▉ | 2923/10000 [11:25:45<27:26:14, 13.96s/it] {'loss': 0.1482, 'learning_rate': 3.5405e-05, 'epoch': 3.83} 29%|██▉ | 2923/10000 [11:25:45<27:26:14, 13.96s/it] 29%|██▉ | 2924/10000 [11:25:59<27:24:34, 13.94s/it] {'loss': 0.1922, 'learning_rate': 3.54e-05, 'epoch': 3.83} 29%|██▉ | 2924/10000 [11:25:59<27:24:34, 13.94s/it] 29%|██▉ | 2925/10000 [11:26:13<27:20:16, 13.91s/it] {'loss': 0.1759, 'learning_rate': 3.5395e-05, 'epoch': 3.83} 29%|██▉ | 2925/10000 [11:26:13<27:20:16, 13.91s/it] 29%|██▉ | 2926/10000 [11:26:27<27:24:06, 13.94s/it] {'loss': 0.1544, 'learning_rate': 3.539e-05, 'epoch': 3.83} 29%|██▉ | 2926/10000 [11:26:27<27:24:06, 13.94s/it] 29%|██▉ | 2927/10000 [11:26:41<27:27:39, 13.98s/it] {'loss': 0.1568, 'learning_rate': 3.5385e-05, 'epoch': 3.83} 29%|██▉ | 2927/10000 [11:26:41<27:27:39, 13.98s/it] 29%|██▉ | 2928/10000 [11:26:55<27:21:49, 13.93s/it] {'loss': 0.1331, 'learning_rate': 3.5380000000000003e-05, 'epoch': 3.83} 29%|██▉ | 2928/10000 [11:26:55<27:21:49, 13.93s/it] 29%|██▉ | 2929/10000 [11:27:09<27:21:23, 13.93s/it] {'loss': 0.1808, 'learning_rate': 3.5375e-05, 'epoch': 3.83} 29%|██▉ | 2929/10000 [11:27:09<27:21:23, 13.93s/it] 29%|██▉ | 2930/10000 [11:27:23<27:28:08, 13.99s/it] {'loss': 0.1718, 'learning_rate': 3.537e-05, 'epoch': 3.84} 29%|██▉ | 2930/10000 [11:27:23<27:28:08, 13.99s/it] 29%|██▉ | 2931/10000 [11:27:37<27:24:50, 13.96s/it] {'loss': 0.1264, 'learning_rate': 3.5365000000000004e-05, 'epoch': 3.84} 29%|██▉ | 2931/10000 [11:27:37<27:24:50, 13.96s/it] 29%|██▉ | 2932/10000 [11:27:51<27:24:19, 13.96s/it] {'loss': 0.1915, 'learning_rate': 3.536000000000001e-05, 'epoch': 3.84} 29%|██▉ | 2932/10000 [11:27:51<27:24:19, 13.96s/it] 29%|██▉ | 2933/10000 [11:28:05<27:23:41, 13.96s/it] {'loss': 0.2023, 'learning_rate': 3.5354999999999996e-05, 'epoch': 3.84} 29%|██▉ | 2933/10000 [11:28:05<27:23:41, 13.96s/it] 29%|██▉ | 2934/10000 [11:28:19<27:24:02, 13.96s/it] {'loss': 0.1623, 'learning_rate': 3.535e-05, 'epoch': 3.84} 29%|██▉ | 2934/10000 [11:28:19<27:24:02, 13.96s/it] 29%|██▉ | 2935/10000 [11:28:33<27:22:53, 13.95s/it] {'loss': 0.1637, 'learning_rate': 3.5345e-05, 'epoch': 3.84} 29%|██▉ | 2935/10000 [11:28:33<27:22:53, 13.95s/it] 29%|██▉ | 2936/10000 [11:28:46<27:19:47, 13.93s/it] {'loss': 0.1789, 'learning_rate': 3.5340000000000004e-05, 'epoch': 3.84} 29%|██▉ | 2936/10000 [11:28:46<27:19:47, 13.93s/it] 29%|██▉ | 2937/10000 [11:29:00<27:13:10, 13.87s/it] {'loss': 0.1769, 'learning_rate': 3.5335e-05, 'epoch': 3.84} 29%|██▉ | 2937/10000 [11:29:00<27:13:10, 13.87s/it] 29%|██▉ | 2938/10000 [11:29:14<27:14:27, 13.89s/it] {'loss': 0.172, 'learning_rate': 3.533e-05, 'epoch': 3.85} 29%|██▉ | 2938/10000 [11:29:14<27:14:27, 13.89s/it] 29%|██▉ | 2939/10000 [11:29:28<27:19:27, 13.93s/it] {'loss': 0.1736, 'learning_rate': 3.5325000000000005e-05, 'epoch': 3.85} 29%|██▉ | 2939/10000 [11:29:28<27:19:27, 13.93s/it] 29%|██▉ | 2940/10000 [11:29:42<27:19:16, 13.93s/it] {'loss': 0.2014, 'learning_rate': 3.532e-05, 'epoch': 3.85} 29%|██▉ | 2940/10000 [11:29:42<27:19:16, 13.93s/it] 29%|██▉ | 2941/10000 [11:29:56<27:16:51, 13.91s/it] {'loss': 0.1583, 'learning_rate': 3.5315e-05, 'epoch': 3.85} 29%|██▉ | 2941/10000 [11:29:56<27:16:51, 13.91s/it] 29%|██▉ | 2942/10000 [11:30:10<27:19:05, 13.93s/it] {'loss': 0.1368, 'learning_rate': 3.5310000000000006e-05, 'epoch': 3.85} 29%|██▉ | 2942/10000 [11:30:10<27:19:05, 13.93s/it] 29%|██▉ | 2943/10000 [11:30:24<27:21:55, 13.96s/it] {'loss': 0.1829, 'learning_rate': 3.5305e-05, 'epoch': 3.85} 29%|██▉ | 2943/10000 [11:30:24<27:21:55, 13.96s/it] 29%|██▉ | 2944/10000 [11:30:38<27:21:46, 13.96s/it] {'loss': 0.1694, 'learning_rate': 3.53e-05, 'epoch': 3.85} 29%|██▉ | 2944/10000 [11:30:38<27:21:46, 13.96s/it] 29%|██▉ | 2945/10000 [11:30:52<27:22:59, 13.97s/it] {'loss': 0.1611, 'learning_rate': 3.5295e-05, 'epoch': 3.85} 29%|██▉ | 2945/10000 [11:30:52<27:22:59, 13.97s/it] 29%|██▉ | 2946/10000 [11:31:06<27:18:32, 13.94s/it] {'loss': 0.1673, 'learning_rate': 3.529e-05, 'epoch': 3.86} 29%|██▉ | 2946/10000 [11:31:06<27:18:32, 13.94s/it] 29%|██▉ | 2947/10000 [11:31:20<27:17:50, 13.93s/it] {'loss': 0.1791, 'learning_rate': 3.5285e-05, 'epoch': 3.86} 29%|██▉ | 2947/10000 [11:31:20<27:17:50, 13.93s/it] 29%|██▉ | 2948/10000 [11:31:34<27:17:50, 13.94s/it] {'loss': 0.1727, 'learning_rate': 3.528e-05, 'epoch': 3.86} 29%|██▉ | 2948/10000 [11:31:34<27:17:50, 13.94s/it] 29%|██▉ | 2949/10000 [11:31:48<27:18:31, 13.94s/it] {'loss': 0.1568, 'learning_rate': 3.5275000000000004e-05, 'epoch': 3.86} 29%|██▉ | 2949/10000 [11:31:48<27:18:31, 13.94s/it] 30%|██▉ | 2950/10000 [11:32:02<27:19:46, 13.96s/it] {'loss': 0.1721, 'learning_rate': 3.5270000000000006e-05, 'epoch': 3.86} 30%|██▉ | 2950/10000 [11:32:02<27:19:46, 13.96s/it] 30%|██▉ | 2951/10000 [11:32:15<27:16:42, 13.93s/it] {'loss': 0.1319, 'learning_rate': 3.5265e-05, 'epoch': 3.86} 30%|██▉ | 2951/10000 [11:32:15<27:16:42, 13.93s/it] 30%|██▉ | 2952/10000 [11:32:29<27:14:48, 13.92s/it] {'loss': 0.1673, 'learning_rate': 3.5260000000000005e-05, 'epoch': 3.86} 30%|██▉ | 2952/10000 [11:32:29<27:14:48, 13.92s/it] 30%|██▉ | 2953/10000 [11:32:43<27:18:03, 13.95s/it] {'loss': 0.1355, 'learning_rate': 3.5255e-05, 'epoch': 3.87} 30%|██▉ | 2953/10000 [11:32:43<27:18:03, 13.95s/it] 30%|██▉ | 2954/10000 [11:32:57<27:15:36, 13.93s/it] {'loss': 0.1756, 'learning_rate': 3.525e-05, 'epoch': 3.87} 30%|██▉ | 2954/10000 [11:32:57<27:15:36, 13.93s/it] 30%|██▉ | 2955/10000 [11:33:11<27:15:14, 13.93s/it] {'loss': 0.1438, 'learning_rate': 3.5245e-05, 'epoch': 3.87} 30%|██▉ | 2955/10000 [11:33:11<27:15:14, 13.93s/it] 30%|██▉ | 2956/10000 [11:33:25<27:16:01, 13.94s/it] {'loss': 0.1358, 'learning_rate': 3.524e-05, 'epoch': 3.87} 30%|██▉ | 2956/10000 [11:33:25<27:16:01, 13.94s/it] 30%|██▉ | 2957/10000 [11:33:39<27:13:40, 13.92s/it] {'loss': 0.1811, 'learning_rate': 3.5235000000000004e-05, 'epoch': 3.87} 30%|██▉ | 2957/10000 [11:33:39<27:13:40, 13.92s/it] 30%|██▉ | 2958/10000 [11:33:53<27:11:47, 13.90s/it] {'loss': 0.1541, 'learning_rate': 3.523e-05, 'epoch': 3.87} 30%|██▉ | 2958/10000 [11:33:53<27:11:47, 13.90s/it] 30%|██▉ | 2959/10000 [11:34:07<27:09:31, 13.89s/it] {'loss': 0.19, 'learning_rate': 3.5225e-05, 'epoch': 3.87} 30%|██▉ | 2959/10000 [11:34:07<27:09:31, 13.89s/it] 30%|██▉ | 2960/10000 [11:34:20<27:05:11, 13.85s/it] {'loss': 0.1535, 'learning_rate': 3.5220000000000005e-05, 'epoch': 3.87} 30%|██▉ | 2960/10000 [11:34:21<27:05:11, 13.85s/it] 30%|██▉ | 2961/10000 [11:34:35<27:13:17, 13.92s/it] {'loss': 0.1808, 'learning_rate': 3.5215e-05, 'epoch': 3.88} 30%|██▉ | 2961/10000 [11:34:35<27:13:17, 13.92s/it] 30%|██▉ | 2962/10000 [11:34:49<27:17:36, 13.96s/it] {'loss': 0.1665, 'learning_rate': 3.5210000000000003e-05, 'epoch': 3.88} 30%|██▉ | 2962/10000 [11:34:49<27:17:36, 13.96s/it] 30%|██▉ | 2963/10000 [11:35:03<27:17:12, 13.96s/it] {'loss': 0.1518, 'learning_rate': 3.5205e-05, 'epoch': 3.88} 30%|██▉ | 2963/10000 [11:35:03<27:17:12, 13.96s/it] 30%|██▉ | 2964/10000 [11:35:16<27:16:51, 13.96s/it] {'loss': 0.1647, 'learning_rate': 3.52e-05, 'epoch': 3.88} 30%|██▉ | 2964/10000 [11:35:17<27:16:51, 13.96s/it] 30%|██▉ | 2965/10000 [11:35:30<27:14:19, 13.94s/it] {'loss': 0.1842, 'learning_rate': 3.5195e-05, 'epoch': 3.88} 30%|██▉ | 2965/10000 [11:35:30<27:14:19, 13.94s/it] 30%|██▉ | 2966/10000 [11:35:44<27:14:56, 13.95s/it] {'loss': 0.1377, 'learning_rate': 3.519e-05, 'epoch': 3.88} 30%|██▉ | 2966/10000 [11:35:44<27:14:56, 13.95s/it] 30%|██▉ | 2967/10000 [11:35:58<27:13:05, 13.93s/it] {'loss': 0.1456, 'learning_rate': 3.5185e-05, 'epoch': 3.88} 30%|██▉ | 2967/10000 [11:35:58<27:13:05, 13.93s/it] 30%|██▉ | 2968/10000 [11:36:12<27:20:35, 14.00s/it] {'loss': 0.1777, 'learning_rate': 3.518e-05, 'epoch': 3.88} 30%|██▉ | 2968/10000 [11:36:12<27:20:35, 14.00s/it] 30%|██▉ | 2969/10000 [11:36:26<27:15:51, 13.96s/it] {'loss': 0.1739, 'learning_rate': 3.5175e-05, 'epoch': 3.89} 30%|██▉ | 2969/10000 [11:36:26<27:15:51, 13.96s/it] 30%|██▉ | 2970/10000 [11:36:40<27:16:23, 13.97s/it] {'loss': 0.1904, 'learning_rate': 3.5170000000000004e-05, 'epoch': 3.89} 30%|██▉ | 2970/10000 [11:36:40<27:16:23, 13.97s/it] 30%|██▉ | 2971/10000 [11:36:54<27:15:48, 13.96s/it] {'loss': 0.1565, 'learning_rate': 3.5165000000000006e-05, 'epoch': 3.89} 30%|██▉ | 2971/10000 [11:36:54<27:15:48, 13.96s/it] 30%|██▉ | 2972/10000 [11:37:08<27:14:19, 13.95s/it] {'loss': 0.1705, 'learning_rate': 3.516e-05, 'epoch': 3.89} 30%|██▉ | 2972/10000 [11:37:08<27:14:19, 13.95s/it] 30%|██▉ | 2973/10000 [11:37:22<27:11:41, 13.93s/it] {'loss': 0.1392, 'learning_rate': 3.5155e-05, 'epoch': 3.89} 30%|██▉ | 2973/10000 [11:37:22<27:11:41, 13.93s/it] 30%|██▉ | 2974/10000 [11:37:36<27:16:29, 13.98s/it] {'loss': 0.1498, 'learning_rate': 3.515e-05, 'epoch': 3.89} 30%|██▉ | 2974/10000 [11:37:36<27:16:29, 13.98s/it] 30%|██▉ | 2975/10000 [11:37:50<27:23:10, 14.03s/it] {'loss': 0.1731, 'learning_rate': 3.5145e-05, 'epoch': 3.89} 30%|██▉ | 2975/10000 [11:37:50<27:23:10, 14.03s/it] 30%|██▉ | 2976/10000 [11:38:04<27:19:29, 14.00s/it] {'loss': 0.1664, 'learning_rate': 3.514e-05, 'epoch': 3.9} 30%|██▉ | 2976/10000 [11:38:04<27:19:29, 14.00s/it] 30%|██▉ | 2977/10000 [11:38:18<27:14:44, 13.97s/it] {'loss': 0.1604, 'learning_rate': 3.5135e-05, 'epoch': 3.9} 30%|██▉ | 2977/10000 [11:38:18<27:14:44, 13.97s/it] 30%|██▉ | 2978/10000 [11:38:32<27:12:02, 13.95s/it] {'loss': 0.1615, 'learning_rate': 3.5130000000000004e-05, 'epoch': 3.9} 30%|██▉ | 2978/10000 [11:38:32<27:12:02, 13.95s/it] 30%|██▉ | 2979/10000 [11:38:46<27:10:22, 13.93s/it] {'loss': 0.1501, 'learning_rate': 3.5125e-05, 'epoch': 3.9} 30%|██▉ | 2979/10000 [11:38:46<27:10:22, 13.93s/it] 30%|██▉ | 2980/10000 [11:39:00<27:08:08, 13.92s/it] {'loss': 0.1852, 'learning_rate': 3.512e-05, 'epoch': 3.9} 30%|██▉ | 2980/10000 [11:39:00<27:08:08, 13.92s/it] 30%|██▉ | 2981/10000 [11:39:14<27:05:23, 13.89s/it] {'loss': 0.1478, 'learning_rate': 3.5115000000000005e-05, 'epoch': 3.9} 30%|██▉ | 2981/10000 [11:39:14<27:05:23, 13.89s/it] 30%|██▉ | 2982/10000 [11:39:27<27:04:22, 13.89s/it] {'loss': 0.174, 'learning_rate': 3.511e-05, 'epoch': 3.9} 30%|██▉ | 2982/10000 [11:39:28<27:04:22, 13.89s/it] 30%|██▉ | 2983/10000 [11:39:41<27:05:48, 13.90s/it] {'loss': 0.1498, 'learning_rate': 3.5105e-05, 'epoch': 3.9} 30%|██▉ | 2983/10000 [11:39:41<27:05:48, 13.90s/it] 30%|██▉ | 2984/10000 [11:39:55<27:04:40, 13.89s/it] {'loss': 0.1664, 'learning_rate': 3.51e-05, 'epoch': 3.91} 30%|██▉ | 2984/10000 [11:39:55<27:04:40, 13.89s/it] 30%|██▉ | 2985/10000 [11:40:09<27:00:41, 13.86s/it] {'loss': 0.1923, 'learning_rate': 3.5095e-05, 'epoch': 3.91} 30%|██▉ | 2985/10000 [11:40:09<27:00:41, 13.86s/it] 30%|██▉ | 2986/10000 [11:40:23<27:08:10, 13.93s/it] {'loss': 0.1812, 'learning_rate': 3.509e-05, 'epoch': 3.91} 30%|██▉ | 2986/10000 [11:40:23<27:08:10, 13.93s/it] 30%|██▉ | 2987/10000 [11:40:37<27:05:08, 13.90s/it] {'loss': 0.1495, 'learning_rate': 3.5085e-05, 'epoch': 3.91} 30%|██▉ | 2987/10000 [11:40:37<27:05:08, 13.90s/it] 30%|██▉ | 2988/10000 [11:40:51<27:10:37, 13.95s/it] {'loss': 0.1531, 'learning_rate': 3.508e-05, 'epoch': 3.91} 30%|██▉ | 2988/10000 [11:40:51<27:10:37, 13.95s/it] 30%|██▉ | 2989/10000 [11:41:05<27:11:25, 13.96s/it] {'loss': 0.1884, 'learning_rate': 3.5075000000000006e-05, 'epoch': 3.91} 30%|██▉ | 2989/10000 [11:41:05<27:11:25, 13.96s/it] 30%|██▉ | 2990/10000 [11:41:19<27:07:01, 13.93s/it] {'loss': 0.1633, 'learning_rate': 3.507e-05, 'epoch': 3.91} 30%|██▉ | 2990/10000 [11:41:19<27:07:01, 13.93s/it] 30%|██▉ | 2991/10000 [11:41:33<27:05:55, 13.92s/it] {'loss': 0.161, 'learning_rate': 3.5065000000000004e-05, 'epoch': 3.91} 30%|██▉ | 2991/10000 [11:41:33<27:05:55, 13.92s/it] 30%|██▉ | 2992/10000 [11:41:47<27:06:23, 13.92s/it] {'loss': 0.1333, 'learning_rate': 3.5060000000000007e-05, 'epoch': 3.92} 30%|██▉ | 2992/10000 [11:41:47<27:06:23, 13.92s/it] 30%|██▉ | 2993/10000 [11:42:01<27:11:45, 13.97s/it] {'loss': 0.1845, 'learning_rate': 3.5055e-05, 'epoch': 3.92} 30%|██▉ | 2993/10000 [11:42:01<27:11:45, 13.97s/it] 30%|██▉ | 2994/10000 [11:42:15<27:13:52, 13.99s/it] {'loss': 0.1776, 'learning_rate': 3.505e-05, 'epoch': 3.92} 30%|██▉ | 2994/10000 [11:42:15<27:13:52, 13.99s/it] 30%|██▉ | 2995/10000 [11:42:29<27:09:33, 13.96s/it] {'loss': 0.1757, 'learning_rate': 3.5045e-05, 'epoch': 3.92} 30%|██▉ | 2995/10000 [11:42:29<27:09:33, 13.96s/it] 30%|██▉ | 2996/10000 [11:42:43<27:09:00, 13.95s/it] {'loss': 0.188, 'learning_rate': 3.504e-05, 'epoch': 3.92} 30%|██▉ | 2996/10000 [11:42:43<27:09:00, 13.95s/it] 30%|██▉ | 2997/10000 [11:42:57<27:06:49, 13.94s/it] {'loss': 0.1647, 'learning_rate': 3.5035e-05, 'epoch': 3.92} 30%|██▉ | 2997/10000 [11:42:57<27:06:49, 13.94s/it] 30%|██▉ | 2998/10000 [11:43:11<27:09:28, 13.96s/it] {'loss': 0.1804, 'learning_rate': 3.503e-05, 'epoch': 3.92} 30%|██▉ | 2998/10000 [11:43:11<27:09:28, 13.96s/it] 30%|██▉ | 2999/10000 [11:43:24<27:06:11, 13.94s/it] {'loss': 0.1851, 'learning_rate': 3.5025000000000004e-05, 'epoch': 3.93} 30%|██▉ | 2999/10000 [11:43:25<27:06:11, 13.94s/it] 30%|███ | 3000/10000 [11:43:38<27:07:09, 13.95s/it] {'loss': 0.1656, 'learning_rate': 3.502e-05, 'epoch': 3.93} 30%|███ | 3000/10000 [11:43:39<27:07:09, 13.95s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 08:01:46,803 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-3000/config.json [INFO|configuration_utils.py:364] 2024-11-04 08:01:46,805 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-3000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 08:02:46,413 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-3000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 08:02:46,415 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-3000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 08:02:46,417 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-3000/special_tokens_map.json [2024-11-04 08:02:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is about to be saved! [2024-11-04 08:02:46,464] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt [2024-11-04 08:02:46,464] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt... [2024-11-04 08:03:58,515] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/mp_rank_00_model_states.pt. [2024-11-04 08:03:58,713] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 08:05:48,014] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 08:05:48,017] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 08:05:48,018] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step3000 is ready now! 30%|███ | 3001/10000 [11:47:53<167:44:08, 86.28s/it] {'loss': 0.1415, 'learning_rate': 3.5015e-05, 'epoch': 3.93} 30%|███ | 3001/10000 [11:47:54<167:44:08, 86.28s/it] 30%|███ | 3002/10000 [11:48:07<125:21:14, 64.49s/it] {'loss': 0.1469, 'learning_rate': 3.5010000000000005e-05, 'epoch': 3.93} 30%|███ | 3002/10000 [11:48:07<125:21:14, 64.49s/it] 30%|███ | 3003/10000 [11:48:21<95:45:50, 49.27s/it] {'loss': 0.1665, 'learning_rate': 3.5005e-05, 'epoch': 3.93} 30%|███ | 3003/10000 [11:48:21<95:45:50, 49.27s/it] 30%|███ | 3004/10000 [11:48:35<75:02:41, 38.62s/it] {'loss': 0.1273, 'learning_rate': 3.5e-05, 'epoch': 3.93} 30%|███ | 3004/10000 [11:48:35<75:02:41, 38.62s/it] 30%|███ | 3005/10000 [11:48:48<60:32:53, 31.16s/it] {'loss': 0.1593, 'learning_rate': 3.4995e-05, 'epoch': 3.93} 30%|███ | 3005/10000 [11:48:48<60:32:53, 31.16s/it] 30%|███ | 3006/10000 [11:49:02<50:26:01, 25.96s/it] {'loss': 0.1519, 'learning_rate': 3.499e-05, 'epoch': 3.93} 30%|███ | 3006/10000 [11:49:02<50:26:01, 25.96s/it] 30%|███ | 3007/10000 [11:49:16<43:25:11, 22.35s/it] {'loss': 0.1719, 'learning_rate': 3.4985e-05, 'epoch': 3.94} 30%|███ | 3007/10000 [11:49:16<43:25:11, 22.35s/it] 30%|███ | 3008/10000 [11:49:30<38:29:03, 19.81s/it] {'loss': 0.1685, 'learning_rate': 3.498e-05, 'epoch': 3.94} 30%|███ | 3008/10000 [11:49:30<38:29:03, 19.81s/it] 30%|███ | 3009/10000 [11:49:44<35:01:08, 18.03s/it] {'loss': 0.1522, 'learning_rate': 3.4975e-05, 'epoch': 3.94} 30%|███ | 3009/10000 [11:49:44<35:01:08, 18.03s/it] 30%|███ | 3010/10000 [11:49:58<32:38:03, 16.81s/it] {'loss': 0.1634, 'learning_rate': 3.4970000000000006e-05, 'epoch': 3.94} 30%|███ | 3010/10000 [11:49:58<32:38:03, 16.81s/it] 30%|███ | 3011/10000 [11:50:12<30:52:04, 15.90s/it] {'loss': 0.1465, 'learning_rate': 3.4965e-05, 'epoch': 3.94} 30%|███ | 3011/10000 [11:50:12<30:52:04, 15.90s/it] 30%|███ | 3012/10000 [11:50:26<29:41:25, 15.30s/it] {'loss': 0.2149, 'learning_rate': 3.4960000000000004e-05, 'epoch': 3.94} 30%|███ | 3012/10000 [11:50:26<29:41:25, 15.30s/it] 30%|███ | 3013/10000 [11:50:40<29:01:21, 14.95s/it] {'loss': 0.1657, 'learning_rate': 3.495500000000001e-05, 'epoch': 3.94} 30%|███ | 3013/10000 [11:50:40<29:01:21, 14.95s/it] 30%|███ | 3014/10000 [11:50:54<28:23:52, 14.63s/it] {'loss': 0.2161, 'learning_rate': 3.495e-05, 'epoch': 3.95} 30%|███ | 3014/10000 [11:50:54<28:23:52, 14.63s/it] 30%|███ | 3015/10000 [11:51:07<27:55:37, 14.39s/it] {'loss': 0.1357, 'learning_rate': 3.4945e-05, 'epoch': 3.95} 30%|███ | 3015/10000 [11:51:07<27:55:37, 14.39s/it] 30%|███ | 3016/10000 [11:51:21<27:40:38, 14.27s/it] {'loss': 0.1735, 'learning_rate': 3.494e-05, 'epoch': 3.95} 30%|███ | 3016/10000 [11:51:21<27:40:38, 14.27s/it] 30%|███ | 3017/10000 [11:51:35<27:26:05, 14.14s/it] {'loss': 0.1407, 'learning_rate': 3.4935000000000003e-05, 'epoch': 3.95} 30%|███ | 3017/10000 [11:51:35<27:26:05, 14.14s/it] 30%|███ | 3018/10000 [11:51:49<27:18:00, 14.08s/it] {'loss': 0.1693, 'learning_rate': 3.493e-05, 'epoch': 3.95} 30%|███ | 3018/10000 [11:51:49<27:18:00, 14.08s/it] 30%|███ | 3019/10000 [11:52:03<27:07:37, 13.99s/it] {'loss': 0.1494, 'learning_rate': 3.4925e-05, 'epoch': 3.95} 30%|███ | 3019/10000 [11:52:03<27:07:37, 13.99s/it] 30%|███ | 3020/10000 [11:52:17<27:08:35, 14.00s/it] {'loss': 0.1528, 'learning_rate': 3.4920000000000004e-05, 'epoch': 3.95} 30%|███ | 3020/10000 [11:52:17<27:08:35, 14.00s/it] 30%|███ | 3021/10000 [11:52:31<27:01:49, 13.94s/it] {'loss': 0.1664, 'learning_rate': 3.4915e-05, 'epoch': 3.95} 30%|███ | 3021/10000 [11:52:31<27:01:49, 13.94s/it] 30%|███ | 3022/10000 [11:52:45<26:55:20, 13.89s/it] {'loss': 0.1871, 'learning_rate': 3.491e-05, 'epoch': 3.96} 30%|███ | 3022/10000 [11:52:45<26:55:20, 13.89s/it] 30%|███ | 3023/10000 [11:52:58<26:51:45, 13.86s/it] {'loss': 0.1834, 'learning_rate': 3.4905000000000005e-05, 'epoch': 3.96} 30%|███ | 3023/10000 [11:52:58<26:51:45, 13.86s/it] 30%|███ | 3024/10000 [11:53:12<26:57:28, 13.91s/it] {'loss': 0.1962, 'learning_rate': 3.49e-05, 'epoch': 3.96} 30%|███ | 3024/10000 [11:53:12<26:57:28, 13.91s/it] 30%|███ | 3025/10000 [11:53:26<26:56:09, 13.90s/it] {'loss': 0.1791, 'learning_rate': 3.4895e-05, 'epoch': 3.96} 30%|███ | 3025/10000 [11:53:26<26:56:09, 13.90s/it] 30%|███ | 3026/10000 [11:53:40<26:59:50, 13.94s/it] {'loss': 0.1625, 'learning_rate': 3.489e-05, 'epoch': 3.96} 30%|███ | 3026/10000 [11:53:40<26:59:50, 13.94s/it] 30%|███ | 3027/10000 [11:53:54<26:59:47, 13.94s/it] {'loss': 0.1495, 'learning_rate': 3.4885e-05, 'epoch': 3.96} 30%|███ | 3027/10000 [11:53:54<26:59:47, 13.94s/it] 30%|███ | 3028/10000 [11:54:08<26:59:18, 13.94s/it] {'loss': 0.1725, 'learning_rate': 3.4880000000000005e-05, 'epoch': 3.96} 30%|███ | 3028/10000 [11:54:08<26:59:18, 13.94s/it] 30%|███ | 3029/10000 [11:54:22<26:57:16, 13.92s/it] {'loss': 0.1603, 'learning_rate': 3.4875e-05, 'epoch': 3.96} 30%|███ | 3029/10000 [11:54:22<26:57:16, 13.92s/it] 30%|███ | 3030/10000 [11:54:36<27:00:58, 13.95s/it] {'loss': 0.1696, 'learning_rate': 3.487e-05, 'epoch': 3.97} 30%|███ | 3030/10000 [11:54:36<27:00:58, 13.95s/it] 30%|███ | 3031/10000 [11:54:50<26:58:42, 13.94s/it] {'loss': 0.1568, 'learning_rate': 3.4865000000000006e-05, 'epoch': 3.97} 30%|███ | 3031/10000 [11:54:50<26:58:42, 13.94s/it] 30%|███ | 3032/10000 [11:55:04<27:00:58, 13.96s/it] {'loss': 0.1897, 'learning_rate': 3.486e-05, 'epoch': 3.97} 30%|███ | 3032/10000 [11:55:04<27:00:58, 13.96s/it] 30%|███ | 3033/10000 [11:55:18<26:56:04, 13.92s/it] {'loss': 0.1545, 'learning_rate': 3.4855000000000004e-05, 'epoch': 3.97} 30%|███ | 3033/10000 [11:55:18<26:56:04, 13.92s/it] 30%|███ | 3034/10000 [11:55:32<26:54:37, 13.91s/it] {'loss': 0.1372, 'learning_rate': 3.485e-05, 'epoch': 3.97} 30%|███ | 3034/10000 [11:55:32<26:54:37, 13.91s/it] 30%|███ | 3035/10000 [11:55:46<27:00:43, 13.96s/it] {'loss': 0.1899, 'learning_rate': 3.4845e-05, 'epoch': 3.97} 30%|███ | 3035/10000 [11:55:46<27:00:43, 13.96s/it] 30%|███ | 3036/10000 [11:56:00<26:55:42, 13.92s/it] {'loss': 0.1642, 'learning_rate': 3.484e-05, 'epoch': 3.97} 30%|███ | 3036/10000 [11:56:00<26:55:42, 13.92s/it] 30%|███ | 3037/10000 [11:56:14<26:55:18, 13.92s/it] {'loss': 0.1736, 'learning_rate': 3.4835e-05, 'epoch': 3.98} 30%|███ | 3037/10000 [11:56:14<26:55:18, 13.92s/it] 30%|███ | 3038/10000 [11:56:27<26:54:41, 13.92s/it] {'loss': 0.1559, 'learning_rate': 3.4830000000000004e-05, 'epoch': 3.98} 30%|███ | 3038/10000 [11:56:27<26:54:41, 13.92s/it] 30%|███ | 3039/10000 [11:56:41<26:54:28, 13.92s/it] {'loss': 0.1736, 'learning_rate': 3.4825e-05, 'epoch': 3.98} 30%|███ | 3039/10000 [11:56:41<26:54:28, 13.92s/it] 30%|███ | 3040/10000 [11:56:55<26:54:06, 13.91s/it] {'loss': 0.1598, 'learning_rate': 3.482e-05, 'epoch': 3.98} 30%|███ | 3040/10000 [11:56:55<26:54:06, 13.91s/it] 30%|███ | 3041/10000 [11:57:09<26:54:01, 13.92s/it] {'loss': 0.1881, 'learning_rate': 3.4815000000000005e-05, 'epoch': 3.98} 30%|███ | 3041/10000 [11:57:09<26:54:01, 13.92s/it] 30%|███ | 3042/10000 [11:57:23<26:55:13, 13.93s/it] {'loss': 0.1523, 'learning_rate': 3.481e-05, 'epoch': 3.98} 30%|███ | 3042/10000 [11:57:23<26:55:13, 13.93s/it] 30%|███ | 3043/10000 [11:57:37<26:56:46, 13.94s/it] {'loss': 0.1524, 'learning_rate': 3.4805e-05, 'epoch': 3.98} 30%|███ | 3043/10000 [11:57:37<26:56:46, 13.94s/it] 30%|███ | 3044/10000 [11:57:51<26:51:32, 13.90s/it] {'loss': 0.1946, 'learning_rate': 3.48e-05, 'epoch': 3.98} 30%|███ | 3044/10000 [11:57:51<26:51:32, 13.90s/it] 30%|███ | 3045/10000 [11:58:05<26:50:19, 13.89s/it] {'loss': 0.2023, 'learning_rate': 3.4795e-05, 'epoch': 3.99} 30%|███ | 3045/10000 [11:58:05<26:50:19, 13.89s/it] 30%|███ | 3046/10000 [11:58:19<26:50:21, 13.89s/it] {'loss': 0.1609, 'learning_rate': 3.479e-05, 'epoch': 3.99} 30%|███ | 3046/10000 [11:58:19<26:50:21, 13.89s/it] 30%|███ | 3047/10000 [11:58:33<26:48:56, 13.88s/it] {'loss': 0.1568, 'learning_rate': 3.4785e-05, 'epoch': 3.99} 30%|███ | 3047/10000 [11:58:33<26:48:56, 13.88s/it] 30%|███ | 3048/10000 [11:58:47<26:55:47, 13.95s/it] {'loss': 0.1505, 'learning_rate': 3.478e-05, 'epoch': 3.99} 30%|███ | 3048/10000 [11:58:47<26:55:47, 13.95s/it] 30%|███ | 3049/10000 [11:59:01<26:58:51, 13.97s/it] {'loss': 0.1895, 'learning_rate': 3.4775000000000005e-05, 'epoch': 3.99} 30%|███ | 3049/10000 [11:59:01<26:58:51, 13.97s/it] 30%|███ | 3050/10000 [11:59:15<26:55:30, 13.95s/it] {'loss': 0.1535, 'learning_rate': 3.477e-05, 'epoch': 3.99} 30%|███ | 3050/10000 [11:59:15<26:55:30, 13.95s/it] 31%|███ | 3051/10000 [11:59:28<26:53:07, 13.93s/it] {'loss': 0.1766, 'learning_rate': 3.4765000000000003e-05, 'epoch': 3.99} 31%|███ | 3051/10000 [11:59:29<26:53:07, 13.93s/it] 31%|███ | 3052/10000 [11:59:43<26:57:59, 13.97s/it] {'loss': 0.185, 'learning_rate': 3.4760000000000006e-05, 'epoch': 3.99} 31%|███ | 3052/10000 [11:59:43<26:57:59, 13.97s/it] 31%|███ | 3053/10000 [11:59:56<26:55:07, 13.95s/it] {'loss': 0.1922, 'learning_rate': 3.4755e-05, 'epoch': 4.0} 31%|███ | 3053/10000 [11:59:56<26:55:07, 13.95s/it] 31%|███ | 3054/10000 [12:00:10<26:50:24, 13.91s/it] {'loss': 0.1722, 'learning_rate': 3.475e-05, 'epoch': 4.0} 31%|███ | 3054/10000 [12:00:10<26:50:24, 13.91s/it] 31%|███ | 3055/10000 [12:00:24<26:50:52, 13.92s/it] {'loss': 0.1659, 'learning_rate': 3.4745e-05, 'epoch': 4.0} 31%|███ | 3055/10000 [12:00:24<26:50:52, 13.92s/it] 31%|███ | 3056/10000 [12:00:37<26:05:20, 13.53s/it] {'loss': 0.1645, 'learning_rate': 3.474e-05, 'epoch': 4.0} 31%|███ | 3056/10000 [12:00:37<26:05:20, 13.53s/it] 31%|███ | 3057/10000 [12:00:51<26:18:49, 13.64s/it] {'loss': 0.0775, 'learning_rate': 3.4735e-05, 'epoch': 4.0} 31%|███ | 3057/10000 [12:00:51<26:18:49, 13.64s/it] 31%|███ | 3058/10000 [12:01:05<26:33:18, 13.77s/it] {'loss': 0.0726, 'learning_rate': 3.473e-05, 'epoch': 4.0} 31%|███ | 3058/10000 [12:01:05<26:33:18, 13.77s/it] 31%|███ | 3059/10000 [12:01:19<26:37:41, 13.81s/it] {'loss': 0.0734, 'learning_rate': 3.4725000000000004e-05, 'epoch': 4.0} 31%|███ | 3059/10000 [12:01:19<26:37:41, 13.81s/it] 31%|███ | 3060/10000 [12:01:33<26:44:34, 13.87s/it] {'loss': 0.0812, 'learning_rate': 3.472e-05, 'epoch': 4.01} 31%|███ | 3060/10000 [12:01:33<26:44:34, 13.87s/it] 31%|███ | 3061/10000 [12:01:47<26:42:39, 13.86s/it] {'loss': 0.0678, 'learning_rate': 3.4715e-05, 'epoch': 4.01} 31%|███ | 3061/10000 [12:01:47<26:42:39, 13.86s/it] 31%|███ | 3062/10000 [12:02:01<26:49:09, 13.92s/it] {'loss': 0.0583, 'learning_rate': 3.4710000000000005e-05, 'epoch': 4.01} 31%|███ | 3062/10000 [12:02:01<26:49:09, 13.92s/it] 31%|███ | 3063/10000 [12:02:14<26:48:14, 13.91s/it] {'loss': 0.0584, 'learning_rate': 3.470500000000001e-05, 'epoch': 4.01} 31%|███ | 3063/10000 [12:02:15<26:48:14, 13.91s/it] 31%|███ | 3064/10000 [12:02:28<26:45:59, 13.89s/it] {'loss': 0.0675, 'learning_rate': 3.4699999999999996e-05, 'epoch': 4.01} 31%|███ | 3064/10000 [12:02:28<26:45:59, 13.89s/it] 31%|███ | 3065/10000 [12:02:42<26:48:05, 13.91s/it] {'loss': 0.0739, 'learning_rate': 3.4695e-05, 'epoch': 4.01} 31%|███ | 3065/10000 [12:02:42<26:48:05, 13.91s/it] 31%|███ | 3066/10000 [12:02:56<26:44:38, 13.89s/it] {'loss': 0.0741, 'learning_rate': 3.469e-05, 'epoch': 4.01} 31%|███ | 3066/10000 [12:02:56<26:44:38, 13.89s/it] 31%|███ | 3067/10000 [12:03:10<26:48:07, 13.92s/it] {'loss': 0.0841, 'learning_rate': 3.4685000000000004e-05, 'epoch': 4.01} 31%|███ | 3067/10000 [12:03:10<26:48:07, 13.92s/it] 31%|███ | 3068/10000 [12:03:24<26:46:53, 13.91s/it] {'loss': 0.0783, 'learning_rate': 3.468e-05, 'epoch': 4.02} 31%|███ | 3068/10000 [12:03:24<26:46:53, 13.91s/it] 31%|███ | 3069/10000 [12:03:38<26:46:29, 13.91s/it] {'loss': 0.0887, 'learning_rate': 3.4675e-05, 'epoch': 4.02} 31%|███ | 3069/10000 [12:03:38<26:46:29, 13.91s/it] 31%|███ | 3070/10000 [12:03:52<26:43:22, 13.88s/it] {'loss': 0.0635, 'learning_rate': 3.4670000000000005e-05, 'epoch': 4.02} 31%|███ | 3070/10000 [12:03:52<26:43:22, 13.88s/it] 31%|███ | 3071/10000 [12:04:06<26:46:07, 13.91s/it] {'loss': 0.0784, 'learning_rate': 3.4665e-05, 'epoch': 4.02} 31%|███ | 3071/10000 [12:04:06<26:46:07, 13.91s/it] 31%|███ | 3072/10000 [12:04:19<26:40:31, 13.86s/it] {'loss': 0.074, 'learning_rate': 3.4660000000000004e-05, 'epoch': 4.02} 31%|███ | 3072/10000 [12:04:19<26:40:31, 13.86s/it] 31%|███ | 3073/10000 [12:04:33<26:39:39, 13.86s/it] {'loss': 0.0593, 'learning_rate': 3.4655000000000006e-05, 'epoch': 4.02} 31%|███ | 3073/10000 [12:04:33<26:39:39, 13.86s/it] 31%|███ | 3074/10000 [12:04:47<26:44:50, 13.90s/it] {'loss': 0.0669, 'learning_rate': 3.465e-05, 'epoch': 4.02} 31%|███ | 3074/10000 [12:04:47<26:44:50, 13.90s/it] 31%|███ | 3075/10000 [12:05:01<26:44:22, 13.90s/it] {'loss': 0.0785, 'learning_rate': 3.4645e-05, 'epoch': 4.02} 31%|███ | 3075/10000 [12:05:01<26:44:22, 13.90s/it] 31%|███ | 3076/10000 [12:05:15<26:45:05, 13.91s/it] {'loss': 0.0786, 'learning_rate': 3.464e-05, 'epoch': 4.03} 31%|███ | 3076/10000 [12:05:15<26:45:05, 13.91s/it] 31%|███ | 3077/10000 [12:05:29<26:44:08, 13.90s/it] {'loss': 0.0785, 'learning_rate': 3.4635e-05, 'epoch': 4.03} 31%|███ | 3077/10000 [12:05:29<26:44:08, 13.90s/it] 31%|███ | 3078/10000 [12:05:43<26:42:50, 13.89s/it] {'loss': 0.0547, 'learning_rate': 3.463e-05, 'epoch': 4.03} 31%|███ | 3078/10000 [12:05:43<26:42:50, 13.89s/it] 31%|███ | 3079/10000 [12:05:57<26:42:49, 13.90s/it] {'loss': 0.0771, 'learning_rate': 3.4625e-05, 'epoch': 4.03} 31%|███ | 3079/10000 [12:05:57<26:42:49, 13.90s/it] 31%|███ | 3080/10000 [12:06:11<26:47:53, 13.94s/it] {'loss': 0.0667, 'learning_rate': 3.4620000000000004e-05, 'epoch': 4.03} 31%|███ | 3080/10000 [12:06:11<26:47:53, 13.94s/it] 31%|███ | 3081/10000 [12:06:25<26:44:08, 13.91s/it] {'loss': 0.0583, 'learning_rate': 3.4615e-05, 'epoch': 4.03} 31%|███ | 3081/10000 [12:06:25<26:44:08, 13.91s/it] 31%|███ | 3082/10000 [12:06:38<26:41:19, 13.89s/it] {'loss': 0.0688, 'learning_rate': 3.461e-05, 'epoch': 4.03} 31%|███ | 3082/10000 [12:06:39<26:41:19, 13.89s/it] 31%|███ | 3083/10000 [12:06:53<26:46:22, 13.93s/it] {'loss': 0.0645, 'learning_rate': 3.4605000000000005e-05, 'epoch': 4.04} 31%|███ | 3083/10000 [12:06:53<26:46:22, 13.93s/it] 31%|███ | 3084/10000 [12:07:06<26:42:58, 13.91s/it] {'loss': 0.0651, 'learning_rate': 3.46e-05, 'epoch': 4.04} 31%|███ | 3084/10000 [12:07:06<26:42:58, 13.91s/it] 31%|███ | 3085/10000 [12:07:20<26:40:18, 13.89s/it] {'loss': 0.0684, 'learning_rate': 3.4594999999999997e-05, 'epoch': 4.04} 31%|███ | 3085/10000 [12:07:20<26:40:18, 13.89s/it] 31%|███ | 3086/10000 [12:07:34<26:44:02, 13.92s/it] {'loss': 0.0663, 'learning_rate': 3.459e-05, 'epoch': 4.04} 31%|███ | 3086/10000 [12:07:34<26:44:02, 13.92s/it] 31%|███ | 3087/10000 [12:07:48<26:43:29, 13.92s/it] {'loss': 0.0488, 'learning_rate': 3.4585e-05, 'epoch': 4.04} 31%|███ | 3087/10000 [12:07:48<26:43:29, 13.92s/it] 31%|███ | 3088/10000 [12:08:02<26:46:01, 13.94s/it] {'loss': 0.0643, 'learning_rate': 3.4580000000000004e-05, 'epoch': 4.04} 31%|███ | 3088/10000 [12:08:02<26:46:01, 13.94s/it] 31%|███ | 3089/10000 [12:08:16<26:45:11, 13.94s/it] {'loss': 0.0636, 'learning_rate': 3.4575e-05, 'epoch': 4.04} 31%|███ | 3089/10000 [12:08:16<26:45:11, 13.94s/it] 31%|███ | 3090/10000 [12:08:30<26:44:16, 13.93s/it] {'loss': 0.0623, 'learning_rate': 3.457e-05, 'epoch': 4.04} 31%|███ | 3090/10000 [12:08:30<26:44:16, 13.93s/it] 31%|███ | 3091/10000 [12:08:44<26:42:02, 13.91s/it] {'loss': 0.0858, 'learning_rate': 3.4565000000000005e-05, 'epoch': 4.05} 31%|███ | 3091/10000 [12:08:44<26:42:02, 13.91s/it] 31%|███ | 3092/10000 [12:08:58<26:44:16, 13.93s/it] {'loss': 0.0679, 'learning_rate': 3.456e-05, 'epoch': 4.05} 31%|███ | 3092/10000 [12:08:58<26:44:16, 13.93s/it] 31%|███ | 3093/10000 [12:09:12<26:42:39, 13.92s/it] {'loss': 0.0628, 'learning_rate': 3.4555000000000004e-05, 'epoch': 4.05} 31%|███ | 3093/10000 [12:09:12<26:42:39, 13.92s/it] 31%|███ | 3094/10000 [12:09:26<26:41:27, 13.91s/it] {'loss': 0.0679, 'learning_rate': 3.455e-05, 'epoch': 4.05} 31%|███ | 3094/10000 [12:09:26<26:41:27, 13.91s/it] 31%|███ | 3095/10000 [12:09:39<26:39:30, 13.90s/it] {'loss': 0.0543, 'learning_rate': 3.4545e-05, 'epoch': 4.05} 31%|███ | 3095/10000 [12:09:40<26:39:30, 13.90s/it] 31%|███ | 3096/10000 [12:09:54<26:44:26, 13.94s/it] {'loss': 0.0667, 'learning_rate': 3.454e-05, 'epoch': 4.05} 31%|███ | 3096/10000 [12:09:54<26:44:26, 13.94s/it] 31%|███ | 3097/10000 [12:10:07<26:42:41, 13.93s/it] {'loss': 0.0521, 'learning_rate': 3.4535e-05, 'epoch': 4.05} 31%|███ | 3097/10000 [12:10:07<26:42:41, 13.93s/it] 31%|███ | 3098/10000 [12:10:21<26:41:08, 13.92s/it] {'loss': 0.0664, 'learning_rate': 3.453e-05, 'epoch': 4.05} 31%|███ | 3098/10000 [12:10:21<26:41:08, 13.92s/it] 31%|███ | 3099/10000 [12:10:35<26:41:29, 13.92s/it] {'loss': 0.0513, 'learning_rate': 3.4525e-05, 'epoch': 4.06} 31%|███ | 3099/10000 [12:10:35<26:41:29, 13.92s/it] 31%|███ | 3100/10000 [12:10:49<26:42:46, 13.94s/it] {'loss': 0.0706, 'learning_rate': 3.452e-05, 'epoch': 4.06} 31%|███ | 3100/10000 [12:10:49<26:42:46, 13.94s/it] 31%|███ | 3101/10000 [12:11:03<26:38:30, 13.90s/it] {'loss': 0.0524, 'learning_rate': 3.4515000000000004e-05, 'epoch': 4.06} 31%|███ | 3101/10000 [12:11:03<26:38:30, 13.90s/it] 31%|███ | 3102/10000 [12:11:17<26:39:10, 13.91s/it] {'loss': 0.0566, 'learning_rate': 3.451000000000001e-05, 'epoch': 4.06} 31%|███ | 3102/10000 [12:11:17<26:39:10, 13.91s/it] 31%|███ | 3103/10000 [12:11:31<26:38:44, 13.91s/it] {'loss': 0.0711, 'learning_rate': 3.4505e-05, 'epoch': 4.06} 31%|███ | 3103/10000 [12:11:31<26:38:44, 13.91s/it] 31%|███ | 3104/10000 [12:11:45<26:42:05, 13.94s/it] {'loss': 0.0733, 'learning_rate': 3.45e-05, 'epoch': 4.06} 31%|███ | 3104/10000 [12:11:45<26:42:05, 13.94s/it] 31%|███ | 3105/10000 [12:11:59<26:40:24, 13.93s/it] {'loss': 0.0663, 'learning_rate': 3.4495e-05, 'epoch': 4.06} 31%|███ | 3105/10000 [12:11:59<26:40:24, 13.93s/it] 31%|███ | 3106/10000 [12:12:13<26:46:06, 13.98s/it] {'loss': 0.0533, 'learning_rate': 3.449e-05, 'epoch': 4.07} 31%|███ | 3106/10000 [12:12:13<26:46:06, 13.98s/it] 31%|███ | 3107/10000 [12:12:27<26:41:23, 13.94s/it] {'loss': 0.0506, 'learning_rate': 3.4485e-05, 'epoch': 4.07} 31%|███ | 3107/10000 [12:12:27<26:41:23, 13.94s/it] 31%|███ | 3108/10000 [12:12:41<26:38:48, 13.92s/it] {'loss': 0.0709, 'learning_rate': 3.448e-05, 'epoch': 4.07} 31%|███ | 3108/10000 [12:12:41<26:38:48, 13.92s/it] 31%|███ | 3109/10000 [12:12:54<26:34:47, 13.89s/it] {'loss': 0.0726, 'learning_rate': 3.4475000000000005e-05, 'epoch': 4.07} 31%|███ | 3109/10000 [12:12:54<26:34:47, 13.89s/it] 31%|███ | 3110/10000 [12:13:08<26:36:25, 13.90s/it] {'loss': 0.0646, 'learning_rate': 3.447e-05, 'epoch': 4.07} 31%|███ | 3110/10000 [12:13:08<26:36:25, 13.90s/it] 31%|███ | 3111/10000 [12:13:22<26:35:25, 13.90s/it] {'loss': 0.0585, 'learning_rate': 3.4465e-05, 'epoch': 4.07} 31%|███ | 3111/10000 [12:13:22<26:35:25, 13.90s/it] 31%|███ | 3112/10000 [12:13:36<26:38:06, 13.92s/it] {'loss': 0.0661, 'learning_rate': 3.4460000000000005e-05, 'epoch': 4.07} 31%|███ | 3112/10000 [12:13:36<26:38:06, 13.92s/it] 31%|███ | 3113/10000 [12:13:50<26:34:49, 13.89s/it] {'loss': 0.0539, 'learning_rate': 3.4455e-05, 'epoch': 4.07} 31%|███ | 3113/10000 [12:13:50<26:34:49, 13.89s/it] 31%|███ | 3114/10000 [12:14:04<26:31:15, 13.87s/it] {'loss': 0.0573, 'learning_rate': 3.445e-05, 'epoch': 4.08} 31%|███ | 3114/10000 [12:14:04<26:31:15, 13.87s/it] 31%|███ | 3115/10000 [12:14:18<26:35:17, 13.90s/it] {'loss': 0.0767, 'learning_rate': 3.4445e-05, 'epoch': 4.08} 31%|███ | 3115/10000 [12:14:18<26:35:17, 13.90s/it] 31%|███ | 3116/10000 [12:14:32<26:32:32, 13.88s/it] {'loss': 0.0558, 'learning_rate': 3.444e-05, 'epoch': 4.08} 31%|███ | 3116/10000 [12:14:32<26:32:32, 13.88s/it] 31%|███ | 3117/10000 [12:14:46<26:32:26, 13.88s/it] {'loss': 0.0558, 'learning_rate': 3.4435e-05, 'epoch': 4.08} 31%|███ | 3117/10000 [12:14:46<26:32:26, 13.88s/it] 31%|███ | 3118/10000 [12:14:59<26:34:30, 13.90s/it] {'loss': 0.0652, 'learning_rate': 3.443e-05, 'epoch': 4.08} 31%|███ | 3118/10000 [12:15:00<26:34:30, 13.90s/it] 31%|███ | 3119/10000 [12:15:13<26:36:33, 13.92s/it] {'loss': 0.0586, 'learning_rate': 3.4425e-05, 'epoch': 4.08} 31%|███ | 3119/10000 [12:15:13<26:36:33, 13.92s/it] 31%|███ | 3120/10000 [12:15:27<26:33:39, 13.90s/it] {'loss': 0.0649, 'learning_rate': 3.442e-05, 'epoch': 4.08} 31%|███ | 3120/10000 [12:15:27<26:33:39, 13.90s/it] 31%|███ | 3121/10000 [12:15:41<26:31:00, 13.88s/it] {'loss': 0.0554, 'learning_rate': 3.4415e-05, 'epoch': 4.09} 31%|███ | 3121/10000 [12:15:41<26:31:00, 13.88s/it] 31%|███ | 3122/10000 [12:15:55<26:32:39, 13.89s/it] {'loss': 0.0767, 'learning_rate': 3.4410000000000004e-05, 'epoch': 4.09} 31%|███ | 3122/10000 [12:15:55<26:32:39, 13.89s/it] 31%|███ | 3123/10000 [12:16:09<26:30:10, 13.87s/it] {'loss': 0.0704, 'learning_rate': 3.440500000000001e-05, 'epoch': 4.09} 31%|███ | 3123/10000 [12:16:09<26:30:10, 13.87s/it] 31%|███ | 3124/10000 [12:16:23<26:29:20, 13.87s/it] {'loss': 0.0773, 'learning_rate': 3.4399999999999996e-05, 'epoch': 4.09} 31%|███ | 3124/10000 [12:16:23<26:29:20, 13.87s/it] 31%|███▏ | 3125/10000 [12:16:37<26:26:28, 13.85s/it] {'loss': 0.0845, 'learning_rate': 3.4395e-05, 'epoch': 4.09} 31%|███▏ | 3125/10000 [12:16:37<26:26:28, 13.85s/it] 31%|███▏ | 3126/10000 [12:16:50<26:26:10, 13.84s/it] {'loss': 0.0513, 'learning_rate': 3.439e-05, 'epoch': 4.09} 31%|███▏ | 3126/10000 [12:16:50<26:26:10, 13.84s/it] 31%|███▏ | 3127/10000 [12:17:04<26:25:41, 13.84s/it] {'loss': 0.0545, 'learning_rate': 3.4385000000000004e-05, 'epoch': 4.09} 31%|███▏ | 3127/10000 [12:17:04<26:25:41, 13.84s/it] 31%|███▏ | 3128/10000 [12:17:18<26:29:43, 13.88s/it] {'loss': 0.0615, 'learning_rate': 3.438e-05, 'epoch': 4.09} 31%|███▏ | 3128/10000 [12:17:18<26:29:43, 13.88s/it] 31%|███▏ | 3129/10000 [12:17:32<26:29:12, 13.88s/it] {'loss': 0.0584, 'learning_rate': 3.4375e-05, 'epoch': 4.1} 31%|███▏ | 3129/10000 [12:17:32<26:29:12, 13.88s/it] 31%|███▏ | 3130/10000 [12:17:46<26:29:12, 13.88s/it] {'loss': 0.0565, 'learning_rate': 3.4370000000000005e-05, 'epoch': 4.1} 31%|███▏ | 3130/10000 [12:17:46<26:29:12, 13.88s/it] 31%|███▏ | 3131/10000 [12:18:00<26:35:43, 13.94s/it] {'loss': 0.0634, 'learning_rate': 3.4365e-05, 'epoch': 4.1} 31%|███▏ | 3131/10000 [12:18:00<26:35:43, 13.94s/it] 31%|███▏ | 3132/10000 [12:18:14<26:35:15, 13.94s/it] {'loss': 0.0718, 'learning_rate': 3.436e-05, 'epoch': 4.1} 31%|███▏ | 3132/10000 [12:18:14<26:35:15, 13.94s/it] 31%|███▏ | 3133/10000 [12:18:28<26:36:22, 13.95s/it] {'loss': 0.062, 'learning_rate': 3.4355000000000006e-05, 'epoch': 4.1} 31%|███▏ | 3133/10000 [12:18:28<26:36:22, 13.95s/it] 31%|███▏ | 3134/10000 [12:18:42<26:33:53, 13.93s/it] {'loss': 0.0759, 'learning_rate': 3.435e-05, 'epoch': 4.1} 31%|███▏ | 3134/10000 [12:18:42<26:33:53, 13.93s/it] 31%|███▏ | 3135/10000 [12:18:56<26:32:27, 13.92s/it] {'loss': 0.0616, 'learning_rate': 3.4345e-05, 'epoch': 4.1} 31%|███▏ | 3135/10000 [12:18:56<26:32:27, 13.92s/it] 31%|███▏ | 3136/10000 [12:19:10<26:32:01, 13.92s/it] {'loss': 0.0587, 'learning_rate': 3.434e-05, 'epoch': 4.1} 31%|███▏ | 3136/10000 [12:19:10<26:32:01, 13.92s/it] 31%|███▏ | 3137/10000 [12:19:24<26:36:18, 13.96s/it] {'loss': 0.0639, 'learning_rate': 3.4335e-05, 'epoch': 4.11} 31%|███▏ | 3137/10000 [12:19:24<26:36:18, 13.96s/it] 31%|███▏ | 3138/10000 [12:19:38<26:32:56, 13.93s/it] {'loss': 0.0708, 'learning_rate': 3.433e-05, 'epoch': 4.11} 31%|███▏ | 3138/10000 [12:19:38<26:32:56, 13.93s/it] 31%|███▏ | 3139/10000 [12:19:51<26:29:56, 13.90s/it] {'loss': 0.0912, 'learning_rate': 3.4325e-05, 'epoch': 4.11} 31%|███▏ | 3139/10000 [12:19:51<26:29:56, 13.90s/it] 31%|███▏ | 3140/10000 [12:20:05<26:29:42, 13.90s/it] {'loss': 0.0571, 'learning_rate': 3.4320000000000003e-05, 'epoch': 4.11} 31%|███▏ | 3140/10000 [12:20:05<26:29:42, 13.90s/it] 31%|███▏ | 3141/10000 [12:20:19<26:30:58, 13.92s/it] {'loss': 0.0643, 'learning_rate': 3.4315000000000006e-05, 'epoch': 4.11} 31%|███▏ | 3141/10000 [12:20:19<26:30:58, 13.92s/it] 31%|███▏ | 3142/10000 [12:20:33<26:34:48, 13.95s/it] {'loss': 0.0564, 'learning_rate': 3.431e-05, 'epoch': 4.11} 31%|███▏ | 3142/10000 [12:20:33<26:34:48, 13.95s/it] 31%|███▏ | 3143/10000 [12:20:47<26:35:14, 13.96s/it] {'loss': 0.076, 'learning_rate': 3.4305000000000004e-05, 'epoch': 4.11} 31%|███▏ | 3143/10000 [12:20:47<26:35:14, 13.96s/it] 31%|███▏ | 3144/10000 [12:21:01<26:33:32, 13.95s/it] {'loss': 0.0652, 'learning_rate': 3.430000000000001e-05, 'epoch': 4.12} 31%|███▏ | 3144/10000 [12:21:01<26:33:32, 13.95s/it] 31%|███▏ | 3145/10000 [12:21:15<26:28:57, 13.91s/it] {'loss': 0.0692, 'learning_rate': 3.4294999999999996e-05, 'epoch': 4.12} 31%|███▏ | 3145/10000 [12:21:15<26:28:57, 13.91s/it] 31%|███▏ | 3146/10000 [12:21:29<26:27:32, 13.90s/it] {'loss': 0.0599, 'learning_rate': 3.429e-05, 'epoch': 4.12} 31%|███▏ | 3146/10000 [12:21:29<26:27:32, 13.90s/it] 31%|███▏ | 3147/10000 [12:21:43<26:28:30, 13.91s/it] {'loss': 0.0776, 'learning_rate': 3.4285e-05, 'epoch': 4.12} 31%|███▏ | 3147/10000 [12:21:43<26:28:30, 13.91s/it] 31%|███▏ | 3148/10000 [12:21:57<26:27:18, 13.90s/it] {'loss': 0.0618, 'learning_rate': 3.4280000000000004e-05, 'epoch': 4.12} 31%|███▏ | 3148/10000 [12:21:57<26:27:18, 13.90s/it] 31%|███▏ | 3149/10000 [12:22:11<26:29:35, 13.92s/it] {'loss': 0.0598, 'learning_rate': 3.4275e-05, 'epoch': 4.12} 31%|███▏ | 3149/10000 [12:22:11<26:29:35, 13.92s/it] 32%|███▏ | 3150/10000 [12:22:25<26:34:11, 13.96s/it] {'loss': 0.0606, 'learning_rate': 3.427e-05, 'epoch': 4.12} 32%|███▏ | 3150/10000 [12:22:25<26:34:11, 13.96s/it] 32%|███▏ | 3151/10000 [12:22:38<26:28:05, 13.91s/it] {'loss': 0.0627, 'learning_rate': 3.4265000000000005e-05, 'epoch': 4.12} 32%|███▏ | 3151/10000 [12:22:39<26:28:05, 13.91s/it] 32%|███▏ | 3152/10000 [12:22:52<26:28:54, 13.92s/it] {'loss': 0.0636, 'learning_rate': 3.426e-05, 'epoch': 4.13} 32%|███▏ | 3152/10000 [12:22:52<26:28:54, 13.92s/it] 32%|███▏ | 3153/10000 [12:23:06<26:29:21, 13.93s/it] {'loss': 0.0623, 'learning_rate': 3.4255e-05, 'epoch': 4.13} 32%|███▏ | 3153/10000 [12:23:06<26:29:21, 13.93s/it] 32%|███▏ | 3154/10000 [12:23:20<26:28:04, 13.92s/it] {'loss': 0.0498, 'learning_rate': 3.4250000000000006e-05, 'epoch': 4.13} 32%|███▏ | 3154/10000 [12:23:20<26:28:04, 13.92s/it] 32%|███▏ | 3155/10000 [12:23:34<26:26:55, 13.91s/it] {'loss': 0.0603, 'learning_rate': 3.4245e-05, 'epoch': 4.13} 32%|███▏ | 3155/10000 [12:23:34<26:26:55, 13.91s/it] 32%|███▏ | 3156/10000 [12:23:48<26:24:38, 13.89s/it] {'loss': 0.0851, 'learning_rate': 3.424e-05, 'epoch': 4.13} 32%|███▏ | 3156/10000 [12:23:48<26:24:38, 13.89s/it] 32%|███▏ | 3157/10000 [12:24:02<26:25:57, 13.91s/it] {'loss': 0.0575, 'learning_rate': 3.4235e-05, 'epoch': 4.13} 32%|███▏ | 3157/10000 [12:24:02<26:25:57, 13.91s/it] 32%|███▏ | 3158/10000 [12:24:16<26:23:45, 13.89s/it] {'loss': 0.0544, 'learning_rate': 3.423e-05, 'epoch': 4.13} 32%|███▏ | 3158/10000 [12:24:16<26:23:45, 13.89s/it] 32%|███▏ | 3159/10000 [12:24:30<26:25:44, 13.91s/it] {'loss': 0.0658, 'learning_rate': 3.4225e-05, 'epoch': 4.13} 32%|███▏ | 3159/10000 [12:24:30<26:25:44, 13.91s/it] 32%|███▏ | 3160/10000 [12:24:44<26:23:31, 13.89s/it] {'loss': 0.0551, 'learning_rate': 3.422e-05, 'epoch': 4.14} 32%|███▏ | 3160/10000 [12:24:44<26:23:31, 13.89s/it] 32%|███▏ | 3161/10000 [12:24:57<26:23:29, 13.89s/it] {'loss': 0.0628, 'learning_rate': 3.4215000000000004e-05, 'epoch': 4.14} 32%|███▏ | 3161/10000 [12:24:58<26:23:29, 13.89s/it] 32%|███▏ | 3162/10000 [12:25:11<26:23:15, 13.89s/it] {'loss': 0.0805, 'learning_rate': 3.4210000000000006e-05, 'epoch': 4.14} 32%|███▏ | 3162/10000 [12:25:11<26:23:15, 13.89s/it] 32%|███▏ | 3163/10000 [12:25:25<26:24:22, 13.90s/it] {'loss': 0.0698, 'learning_rate': 3.4205e-05, 'epoch': 4.14} 32%|███▏ | 3163/10000 [12:25:25<26:24:22, 13.90s/it] 32%|███▏ | 3164/10000 [12:25:39<26:24:08, 13.90s/it] {'loss': 0.0388, 'learning_rate': 3.4200000000000005e-05, 'epoch': 4.14} 32%|███▏ | 3164/10000 [12:25:39<26:24:08, 13.90s/it] 32%|███▏ | 3165/10000 [12:25:53<26:25:24, 13.92s/it] {'loss': 0.071, 'learning_rate': 3.4195e-05, 'epoch': 4.14} 32%|███▏ | 3165/10000 [12:25:53<26:25:24, 13.92s/it] 32%|███▏ | 3166/10000 [12:26:07<26:25:33, 13.92s/it] {'loss': 0.0588, 'learning_rate': 3.419e-05, 'epoch': 4.14} 32%|███▏ | 3166/10000 [12:26:07<26:25:33, 13.92s/it] 32%|███▏ | 3167/10000 [12:26:21<26:22:59, 13.90s/it] {'loss': 0.0578, 'learning_rate': 3.4185e-05, 'epoch': 4.15} 32%|███▏ | 3167/10000 [12:26:21<26:22:59, 13.90s/it] 32%|███▏ | 3168/10000 [12:26:35<26:22:27, 13.90s/it] {'loss': 0.0749, 'learning_rate': 3.418e-05, 'epoch': 4.15} 32%|███▏ | 3168/10000 [12:26:35<26:22:27, 13.90s/it] 32%|███▏ | 3169/10000 [12:26:49<26:26:00, 13.93s/it] {'loss': 0.0649, 'learning_rate': 3.4175000000000004e-05, 'epoch': 4.15} 32%|███▏ | 3169/10000 [12:26:49<26:26:00, 13.93s/it] 32%|███▏ | 3170/10000 [12:27:03<26:24:20, 13.92s/it] {'loss': 0.0759, 'learning_rate': 3.417e-05, 'epoch': 4.15} 32%|███▏ | 3170/10000 [12:27:03<26:24:20, 13.92s/it] 32%|███▏ | 3171/10000 [12:27:17<26:25:18, 13.93s/it] {'loss': 0.0627, 'learning_rate': 3.4165e-05, 'epoch': 4.15} 32%|███▏ | 3171/10000 [12:27:17<26:25:18, 13.93s/it] 32%|███▏ | 3172/10000 [12:27:31<26:25:41, 13.93s/it] {'loss': 0.0572, 'learning_rate': 3.4160000000000005e-05, 'epoch': 4.15} 32%|███▏ | 3172/10000 [12:27:31<26:25:41, 13.93s/it] 32%|███▏ | 3173/10000 [12:27:45<26:27:17, 13.95s/it] {'loss': 0.0743, 'learning_rate': 3.4155e-05, 'epoch': 4.15} 32%|███▏ | 3173/10000 [12:27:45<26:27:17, 13.95s/it] 32%|███▏ | 3174/10000 [12:27:58<26:24:39, 13.93s/it] {'loss': 0.0668, 'learning_rate': 3.415e-05, 'epoch': 4.15} 32%|███▏ | 3174/10000 [12:27:59<26:24:39, 13.93s/it] 32%|███▏ | 3175/10000 [12:28:12<26:24:07, 13.93s/it] {'loss': 0.0596, 'learning_rate': 3.4145e-05, 'epoch': 4.16} 32%|███▏ | 3175/10000 [12:28:12<26:24:07, 13.93s/it] 32%|███▏ | 3176/10000 [12:28:26<26:27:05, 13.95s/it] {'loss': 0.0774, 'learning_rate': 3.414e-05, 'epoch': 4.16} 32%|███▏ | 3176/10000 [12:28:26<26:27:05, 13.95s/it] 32%|███▏ | 3177/10000 [12:28:40<26:23:29, 13.92s/it] {'loss': 0.0683, 'learning_rate': 3.4135e-05, 'epoch': 4.16} 32%|███▏ | 3177/10000 [12:28:40<26:23:29, 13.92s/it] 32%|███▏ | 3178/10000 [12:28:54<26:20:21, 13.90s/it] {'loss': 0.0602, 'learning_rate': 3.413e-05, 'epoch': 4.16} 32%|███▏ | 3178/10000 [12:28:54<26:20:21, 13.90s/it] 32%|███▏ | 3179/10000 [12:29:08<26:25:03, 13.94s/it] {'loss': 0.0524, 'learning_rate': 3.4125e-05, 'epoch': 4.16} 32%|███▏ | 3179/10000 [12:29:08<26:25:03, 13.94s/it] 32%|███▏ | 3180/10000 [12:29:22<26:24:52, 13.94s/it] {'loss': 0.0667, 'learning_rate': 3.412e-05, 'epoch': 4.16} 32%|███▏ | 3180/10000 [12:29:22<26:24:52, 13.94s/it] 32%|███▏ | 3181/10000 [12:29:36<26:25:04, 13.95s/it] {'loss': 0.0539, 'learning_rate': 3.4115e-05, 'epoch': 4.16} 32%|███▏ | 3181/10000 [12:29:36<26:25:04, 13.95s/it] 32%|███▏ | 3182/10000 [12:29:50<26:21:46, 13.92s/it] {'loss': 0.068, 'learning_rate': 3.4110000000000004e-05, 'epoch': 4.16} 32%|███▏ | 3182/10000 [12:29:50<26:21:46, 13.92s/it] 32%|███▏ | 3183/10000 [12:30:04<26:19:36, 13.90s/it] {'loss': 0.0614, 'learning_rate': 3.4105000000000006e-05, 'epoch': 4.17} 32%|███▏ | 3183/10000 [12:30:04<26:19:36, 13.90s/it] 32%|███▏ | 3184/10000 [12:30:18<26:19:08, 13.90s/it] {'loss': 0.0737, 'learning_rate': 3.41e-05, 'epoch': 4.17} 32%|███▏ | 3184/10000 [12:30:18<26:19:08, 13.90s/it] 32%|███▏ | 3185/10000 [12:30:32<26:16:42, 13.88s/it] {'loss': 0.0668, 'learning_rate': 3.4095e-05, 'epoch': 4.17} 32%|███▏ | 3185/10000 [12:30:32<26:16:42, 13.88s/it] 32%|███▏ | 3186/10000 [12:30:45<26:17:47, 13.89s/it] {'loss': 0.0651, 'learning_rate': 3.409e-05, 'epoch': 4.17} 32%|███▏ | 3186/10000 [12:30:45<26:17:47, 13.89s/it] 32%|███▏ | 3187/10000 [12:30:59<26:16:16, 13.88s/it] {'loss': 0.0557, 'learning_rate': 3.4085e-05, 'epoch': 4.17} 32%|███▏ | 3187/10000 [12:30:59<26:16:16, 13.88s/it] 32%|███▏ | 3188/10000 [12:31:13<26:20:11, 13.92s/it] {'loss': 0.0641, 'learning_rate': 3.408e-05, 'epoch': 4.17} 32%|███▏ | 3188/10000 [12:31:13<26:20:11, 13.92s/it] 32%|███▏ | 3189/10000 [12:31:27<26:15:32, 13.88s/it] {'loss': 0.0429, 'learning_rate': 3.4075e-05, 'epoch': 4.17} 32%|███▏ | 3189/10000 [12:31:27<26:15:32, 13.88s/it] 32%|███▏ | 3190/10000 [12:31:41<26:16:50, 13.89s/it] {'loss': 0.0534, 'learning_rate': 3.4070000000000004e-05, 'epoch': 4.18} 32%|███▏ | 3190/10000 [12:31:41<26:16:50, 13.89s/it] 32%|███▏ | 3191/10000 [12:31:55<26:16:48, 13.89s/it] {'loss': 0.0726, 'learning_rate': 3.4065e-05, 'epoch': 4.18} 32%|███▏ | 3191/10000 [12:31:55<26:16:48, 13.89s/it] 32%|███▏ | 3192/10000 [12:32:09<26:21:24, 13.94s/it] {'loss': 0.0645, 'learning_rate': 3.406e-05, 'epoch': 4.18} 32%|███▏ | 3192/10000 [12:32:09<26:21:24, 13.94s/it] 32%|███▏ | 3193/10000 [12:32:23<26:20:45, 13.93s/it] {'loss': 0.0604, 'learning_rate': 3.4055000000000005e-05, 'epoch': 4.18} 32%|███▏ | 3193/10000 [12:32:23<26:20:45, 13.93s/it] 32%|███▏ | 3194/10000 [12:32:37<26:16:21, 13.90s/it] {'loss': 0.0744, 'learning_rate': 3.405e-05, 'epoch': 4.18} 32%|███▏ | 3194/10000 [12:32:37<26:16:21, 13.90s/it] 32%|███▏ | 3195/10000 [12:32:51<26:18:41, 13.92s/it] {'loss': 0.0628, 'learning_rate': 3.4045e-05, 'epoch': 4.18} 32%|███▏ | 3195/10000 [12:32:51<26:18:41, 13.92s/it] 32%|███▏ | 3196/10000 [12:33:05<26:19:25, 13.93s/it] {'loss': 0.0592, 'learning_rate': 3.404e-05, 'epoch': 4.18} 32%|███▏ | 3196/10000 [12:33:05<26:19:25, 13.93s/it] 32%|███▏ | 3197/10000 [12:33:18<26:17:26, 13.91s/it] {'loss': 0.0728, 'learning_rate': 3.4035e-05, 'epoch': 4.18} 32%|███▏ | 3197/10000 [12:33:19<26:17:26, 13.91s/it] 32%|███▏ | 3198/10000 [12:33:33<26:23:50, 13.97s/it] {'loss': 0.0555, 'learning_rate': 3.403e-05, 'epoch': 4.19} 32%|███▏ | 3198/10000 [12:33:33<26:23:50, 13.97s/it] 32%|███▏ | 3199/10000 [12:33:46<26:21:12, 13.95s/it] {'loss': 0.0645, 'learning_rate': 3.4025e-05, 'epoch': 4.19} 32%|███▏ | 3199/10000 [12:33:47<26:21:12, 13.95s/it] 32%|███▏ | 3200/10000 [12:34:00<26:19:56, 13.94s/it] {'loss': 0.0696, 'learning_rate': 3.402e-05, 'epoch': 4.19} 32%|███▏ | 3200/10000 [12:34:00<26:19:56, 13.94s/it] 32%|███▏ | 3201/10000 [12:34:15<26:26:34, 14.00s/it] {'loss': 0.0636, 'learning_rate': 3.4015000000000006e-05, 'epoch': 4.19} 32%|███▏ | 3201/10000 [12:34:15<26:26:34, 14.00s/it] 32%|███▏ | 3202/10000 [12:34:28<26:24:27, 13.98s/it] {'loss': 0.0602, 'learning_rate': 3.401e-05, 'epoch': 4.19} 32%|███▏ | 3202/10000 [12:34:29<26:24:27, 13.98s/it] 32%|███▏ | 3203/10000 [12:34:42<26:21:04, 13.96s/it] {'loss': 0.0634, 'learning_rate': 3.4005000000000004e-05, 'epoch': 4.19} 32%|███▏ | 3203/10000 [12:34:42<26:21:04, 13.96s/it] 32%|███▏ | 3204/10000 [12:34:56<26:20:40, 13.96s/it] {'loss': 0.0777, 'learning_rate': 3.4000000000000007e-05, 'epoch': 4.19} 32%|███▏ | 3204/10000 [12:34:56<26:20:40, 13.96s/it] 32%|███▏ | 3205/10000 [12:35:10<26:20:11, 13.95s/it] {'loss': 0.0619, 'learning_rate': 3.3995e-05, 'epoch': 4.2} 32%|███▏ | 3205/10000 [12:35:10<26:20:11, 13.95s/it] 32%|███▏ | 3206/10000 [12:35:24<26:21:17, 13.96s/it] {'loss': 0.0689, 'learning_rate': 3.399e-05, 'epoch': 4.2} 32%|███▏ | 3206/10000 [12:35:24<26:21:17, 13.96s/it] 32%|███▏ | 3207/10000 [12:35:38<26:16:56, 13.93s/it] {'loss': 0.0668, 'learning_rate': 3.3985e-05, 'epoch': 4.2} 32%|███▏ | 3207/10000 [12:35:38<26:16:56, 13.93s/it] 32%|███▏ | 3208/10000 [12:35:52<26:14:17, 13.91s/it] {'loss': 0.054, 'learning_rate': 3.398e-05, 'epoch': 4.2} 32%|███▏ | 3208/10000 [12:35:52<26:14:17, 13.91s/it] 32%|███▏ | 3209/10000 [12:36:06<26:19:45, 13.96s/it] {'loss': 0.0553, 'learning_rate': 3.3975e-05, 'epoch': 4.2} 32%|███▏ | 3209/10000 [12:36:06<26:19:45, 13.96s/it] 32%|███▏ | 3210/10000 [12:36:20<26:18:01, 13.94s/it] {'loss': 0.0771, 'learning_rate': 3.397e-05, 'epoch': 4.2} 32%|███▏ | 3210/10000 [12:36:20<26:18:01, 13.94s/it] 32%|███▏ | 3211/10000 [12:36:34<26:20:32, 13.97s/it] {'loss': 0.0689, 'learning_rate': 3.3965000000000004e-05, 'epoch': 4.2} 32%|███▏ | 3211/10000 [12:36:34<26:20:32, 13.97s/it] 32%|███▏ | 3212/10000 [12:36:48<26:21:08, 13.98s/it] {'loss': 0.0637, 'learning_rate': 3.396e-05, 'epoch': 4.2} 32%|███▏ | 3212/10000 [12:36:48<26:21:08, 13.98s/it] 32%|███▏ | 3213/10000 [12:37:02<26:22:13, 13.99s/it] {'loss': 0.075, 'learning_rate': 3.3955e-05, 'epoch': 4.21} 32%|███▏ | 3213/10000 [12:37:02<26:22:13, 13.99s/it] 32%|███▏ | 3214/10000 [12:37:16<26:18:55, 13.96s/it] {'loss': 0.0644, 'learning_rate': 3.3950000000000005e-05, 'epoch': 4.21} 32%|███▏ | 3214/10000 [12:37:16<26:18:55, 13.96s/it] 32%|███▏ | 3215/10000 [12:37:30<26:21:34, 13.99s/it] {'loss': 0.059, 'learning_rate': 3.3945e-05, 'epoch': 4.21} 32%|███▏ | 3215/10000 [12:37:30<26:21:34, 13.99s/it] 32%|███▏ | 3216/10000 [12:37:44<26:23:31, 14.01s/it] {'loss': 0.0621, 'learning_rate': 3.394e-05, 'epoch': 4.21} 32%|███▏ | 3216/10000 [12:37:44<26:23:31, 14.01s/it] 32%|███▏ | 3217/10000 [12:37:58<26:19:04, 13.97s/it] {'loss': 0.0646, 'learning_rate': 3.3935e-05, 'epoch': 4.21} 32%|███▏ | 3217/10000 [12:37:58<26:19:04, 13.97s/it] 32%|███▏ | 3218/10000 [12:38:12<26:20:27, 13.98s/it] {'loss': 0.0639, 'learning_rate': 3.393e-05, 'epoch': 4.21} 32%|███▏ | 3218/10000 [12:38:12<26:20:27, 13.98s/it] 32%|███▏ | 3219/10000 [12:38:26<26:18:22, 13.97s/it] {'loss': 0.0668, 'learning_rate': 3.3925e-05, 'epoch': 4.21} 32%|███▏ | 3219/10000 [12:38:26<26:18:22, 13.97s/it] 32%|███▏ | 3220/10000 [12:38:40<26:17:27, 13.96s/it] {'loss': 0.0652, 'learning_rate': 3.392e-05, 'epoch': 4.21} 32%|███▏ | 3220/10000 [12:38:40<26:17:27, 13.96s/it] 32%|███▏ | 3221/10000 [12:38:54<26:17:59, 13.97s/it] {'loss': 0.0707, 'learning_rate': 3.3915e-05, 'epoch': 4.22} 32%|███▏ | 3221/10000 [12:38:54<26:17:59, 13.97s/it] 32%|███▏ | 3222/10000 [12:39:08<26:11:42, 13.91s/it] {'loss': 0.0563, 'learning_rate': 3.3910000000000006e-05, 'epoch': 4.22} 32%|███▏ | 3222/10000 [12:39:08<26:11:42, 13.91s/it] 32%|███▏ | 3223/10000 [12:39:21<26:11:18, 13.91s/it] {'loss': 0.0704, 'learning_rate': 3.3905e-05, 'epoch': 4.22} 32%|███▏ | 3223/10000 [12:39:21<26:11:18, 13.91s/it] 32%|███▏ | 3224/10000 [12:39:35<26:10:42, 13.91s/it] {'loss': 0.0543, 'learning_rate': 3.3900000000000004e-05, 'epoch': 4.22} 32%|███▏ | 3224/10000 [12:39:35<26:10:42, 13.91s/it] 32%|███▏ | 3225/10000 [12:39:49<26:07:37, 13.88s/it] {'loss': 0.0689, 'learning_rate': 3.3895e-05, 'epoch': 4.22} 32%|███▏ | 3225/10000 [12:39:49<26:07:37, 13.88s/it] 32%|███▏ | 3226/10000 [12:40:03<26:06:26, 13.87s/it] {'loss': 0.059, 'learning_rate': 3.389e-05, 'epoch': 4.22} 32%|███▏ | 3226/10000 [12:40:03<26:06:26, 13.87s/it] 32%|███▏ | 3227/10000 [12:40:17<26:07:12, 13.88s/it] {'loss': 0.0578, 'learning_rate': 3.3885e-05, 'epoch': 4.22} 32%|███▏ | 3227/10000 [12:40:17<26:07:12, 13.88s/it] 32%|███▏ | 3228/10000 [12:40:31<26:09:42, 13.91s/it] {'loss': 0.0607, 'learning_rate': 3.388e-05, 'epoch': 4.23} 32%|███▏ | 3228/10000 [12:40:31<26:09:42, 13.91s/it] 32%|███▏ | 3229/10000 [12:40:45<26:11:36, 13.93s/it] {'loss': 0.056, 'learning_rate': 3.3875000000000003e-05, 'epoch': 4.23} 32%|███▏ | 3229/10000 [12:40:45<26:11:36, 13.93s/it] 32%|███▏ | 3230/10000 [12:40:59<26:13:05, 13.94s/it] {'loss': 0.0763, 'learning_rate': 3.387e-05, 'epoch': 4.23} 32%|███▏ | 3230/10000 [12:40:59<26:13:05, 13.94s/it] 32%|███▏ | 3231/10000 [12:41:13<26:11:18, 13.93s/it] {'loss': 0.0537, 'learning_rate': 3.3865e-05, 'epoch': 4.23} 32%|███▏ | 3231/10000 [12:41:13<26:11:18, 13.93s/it] 32%|███▏ | 3232/10000 [12:41:27<26:08:24, 13.90s/it] {'loss': 0.058, 'learning_rate': 3.3860000000000004e-05, 'epoch': 4.23} 32%|███▏ | 3232/10000 [12:41:27<26:08:24, 13.90s/it] 32%|███▏ | 3233/10000 [12:41:40<26:07:02, 13.89s/it] {'loss': 0.0592, 'learning_rate': 3.3855e-05, 'epoch': 4.23} 32%|███▏ | 3233/10000 [12:41:41<26:07:02, 13.89s/it] 32%|███▏ | 3234/10000 [12:41:55<26:14:11, 13.96s/it] {'loss': 0.0523, 'learning_rate': 3.385e-05, 'epoch': 4.23} 32%|███▏ | 3234/10000 [12:41:55<26:14:11, 13.96s/it] 32%|███▏ | 3235/10000 [12:42:08<26:09:29, 13.92s/it] {'loss': 0.0657, 'learning_rate': 3.3845e-05, 'epoch': 4.23} 32%|███▏ | 3235/10000 [12:42:08<26:09:29, 13.92s/it] 32%|███▏ | 3236/10000 [12:42:22<26:11:22, 13.94s/it] {'loss': 0.0533, 'learning_rate': 3.384e-05, 'epoch': 4.24} 32%|███▏ | 3236/10000 [12:42:22<26:11:22, 13.94s/it] 32%|███▏ | 3237/10000 [12:42:36<26:16:36, 13.99s/it] {'loss': 0.0709, 'learning_rate': 3.3835e-05, 'epoch': 4.24} 32%|███▏ | 3237/10000 [12:42:37<26:16:36, 13.99s/it] 32%|███▏ | 3238/10000 [12:42:50<26:16:09, 13.99s/it] {'loss': 0.051, 'learning_rate': 3.383e-05, 'epoch': 4.24} 32%|███▏ | 3238/10000 [12:42:51<26:16:09, 13.99s/it] 32%|███▏ | 3239/10000 [12:43:04<26:12:13, 13.95s/it] {'loss': 0.0668, 'learning_rate': 3.3825e-05, 'epoch': 4.24} 32%|███▏ | 3239/10000 [12:43:04<26:12:13, 13.95s/it] 32%|███▏ | 3240/10000 [12:43:18<26:12:06, 13.95s/it] {'loss': 0.0657, 'learning_rate': 3.3820000000000005e-05, 'epoch': 4.24} 32%|███▏ | 3240/10000 [12:43:18<26:12:06, 13.95s/it] 32%|███▏ | 3241/10000 [12:43:32<26:09:23, 13.93s/it] {'loss': 0.0509, 'learning_rate': 3.3815e-05, 'epoch': 4.24} 32%|███▏ | 3241/10000 [12:43:32<26:09:23, 13.93s/it] 32%|███▏ | 3242/10000 [12:43:46<26:07:08, 13.91s/it] {'loss': 0.0755, 'learning_rate': 3.381e-05, 'epoch': 4.24} 32%|███▏ | 3242/10000 [12:43:46<26:07:08, 13.91s/it] 32%|███▏ | 3243/10000 [12:44:00<26:04:25, 13.89s/it] {'loss': 0.053, 'learning_rate': 3.3805000000000006e-05, 'epoch': 4.24} 32%|███▏ | 3243/10000 [12:44:00<26:04:25, 13.89s/it] 32%|███▏ | 3244/10000 [12:44:14<26:03:05, 13.88s/it] {'loss': 0.0674, 'learning_rate': 3.38e-05, 'epoch': 4.25} 32%|███▏ | 3244/10000 [12:44:14<26:03:05, 13.88s/it] 32%|███▏ | 3245/10000 [12:44:28<26:05:23, 13.90s/it] {'loss': 0.0561, 'learning_rate': 3.3795e-05, 'epoch': 4.25} 32%|███▏ | 3245/10000 [12:44:28<26:05:23, 13.90s/it] 32%|███▏ | 3246/10000 [12:44:42<26:03:15, 13.89s/it] {'loss': 0.068, 'learning_rate': 3.379e-05, 'epoch': 4.25} 32%|███▏ | 3246/10000 [12:44:42<26:03:15, 13.89s/it] 32%|███▏ | 3247/10000 [12:44:56<26:06:58, 13.92s/it] {'loss': 0.0566, 'learning_rate': 3.3785e-05, 'epoch': 4.25} 32%|███▏ | 3247/10000 [12:44:56<26:06:58, 13.92s/it] 32%|███▏ | 3248/10000 [12:45:09<26:03:30, 13.89s/it] {'loss': 0.0598, 'learning_rate': 3.378e-05, 'epoch': 4.25} 32%|███▏ | 3248/10000 [12:45:09<26:03:30, 13.89s/it] 32%|███▏ | 3249/10000 [12:45:23<26:03:23, 13.89s/it] {'loss': 0.0648, 'learning_rate': 3.3775e-05, 'epoch': 4.25} 32%|███▏ | 3249/10000 [12:45:23<26:03:23, 13.89s/it] 32%|███▎ | 3250/10000 [12:45:37<26:05:45, 13.92s/it] {'loss': 0.0666, 'learning_rate': 3.3770000000000004e-05, 'epoch': 4.25} 32%|███▎ | 3250/10000 [12:45:37<26:05:45, 13.92s/it] 33%|███▎ | 3251/10000 [12:45:51<26:04:06, 13.91s/it] {'loss': 0.0639, 'learning_rate': 3.3765e-05, 'epoch': 4.26} 33%|███▎ | 3251/10000 [12:45:51<26:04:06, 13.91s/it] 33%|███▎ | 3252/10000 [12:46:05<26:02:25, 13.89s/it] {'loss': 0.0617, 'learning_rate': 3.376e-05, 'epoch': 4.26} 33%|███▎ | 3252/10000 [12:46:05<26:02:25, 13.89s/it] 33%|███▎ | 3253/10000 [12:46:19<26:00:37, 13.88s/it] {'loss': 0.0646, 'learning_rate': 3.3755000000000005e-05, 'epoch': 4.26} 33%|███▎ | 3253/10000 [12:46:19<26:00:37, 13.88s/it] 33%|███▎ | 3254/10000 [12:46:33<26:06:23, 13.93s/it] {'loss': 0.0649, 'learning_rate': 3.375000000000001e-05, 'epoch': 4.26} 33%|███▎ | 3254/10000 [12:46:33<26:06:23, 13.93s/it] 33%|███▎ | 3255/10000 [12:46:47<26:07:37, 13.94s/it] {'loss': 0.0617, 'learning_rate': 3.3745e-05, 'epoch': 4.26} 33%|███▎ | 3255/10000 [12:46:47<26:07:37, 13.94s/it] 33%|███▎ | 3256/10000 [12:47:01<26:05:21, 13.93s/it] {'loss': 0.0468, 'learning_rate': 3.374e-05, 'epoch': 4.26} 33%|███▎ | 3256/10000 [12:47:01<26:05:21, 13.93s/it] 33%|███▎ | 3257/10000 [12:47:14<25:59:14, 13.87s/it] {'loss': 0.0653, 'learning_rate': 3.3735e-05, 'epoch': 4.26} 33%|███▎ | 3257/10000 [12:47:15<25:59:14, 13.87s/it] 33%|███▎ | 3258/10000 [12:47:28<25:58:21, 13.87s/it] {'loss': 0.0642, 'learning_rate': 3.373e-05, 'epoch': 4.26} 33%|███▎ | 3258/10000 [12:47:28<25:58:21, 13.87s/it] 33%|███▎ | 3259/10000 [12:47:42<25:54:30, 13.84s/it] {'loss': 0.0569, 'learning_rate': 3.3725e-05, 'epoch': 4.27} 33%|███▎ | 3259/10000 [12:47:42<25:54:30, 13.84s/it] 33%|███▎ | 3260/10000 [12:47:56<25:58:18, 13.87s/it] {'loss': 0.0673, 'learning_rate': 3.372e-05, 'epoch': 4.27} 33%|███▎ | 3260/10000 [12:47:56<25:58:18, 13.87s/it] 33%|███▎ | 3261/10000 [12:48:10<26:02:12, 13.91s/it] {'loss': 0.0666, 'learning_rate': 3.3715000000000005e-05, 'epoch': 4.27} 33%|███▎ | 3261/10000 [12:48:10<26:02:12, 13.91s/it] 33%|███▎ | 3262/10000 [12:48:24<25:58:30, 13.88s/it] {'loss': 0.0547, 'learning_rate': 3.371e-05, 'epoch': 4.27} 33%|███▎ | 3262/10000 [12:48:24<25:58:30, 13.88s/it] 33%|███▎ | 3263/10000 [12:48:38<25:59:07, 13.89s/it] {'loss': 0.0782, 'learning_rate': 3.3705000000000003e-05, 'epoch': 4.27} 33%|███▎ | 3263/10000 [12:48:38<25:59:07, 13.89s/it] 33%|███▎ | 3264/10000 [12:48:52<26:02:52, 13.92s/it] {'loss': 0.0637, 'learning_rate': 3.3700000000000006e-05, 'epoch': 4.27} 33%|███▎ | 3264/10000 [12:48:52<26:02:52, 13.92s/it] 33%|███▎ | 3265/10000 [12:49:06<26:02:12, 13.92s/it] {'loss': 0.067, 'learning_rate': 3.3695e-05, 'epoch': 4.27} 33%|███▎ | 3265/10000 [12:49:06<26:02:12, 13.92s/it] 33%|███▎ | 3266/10000 [12:49:20<26:00:02, 13.90s/it] {'loss': 0.0668, 'learning_rate': 3.369e-05, 'epoch': 4.27} 33%|███▎ | 3266/10000 [12:49:20<26:00:02, 13.90s/it] 33%|███▎ | 3267/10000 [12:49:34<26:02:14, 13.92s/it] {'loss': 0.0585, 'learning_rate': 3.3685e-05, 'epoch': 4.28} 33%|███▎ | 3267/10000 [12:49:34<26:02:14, 13.92s/it] 33%|███▎ | 3268/10000 [12:49:48<26:11:14, 14.00s/it] {'loss': 0.075, 'learning_rate': 3.368e-05, 'epoch': 4.28} 33%|███▎ | 3268/10000 [12:49:48<26:11:14, 14.00s/it] 33%|███▎ | 3269/10000 [12:50:02<26:12:05, 14.01s/it] {'loss': 0.0538, 'learning_rate': 3.3675e-05, 'epoch': 4.28} 33%|███▎ | 3269/10000 [12:50:02<26:12:05, 14.01s/it] 33%|███▎ | 3270/10000 [12:50:16<26:09:15, 13.99s/it] {'loss': 0.0591, 'learning_rate': 3.367e-05, 'epoch': 4.28} 33%|███▎ | 3270/10000 [12:50:16<26:09:15, 13.99s/it] 33%|███▎ | 3271/10000 [12:50:30<26:07:59, 13.98s/it] {'loss': 0.0645, 'learning_rate': 3.3665000000000004e-05, 'epoch': 4.28} 33%|███▎ | 3271/10000 [12:50:30<26:07:59, 13.98s/it] 33%|███▎ | 3272/10000 [12:50:44<26:12:33, 14.02s/it] {'loss': 0.069, 'learning_rate': 3.366e-05, 'epoch': 4.28} 33%|███▎ | 3272/10000 [12:50:44<26:12:33, 14.02s/it] 33%|███▎ | 3273/10000 [12:50:58<26:06:56, 13.98s/it] {'loss': 0.0701, 'learning_rate': 3.3655e-05, 'epoch': 4.28} 33%|███▎ | 3273/10000 [12:50:58<26:06:56, 13.98s/it] 33%|███▎ | 3274/10000 [12:51:12<26:06:50, 13.98s/it] {'loss': 0.0682, 'learning_rate': 3.3650000000000005e-05, 'epoch': 4.29} 33%|███▎ | 3274/10000 [12:51:12<26:06:50, 13.98s/it] 33%|███▎ | 3275/10000 [12:51:25<26:03:36, 13.95s/it] {'loss': 0.0669, 'learning_rate': 3.364500000000001e-05, 'epoch': 4.29} 33%|███▎ | 3275/10000 [12:51:26<26:03:36, 13.95s/it] 33%|███▎ | 3276/10000 [12:51:39<26:03:14, 13.95s/it] {'loss': 0.068, 'learning_rate': 3.3639999999999996e-05, 'epoch': 4.29} 33%|███▎ | 3276/10000 [12:51:39<26:03:14, 13.95s/it] 33%|███▎ | 3277/10000 [12:51:53<26:03:09, 13.95s/it] {'loss': 0.0699, 'learning_rate': 3.3635e-05, 'epoch': 4.29} 33%|███▎ | 3277/10000 [12:51:53<26:03:09, 13.95s/it] 33%|███▎ | 3278/10000 [12:52:07<26:05:15, 13.97s/it] {'loss': 0.0557, 'learning_rate': 3.363e-05, 'epoch': 4.29} 33%|███▎ | 3278/10000 [12:52:07<26:05:15, 13.97s/it] 33%|███▎ | 3279/10000 [12:52:21<26:03:33, 13.96s/it] {'loss': 0.0508, 'learning_rate': 3.3625000000000004e-05, 'epoch': 4.29} 33%|███▎ | 3279/10000 [12:52:21<26:03:33, 13.96s/it] 33%|███▎ | 3280/10000 [12:52:35<26:00:02, 13.93s/it] {'loss': 0.0569, 'learning_rate': 3.362e-05, 'epoch': 4.29} 33%|███▎ | 3280/10000 [12:52:35<26:00:02, 13.93s/it] 33%|███▎ | 3281/10000 [12:52:49<25:59:54, 13.93s/it] {'loss': 0.0668, 'learning_rate': 3.3615e-05, 'epoch': 4.29} 33%|███▎ | 3281/10000 [12:52:49<25:59:54, 13.93s/it] 33%|███▎ | 3282/10000 [12:53:03<26:02:01, 13.95s/it] {'loss': 0.0778, 'learning_rate': 3.3610000000000005e-05, 'epoch': 4.3} 33%|███▎ | 3282/10000 [12:53:03<26:02:01, 13.95s/it] 33%|███▎ | 3283/10000 [12:53:17<26:00:45, 13.94s/it] {'loss': 0.0653, 'learning_rate': 3.3605e-05, 'epoch': 4.3} 33%|███▎ | 3283/10000 [12:53:17<26:00:45, 13.94s/it] 33%|███▎ | 3284/10000 [12:53:31<26:00:12, 13.94s/it] {'loss': 0.0664, 'learning_rate': 3.3600000000000004e-05, 'epoch': 4.3} 33%|███▎ | 3284/10000 [12:53:31<26:00:12, 13.94s/it] 33%|███▎ | 3285/10000 [12:53:45<25:59:31, 13.93s/it] {'loss': 0.059, 'learning_rate': 3.3595000000000006e-05, 'epoch': 4.3} 33%|███▎ | 3285/10000 [12:53:45<25:59:31, 13.93s/it] 33%|███▎ | 3286/10000 [12:53:59<26:02:30, 13.96s/it] {'loss': 0.0717, 'learning_rate': 3.359e-05, 'epoch': 4.3} 33%|███▎ | 3286/10000 [12:53:59<26:02:30, 13.96s/it] 33%|███▎ | 3287/10000 [12:54:13<26:05:15, 13.99s/it] {'loss': 0.0806, 'learning_rate': 3.3585e-05, 'epoch': 4.3} 33%|███▎ | 3287/10000 [12:54:13<26:05:15, 13.99s/it] 33%|███▎ | 3288/10000 [12:54:27<26:06:33, 14.00s/it] {'loss': 0.0548, 'learning_rate': 3.358e-05, 'epoch': 4.3} 33%|███▎ | 3288/10000 [12:54:27<26:06:33, 14.00s/it] 33%|███▎ | 3289/10000 [12:54:41<26:03:06, 13.98s/it] {'loss': 0.0671, 'learning_rate': 3.3575e-05, 'epoch': 4.3} 33%|███▎ | 3289/10000 [12:54:41<26:03:06, 13.98s/it] 33%|███▎ | 3290/10000 [12:54:55<26:03:57, 13.98s/it] {'loss': 0.0674, 'learning_rate': 3.357e-05, 'epoch': 4.31} 33%|███▎ | 3290/10000 [12:54:55<26:03:57, 13.98s/it] 33%|███▎ | 3291/10000 [12:55:09<25:59:30, 13.95s/it] {'loss': 0.0628, 'learning_rate': 3.3565e-05, 'epoch': 4.31} 33%|███▎ | 3291/10000 [12:55:09<25:59:30, 13.95s/it] 33%|███▎ | 3292/10000 [12:55:23<25:59:21, 13.95s/it] {'loss': 0.0729, 'learning_rate': 3.3560000000000004e-05, 'epoch': 4.31} 33%|███▎ | 3292/10000 [12:55:23<25:59:21, 13.95s/it] 33%|███▎ | 3293/10000 [12:55:37<25:55:54, 13.92s/it] {'loss': 0.0737, 'learning_rate': 3.3555e-05, 'epoch': 4.31} 33%|███▎ | 3293/10000 [12:55:37<25:55:54, 13.92s/it] 33%|███▎ | 3294/10000 [12:55:51<25:59:46, 13.96s/it] {'loss': 0.0492, 'learning_rate': 3.355e-05, 'epoch': 4.31} 33%|███▎ | 3294/10000 [12:55:51<25:59:46, 13.96s/it] 33%|███▎ | 3295/10000 [12:56:05<25:58:22, 13.95s/it] {'loss': 0.0589, 'learning_rate': 3.3545000000000005e-05, 'epoch': 4.31} 33%|███▎ | 3295/10000 [12:56:05<25:58:22, 13.95s/it] 33%|███▎ | 3296/10000 [12:56:18<25:57:06, 13.94s/it] {'loss': 0.0548, 'learning_rate': 3.354e-05, 'epoch': 4.31} 33%|███▎ | 3296/10000 [12:56:19<25:57:06, 13.94s/it] 33%|███▎ | 3297/10000 [12:56:32<25:52:11, 13.89s/it] {'loss': 0.0596, 'learning_rate': 3.3534999999999997e-05, 'epoch': 4.32} 33%|███▎ | 3297/10000 [12:56:32<25:52:11, 13.89s/it] 33%|███▎ | 3298/10000 [12:56:46<25:57:17, 13.94s/it] {'loss': 0.0609, 'learning_rate': 3.353e-05, 'epoch': 4.32} 33%|███▎ | 3298/10000 [12:56:46<25:57:17, 13.94s/it] 33%|███▎ | 3299/10000 [12:57:00<26:00:50, 13.98s/it] {'loss': 0.0706, 'learning_rate': 3.3525e-05, 'epoch': 4.32} 33%|███▎ | 3299/10000 [12:57:00<26:00:50, 13.98s/it] 33%|███▎ | 3300/10000 [12:57:14<26:01:33, 13.98s/it] {'loss': 0.0653, 'learning_rate': 3.3520000000000004e-05, 'epoch': 4.32} 33%|███▎ | 3300/10000 [12:57:14<26:01:33, 13.98s/it] 33%|███▎ | 3301/10000 [12:57:28<26:02:55, 14.00s/it] {'loss': 0.0699, 'learning_rate': 3.3515e-05, 'epoch': 4.32} 33%|███▎ | 3301/10000 [12:57:28<26:02:55, 14.00s/it] 33%|███▎ | 3302/10000 [12:57:42<25:57:55, 13.96s/it] {'loss': 0.0632, 'learning_rate': 3.351e-05, 'epoch': 4.32} 33%|███▎ | 3302/10000 [12:57:42<25:57:55, 13.96s/it] 33%|███▎ | 3303/10000 [12:57:56<25:59:26, 13.97s/it] {'loss': 0.0832, 'learning_rate': 3.3505000000000005e-05, 'epoch': 4.32} 33%|███▎ | 3303/10000 [12:57:56<25:59:26, 13.97s/it] 33%|███▎ | 3304/10000 [12:58:10<25:57:29, 13.96s/it] {'loss': 0.0864, 'learning_rate': 3.35e-05, 'epoch': 4.32} 33%|███▎ | 3304/10000 [12:58:10<25:57:29, 13.96s/it] 33%|███▎ | 3305/10000 [12:58:24<25:53:14, 13.92s/it] {'loss': 0.0545, 'learning_rate': 3.3495000000000004e-05, 'epoch': 4.33} 33%|███▎ | 3305/10000 [12:58:24<25:53:14, 13.92s/it] 33%|███▎ | 3306/10000 [12:58:38<26:00:51, 13.99s/it] {'loss': 0.0685, 'learning_rate': 3.349e-05, 'epoch': 4.33} 33%|███▎ | 3306/10000 [12:58:38<26:00:51, 13.99s/it] 33%|███▎ | 3307/10000 [12:58:52<25:55:54, 13.95s/it] {'loss': 0.055, 'learning_rate': 3.3485e-05, 'epoch': 4.33} 33%|███▎ | 3307/10000 [12:58:52<25:55:54, 13.95s/it] 33%|███▎ | 3308/10000 [12:59:06<25:54:12, 13.93s/it] {'loss': 0.0665, 'learning_rate': 3.348e-05, 'epoch': 4.33} 33%|███▎ | 3308/10000 [12:59:06<25:54:12, 13.93s/it] 33%|███▎ | 3309/10000 [12:59:20<25:53:19, 13.93s/it] {'loss': 0.0626, 'learning_rate': 3.3475e-05, 'epoch': 4.33} 33%|███▎ | 3309/10000 [12:59:20<25:53:19, 13.93s/it] 33%|███▎ | 3310/10000 [12:59:34<25:52:42, 13.93s/it] {'loss': 0.0612, 'learning_rate': 3.347e-05, 'epoch': 4.33} 33%|███▎ | 3310/10000 [12:59:34<25:52:42, 13.93s/it] 33%|███▎ | 3311/10000 [12:59:48<25:51:51, 13.92s/it] {'loss': 0.0624, 'learning_rate': 3.3465e-05, 'epoch': 4.33} 33%|███▎ | 3311/10000 [12:59:48<25:51:51, 13.92s/it] 33%|███▎ | 3312/10000 [13:00:02<25:48:56, 13.90s/it] {'loss': 0.0581, 'learning_rate': 3.346e-05, 'epoch': 4.34} 33%|███▎ | 3312/10000 [13:00:02<25:48:56, 13.90s/it] 33%|███▎ | 3313/10000 [13:00:15<25:48:35, 13.89s/it] {'loss': 0.0601, 'learning_rate': 3.3455000000000004e-05, 'epoch': 4.34} 33%|███▎ | 3313/10000 [13:00:15<25:48:35, 13.89s/it] 33%|███▎ | 3314/10000 [13:00:29<25:50:55, 13.92s/it] {'loss': 0.0721, 'learning_rate': 3.345000000000001e-05, 'epoch': 4.34} 33%|███▎ | 3314/10000 [13:00:29<25:50:55, 13.92s/it] 33%|███▎ | 3315/10000 [13:00:43<25:47:00, 13.88s/it] {'loss': 0.0678, 'learning_rate': 3.3445e-05, 'epoch': 4.34} 33%|███▎ | 3315/10000 [13:00:43<25:47:00, 13.88s/it] 33%|███▎ | 3316/10000 [13:00:57<25:47:57, 13.90s/it] {'loss': 0.0646, 'learning_rate': 3.344e-05, 'epoch': 4.34} 33%|███▎ | 3316/10000 [13:00:57<25:47:57, 13.90s/it] 33%|███▎ | 3317/10000 [13:01:11<25:51:21, 13.93s/it] {'loss': 0.0372, 'learning_rate': 3.3435e-05, 'epoch': 4.34} 33%|███▎ | 3317/10000 [13:01:11<25:51:21, 13.93s/it] 33%|███▎ | 3318/10000 [13:01:25<25:51:18, 13.93s/it] {'loss': 0.0702, 'learning_rate': 3.3430000000000003e-05, 'epoch': 4.34} 33%|███▎ | 3318/10000 [13:01:25<25:51:18, 13.93s/it] 33%|███▎ | 3319/10000 [13:01:39<25:50:00, 13.92s/it] {'loss': 0.055, 'learning_rate': 3.3425e-05, 'epoch': 4.34} 33%|███▎ | 3319/10000 [13:01:39<25:50:00, 13.92s/it] 33%|███▎ | 3320/10000 [13:01:53<25:51:31, 13.94s/it] {'loss': 0.0675, 'learning_rate': 3.342e-05, 'epoch': 4.35} 33%|███▎ | 3320/10000 [13:01:53<25:51:31, 13.94s/it] 33%|███▎ | 3321/10000 [13:02:07<25:51:57, 13.94s/it] {'loss': 0.0672, 'learning_rate': 3.3415000000000004e-05, 'epoch': 4.35} 33%|███▎ | 3321/10000 [13:02:07<25:51:57, 13.94s/it] 33%|███▎ | 3322/10000 [13:02:21<25:49:23, 13.92s/it] {'loss': 0.0841, 'learning_rate': 3.341e-05, 'epoch': 4.35} 33%|███▎ | 3322/10000 [13:02:21<25:49:23, 13.92s/it] 33%|███▎ | 3323/10000 [13:02:35<25:55:40, 13.98s/it] {'loss': 0.0591, 'learning_rate': 3.3405e-05, 'epoch': 4.35} 33%|███▎ | 3323/10000 [13:02:35<25:55:40, 13.98s/it] 33%|███▎ | 3324/10000 [13:02:49<25:55:03, 13.98s/it] {'loss': 0.0862, 'learning_rate': 3.3400000000000005e-05, 'epoch': 4.35} 33%|███▎ | 3324/10000 [13:02:49<25:55:03, 13.98s/it] 33%|███▎ | 3325/10000 [13:03:03<25:49:07, 13.92s/it] {'loss': 0.065, 'learning_rate': 3.3395e-05, 'epoch': 4.35} 33%|███▎ | 3325/10000 [13:03:03<25:49:07, 13.92s/it] 33%|███▎ | 3326/10000 [13:03:17<25:47:29, 13.91s/it] {'loss': 0.0694, 'learning_rate': 3.339e-05, 'epoch': 4.35} 33%|███▎ | 3326/10000 [13:03:17<25:47:29, 13.91s/it] 33%|███▎ | 3327/10000 [13:03:30<25:48:43, 13.93s/it] {'loss': 0.0624, 'learning_rate': 3.3385e-05, 'epoch': 4.35} 33%|███▎ | 3327/10000 [13:03:31<25:48:43, 13.93s/it] 33%|███▎ | 3328/10000 [13:03:44<25:49:02, 13.93s/it] {'loss': 0.0729, 'learning_rate': 3.338e-05, 'epoch': 4.36} 33%|███▎ | 3328/10000 [13:03:44<25:49:02, 13.93s/it] 33%|███▎ | 3329/10000 [13:03:58<25:48:24, 13.93s/it] {'loss': 0.0625, 'learning_rate': 3.3375e-05, 'epoch': 4.36} 33%|███▎ | 3329/10000 [13:03:58<25:48:24, 13.93s/it] 33%|███▎ | 3330/10000 [13:04:12<25:45:57, 13.91s/it] {'loss': 0.0534, 'learning_rate': 3.337e-05, 'epoch': 4.36} 33%|███▎ | 3330/10000 [13:04:12<25:45:57, 13.91s/it] 33%|███▎ | 3331/10000 [13:04:26<25:45:30, 13.90s/it] {'loss': 0.0677, 'learning_rate': 3.3365e-05, 'epoch': 4.36} 33%|███▎ | 3331/10000 [13:04:26<25:45:30, 13.90s/it] 33%|███▎ | 3332/10000 [13:04:40<25:46:03, 13.91s/it] {'loss': 0.0693, 'learning_rate': 3.336e-05, 'epoch': 4.36} 33%|███▎ | 3332/10000 [13:04:40<25:46:03, 13.91s/it] 33%|███▎ | 3333/10000 [13:04:54<25:46:29, 13.92s/it] {'loss': 0.0663, 'learning_rate': 3.3355e-05, 'epoch': 4.36} 33%|███▎ | 3333/10000 [13:04:54<25:46:29, 13.92s/it] 33%|███▎ | 3334/10000 [13:05:08<25:42:36, 13.88s/it] {'loss': 0.0628, 'learning_rate': 3.3350000000000004e-05, 'epoch': 4.36} 33%|███▎ | 3334/10000 [13:05:08<25:42:36, 13.88s/it] 33%|███▎ | 3335/10000 [13:05:22<25:43:41, 13.90s/it] {'loss': 0.0585, 'learning_rate': 3.334500000000001e-05, 'epoch': 4.37} 33%|███▎ | 3335/10000 [13:05:22<25:43:41, 13.90s/it] 33%|███▎ | 3336/10000 [13:05:36<25:48:13, 13.94s/it] {'loss': 0.0536, 'learning_rate': 3.3339999999999996e-05, 'epoch': 4.37} 33%|███▎ | 3336/10000 [13:05:36<25:48:13, 13.94s/it] 33%|███▎ | 3337/10000 [13:05:50<25:45:52, 13.92s/it] {'loss': 0.0671, 'learning_rate': 3.3335e-05, 'epoch': 4.37} 33%|███▎ | 3337/10000 [13:05:50<25:45:52, 13.92s/it] 33%|███▎ | 3338/10000 [13:06:03<25:44:15, 13.91s/it] {'loss': 0.0757, 'learning_rate': 3.333e-05, 'epoch': 4.37} 33%|███▎ | 3338/10000 [13:06:04<25:44:15, 13.91s/it] 33%|███▎ | 3339/10000 [13:06:17<25:45:23, 13.92s/it] {'loss': 0.0663, 'learning_rate': 3.3325000000000004e-05, 'epoch': 4.37} 33%|███▎ | 3339/10000 [13:06:17<25:45:23, 13.92s/it] 33%|███▎ | 3340/10000 [13:06:31<25:42:04, 13.89s/it] {'loss': 0.0626, 'learning_rate': 3.332e-05, 'epoch': 4.37} 33%|███▎ | 3340/10000 [13:06:31<25:42:04, 13.89s/it] 33%|███▎ | 3341/10000 [13:06:45<25:46:42, 13.94s/it] {'loss': 0.0654, 'learning_rate': 3.3315e-05, 'epoch': 4.37} 33%|███▎ | 3341/10000 [13:06:45<25:46:42, 13.94s/it] 33%|███▎ | 3342/10000 [13:06:59<25:49:06, 13.96s/it] {'loss': 0.0743, 'learning_rate': 3.3310000000000005e-05, 'epoch': 4.37} 33%|███▎ | 3342/10000 [13:06:59<25:49:06, 13.96s/it] 33%|███▎ | 3343/10000 [13:07:13<25:46:00, 13.93s/it] {'loss': 0.0601, 'learning_rate': 3.3305e-05, 'epoch': 4.38} 33%|███▎ | 3343/10000 [13:07:13<25:46:00, 13.93s/it] 33%|███▎ | 3344/10000 [13:07:27<25:42:53, 13.91s/it] {'loss': 0.0803, 'learning_rate': 3.33e-05, 'epoch': 4.38} 33%|███▎ | 3344/10000 [13:07:27<25:42:53, 13.91s/it] 33%|███▎ | 3345/10000 [13:07:41<25:39:28, 13.88s/it] {'loss': 0.0759, 'learning_rate': 3.3295000000000006e-05, 'epoch': 4.38} 33%|███▎ | 3345/10000 [13:07:41<25:39:28, 13.88s/it] 33%|███▎ | 3346/10000 [13:07:55<25:41:55, 13.90s/it] {'loss': 0.0771, 'learning_rate': 3.329e-05, 'epoch': 4.38} 33%|███▎ | 3346/10000 [13:07:55<25:41:55, 13.90s/it] 33%|███▎ | 3347/10000 [13:08:09<25:38:13, 13.87s/it] {'loss': 0.0707, 'learning_rate': 3.3285e-05, 'epoch': 4.38} 33%|███▎ | 3347/10000 [13:08:09<25:38:13, 13.87s/it] 33%|███▎ | 3348/10000 [13:08:22<25:33:36, 13.83s/it] {'loss': 0.0674, 'learning_rate': 3.328e-05, 'epoch': 4.38} 33%|███▎ | 3348/10000 [13:08:22<25:33:36, 13.83s/it] 33%|███▎ | 3349/10000 [13:08:36<25:37:47, 13.87s/it] {'loss': 0.0623, 'learning_rate': 3.3275e-05, 'epoch': 4.38} 33%|███▎ | 3349/10000 [13:08:36<25:37:47, 13.87s/it] 34%|███▎ | 3350/10000 [13:08:50<25:35:53, 13.86s/it] {'loss': 0.0571, 'learning_rate': 3.327e-05, 'epoch': 4.38} 34%|███▎ | 3350/10000 [13:08:50<25:35:53, 13.86s/it] 34%|███▎ | 3351/10000 [13:09:04<25:39:39, 13.89s/it] {'loss': 0.066, 'learning_rate': 3.3265e-05, 'epoch': 4.39} 34%|███▎ | 3351/10000 [13:09:04<25:39:39, 13.89s/it] 34%|███▎ | 3352/10000 [13:09:18<25:39:01, 13.89s/it] {'loss': 0.0603, 'learning_rate': 3.3260000000000003e-05, 'epoch': 4.39} 34%|███▎ | 3352/10000 [13:09:18<25:39:01, 13.89s/it] 34%|███▎ | 3353/10000 [13:09:32<25:40:59, 13.91s/it] {'loss': 0.076, 'learning_rate': 3.3255000000000006e-05, 'epoch': 4.39} 34%|███▎ | 3353/10000 [13:09:32<25:40:59, 13.91s/it] 34%|███▎ | 3354/10000 [13:09:46<25:39:53, 13.90s/it] {'loss': 0.0594, 'learning_rate': 3.325e-05, 'epoch': 4.39} 34%|███▎ | 3354/10000 [13:09:46<25:39:53, 13.90s/it] 34%|███▎ | 3355/10000 [13:10:00<25:37:59, 13.89s/it] {'loss': 0.0739, 'learning_rate': 3.3245000000000004e-05, 'epoch': 4.39} 34%|███▎ | 3355/10000 [13:10:00<25:37:59, 13.89s/it] 34%|███▎ | 3356/10000 [13:10:14<25:39:55, 13.91s/it] {'loss': 0.0814, 'learning_rate': 3.324e-05, 'epoch': 4.39} 34%|███▎ | 3356/10000 [13:10:14<25:39:55, 13.91s/it] 34%|███▎ | 3357/10000 [13:10:28<25:41:43, 13.92s/it] {'loss': 0.0657, 'learning_rate': 3.3235e-05, 'epoch': 4.39} 34%|███▎ | 3357/10000 [13:10:28<25:41:43, 13.92s/it] 34%|███▎ | 3358/10000 [13:10:42<25:42:39, 13.94s/it] {'loss': 0.0596, 'learning_rate': 3.323e-05, 'epoch': 4.4} 34%|███▎ | 3358/10000 [13:10:42<25:42:39, 13.94s/it] 34%|███▎ | 3359/10000 [13:10:55<25:40:43, 13.92s/it] {'loss': 0.0591, 'learning_rate': 3.3225e-05, 'epoch': 4.4} 34%|███▎ | 3359/10000 [13:10:56<25:40:43, 13.92s/it] 34%|███▎ | 3360/10000 [13:11:09<25:39:41, 13.91s/it] {'loss': 0.0801, 'learning_rate': 3.3220000000000004e-05, 'epoch': 4.4} 34%|███▎ | 3360/10000 [13:11:09<25:39:41, 13.91s/it] 34%|███▎ | 3361/10000 [13:11:23<25:36:09, 13.88s/it] {'loss': 0.0506, 'learning_rate': 3.3215e-05, 'epoch': 4.4} 34%|███▎ | 3361/10000 [13:11:23<25:36:09, 13.88s/it] 34%|███▎ | 3362/10000 [13:11:37<25:37:47, 13.90s/it] {'loss': 0.061, 'learning_rate': 3.321e-05, 'epoch': 4.4} 34%|███▎ | 3362/10000 [13:11:37<25:37:47, 13.90s/it] 34%|███▎ | 3363/10000 [13:11:51<25:40:08, 13.92s/it] {'loss': 0.0852, 'learning_rate': 3.3205000000000005e-05, 'epoch': 4.4} 34%|███▎ | 3363/10000 [13:11:51<25:40:08, 13.92s/it] 34%|███▎ | 3364/10000 [13:12:05<25:40:07, 13.93s/it] {'loss': 0.0805, 'learning_rate': 3.32e-05, 'epoch': 4.4} 34%|███▎ | 3364/10000 [13:12:05<25:40:07, 13.93s/it] 34%|███▎ | 3365/10000 [13:12:19<25:40:54, 13.93s/it] {'loss': 0.0731, 'learning_rate': 3.3195e-05, 'epoch': 4.4} 34%|███▎ | 3365/10000 [13:12:19<25:40:54, 13.93s/it] 34%|███▎ | 3366/10000 [13:12:33<25:40:12, 13.93s/it] {'loss': 0.0581, 'learning_rate': 3.319e-05, 'epoch': 4.41} 34%|███▎ | 3366/10000 [13:12:33<25:40:12, 13.93s/it] 34%|███▎ | 3367/10000 [13:12:47<25:38:26, 13.92s/it] {'loss': 0.0775, 'learning_rate': 3.3185e-05, 'epoch': 4.41} 34%|███▎ | 3367/10000 [13:12:47<25:38:26, 13.92s/it] 34%|███▎ | 3368/10000 [13:13:01<25:40:41, 13.94s/it] {'loss': 0.071, 'learning_rate': 3.318e-05, 'epoch': 4.41} 34%|███▎ | 3368/10000 [13:13:01<25:40:41, 13.94s/it] 34%|███▎ | 3369/10000 [13:13:15<25:37:45, 13.91s/it] {'loss': 0.0805, 'learning_rate': 3.3175e-05, 'epoch': 4.41} 34%|███▎ | 3369/10000 [13:13:15<25:37:45, 13.91s/it] 34%|███▎ | 3370/10000 [13:13:29<25:37:58, 13.92s/it] {'loss': 0.0603, 'learning_rate': 3.317e-05, 'epoch': 4.41} 34%|███▎ | 3370/10000 [13:13:29<25:37:58, 13.92s/it] 34%|███▎ | 3371/10000 [13:13:42<25:38:11, 13.92s/it] {'loss': 0.0724, 'learning_rate': 3.3165e-05, 'epoch': 4.41} 34%|███▎ | 3371/10000 [13:13:43<25:38:11, 13.92s/it] 34%|███▎ | 3372/10000 [13:13:56<25:37:52, 13.92s/it] {'loss': 0.0737, 'learning_rate': 3.316e-05, 'epoch': 4.41} 34%|███▎ | 3372/10000 [13:13:56<25:37:52, 13.92s/it] 34%|███▎ | 3373/10000 [13:14:10<25:41:50, 13.96s/it] {'loss': 0.0792, 'learning_rate': 3.3155000000000004e-05, 'epoch': 4.41} 34%|███▎ | 3373/10000 [13:14:11<25:41:50, 13.96s/it] 34%|███▎ | 3374/10000 [13:14:24<25:41:06, 13.96s/it] {'loss': 0.0558, 'learning_rate': 3.3150000000000006e-05, 'epoch': 4.42} 34%|███▎ | 3374/10000 [13:14:24<25:41:06, 13.96s/it] 34%|███▍ | 3375/10000 [13:14:38<25:36:06, 13.91s/it] {'loss': 0.0729, 'learning_rate': 3.3145e-05, 'epoch': 4.42} 34%|███▍ | 3375/10000 [13:14:38<25:36:06, 13.91s/it] 34%|███▍ | 3376/10000 [13:14:52<25:39:11, 13.94s/it] {'loss': 0.0775, 'learning_rate': 3.314e-05, 'epoch': 4.42} 34%|███▍ | 3376/10000 [13:14:52<25:39:11, 13.94s/it] 34%|███▍ | 3377/10000 [13:15:06<25:34:35, 13.90s/it] {'loss': 0.0627, 'learning_rate': 3.3135e-05, 'epoch': 4.42} 34%|███▍ | 3377/10000 [13:15:06<25:34:35, 13.90s/it] 34%|███▍ | 3378/10000 [13:15:20<25:35:50, 13.92s/it] {'loss': 0.065, 'learning_rate': 3.313e-05, 'epoch': 4.42} 34%|███▍ | 3378/10000 [13:15:20<25:35:50, 13.92s/it] 34%|███▍ | 3379/10000 [13:15:34<25:32:00, 13.88s/it] {'loss': 0.0624, 'learning_rate': 3.3125e-05, 'epoch': 4.42} 34%|███▍ | 3379/10000 [13:15:34<25:32:00, 13.88s/it] 34%|███▍ | 3380/10000 [13:15:48<25:34:17, 13.91s/it] {'loss': 0.0518, 'learning_rate': 3.312e-05, 'epoch': 4.42} 34%|███▍ | 3380/10000 [13:15:48<25:34:17, 13.91s/it] 34%|███▍ | 3381/10000 [13:16:02<25:34:24, 13.91s/it] {'loss': 0.0617, 'learning_rate': 3.3115000000000004e-05, 'epoch': 4.43} 34%|███▍ | 3381/10000 [13:16:02<25:34:24, 13.91s/it] 34%|███▍ | 3382/10000 [13:16:16<25:34:00, 13.91s/it] {'loss': 0.0603, 'learning_rate': 3.311e-05, 'epoch': 4.43} 34%|███▍ | 3382/10000 [13:16:16<25:34:00, 13.91s/it] 34%|███▍ | 3383/10000 [13:16:30<25:38:09, 13.95s/it] {'loss': 0.0585, 'learning_rate': 3.3105e-05, 'epoch': 4.43} 34%|███▍ | 3383/10000 [13:16:30<25:38:09, 13.95s/it] 34%|███▍ | 3384/10000 [13:16:44<25:37:01, 13.94s/it] {'loss': 0.0776, 'learning_rate': 3.3100000000000005e-05, 'epoch': 4.43} 34%|███▍ | 3384/10000 [13:16:44<25:37:01, 13.94s/it] 34%|███▍ | 3385/10000 [13:16:58<25:41:28, 13.98s/it] {'loss': 0.0822, 'learning_rate': 3.3095e-05, 'epoch': 4.43} 34%|███▍ | 3385/10000 [13:16:58<25:41:28, 13.98s/it] 34%|███▍ | 3386/10000 [13:17:12<25:39:07, 13.96s/it] {'loss': 0.0693, 'learning_rate': 3.309e-05, 'epoch': 4.43} 34%|███▍ | 3386/10000 [13:17:12<25:39:07, 13.96s/it] 34%|███▍ | 3387/10000 [13:17:25<25:34:07, 13.92s/it] {'loss': 0.0568, 'learning_rate': 3.3085e-05, 'epoch': 4.43} 34%|███▍ | 3387/10000 [13:17:25<25:34:07, 13.92s/it] 34%|███▍ | 3388/10000 [13:17:39<25:39:43, 13.97s/it] {'loss': 0.057, 'learning_rate': 3.308e-05, 'epoch': 4.43} 34%|███▍ | 3388/10000 [13:17:39<25:39:43, 13.97s/it] 34%|███▍ | 3389/10000 [13:17:53<25:40:48, 13.98s/it] {'loss': 0.0812, 'learning_rate': 3.3075e-05, 'epoch': 4.44} 34%|███▍ | 3389/10000 [13:17:54<25:40:48, 13.98s/it] 34%|███▍ | 3390/10000 [13:18:07<25:39:06, 13.97s/it] {'loss': 0.0615, 'learning_rate': 3.307e-05, 'epoch': 4.44} 34%|███▍ | 3390/10000 [13:18:07<25:39:06, 13.97s/it] 34%|███▍ | 3391/10000 [13:18:21<25:39:35, 13.98s/it] {'loss': 0.0659, 'learning_rate': 3.3065e-05, 'epoch': 4.44} 34%|███▍ | 3391/10000 [13:18:21<25:39:35, 13.98s/it] 34%|███▍ | 3392/10000 [13:18:35<25:41:25, 14.00s/it] {'loss': 0.0735, 'learning_rate': 3.3060000000000005e-05, 'epoch': 4.44} 34%|███▍ | 3392/10000 [13:18:35<25:41:25, 14.00s/it] 34%|███▍ | 3393/10000 [13:18:49<25:39:55, 13.98s/it] {'loss': 0.0718, 'learning_rate': 3.3055e-05, 'epoch': 4.44} 34%|███▍ | 3393/10000 [13:18:49<25:39:55, 13.98s/it] 34%|███▍ | 3394/10000 [13:19:03<25:36:05, 13.95s/it] {'loss': 0.06, 'learning_rate': 3.3050000000000004e-05, 'epoch': 4.44} 34%|███▍ | 3394/10000 [13:19:03<25:36:05, 13.95s/it] 34%|███▍ | 3395/10000 [13:19:17<25:34:44, 13.94s/it] {'loss': 0.0744, 'learning_rate': 3.3045000000000006e-05, 'epoch': 4.44} 34%|███▍ | 3395/10000 [13:19:17<25:34:44, 13.94s/it] 34%|███▍ | 3396/10000 [13:19:31<25:37:07, 13.97s/it] {'loss': 0.0739, 'learning_rate': 3.304e-05, 'epoch': 4.45} 34%|███▍ | 3396/10000 [13:19:31<25:37:07, 13.97s/it] 34%|███▍ | 3397/10000 [13:19:45<25:35:33, 13.95s/it] {'loss': 0.0785, 'learning_rate': 3.3035e-05, 'epoch': 4.45} 34%|███▍ | 3397/10000 [13:19:45<25:35:33, 13.95s/it] 34%|███▍ | 3398/10000 [13:19:59<25:31:41, 13.92s/it] {'loss': 0.0738, 'learning_rate': 3.303e-05, 'epoch': 4.45} 34%|███▍ | 3398/10000 [13:19:59<25:31:41, 13.92s/it] 34%|███▍ | 3399/10000 [13:20:13<25:30:58, 13.92s/it] {'loss': 0.0589, 'learning_rate': 3.3025e-05, 'epoch': 4.45} 34%|███▍ | 3399/10000 [13:20:13<25:30:58, 13.92s/it] 34%|███▍ | 3400/10000 [13:20:27<25:31:36, 13.92s/it] {'loss': 0.0718, 'learning_rate': 3.302e-05, 'epoch': 4.45} 34%|███▍ | 3400/10000 [13:20:27<25:31:36, 13.92s/it] 34%|███▍ | 3401/10000 [13:20:41<25:32:27, 13.93s/it] {'loss': 0.0781, 'learning_rate': 3.3015e-05, 'epoch': 4.45} 34%|███▍ | 3401/10000 [13:20:41<25:32:27, 13.93s/it] 34%|███▍ | 3402/10000 [13:20:55<25:38:51, 13.99s/it] {'loss': 0.0702, 'learning_rate': 3.3010000000000004e-05, 'epoch': 4.45} 34%|███▍ | 3402/10000 [13:20:55<25:38:51, 13.99s/it] 34%|███▍ | 3403/10000 [13:21:09<25:33:21, 13.95s/it] {'loss': 0.0695, 'learning_rate': 3.3005e-05, 'epoch': 4.45} 34%|███▍ | 3403/10000 [13:21:09<25:33:21, 13.95s/it] 34%|███▍ | 3404/10000 [13:21:23<25:31:38, 13.93s/it] {'loss': 0.0645, 'learning_rate': 3.3e-05, 'epoch': 4.46} 34%|███▍ | 3404/10000 [13:21:23<25:31:38, 13.93s/it] 34%|███▍ | 3405/10000 [13:21:37<25:30:32, 13.92s/it] {'loss': 0.0631, 'learning_rate': 3.2995000000000005e-05, 'epoch': 4.46} 34%|███▍ | 3405/10000 [13:21:37<25:30:32, 13.92s/it] 34%|███▍ | 3406/10000 [13:21:51<25:32:47, 13.95s/it] {'loss': 0.0757, 'learning_rate': 3.299e-05, 'epoch': 4.46} 34%|███▍ | 3406/10000 [13:21:51<25:32:47, 13.95s/it] 34%|███▍ | 3407/10000 [13:22:05<25:32:51, 13.95s/it] {'loss': 0.0553, 'learning_rate': 3.2985e-05, 'epoch': 4.46} 34%|███▍ | 3407/10000 [13:22:05<25:32:51, 13.95s/it] 34%|███▍ | 3408/10000 [13:22:18<25:28:07, 13.91s/it] {'loss': 0.0866, 'learning_rate': 3.298e-05, 'epoch': 4.46} 34%|███▍ | 3408/10000 [13:22:18<25:28:07, 13.91s/it] 34%|███▍ | 3409/10000 [13:22:32<25:26:04, 13.89s/it] {'loss': 0.0816, 'learning_rate': 3.2975e-05, 'epoch': 4.46} 34%|███▍ | 3409/10000 [13:22:32<25:26:04, 13.89s/it] 34%|███▍ | 3410/10000 [13:22:46<25:25:05, 13.89s/it] {'loss': 0.0747, 'learning_rate': 3.297e-05, 'epoch': 4.46} 34%|███▍ | 3410/10000 [13:22:46<25:25:05, 13.89s/it] 34%|███▍ | 3411/10000 [13:23:00<25:27:49, 13.91s/it] {'loss': 0.0743, 'learning_rate': 3.2965e-05, 'epoch': 4.46} 34%|███▍ | 3411/10000 [13:23:00<25:27:49, 13.91s/it] 34%|███▍ | 3412/10000 [13:23:14<25:30:07, 13.94s/it] {'loss': 0.0797, 'learning_rate': 3.296e-05, 'epoch': 4.47} 34%|███▍ | 3412/10000 [13:23:14<25:30:07, 13.94s/it] 34%|███▍ | 3413/10000 [13:23:28<25:28:16, 13.92s/it] {'loss': 0.0665, 'learning_rate': 3.2955000000000006e-05, 'epoch': 4.47} 34%|███▍ | 3413/10000 [13:23:28<25:28:16, 13.92s/it] 34%|███▍ | 3414/10000 [13:23:42<25:28:09, 13.92s/it] {'loss': 0.06, 'learning_rate': 3.295e-05, 'epoch': 4.47} 34%|███▍ | 3414/10000 [13:23:42<25:28:09, 13.92s/it] 34%|███▍ | 3415/10000 [13:23:56<25:29:58, 13.94s/it] {'loss': 0.0608, 'learning_rate': 3.2945000000000004e-05, 'epoch': 4.47} 34%|███▍ | 3415/10000 [13:23:56<25:29:58, 13.94s/it] 34%|███▍ | 3416/10000 [13:24:10<25:30:21, 13.95s/it] {'loss': 0.0639, 'learning_rate': 3.2940000000000006e-05, 'epoch': 4.47} 34%|███▍ | 3416/10000 [13:24:10<25:30:21, 13.95s/it] 34%|███▍ | 3417/10000 [13:24:24<25:34:08, 13.98s/it] {'loss': 0.0646, 'learning_rate': 3.2935e-05, 'epoch': 4.47} 34%|███▍ | 3417/10000 [13:24:24<25:34:08, 13.98s/it] 34%|███▍ | 3418/10000 [13:24:38<25:31:06, 13.96s/it] {'loss': 0.0573, 'learning_rate': 3.293e-05, 'epoch': 4.47} 34%|███▍ | 3418/10000 [13:24:38<25:31:06, 13.96s/it] 34%|███▍ | 3419/10000 [13:24:52<25:30:15, 13.95s/it] {'loss': 0.0846, 'learning_rate': 3.2925e-05, 'epoch': 4.48} 34%|███▍ | 3419/10000 [13:24:52<25:30:15, 13.95s/it] 34%|███▍ | 3420/10000 [13:25:06<25:27:08, 13.93s/it] {'loss': 0.0682, 'learning_rate': 3.292e-05, 'epoch': 4.48} 34%|███▍ | 3420/10000 [13:25:06<25:27:08, 13.93s/it] 34%|███▍ | 3421/10000 [13:25:19<25:27:07, 13.93s/it] {'loss': 0.0733, 'learning_rate': 3.2915e-05, 'epoch': 4.48} 34%|███▍ | 3421/10000 [13:25:19<25:27:07, 13.93s/it] 34%|███▍ | 3422/10000 [13:25:33<25:27:02, 13.93s/it] {'loss': 0.0665, 'learning_rate': 3.291e-05, 'epoch': 4.48} 34%|███▍ | 3422/10000 [13:25:33<25:27:02, 13.93s/it] 34%|███▍ | 3423/10000 [13:25:47<25:28:10, 13.94s/it] {'loss': 0.0815, 'learning_rate': 3.2905000000000004e-05, 'epoch': 4.48} 34%|███▍ | 3423/10000 [13:25:47<25:28:10, 13.94s/it] 34%|███▍ | 3424/10000 [13:26:01<25:25:13, 13.92s/it] {'loss': 0.0649, 'learning_rate': 3.29e-05, 'epoch': 4.48} 34%|███▍ | 3424/10000 [13:26:01<25:25:13, 13.92s/it] 34%|███▍ | 3425/10000 [13:26:15<25:24:12, 13.91s/it] {'loss': 0.066, 'learning_rate': 3.2895e-05, 'epoch': 4.48} 34%|███▍ | 3425/10000 [13:26:15<25:24:12, 13.91s/it] 34%|███▍ | 3426/10000 [13:26:29<25:24:50, 13.92s/it] {'loss': 0.0686, 'learning_rate': 3.2890000000000005e-05, 'epoch': 4.48} 34%|███▍ | 3426/10000 [13:26:29<25:24:50, 13.92s/it] 34%|███▍ | 3427/10000 [13:26:43<25:24:02, 13.91s/it] {'loss': 0.0714, 'learning_rate': 3.2885e-05, 'epoch': 4.49} 34%|███▍ | 3427/10000 [13:26:43<25:24:02, 13.91s/it] 34%|███▍ | 3428/10000 [13:26:57<25:26:58, 13.94s/it] {'loss': 0.073, 'learning_rate': 3.288e-05, 'epoch': 4.49} 34%|███▍ | 3428/10000 [13:26:57<25:26:58, 13.94s/it] 34%|███▍ | 3429/10000 [13:27:11<25:28:49, 13.96s/it] {'loss': 0.064, 'learning_rate': 3.2875e-05, 'epoch': 4.49} 34%|███▍ | 3429/10000 [13:27:11<25:28:49, 13.96s/it] 34%|███▍ | 3430/10000 [13:27:25<25:28:35, 13.96s/it] {'loss': 0.0688, 'learning_rate': 3.287e-05, 'epoch': 4.49} 34%|███▍ | 3430/10000 [13:27:25<25:28:35, 13.96s/it] 34%|███▍ | 3431/10000 [13:27:39<25:29:09, 13.97s/it] {'loss': 0.0703, 'learning_rate': 3.2865000000000005e-05, 'epoch': 4.49} 34%|███▍ | 3431/10000 [13:27:39<25:29:09, 13.97s/it] 34%|███▍ | 3432/10000 [13:27:53<25:27:04, 13.95s/it] {'loss': 0.0835, 'learning_rate': 3.286e-05, 'epoch': 4.49} 34%|███▍ | 3432/10000 [13:27:53<25:27:04, 13.95s/it] 34%|███▍ | 3433/10000 [13:28:07<25:25:37, 13.94s/it] {'loss': 0.0587, 'learning_rate': 3.2855e-05, 'epoch': 4.49} 34%|███▍ | 3433/10000 [13:28:07<25:25:37, 13.94s/it] 34%|███▍ | 3434/10000 [13:28:21<25:26:36, 13.95s/it] {'loss': 0.065, 'learning_rate': 3.2850000000000006e-05, 'epoch': 4.49} 34%|███▍ | 3434/10000 [13:28:21<25:26:36, 13.95s/it] 34%|███▍ | 3435/10000 [13:28:35<25:24:05, 13.93s/it] {'loss': 0.0637, 'learning_rate': 3.2845e-05, 'epoch': 4.5} 34%|███▍ | 3435/10000 [13:28:35<25:24:05, 13.93s/it] 34%|███▍ | 3436/10000 [13:28:49<25:25:15, 13.94s/it] {'loss': 0.0589, 'learning_rate': 3.2840000000000004e-05, 'epoch': 4.5} 34%|███▍ | 3436/10000 [13:28:49<25:25:15, 13.94s/it] 34%|███▍ | 3437/10000 [13:29:03<25:30:32, 13.99s/it] {'loss': 0.0702, 'learning_rate': 3.2835e-05, 'epoch': 4.5} 34%|███▍ | 3437/10000 [13:29:03<25:30:32, 13.99s/it] 34%|███▍ | 3438/10000 [13:29:17<25:30:12, 13.99s/it] {'loss': 0.082, 'learning_rate': 3.283e-05, 'epoch': 4.5} 34%|███▍ | 3438/10000 [13:29:17<25:30:12, 13.99s/it] 34%|███▍ | 3439/10000 [13:29:31<25:27:38, 13.97s/it] {'loss': 0.0742, 'learning_rate': 3.2825e-05, 'epoch': 4.5} 34%|███▍ | 3439/10000 [13:29:31<25:27:38, 13.97s/it] 34%|███▍ | 3440/10000 [13:29:45<25:28:17, 13.98s/it] {'loss': 0.0573, 'learning_rate': 3.282e-05, 'epoch': 4.5} 34%|███▍ | 3440/10000 [13:29:45<25:28:17, 13.98s/it] 34%|███▍ | 3441/10000 [13:29:59<25:28:27, 13.98s/it] {'loss': 0.0705, 'learning_rate': 3.2815000000000003e-05, 'epoch': 4.5} 34%|███▍ | 3441/10000 [13:29:59<25:28:27, 13.98s/it] 34%|███▍ | 3442/10000 [13:30:13<25:28:12, 13.98s/it] {'loss': 0.0642, 'learning_rate': 3.281e-05, 'epoch': 4.51} 34%|███▍ | 3442/10000 [13:30:13<25:28:12, 13.98s/it] 34%|███▍ | 3443/10000 [13:30:26<25:24:19, 13.95s/it] {'loss': 0.0738, 'learning_rate': 3.2805e-05, 'epoch': 4.51} 34%|███▍ | 3443/10000 [13:30:26<25:24:19, 13.95s/it] 34%|███▍ | 3444/10000 [13:30:40<25:20:04, 13.91s/it] {'loss': 0.0847, 'learning_rate': 3.2800000000000004e-05, 'epoch': 4.51} 34%|███▍ | 3444/10000 [13:30:40<25:20:04, 13.91s/it] 34%|███▍ | 3445/10000 [13:30:54<25:24:55, 13.96s/it] {'loss': 0.0824, 'learning_rate': 3.2795e-05, 'epoch': 4.51} 34%|███▍ | 3445/10000 [13:30:54<25:24:55, 13.96s/it] 34%|███▍ | 3446/10000 [13:31:08<25:22:41, 13.94s/it] {'loss': 0.0605, 'learning_rate': 3.279e-05, 'epoch': 4.51} 34%|███▍ | 3446/10000 [13:31:08<25:22:41, 13.94s/it] 34%|███▍ | 3447/10000 [13:31:22<25:21:30, 13.93s/it] {'loss': 0.0603, 'learning_rate': 3.2785e-05, 'epoch': 4.51} 34%|███▍ | 3447/10000 [13:31:22<25:21:30, 13.93s/it] 34%|███▍ | 3448/10000 [13:31:36<25:20:39, 13.93s/it] {'loss': 0.0626, 'learning_rate': 3.278e-05, 'epoch': 4.51} 34%|███▍ | 3448/10000 [13:31:36<25:20:39, 13.93s/it] 34%|███▍ | 3449/10000 [13:31:50<25:21:56, 13.94s/it] {'loss': 0.0745, 'learning_rate': 3.2775e-05, 'epoch': 4.51} 34%|███▍ | 3449/10000 [13:31:50<25:21:56, 13.94s/it] 34%|███▍ | 3450/10000 [13:32:04<25:24:13, 13.96s/it] {'loss': 0.0735, 'learning_rate': 3.277e-05, 'epoch': 4.52} 34%|███▍ | 3450/10000 [13:32:04<25:24:13, 13.96s/it] 35%|███▍ | 3451/10000 [13:32:18<25:23:04, 13.95s/it] {'loss': 0.0587, 'learning_rate': 3.2765e-05, 'epoch': 4.52} 35%|███▍ | 3451/10000 [13:32:18<25:23:04, 13.95s/it] 35%|███▍ | 3452/10000 [13:32:32<25:22:41, 13.95s/it] {'loss': 0.0673, 'learning_rate': 3.2760000000000005e-05, 'epoch': 4.52} 35%|███▍ | 3452/10000 [13:32:32<25:22:41, 13.95s/it] 35%|███▍ | 3453/10000 [13:32:46<25:19:49, 13.93s/it] {'loss': 0.068, 'learning_rate': 3.2755e-05, 'epoch': 4.52} 35%|███▍ | 3453/10000 [13:32:46<25:19:49, 13.93s/it] 35%|███▍ | 3454/10000 [13:33:00<25:18:48, 13.92s/it] {'loss': 0.0719, 'learning_rate': 3.275e-05, 'epoch': 4.52} 35%|███▍ | 3454/10000 [13:33:00<25:18:48, 13.92s/it] 35%|███▍ | 3455/10000 [13:33:13<25:14:41, 13.89s/it] {'loss': 0.0536, 'learning_rate': 3.2745000000000006e-05, 'epoch': 4.52} 35%|███▍ | 3455/10000 [13:33:14<25:14:41, 13.89s/it] 35%|███▍ | 3456/10000 [13:33:27<25:15:05, 13.89s/it] {'loss': 0.0727, 'learning_rate': 3.274e-05, 'epoch': 4.52} 35%|███▍ | 3456/10000 [13:33:27<25:15:05, 13.89s/it] 35%|███▍ | 3457/10000 [13:33:41<25:11:28, 13.86s/it] {'loss': 0.0634, 'learning_rate': 3.2735e-05, 'epoch': 4.52} 35%|███▍ | 3457/10000 [13:33:41<25:11:28, 13.86s/it] 35%|███▍ | 3458/10000 [13:33:55<25:08:47, 13.84s/it] {'loss': 0.0786, 'learning_rate': 3.273e-05, 'epoch': 4.53} 35%|███▍ | 3458/10000 [13:33:55<25:08:47, 13.84s/it] 35%|███▍ | 3459/10000 [13:34:09<25:11:19, 13.86s/it] {'loss': 0.0787, 'learning_rate': 3.2725e-05, 'epoch': 4.53} 35%|███▍ | 3459/10000 [13:34:09<25:11:19, 13.86s/it] 35%|███▍ | 3460/10000 [13:34:23<25:12:58, 13.88s/it] {'loss': 0.057, 'learning_rate': 3.272e-05, 'epoch': 4.53} 35%|███▍ | 3460/10000 [13:34:23<25:12:58, 13.88s/it] 35%|███▍ | 3461/10000 [13:34:37<25:18:19, 13.93s/it] {'loss': 0.0716, 'learning_rate': 3.2715e-05, 'epoch': 4.53} 35%|███▍ | 3461/10000 [13:34:37<25:18:19, 13.93s/it] 35%|███▍ | 3462/10000 [13:34:51<25:18:21, 13.93s/it] {'loss': 0.0718, 'learning_rate': 3.2710000000000004e-05, 'epoch': 4.53} 35%|███▍ | 3462/10000 [13:34:51<25:18:21, 13.93s/it] 35%|███▍ | 3463/10000 [13:35:05<25:16:26, 13.92s/it] {'loss': 0.0778, 'learning_rate': 3.2705e-05, 'epoch': 4.53} 35%|███▍ | 3463/10000 [13:35:05<25:16:26, 13.92s/it] 35%|███▍ | 3464/10000 [13:35:19<25:15:08, 13.91s/it] {'loss': 0.0745, 'learning_rate': 3.27e-05, 'epoch': 4.53} 35%|███▍ | 3464/10000 [13:35:19<25:15:08, 13.91s/it] 35%|███▍ | 3465/10000 [13:35:33<25:18:23, 13.94s/it] {'loss': 0.0644, 'learning_rate': 3.2695000000000005e-05, 'epoch': 4.54} 35%|███▍ | 3465/10000 [13:35:33<25:18:23, 13.94s/it] 35%|███▍ | 3466/10000 [13:35:46<25:16:11, 13.92s/it] {'loss': 0.0684, 'learning_rate': 3.269000000000001e-05, 'epoch': 4.54} 35%|███▍ | 3466/10000 [13:35:47<25:16:11, 13.92s/it] 35%|███▍ | 3467/10000 [13:36:00<25:17:00, 13.93s/it] {'loss': 0.0761, 'learning_rate': 3.2684999999999996e-05, 'epoch': 4.54} 35%|███▍ | 3467/10000 [13:36:00<25:17:00, 13.93s/it] 35%|███▍ | 3468/10000 [13:36:14<25:14:56, 13.92s/it] {'loss': 0.0727, 'learning_rate': 3.268e-05, 'epoch': 4.54} 35%|███▍ | 3468/10000 [13:36:14<25:14:56, 13.92s/it] 35%|███▍ | 3469/10000 [13:36:28<25:11:43, 13.89s/it] {'loss': 0.0669, 'learning_rate': 3.2675e-05, 'epoch': 4.54} 35%|███▍ | 3469/10000 [13:36:28<25:11:43, 13.89s/it] 35%|███▍ | 3470/10000 [13:36:42<25:09:47, 13.87s/it] {'loss': 0.0817, 'learning_rate': 3.267e-05, 'epoch': 4.54} 35%|███▍ | 3470/10000 [13:36:42<25:09:47, 13.87s/it] 35%|███▍ | 3471/10000 [13:36:56<25:11:11, 13.89s/it] {'loss': 0.0743, 'learning_rate': 3.2665e-05, 'epoch': 4.54} 35%|███▍ | 3471/10000 [13:36:56<25:11:11, 13.89s/it] 35%|███▍ | 3472/10000 [13:37:10<25:11:17, 13.89s/it] {'loss': 0.071, 'learning_rate': 3.266e-05, 'epoch': 4.54} 35%|███▍ | 3472/10000 [13:37:10<25:11:17, 13.89s/it] 35%|███▍ | 3473/10000 [13:37:24<25:15:40, 13.93s/it] {'loss': 0.0711, 'learning_rate': 3.2655000000000005e-05, 'epoch': 4.55} 35%|███▍ | 3473/10000 [13:37:24<25:15:40, 13.93s/it] 35%|███▍ | 3474/10000 [13:37:38<25:15:16, 13.93s/it] {'loss': 0.0674, 'learning_rate': 3.265e-05, 'epoch': 4.55} 35%|███▍ | 3474/10000 [13:37:38<25:15:16, 13.93s/it] 35%|███▍ | 3475/10000 [13:37:52<25:13:06, 13.91s/it] {'loss': 0.0516, 'learning_rate': 3.2645e-05, 'epoch': 4.55} 35%|███▍ | 3475/10000 [13:37:52<25:13:06, 13.91s/it] 35%|███▍ | 3476/10000 [13:38:05<25:10:15, 13.89s/it] {'loss': 0.0708, 'learning_rate': 3.2640000000000006e-05, 'epoch': 4.55} 35%|███▍ | 3476/10000 [13:38:06<25:10:15, 13.89s/it] 35%|███▍ | 3477/10000 [13:38:19<25:10:04, 13.89s/it] {'loss': 0.0651, 'learning_rate': 3.2635e-05, 'epoch': 4.55} 35%|███▍ | 3477/10000 [13:38:19<25:10:04, 13.89s/it] 35%|███▍ | 3478/10000 [13:38:33<25:11:19, 13.90s/it] {'loss': 0.0713, 'learning_rate': 3.263e-05, 'epoch': 4.55} 35%|███▍ | 3478/10000 [13:38:33<25:11:19, 13.90s/it] 35%|███▍ | 3479/10000 [13:38:47<25:16:17, 13.95s/it] {'loss': 0.0658, 'learning_rate': 3.2625e-05, 'epoch': 4.55} 35%|███▍ | 3479/10000 [13:38:47<25:16:17, 13.95s/it] 35%|███▍ | 3480/10000 [13:39:01<25:17:42, 13.97s/it] {'loss': 0.072, 'learning_rate': 3.262e-05, 'epoch': 4.55} 35%|███▍ | 3480/10000 [13:39:01<25:17:42, 13.97s/it] 35%|███▍ | 3481/10000 [13:39:15<25:15:16, 13.95s/it] {'loss': 0.0722, 'learning_rate': 3.2615e-05, 'epoch': 4.56} 35%|███▍ | 3481/10000 [13:39:15<25:15:16, 13.95s/it] 35%|███▍ | 3482/10000 [13:39:29<25:10:08, 13.90s/it] {'loss': 0.0754, 'learning_rate': 3.261e-05, 'epoch': 4.56} 35%|███▍ | 3482/10000 [13:39:29<25:10:08, 13.90s/it] 35%|███▍ | 3483/10000 [13:39:43<25:14:09, 13.94s/it] {'loss': 0.0785, 'learning_rate': 3.2605000000000004e-05, 'epoch': 4.56} 35%|███▍ | 3483/10000 [13:39:43<25:14:09, 13.94s/it] 35%|███▍ | 3484/10000 [13:39:57<25:12:09, 13.92s/it] {'loss': 0.0625, 'learning_rate': 3.26e-05, 'epoch': 4.56} 35%|███▍ | 3484/10000 [13:39:57<25:12:09, 13.92s/it] 35%|███▍ | 3485/10000 [13:40:11<25:12:38, 13.93s/it] {'loss': 0.0635, 'learning_rate': 3.2595e-05, 'epoch': 4.56} 35%|███▍ | 3485/10000 [13:40:11<25:12:38, 13.93s/it] 35%|███▍ | 3486/10000 [13:40:25<25:12:27, 13.93s/it] {'loss': 0.0586, 'learning_rate': 3.2590000000000005e-05, 'epoch': 4.56} 35%|███▍ | 3486/10000 [13:40:25<25:12:27, 13.93s/it] 35%|███▍ | 3487/10000 [13:40:39<25:15:41, 13.96s/it] {'loss': 0.0664, 'learning_rate': 3.2585e-05, 'epoch': 4.56} 35%|███▍ | 3487/10000 [13:40:39<25:15:41, 13.96s/it] 35%|███▍ | 3488/10000 [13:40:53<25:18:26, 13.99s/it] {'loss': 0.0786, 'learning_rate': 3.2579999999999996e-05, 'epoch': 4.57} 35%|███▍ | 3488/10000 [13:40:53<25:18:26, 13.99s/it] 35%|███▍ | 3489/10000 [13:41:07<25:13:56, 13.95s/it] {'loss': 0.0606, 'learning_rate': 3.2575e-05, 'epoch': 4.57} 35%|███▍ | 3489/10000 [13:41:07<25:13:56, 13.95s/it] 35%|███▍ | 3490/10000 [13:41:21<25:11:39, 13.93s/it] {'loss': 0.0786, 'learning_rate': 3.257e-05, 'epoch': 4.57} 35%|███▍ | 3490/10000 [13:41:21<25:11:39, 13.93s/it] 35%|███▍ | 3491/10000 [13:41:35<25:08:56, 13.91s/it] {'loss': 0.0637, 'learning_rate': 3.2565000000000004e-05, 'epoch': 4.57} 35%|███▍ | 3491/10000 [13:41:35<25:08:56, 13.91s/it] 35%|███▍ | 3492/10000 [13:41:48<25:10:06, 13.92s/it] {'loss': 0.0656, 'learning_rate': 3.256e-05, 'epoch': 4.57} 35%|███▍ | 3492/10000 [13:41:49<25:10:06, 13.92s/it] 35%|███▍ | 3493/10000 [13:42:02<25:08:36, 13.91s/it] {'loss': 0.0767, 'learning_rate': 3.2555e-05, 'epoch': 4.57} 35%|███▍ | 3493/10000 [13:42:02<25:08:36, 13.91s/it] 35%|███▍ | 3494/10000 [13:42:16<25:06:14, 13.89s/it] {'loss': 0.0612, 'learning_rate': 3.2550000000000005e-05, 'epoch': 4.57} 35%|███▍ | 3494/10000 [13:42:16<25:06:14, 13.89s/it] 35%|███▍ | 3495/10000 [13:42:30<25:09:08, 13.92s/it] {'loss': 0.071, 'learning_rate': 3.2545e-05, 'epoch': 4.57} 35%|███▍ | 3495/10000 [13:42:30<25:09:08, 13.92s/it] 35%|███▍ | 3496/10000 [13:42:44<25:10:33, 13.94s/it] {'loss': 0.0756, 'learning_rate': 3.2540000000000004e-05, 'epoch': 4.58} 35%|███▍ | 3496/10000 [13:42:44<25:10:33, 13.94s/it] 35%|███▍ | 3497/10000 [13:42:58<25:09:59, 13.93s/it] {'loss': 0.0701, 'learning_rate': 3.2535e-05, 'epoch': 4.58} 35%|███▍ | 3497/10000 [13:42:58<25:09:59, 13.93s/it] 35%|███▍ | 3498/10000 [13:43:12<25:09:15, 13.93s/it] {'loss': 0.0616, 'learning_rate': 3.253e-05, 'epoch': 4.58} 35%|███▍ | 3498/10000 [13:43:12<25:09:15, 13.93s/it] 35%|███▍ | 3499/10000 [13:43:26<25:07:02, 13.91s/it] {'loss': 0.0612, 'learning_rate': 3.2525e-05, 'epoch': 4.58} 35%|███▍ | 3499/10000 [13:43:26<25:07:02, 13.91s/it] 35%|███▌ | 3500/10000 [13:43:40<25:07:52, 13.92s/it] {'loss': 0.0854, 'learning_rate': 3.252e-05, 'epoch': 4.58} 35%|███▌ | 3500/10000 [13:43:40<25:07:52, 13.92s/it] 35%|███▌ | 3501/10000 [13:43:54<25:07:25, 13.92s/it] {'loss': 0.0814, 'learning_rate': 3.2515e-05, 'epoch': 4.58} 35%|███▌ | 3501/10000 [13:43:54<25:07:25, 13.92s/it] 35%|███▌ | 3502/10000 [13:44:08<25:04:49, 13.89s/it] {'loss': 0.0843, 'learning_rate': 3.251e-05, 'epoch': 4.58} 35%|███▌ | 3502/10000 [13:44:08<25:04:49, 13.89s/it] 35%|███▌ | 3503/10000 [13:44:22<25:07:53, 13.93s/it] {'loss': 0.0656, 'learning_rate': 3.2505e-05, 'epoch': 4.59} 35%|███▌ | 3503/10000 [13:44:22<25:07:53, 13.93s/it] 35%|███▌ | 3504/10000 [13:44:35<25:07:04, 13.92s/it] {'loss': 0.0755, 'learning_rate': 3.2500000000000004e-05, 'epoch': 4.59} 35%|███▌ | 3504/10000 [13:44:35<25:07:04, 13.92s/it] 35%|███▌ | 3505/10000 [13:44:49<25:03:58, 13.89s/it] {'loss': 0.0698, 'learning_rate': 3.2495000000000007e-05, 'epoch': 4.59} 35%|███▌ | 3505/10000 [13:44:49<25:03:58, 13.89s/it] 35%|███▌ | 3506/10000 [13:45:03<25:06:09, 13.92s/it] {'loss': 0.076, 'learning_rate': 3.249e-05, 'epoch': 4.59} 35%|███▌ | 3506/10000 [13:45:03<25:06:09, 13.92s/it] 35%|███▌ | 3507/10000 [13:45:17<25:02:56, 13.89s/it] {'loss': 0.0686, 'learning_rate': 3.2485000000000005e-05, 'epoch': 4.59} 35%|███▌ | 3507/10000 [13:45:17<25:02:56, 13.89s/it] 35%|███▌ | 3508/10000 [13:45:31<25:05:45, 13.92s/it] {'loss': 0.0701, 'learning_rate': 3.248e-05, 'epoch': 4.59} 35%|███▌ | 3508/10000 [13:45:31<25:05:45, 13.92s/it] 35%|███▌ | 3509/10000 [13:45:45<25:07:41, 13.94s/it] {'loss': 0.0719, 'learning_rate': 3.2474999999999997e-05, 'epoch': 4.59} 35%|███▌ | 3509/10000 [13:45:45<25:07:41, 13.94s/it] 35%|███▌ | 3510/10000 [13:45:59<25:10:54, 13.97s/it] {'loss': 0.0645, 'learning_rate': 3.247e-05, 'epoch': 4.59} 35%|███▌ | 3510/10000 [13:45:59<25:10:54, 13.97s/it] 35%|███▌ | 3511/10000 [13:46:13<25:08:23, 13.95s/it] {'loss': 0.0598, 'learning_rate': 3.2465e-05, 'epoch': 4.6} 35%|███▌ | 3511/10000 [13:46:13<25:08:23, 13.95s/it] 35%|███▌ | 3512/10000 [13:46:27<25:06:49, 13.93s/it] {'loss': 0.0613, 'learning_rate': 3.2460000000000004e-05, 'epoch': 4.6} 35%|███▌ | 3512/10000 [13:46:27<25:06:49, 13.93s/it] 35%|███▌ | 3513/10000 [13:46:41<25:04:17, 13.91s/it] {'loss': 0.0813, 'learning_rate': 3.2455e-05, 'epoch': 4.6} 35%|███▌ | 3513/10000 [13:46:41<25:04:17, 13.91s/it] 35%|███▌ | 3514/10000 [13:46:55<25:03:29, 13.91s/it] {'loss': 0.0718, 'learning_rate': 3.245e-05, 'epoch': 4.6} 35%|███▌ | 3514/10000 [13:46:55<25:03:29, 13.91s/it] 35%|███▌ | 3515/10000 [13:47:09<25:01:41, 13.89s/it] {'loss': 0.0737, 'learning_rate': 3.2445000000000005e-05, 'epoch': 4.6} 35%|███▌ | 3515/10000 [13:47:09<25:01:41, 13.89s/it] 35%|███▌ | 3516/10000 [13:47:22<25:01:35, 13.89s/it] {'loss': 0.0551, 'learning_rate': 3.244e-05, 'epoch': 4.6} 35%|███▌ | 3516/10000 [13:47:22<25:01:35, 13.89s/it] 35%|███▌ | 3517/10000 [13:47:36<25:02:13, 13.90s/it] {'loss': 0.0743, 'learning_rate': 3.2435000000000004e-05, 'epoch': 4.6} 35%|███▌ | 3517/10000 [13:47:36<25:02:13, 13.90s/it] 35%|███▌ | 3518/10000 [13:47:50<25:05:35, 13.94s/it] {'loss': 0.0585, 'learning_rate': 3.243e-05, 'epoch': 4.6} 35%|███▌ | 3518/10000 [13:47:50<25:05:35, 13.94s/it] 35%|███▌ | 3519/10000 [13:48:04<25:02:11, 13.91s/it] {'loss': 0.0638, 'learning_rate': 3.2425e-05, 'epoch': 4.61} 35%|███▌ | 3519/10000 [13:48:04<25:02:11, 13.91s/it] 35%|███▌ | 3520/10000 [13:48:18<25:07:33, 13.96s/it] {'loss': 0.0678, 'learning_rate': 3.242e-05, 'epoch': 4.61} 35%|███▌ | 3520/10000 [13:48:18<25:07:33, 13.96s/it] 35%|███▌ | 3521/10000 [13:48:32<25:08:17, 13.97s/it] {'loss': 0.0742, 'learning_rate': 3.2415e-05, 'epoch': 4.61} 35%|███▌ | 3521/10000 [13:48:32<25:08:17, 13.97s/it] 35%|███▌ | 3522/10000 [13:48:46<25:06:21, 13.95s/it] {'loss': 0.0658, 'learning_rate': 3.241e-05, 'epoch': 4.61} 35%|███▌ | 3522/10000 [13:48:46<25:06:21, 13.95s/it] 35%|███▌ | 3523/10000 [13:49:00<25:04:26, 13.94s/it] {'loss': 0.0577, 'learning_rate': 3.2405e-05, 'epoch': 4.61} 35%|███▌ | 3523/10000 [13:49:00<25:04:26, 13.94s/it] 35%|███▌ | 3524/10000 [13:49:14<25:04:22, 13.94s/it] {'loss': 0.0752, 'learning_rate': 3.24e-05, 'epoch': 4.61} 35%|███▌ | 3524/10000 [13:49:14<25:04:22, 13.94s/it] 35%|███▌ | 3525/10000 [13:49:28<25:12:03, 14.01s/it] {'loss': 0.0714, 'learning_rate': 3.2395000000000004e-05, 'epoch': 4.61} 35%|███▌ | 3525/10000 [13:49:28<25:12:03, 14.01s/it] 35%|███▌ | 3526/10000 [13:49:42<25:03:41, 13.94s/it] {'loss': 0.0662, 'learning_rate': 3.239000000000001e-05, 'epoch': 4.62} 35%|███▌ | 3526/10000 [13:49:42<25:03:41, 13.94s/it] 35%|███▌ | 3527/10000 [13:49:56<25:05:26, 13.95s/it] {'loss': 0.0624, 'learning_rate': 3.2385e-05, 'epoch': 4.62} 35%|███▌ | 3527/10000 [13:49:56<25:05:26, 13.95s/it] 35%|███▌ | 3528/10000 [13:50:10<25:05:20, 13.96s/it] {'loss': 0.0677, 'learning_rate': 3.238e-05, 'epoch': 4.62} 35%|███▌ | 3528/10000 [13:50:10<25:05:20, 13.96s/it] 35%|███▌ | 3529/10000 [13:50:24<25:04:36, 13.95s/it] {'loss': 0.0856, 'learning_rate': 3.2375e-05, 'epoch': 4.62} 35%|███▌ | 3529/10000 [13:50:24<25:04:36, 13.95s/it] 35%|███▌ | 3530/10000 [13:50:38<25:04:32, 13.95s/it] {'loss': 0.0672, 'learning_rate': 3.2370000000000003e-05, 'epoch': 4.62} 35%|███▌ | 3530/10000 [13:50:38<25:04:32, 13.95s/it] 35%|███▌ | 3531/10000 [13:50:52<25:06:37, 13.97s/it] {'loss': 0.0671, 'learning_rate': 3.2365e-05, 'epoch': 4.62} 35%|███▌ | 3531/10000 [13:50:52<25:06:37, 13.97s/it] 35%|███▌ | 3532/10000 [13:51:06<25:04:53, 13.96s/it] {'loss': 0.0612, 'learning_rate': 3.236e-05, 'epoch': 4.62} 35%|███▌ | 3532/10000 [13:51:06<25:04:53, 13.96s/it] 35%|███▌ | 3533/10000 [13:51:20<25:02:35, 13.94s/it] {'loss': 0.0718, 'learning_rate': 3.2355000000000004e-05, 'epoch': 4.62} 35%|███▌ | 3533/10000 [13:51:20<25:02:35, 13.94s/it] 35%|███▌ | 3534/10000 [13:51:33<24:58:58, 13.91s/it] {'loss': 0.0785, 'learning_rate': 3.235e-05, 'epoch': 4.63} 35%|███▌ | 3534/10000 [13:51:34<24:58:58, 13.91s/it] 35%|███▌ | 3535/10000 [13:51:47<24:58:51, 13.91s/it] {'loss': 0.0727, 'learning_rate': 3.2345e-05, 'epoch': 4.63} 35%|███▌ | 3535/10000 [13:51:47<24:58:51, 13.91s/it] 35%|███▌ | 3536/10000 [13:52:01<25:02:23, 13.95s/it] {'loss': 0.0624, 'learning_rate': 3.2340000000000005e-05, 'epoch': 4.63} 35%|███▌ | 3536/10000 [13:52:01<25:02:23, 13.95s/it] 35%|███▌ | 3537/10000 [13:52:15<25:01:40, 13.94s/it] {'loss': 0.0769, 'learning_rate': 3.2335e-05, 'epoch': 4.63} 35%|███▌ | 3537/10000 [13:52:15<25:01:40, 13.94s/it] 35%|███▌ | 3538/10000 [13:52:29<24:57:34, 13.91s/it] {'loss': 0.0691, 'learning_rate': 3.233e-05, 'epoch': 4.63} 35%|███▌ | 3538/10000 [13:52:29<24:57:34, 13.91s/it] 35%|███▌ | 3539/10000 [13:52:43<24:58:51, 13.92s/it] {'loss': 0.0858, 'learning_rate': 3.2325e-05, 'epoch': 4.63} 35%|███▌ | 3539/10000 [13:52:43<24:58:51, 13.92s/it] 35%|███▌ | 3540/10000 [13:52:57<24:53:37, 13.87s/it] {'loss': 0.0674, 'learning_rate': 3.232e-05, 'epoch': 4.63} 35%|███▌ | 3540/10000 [13:52:57<24:53:37, 13.87s/it] 35%|███▌ | 3541/10000 [13:53:11<24:52:53, 13.87s/it] {'loss': 0.0785, 'learning_rate': 3.2315e-05, 'epoch': 4.63} 35%|███▌ | 3541/10000 [13:53:11<24:52:53, 13.87s/it] 35%|███▌ | 3542/10000 [13:53:25<24:50:50, 13.85s/it] {'loss': 0.0757, 'learning_rate': 3.231e-05, 'epoch': 4.64} 35%|███▌ | 3542/10000 [13:53:25<24:50:50, 13.85s/it] 35%|███▌ | 3543/10000 [13:53:39<24:54:37, 13.89s/it] {'loss': 0.065, 'learning_rate': 3.2305e-05, 'epoch': 4.64} 35%|███▌ | 3543/10000 [13:53:39<24:54:37, 13.89s/it] 35%|███▌ | 3544/10000 [13:53:53<24:57:07, 13.91s/it] {'loss': 0.0745, 'learning_rate': 3.2300000000000006e-05, 'epoch': 4.64} 35%|███▌ | 3544/10000 [13:53:53<24:57:07, 13.91s/it] 35%|███▌ | 3545/10000 [13:54:07<25:00:55, 13.95s/it] {'loss': 0.0723, 'learning_rate': 3.2295e-05, 'epoch': 4.64} 35%|███▌ | 3545/10000 [13:54:07<25:00:55, 13.95s/it] 35%|███▌ | 3546/10000 [13:54:21<25:02:38, 13.97s/it] {'loss': 0.0701, 'learning_rate': 3.2290000000000004e-05, 'epoch': 4.64} 35%|███▌ | 3546/10000 [13:54:21<25:02:38, 13.97s/it] 35%|███▌ | 3547/10000 [13:54:35<25:01:58, 13.97s/it] {'loss': 0.0725, 'learning_rate': 3.228500000000001e-05, 'epoch': 4.64} 35%|███▌ | 3547/10000 [13:54:35<25:01:58, 13.97s/it] 35%|███▌ | 3548/10000 [13:54:48<25:00:17, 13.95s/it] {'loss': 0.0621, 'learning_rate': 3.2279999999999996e-05, 'epoch': 4.64} 35%|███▌ | 3548/10000 [13:54:48<25:00:17, 13.95s/it] 35%|███▌ | 3549/10000 [13:55:02<24:57:20, 13.93s/it] {'loss': 0.0792, 'learning_rate': 3.2275e-05, 'epoch': 4.65} 35%|███▌ | 3549/10000 [13:55:02<24:57:20, 13.93s/it] 36%|███▌ | 3550/10000 [13:55:16<24:55:05, 13.91s/it] {'loss': 0.0743, 'learning_rate': 3.227e-05, 'epoch': 4.65} 36%|███▌ | 3550/10000 [13:55:16<24:55:05, 13.91s/it] 36%|███▌ | 3551/10000 [13:55:30<24:56:01, 13.92s/it] {'loss': 0.0743, 'learning_rate': 3.2265000000000004e-05, 'epoch': 4.65} 36%|███▌ | 3551/10000 [13:55:30<24:56:01, 13.92s/it] 36%|███▌ | 3552/10000 [13:55:44<24:54:17, 13.90s/it] {'loss': 0.0667, 'learning_rate': 3.226e-05, 'epoch': 4.65} 36%|███▌ | 3552/10000 [13:55:44<24:54:17, 13.90s/it] 36%|███▌ | 3553/10000 [13:55:58<24:52:39, 13.89s/it] {'loss': 0.0606, 'learning_rate': 3.2255e-05, 'epoch': 4.65} 36%|███▌ | 3553/10000 [13:55:58<24:52:39, 13.89s/it] 36%|███▌ | 3554/10000 [13:56:12<24:54:18, 13.91s/it] {'loss': 0.0605, 'learning_rate': 3.2250000000000005e-05, 'epoch': 4.65} 36%|███▌ | 3554/10000 [13:56:12<24:54:18, 13.91s/it] 36%|███▌ | 3555/10000 [13:56:26<24:54:14, 13.91s/it] {'loss': 0.073, 'learning_rate': 3.2245e-05, 'epoch': 4.65} 36%|███▌ | 3555/10000 [13:56:26<24:54:14, 13.91s/it] 36%|███▌ | 3556/10000 [13:56:40<24:56:28, 13.93s/it] {'loss': 0.0761, 'learning_rate': 3.224e-05, 'epoch': 4.65} 36%|███▌ | 3556/10000 [13:56:40<24:56:28, 13.93s/it] 36%|███▌ | 3557/10000 [13:56:54<24:54:52, 13.92s/it] {'loss': 0.0855, 'learning_rate': 3.2235000000000006e-05, 'epoch': 4.66} 36%|███▌ | 3557/10000 [13:56:54<24:54:52, 13.92s/it] 36%|███▌ | 3558/10000 [13:57:08<24:57:28, 13.95s/it] {'loss': 0.064, 'learning_rate': 3.223e-05, 'epoch': 4.66} 36%|███▌ | 3558/10000 [13:57:08<24:57:28, 13.95s/it] 36%|███▌ | 3559/10000 [13:57:21<24:54:41, 13.92s/it] {'loss': 0.0721, 'learning_rate': 3.2225e-05, 'epoch': 4.66} 36%|███▌ | 3559/10000 [13:57:22<24:54:41, 13.92s/it] 36%|███▌ | 3560/10000 [13:57:35<24:53:34, 13.92s/it] {'loss': 0.0684, 'learning_rate': 3.222e-05, 'epoch': 4.66} 36%|███▌ | 3560/10000 [13:57:35<24:53:34, 13.92s/it] 36%|███▌ | 3561/10000 [13:57:49<24:52:57, 13.91s/it] {'loss': 0.0808, 'learning_rate': 3.2215e-05, 'epoch': 4.66} 36%|███▌ | 3561/10000 [13:57:49<24:52:57, 13.91s/it] 36%|███▌ | 3562/10000 [13:58:03<24:52:59, 13.91s/it] {'loss': 0.0785, 'learning_rate': 3.221e-05, 'epoch': 4.66} 36%|███▌ | 3562/10000 [13:58:03<24:52:59, 13.91s/it] 36%|███▌ | 3563/10000 [13:58:17<24:53:23, 13.92s/it] {'loss': 0.0738, 'learning_rate': 3.2205e-05, 'epoch': 4.66} 36%|███▌ | 3563/10000 [13:58:17<24:53:23, 13.92s/it] 36%|███▌ | 3564/10000 [13:58:31<24:52:17, 13.91s/it] {'loss': 0.0704, 'learning_rate': 3.2200000000000003e-05, 'epoch': 4.66} 36%|███▌ | 3564/10000 [13:58:31<24:52:17, 13.91s/it] 36%|███▌ | 3565/10000 [13:58:45<24:52:59, 13.92s/it] {'loss': 0.074, 'learning_rate': 3.2195000000000006e-05, 'epoch': 4.67} 36%|███▌ | 3565/10000 [13:58:45<24:52:59, 13.92s/it] 36%|███▌ | 3566/10000 [13:58:59<24:52:52, 13.92s/it] {'loss': 0.059, 'learning_rate': 3.219e-05, 'epoch': 4.67} 36%|███▌ | 3566/10000 [13:58:59<24:52:52, 13.92s/it] 36%|███▌ | 3567/10000 [13:59:13<24:52:43, 13.92s/it] {'loss': 0.0914, 'learning_rate': 3.2185000000000004e-05, 'epoch': 4.67} 36%|███▌ | 3567/10000 [13:59:13<24:52:43, 13.92s/it] 36%|███▌ | 3568/10000 [13:59:27<24:49:11, 13.89s/it] {'loss': 0.0643, 'learning_rate': 3.218e-05, 'epoch': 4.67} 36%|███▌ | 3568/10000 [13:59:27<24:49:11, 13.89s/it] 36%|███▌ | 3569/10000 [13:59:41<24:56:25, 13.96s/it] {'loss': 0.0628, 'learning_rate': 3.2175e-05, 'epoch': 4.67} 36%|███▌ | 3569/10000 [13:59:41<24:56:25, 13.96s/it] 36%|███▌ | 3570/10000 [13:59:55<24:55:37, 13.96s/it] {'loss': 0.0763, 'learning_rate': 3.217e-05, 'epoch': 4.67} 36%|███▌ | 3570/10000 [13:59:55<24:55:37, 13.96s/it] 36%|███▌ | 3571/10000 [14:00:09<24:54:37, 13.95s/it] {'loss': 0.0793, 'learning_rate': 3.2165e-05, 'epoch': 4.67} 36%|███▌ | 3571/10000 [14:00:09<24:54:37, 13.95s/it] 36%|███▌ | 3572/10000 [14:00:23<24:54:14, 13.95s/it] {'loss': 0.08, 'learning_rate': 3.2160000000000004e-05, 'epoch': 4.68} 36%|███▌ | 3572/10000 [14:00:23<24:54:14, 13.95s/it] 36%|███▌ | 3573/10000 [14:00:37<24:56:29, 13.97s/it] {'loss': 0.0795, 'learning_rate': 3.2155e-05, 'epoch': 4.68} 36%|███▌ | 3573/10000 [14:00:37<24:56:29, 13.97s/it] 36%|███▌ | 3574/10000 [14:00:51<24:56:31, 13.97s/it] {'loss': 0.0748, 'learning_rate': 3.215e-05, 'epoch': 4.68} 36%|███▌ | 3574/10000 [14:00:51<24:56:31, 13.97s/it] 36%|███▌ | 3575/10000 [14:01:05<24:55:21, 13.96s/it] {'loss': 0.0603, 'learning_rate': 3.2145000000000005e-05, 'epoch': 4.68} 36%|███▌ | 3575/10000 [14:01:05<24:55:21, 13.96s/it] 36%|███▌ | 3576/10000 [14:01:19<24:56:46, 13.98s/it] {'loss': 0.0838, 'learning_rate': 3.214e-05, 'epoch': 4.68} 36%|███▌ | 3576/10000 [14:01:19<24:56:46, 13.98s/it] 36%|███▌ | 3577/10000 [14:01:33<24:57:42, 13.99s/it] {'loss': 0.0741, 'learning_rate': 3.2135e-05, 'epoch': 4.68} 36%|███▌ | 3577/10000 [14:01:33<24:57:42, 13.99s/it] 36%|███▌ | 3578/10000 [14:01:46<24:55:07, 13.97s/it] {'loss': 0.0788, 'learning_rate': 3.213e-05, 'epoch': 4.68} 36%|███▌ | 3578/10000 [14:01:47<24:55:07, 13.97s/it] 36%|███▌ | 3579/10000 [14:02:00<24:53:00, 13.95s/it] {'loss': 0.0623, 'learning_rate': 3.2125e-05, 'epoch': 4.68} 36%|███▌ | 3579/10000 [14:02:00<24:53:00, 13.95s/it] 36%|███▌ | 3580/10000 [14:02:14<24:56:47, 13.99s/it] {'loss': 0.0831, 'learning_rate': 3.212e-05, 'epoch': 4.69} 36%|███▌ | 3580/10000 [14:02:14<24:56:47, 13.99s/it] 36%|███▌ | 3581/10000 [14:02:28<24:52:23, 13.95s/it] {'loss': 0.0517, 'learning_rate': 3.2115e-05, 'epoch': 4.69} 36%|███▌ | 3581/10000 [14:02:28<24:52:23, 13.95s/it] 36%|███▌ | 3582/10000 [14:02:42<24:50:12, 13.93s/it] {'loss': 0.0693, 'learning_rate': 3.211e-05, 'epoch': 4.69} 36%|███▌ | 3582/10000 [14:02:42<24:50:12, 13.93s/it] 36%|███▌ | 3583/10000 [14:02:56<24:53:27, 13.96s/it] {'loss': 0.0675, 'learning_rate': 3.2105e-05, 'epoch': 4.69} 36%|███▌ | 3583/10000 [14:02:56<24:53:27, 13.96s/it] 36%|███▌ | 3584/10000 [14:03:10<24:47:03, 13.91s/it] {'loss': 0.0743, 'learning_rate': 3.21e-05, 'epoch': 4.69} 36%|███▌ | 3584/10000 [14:03:10<24:47:03, 13.91s/it] 36%|███▌ | 3585/10000 [14:03:24<24:51:53, 13.95s/it] {'loss': 0.0685, 'learning_rate': 3.2095000000000004e-05, 'epoch': 4.69} 36%|███▌ | 3585/10000 [14:03:24<24:51:53, 13.95s/it] 36%|███▌ | 3586/10000 [14:03:38<24:48:00, 13.92s/it] {'loss': 0.0818, 'learning_rate': 3.2090000000000006e-05, 'epoch': 4.69} 36%|███▌ | 3586/10000 [14:03:38<24:48:00, 13.92s/it] 36%|███▌ | 3587/10000 [14:03:52<24:48:17, 13.92s/it] {'loss': 0.066, 'learning_rate': 3.2085e-05, 'epoch': 4.7} 36%|███▌ | 3587/10000 [14:03:52<24:48:17, 13.92s/it] 36%|███▌ | 3588/10000 [14:04:06<24:46:53, 13.91s/it] {'loss': 0.0686, 'learning_rate': 3.208e-05, 'epoch': 4.7} 36%|███▌ | 3588/10000 [14:04:06<24:46:53, 13.91s/it] 36%|███▌ | 3589/10000 [14:04:20<24:42:58, 13.88s/it] {'loss': 0.0632, 'learning_rate': 3.2075e-05, 'epoch': 4.7} 36%|███▌ | 3589/10000 [14:04:20<24:42:58, 13.88s/it] 36%|███▌ | 3590/10000 [14:04:33<24:43:04, 13.88s/it] {'loss': 0.0619, 'learning_rate': 3.207e-05, 'epoch': 4.7} 36%|███▌ | 3590/10000 [14:04:33<24:43:04, 13.88s/it] 36%|███▌ | 3591/10000 [14:04:48<24:51:40, 13.96s/it] {'loss': 0.0774, 'learning_rate': 3.2065e-05, 'epoch': 4.7} 36%|███▌ | 3591/10000 [14:04:48<24:51:40, 13.96s/it] 36%|███▌ | 3592/10000 [14:05:02<24:50:14, 13.95s/it] {'loss': 0.06, 'learning_rate': 3.206e-05, 'epoch': 4.7} 36%|███▌ | 3592/10000 [14:05:02<24:50:14, 13.95s/it] 36%|███▌ | 3593/10000 [14:05:15<24:49:43, 13.95s/it] {'loss': 0.0636, 'learning_rate': 3.2055000000000004e-05, 'epoch': 4.7} 36%|███▌ | 3593/10000 [14:05:16<24:49:43, 13.95s/it] 36%|███▌ | 3594/10000 [14:05:29<24:47:19, 13.93s/it] {'loss': 0.0691, 'learning_rate': 3.205e-05, 'epoch': 4.7} 36%|███▌ | 3594/10000 [14:05:29<24:47:19, 13.93s/it] 36%|███▌ | 3595/10000 [14:05:43<24:45:49, 13.92s/it] {'loss': 0.0695, 'learning_rate': 3.2045e-05, 'epoch': 4.71} 36%|███▌ | 3595/10000 [14:05:43<24:45:49, 13.92s/it] 36%|███▌ | 3596/10000 [14:05:57<24:45:45, 13.92s/it] {'loss': 0.08, 'learning_rate': 3.2040000000000005e-05, 'epoch': 4.71} 36%|███▌ | 3596/10000 [14:05:57<24:45:45, 13.92s/it] 36%|███▌ | 3597/10000 [14:06:11<24:45:03, 13.92s/it] {'loss': 0.0711, 'learning_rate': 3.2035e-05, 'epoch': 4.71} 36%|███▌ | 3597/10000 [14:06:11<24:45:03, 13.92s/it] 36%|███▌ | 3598/10000 [14:06:25<24:39:16, 13.86s/it] {'loss': 0.088, 'learning_rate': 3.2029999999999997e-05, 'epoch': 4.71} 36%|███▌ | 3598/10000 [14:06:25<24:39:16, 13.86s/it] 36%|███▌ | 3599/10000 [14:06:39<24:37:18, 13.85s/it] {'loss': 0.0747, 'learning_rate': 3.2025e-05, 'epoch': 4.71} 36%|███▌ | 3599/10000 [14:06:39<24:37:18, 13.85s/it] 36%|███▌ | 3600/10000 [14:06:53<24:38:57, 13.87s/it] {'loss': 0.1029, 'learning_rate': 3.202e-05, 'epoch': 4.71} 36%|███▌ | 3600/10000 [14:06:53<24:38:57, 13.87s/it] 36%|███▌ | 3601/10000 [14:07:06<24:40:20, 13.88s/it] {'loss': 0.0847, 'learning_rate': 3.2015e-05, 'epoch': 4.71} 36%|███▌ | 3601/10000 [14:07:06<24:40:20, 13.88s/it] 36%|███▌ | 3602/10000 [14:07:20<24:40:26, 13.88s/it] {'loss': 0.0761, 'learning_rate': 3.201e-05, 'epoch': 4.71} 36%|███▌ | 3602/10000 [14:07:20<24:40:26, 13.88s/it] 36%|███▌ | 3603/10000 [14:07:34<24:45:08, 13.93s/it] {'loss': 0.0703, 'learning_rate': 3.2005e-05, 'epoch': 4.72} 36%|███▌ | 3603/10000 [14:07:34<24:45:08, 13.93s/it] 36%|███▌ | 3604/10000 [14:07:48<24:48:56, 13.97s/it] {'loss': 0.0845, 'learning_rate': 3.2000000000000005e-05, 'epoch': 4.72} 36%|███▌ | 3604/10000 [14:07:48<24:48:56, 13.97s/it] 36%|███▌ | 3605/10000 [14:08:02<24:47:08, 13.95s/it] {'loss': 0.0832, 'learning_rate': 3.1995e-05, 'epoch': 4.72} 36%|███▌ | 3605/10000 [14:08:02<24:47:08, 13.95s/it] 36%|███▌ | 3606/10000 [14:08:16<24:50:53, 13.99s/it] {'loss': 0.0668, 'learning_rate': 3.1990000000000004e-05, 'epoch': 4.72} 36%|███▌ | 3606/10000 [14:08:16<24:50:53, 13.99s/it] 36%|███▌ | 3607/10000 [14:08:30<24:46:52, 13.95s/it] {'loss': 0.0809, 'learning_rate': 3.1985000000000006e-05, 'epoch': 4.72} 36%|███▌ | 3607/10000 [14:08:30<24:46:52, 13.95s/it] 36%|███▌ | 3608/10000 [14:08:44<24:41:34, 13.91s/it] {'loss': 0.0883, 'learning_rate': 3.198e-05, 'epoch': 4.72} 36%|███▌ | 3608/10000 [14:08:44<24:41:34, 13.91s/it] 36%|███▌ | 3609/10000 [14:08:58<24:47:22, 13.96s/it] {'loss': 0.0592, 'learning_rate': 3.1975e-05, 'epoch': 4.72} 36%|███▌ | 3609/10000 [14:08:58<24:47:22, 13.96s/it] 36%|███▌ | 3610/10000 [14:09:12<24:44:20, 13.94s/it] {'loss': 0.0601, 'learning_rate': 3.197e-05, 'epoch': 4.73} 36%|███▌ | 3610/10000 [14:09:12<24:44:20, 13.94s/it] 36%|███▌ | 3611/10000 [14:09:26<24:40:51, 13.91s/it] {'loss': 0.058, 'learning_rate': 3.1965e-05, 'epoch': 4.73} 36%|███▌ | 3611/10000 [14:09:26<24:40:51, 13.91s/it] 36%|███▌ | 3612/10000 [14:09:40<24:40:09, 13.90s/it] {'loss': 0.063, 'learning_rate': 3.196e-05, 'epoch': 4.73} 36%|███▌ | 3612/10000 [14:09:40<24:40:09, 13.90s/it] 36%|███▌ | 3613/10000 [14:09:54<24:39:12, 13.90s/it] {'loss': 0.0718, 'learning_rate': 3.1955e-05, 'epoch': 4.73} 36%|███▌ | 3613/10000 [14:09:54<24:39:12, 13.90s/it] 36%|███▌ | 3614/10000 [14:10:07<24:35:23, 13.86s/it] {'loss': 0.0703, 'learning_rate': 3.1950000000000004e-05, 'epoch': 4.73} 36%|███▌ | 3614/10000 [14:10:07<24:35:23, 13.86s/it] 36%|███▌ | 3615/10000 [14:10:21<24:39:34, 13.90s/it] {'loss': 0.087, 'learning_rate': 3.1945e-05, 'epoch': 4.73} 36%|███▌ | 3615/10000 [14:10:22<24:39:34, 13.90s/it] 36%|███▌ | 3616/10000 [14:10:35<24:38:24, 13.89s/it] {'loss': 0.0815, 'learning_rate': 3.194e-05, 'epoch': 4.73} 36%|███▌ | 3616/10000 [14:10:35<24:38:24, 13.89s/it] 36%|███▌ | 3617/10000 [14:10:49<24:37:44, 13.89s/it] {'loss': 0.0814, 'learning_rate': 3.1935000000000005e-05, 'epoch': 4.73} 36%|███▌ | 3617/10000 [14:10:49<24:37:44, 13.89s/it] 36%|███▌ | 3618/10000 [14:11:03<24:42:39, 13.94s/it] {'loss': 0.0732, 'learning_rate': 3.193e-05, 'epoch': 4.74} 36%|███▌ | 3618/10000 [14:11:03<24:42:39, 13.94s/it] 36%|███▌ | 3619/10000 [14:11:17<24:41:26, 13.93s/it] {'loss': 0.0662, 'learning_rate': 3.1925e-05, 'epoch': 4.74} 36%|███▌ | 3619/10000 [14:11:17<24:41:26, 13.93s/it] 36%|███▌ | 3620/10000 [14:11:31<24:39:21, 13.91s/it] {'loss': 0.0786, 'learning_rate': 3.192e-05, 'epoch': 4.74} 36%|███▌ | 3620/10000 [14:11:31<24:39:21, 13.91s/it] 36%|███▌ | 3621/10000 [14:11:45<24:38:13, 13.90s/it] {'loss': 0.0841, 'learning_rate': 3.1915e-05, 'epoch': 4.74} 36%|███▌ | 3621/10000 [14:11:45<24:38:13, 13.90s/it] 36%|███▌ | 3622/10000 [14:11:59<24:38:52, 13.91s/it] {'loss': 0.077, 'learning_rate': 3.191e-05, 'epoch': 4.74} 36%|███▌ | 3622/10000 [14:11:59<24:38:52, 13.91s/it] 36%|███▌ | 3623/10000 [14:12:13<24:38:27, 13.91s/it] {'loss': 0.0831, 'learning_rate': 3.1905e-05, 'epoch': 4.74} 36%|███▌ | 3623/10000 [14:12:13<24:38:27, 13.91s/it] 36%|███▌ | 3624/10000 [14:12:27<24:36:18, 13.89s/it] {'loss': 0.0802, 'learning_rate': 3.19e-05, 'epoch': 4.74} 36%|███▌ | 3624/10000 [14:12:27<24:36:18, 13.89s/it] 36%|███▋ | 3625/10000 [14:12:40<24:35:20, 13.89s/it] {'loss': 0.0701, 'learning_rate': 3.1895000000000005e-05, 'epoch': 4.74} 36%|███▋ | 3625/10000 [14:12:41<24:35:20, 13.89s/it] 36%|███▋ | 3626/10000 [14:12:54<24:34:45, 13.88s/it] {'loss': 0.0783, 'learning_rate': 3.189e-05, 'epoch': 4.75} 36%|███▋ | 3626/10000 [14:12:54<24:34:45, 13.88s/it] 36%|███▋ | 3627/10000 [14:13:08<24:38:08, 13.92s/it] {'loss': 0.0719, 'learning_rate': 3.1885000000000004e-05, 'epoch': 4.75} 36%|███▋ | 3627/10000 [14:13:08<24:38:08, 13.92s/it] 36%|███▋ | 3628/10000 [14:13:22<24:39:10, 13.93s/it] {'loss': 0.0741, 'learning_rate': 3.188e-05, 'epoch': 4.75} 36%|███▋ | 3628/10000 [14:13:22<24:39:10, 13.93s/it] 36%|███▋ | 3629/10000 [14:13:36<24:40:59, 13.95s/it] {'loss': 0.0798, 'learning_rate': 3.1875e-05, 'epoch': 4.75} 36%|███▋ | 3629/10000 [14:13:36<24:40:59, 13.95s/it] 36%|███▋ | 3630/10000 [14:13:50<24:39:08, 13.93s/it] {'loss': 0.083, 'learning_rate': 3.187e-05, 'epoch': 4.75} 36%|███▋ | 3630/10000 [14:13:50<24:39:08, 13.93s/it] 36%|███▋ | 3631/10000 [14:14:04<24:40:07, 13.94s/it] {'loss': 0.0691, 'learning_rate': 3.1865e-05, 'epoch': 4.75} 36%|███▋ | 3631/10000 [14:14:04<24:40:07, 13.94s/it] 36%|███▋ | 3632/10000 [14:14:18<24:38:17, 13.93s/it] {'loss': 0.0769, 'learning_rate': 3.186e-05, 'epoch': 4.75} 36%|███▋ | 3632/10000 [14:14:18<24:38:17, 13.93s/it] 36%|███▋ | 3633/10000 [14:14:32<24:43:13, 13.98s/it] {'loss': 0.0803, 'learning_rate': 3.1855e-05, 'epoch': 4.76} 36%|███▋ | 3633/10000 [14:14:32<24:43:13, 13.98s/it] 36%|███▋ | 3634/10000 [14:14:46<24:43:21, 13.98s/it] {'loss': 0.0808, 'learning_rate': 3.185e-05, 'epoch': 4.76} 36%|███▋ | 3634/10000 [14:14:46<24:43:21, 13.98s/it] 36%|███▋ | 3635/10000 [14:15:00<24:37:36, 13.93s/it] {'loss': 0.0678, 'learning_rate': 3.1845000000000004e-05, 'epoch': 4.76} 36%|███▋ | 3635/10000 [14:15:00<24:37:36, 13.93s/it] 36%|███▋ | 3636/10000 [14:15:14<24:40:24, 13.96s/it] {'loss': 0.0844, 'learning_rate': 3.184e-05, 'epoch': 4.76} 36%|███▋ | 3636/10000 [14:15:14<24:40:24, 13.96s/it] 36%|███▋ | 3637/10000 [14:15:28<24:36:56, 13.93s/it] {'loss': 0.0896, 'learning_rate': 3.1835e-05, 'epoch': 4.76} 36%|███▋ | 3637/10000 [14:15:28<24:36:56, 13.93s/it] 36%|███▋ | 3638/10000 [14:15:42<24:37:15, 13.93s/it] {'loss': 0.0679, 'learning_rate': 3.1830000000000005e-05, 'epoch': 4.76} 36%|███▋ | 3638/10000 [14:15:42<24:37:15, 13.93s/it] 36%|███▋ | 3639/10000 [14:15:56<24:32:49, 13.89s/it] {'loss': 0.0674, 'learning_rate': 3.1825e-05, 'epoch': 4.76} 36%|███▋ | 3639/10000 [14:15:56<24:32:49, 13.89s/it] 36%|███▋ | 3640/10000 [14:16:09<24:29:39, 13.86s/it] {'loss': 0.0707, 'learning_rate': 3.182e-05, 'epoch': 4.76} 36%|███▋ | 3640/10000 [14:16:09<24:29:39, 13.86s/it] 36%|███▋ | 3641/10000 [14:16:23<24:34:14, 13.91s/it] {'loss': 0.076, 'learning_rate': 3.1815e-05, 'epoch': 4.77} 36%|███▋ | 3641/10000 [14:16:23<24:34:14, 13.91s/it] 36%|███▋ | 3642/10000 [14:16:37<24:32:24, 13.90s/it] {'loss': 0.0868, 'learning_rate': 3.181e-05, 'epoch': 4.77} 36%|███▋ | 3642/10000 [14:16:37<24:32:24, 13.90s/it] 36%|███▋ | 3643/10000 [14:16:51<24:32:27, 13.90s/it] {'loss': 0.0755, 'learning_rate': 3.1805000000000005e-05, 'epoch': 4.77} 36%|███▋ | 3643/10000 [14:16:51<24:32:27, 13.90s/it] 36%|███▋ | 3644/10000 [14:17:05<24:30:43, 13.88s/it] {'loss': 0.0607, 'learning_rate': 3.18e-05, 'epoch': 4.77} 36%|███▋ | 3644/10000 [14:17:05<24:30:43, 13.88s/it] 36%|███▋ | 3645/10000 [14:17:19<24:36:17, 13.94s/it] {'loss': 0.0737, 'learning_rate': 3.1795e-05, 'epoch': 4.77} 36%|███▋ | 3645/10000 [14:17:19<24:36:17, 13.94s/it] 36%|███▋ | 3646/10000 [14:17:33<24:39:48, 13.97s/it] {'loss': 0.0741, 'learning_rate': 3.1790000000000006e-05, 'epoch': 4.77} 36%|███▋ | 3646/10000 [14:17:33<24:39:48, 13.97s/it] 36%|███▋ | 3647/10000 [14:17:47<24:39:07, 13.97s/it] {'loss': 0.0819, 'learning_rate': 3.1785e-05, 'epoch': 4.77} 36%|███▋ | 3647/10000 [14:17:47<24:39:07, 13.97s/it] 36%|███▋ | 3648/10000 [14:18:01<24:38:28, 13.97s/it] {'loss': 0.0732, 'learning_rate': 3.1780000000000004e-05, 'epoch': 4.77} 36%|███▋ | 3648/10000 [14:18:01<24:38:28, 13.97s/it] 36%|███▋ | 3649/10000 [14:18:15<24:36:01, 13.94s/it] {'loss': 0.0803, 'learning_rate': 3.1775e-05, 'epoch': 4.78} 36%|███▋ | 3649/10000 [14:18:15<24:36:01, 13.94s/it] 36%|███▋ | 3650/10000 [14:18:29<24:29:18, 13.88s/it] {'loss': 0.0633, 'learning_rate': 3.177e-05, 'epoch': 4.78} 36%|███▋ | 3650/10000 [14:18:29<24:29:18, 13.88s/it] 37%|███▋ | 3651/10000 [14:18:43<24:32:54, 13.92s/it] {'loss': 0.063, 'learning_rate': 3.1765e-05, 'epoch': 4.78} 37%|███▋ | 3651/10000 [14:18:43<24:32:54, 13.92s/it] 37%|███▋ | 3652/10000 [14:18:57<24:32:41, 13.92s/it] {'loss': 0.0895, 'learning_rate': 3.176e-05, 'epoch': 4.78} 37%|███▋ | 3652/10000 [14:18:57<24:32:41, 13.92s/it] 37%|███▋ | 3653/10000 [14:19:11<24:31:51, 13.91s/it] {'loss': 0.0633, 'learning_rate': 3.1755000000000003e-05, 'epoch': 4.78} 37%|███▋ | 3653/10000 [14:19:11<24:31:51, 13.91s/it] 37%|███▋ | 3654/10000 [14:19:24<24:32:49, 13.93s/it] {'loss': 0.0783, 'learning_rate': 3.175e-05, 'epoch': 4.78} 37%|███▋ | 3654/10000 [14:19:24<24:32:49, 13.93s/it] 37%|███▋ | 3655/10000 [14:19:38<24:30:00, 13.90s/it] {'loss': 0.0662, 'learning_rate': 3.1745e-05, 'epoch': 4.78} 37%|███▋ | 3655/10000 [14:19:38<24:30:00, 13.90s/it] 37%|███▋ | 3656/10000 [14:19:52<24:33:56, 13.94s/it] {'loss': 0.0575, 'learning_rate': 3.1740000000000004e-05, 'epoch': 4.79} 37%|███▋ | 3656/10000 [14:19:52<24:33:56, 13.94s/it] 37%|███▋ | 3657/10000 [14:20:06<24:32:12, 13.93s/it] {'loss': 0.0901, 'learning_rate': 3.1735e-05, 'epoch': 4.79} 37%|███▋ | 3657/10000 [14:20:06<24:32:12, 13.93s/it] 37%|███▋ | 3658/10000 [14:20:20<24:30:29, 13.91s/it] {'loss': 0.0766, 'learning_rate': 3.173e-05, 'epoch': 4.79} 37%|███▋ | 3658/10000 [14:20:20<24:30:29, 13.91s/it] 37%|███▋ | 3659/10000 [14:20:34<24:28:59, 13.90s/it] {'loss': 0.0662, 'learning_rate': 3.1725e-05, 'epoch': 4.79} 37%|███▋ | 3659/10000 [14:20:34<24:28:59, 13.90s/it] 37%|███▋ | 3660/10000 [14:20:48<24:26:51, 13.88s/it] {'loss': 0.077, 'learning_rate': 3.172e-05, 'epoch': 4.79} 37%|███▋ | 3660/10000 [14:20:48<24:26:51, 13.88s/it] 37%|███▋ | 3661/10000 [14:21:02<24:27:50, 13.89s/it] {'loss': 0.0771, 'learning_rate': 3.1715e-05, 'epoch': 4.79} 37%|███▋ | 3661/10000 [14:21:02<24:27:50, 13.89s/it] 37%|███▋ | 3662/10000 [14:21:16<24:39:05, 14.00s/it] {'loss': 0.0704, 'learning_rate': 3.171e-05, 'epoch': 4.79} 37%|███▋ | 3662/10000 [14:21:16<24:39:05, 14.00s/it] 37%|███▋ | 3663/10000 [14:21:30<24:34:44, 13.96s/it] {'loss': 0.0622, 'learning_rate': 3.1705e-05, 'epoch': 4.79} 37%|███▋ | 3663/10000 [14:21:30<24:34:44, 13.96s/it] 37%|███▋ | 3664/10000 [14:21:44<24:34:17, 13.96s/it] {'loss': 0.0698, 'learning_rate': 3.1700000000000005e-05, 'epoch': 4.8} 37%|███▋ | 3664/10000 [14:21:44<24:34:17, 13.96s/it] 37%|███▋ | 3665/10000 [14:21:58<24:30:20, 13.93s/it] {'loss': 0.0741, 'learning_rate': 3.1695e-05, 'epoch': 4.8} 37%|███▋ | 3665/10000 [14:21:58<24:30:20, 13.93s/it] 37%|███▋ | 3666/10000 [14:22:12<24:31:03, 13.93s/it] {'loss': 0.0657, 'learning_rate': 3.169e-05, 'epoch': 4.8} 37%|███▋ | 3666/10000 [14:22:12<24:31:03, 13.93s/it] 37%|███▋ | 3667/10000 [14:22:25<24:25:02, 13.88s/it] {'loss': 0.0586, 'learning_rate': 3.1685000000000006e-05, 'epoch': 4.8} 37%|███▋ | 3667/10000 [14:22:25<24:25:02, 13.88s/it] 37%|███▋ | 3668/10000 [14:22:39<24:28:13, 13.91s/it] {'loss': 0.0642, 'learning_rate': 3.168e-05, 'epoch': 4.8} 37%|███▋ | 3668/10000 [14:22:39<24:28:13, 13.91s/it] 37%|███▋ | 3669/10000 [14:22:53<24:27:51, 13.91s/it] {'loss': 0.0719, 'learning_rate': 3.1675e-05, 'epoch': 4.8} 37%|███▋ | 3669/10000 [14:22:53<24:27:51, 13.91s/it] 37%|███▋ | 3670/10000 [14:23:07<24:28:09, 13.92s/it] {'loss': 0.0842, 'learning_rate': 3.167e-05, 'epoch': 4.8} 37%|███▋ | 3670/10000 [14:23:07<24:28:09, 13.92s/it] 37%|███▋ | 3671/10000 [14:23:21<24:25:58, 13.90s/it] {'loss': 0.0692, 'learning_rate': 3.1665e-05, 'epoch': 4.8} 37%|███▋ | 3671/10000 [14:23:21<24:25:58, 13.90s/it] 37%|███▋ | 3672/10000 [14:23:35<24:28:24, 13.92s/it] {'loss': 0.0873, 'learning_rate': 3.166e-05, 'epoch': 4.81} 37%|███▋ | 3672/10000 [14:23:35<24:28:24, 13.92s/it] 37%|███▋ | 3673/10000 [14:23:49<24:24:39, 13.89s/it] {'loss': 0.0863, 'learning_rate': 3.1655e-05, 'epoch': 4.81} 37%|███▋ | 3673/10000 [14:23:49<24:24:39, 13.89s/it] 37%|███▋ | 3674/10000 [14:24:03<24:29:57, 13.94s/it] {'loss': 0.0926, 'learning_rate': 3.1650000000000004e-05, 'epoch': 4.81} 37%|███▋ | 3674/10000 [14:24:03<24:29:57, 13.94s/it] 37%|███▋ | 3675/10000 [14:24:17<24:29:00, 13.94s/it] {'loss': 0.0952, 'learning_rate': 3.1645e-05, 'epoch': 4.81} 37%|███▋ | 3675/10000 [14:24:17<24:29:00, 13.94s/it] 37%|███▋ | 3676/10000 [14:24:31<24:30:49, 13.95s/it] {'loss': 0.0627, 'learning_rate': 3.164e-05, 'epoch': 4.81} 37%|███▋ | 3676/10000 [14:24:31<24:30:49, 13.95s/it] 37%|███▋ | 3677/10000 [14:24:45<24:31:50, 13.97s/it] {'loss': 0.0812, 'learning_rate': 3.1635000000000005e-05, 'epoch': 4.81} 37%|███▋ | 3677/10000 [14:24:45<24:31:50, 13.97s/it] 37%|███▋ | 3678/10000 [14:24:59<24:29:41, 13.95s/it] {'loss': 0.0678, 'learning_rate': 3.163000000000001e-05, 'epoch': 4.81} 37%|███▋ | 3678/10000 [14:24:59<24:29:41, 13.95s/it] 37%|███▋ | 3679/10000 [14:25:13<24:26:45, 13.92s/it] {'loss': 0.0723, 'learning_rate': 3.1624999999999996e-05, 'epoch': 4.82} 37%|███▋ | 3679/10000 [14:25:13<24:26:45, 13.92s/it] 37%|███▋ | 3680/10000 [14:25:26<24:25:47, 13.92s/it] {'loss': 0.0653, 'learning_rate': 3.162e-05, 'epoch': 4.82} 37%|███▋ | 3680/10000 [14:25:27<24:25:47, 13.92s/it] 37%|███▋ | 3681/10000 [14:25:40<24:23:35, 13.90s/it] {'loss': 0.0732, 'learning_rate': 3.1615e-05, 'epoch': 4.82} 37%|███▋ | 3681/10000 [14:25:40<24:23:35, 13.90s/it] 37%|███▋ | 3682/10000 [14:25:54<24:23:34, 13.90s/it] {'loss': 0.0605, 'learning_rate': 3.1610000000000004e-05, 'epoch': 4.82} 37%|███▋ | 3682/10000 [14:25:54<24:23:34, 13.90s/it] 37%|███▋ | 3683/10000 [14:26:08<24:24:30, 13.91s/it] {'loss': 0.0714, 'learning_rate': 3.1605e-05, 'epoch': 4.82} 37%|███▋ | 3683/10000 [14:26:08<24:24:30, 13.91s/it] 37%|███▋ | 3684/10000 [14:26:22<24:26:48, 13.93s/it] {'loss': 0.0714, 'learning_rate': 3.16e-05, 'epoch': 4.82} 37%|███▋ | 3684/10000 [14:26:22<24:26:48, 13.93s/it] 37%|███▋ | 3685/10000 [14:26:36<24:23:01, 13.90s/it] {'loss': 0.0724, 'learning_rate': 3.1595000000000005e-05, 'epoch': 4.82} 37%|███▋ | 3685/10000 [14:26:36<24:23:01, 13.90s/it] 37%|███▋ | 3686/10000 [14:26:50<24:23:01, 13.90s/it] {'loss': 0.0784, 'learning_rate': 3.159e-05, 'epoch': 4.82} 37%|███▋ | 3686/10000 [14:26:50<24:23:01, 13.90s/it] 37%|███▋ | 3687/10000 [14:27:04<24:22:14, 13.90s/it] {'loss': 0.0713, 'learning_rate': 3.1585e-05, 'epoch': 4.83} 37%|███▋ | 3687/10000 [14:27:04<24:22:14, 13.90s/it] 37%|███▋ | 3688/10000 [14:27:18<24:20:56, 13.89s/it] {'loss': 0.0785, 'learning_rate': 3.1580000000000006e-05, 'epoch': 4.83} 37%|███▋ | 3688/10000 [14:27:18<24:20:56, 13.89s/it] 37%|███▋ | 3689/10000 [14:27:32<24:23:51, 13.92s/it] {'loss': 0.0783, 'learning_rate': 3.1575e-05, 'epoch': 4.83} 37%|███▋ | 3689/10000 [14:27:32<24:23:51, 13.92s/it] 37%|███▋ | 3690/10000 [14:27:46<24:22:42, 13.91s/it] {'loss': 0.0812, 'learning_rate': 3.157e-05, 'epoch': 4.83} 37%|███▋ | 3690/10000 [14:27:46<24:22:42, 13.91s/it] 37%|███▋ | 3691/10000 [14:27:59<24:23:40, 13.92s/it] {'loss': 0.0788, 'learning_rate': 3.1565e-05, 'epoch': 4.83} 37%|███▋ | 3691/10000 [14:28:00<24:23:40, 13.92s/it] 37%|███▋ | 3692/10000 [14:28:13<24:26:01, 13.94s/it] {'loss': 0.0697, 'learning_rate': 3.156e-05, 'epoch': 4.83} 37%|███▋ | 3692/10000 [14:28:14<24:26:01, 13.94s/it] 37%|███▋ | 3693/10000 [14:28:27<24:23:30, 13.92s/it] {'loss': 0.0779, 'learning_rate': 3.1555e-05, 'epoch': 4.83} 37%|███▋ | 3693/10000 [14:28:27<24:23:30, 13.92s/it] 37%|███▋ | 3694/10000 [14:28:41<24:26:14, 13.95s/it] {'loss': 0.0746, 'learning_rate': 3.155e-05, 'epoch': 4.84} 37%|███▋ | 3694/10000 [14:28:41<24:26:14, 13.95s/it] 37%|███▋ | 3695/10000 [14:28:55<24:26:08, 13.95s/it] {'loss': 0.0878, 'learning_rate': 3.1545000000000004e-05, 'epoch': 4.84} 37%|███▋ | 3695/10000 [14:28:55<24:26:08, 13.95s/it] 37%|███▋ | 3696/10000 [14:29:09<24:24:30, 13.94s/it] {'loss': 0.0661, 'learning_rate': 3.154e-05, 'epoch': 4.84} 37%|███▋ | 3696/10000 [14:29:09<24:24:30, 13.94s/it] 37%|███▋ | 3697/10000 [14:29:23<24:28:14, 13.98s/it] {'loss': 0.0825, 'learning_rate': 3.1535e-05, 'epoch': 4.84} 37%|███▋ | 3697/10000 [14:29:23<24:28:14, 13.98s/it] 37%|███▋ | 3698/10000 [14:29:37<24:26:05, 13.96s/it] {'loss': 0.0612, 'learning_rate': 3.1530000000000005e-05, 'epoch': 4.84} 37%|███▋ | 3698/10000 [14:29:37<24:26:05, 13.96s/it] 37%|███▋ | 3699/10000 [14:29:51<24:25:09, 13.95s/it] {'loss': 0.0667, 'learning_rate': 3.1525e-05, 'epoch': 4.84} 37%|███▋ | 3699/10000 [14:29:51<24:25:09, 13.95s/it] 37%|███▋ | 3700/10000 [14:30:05<24:23:47, 13.94s/it] {'loss': 0.0767, 'learning_rate': 3.1519999999999996e-05, 'epoch': 4.84} 37%|███▋ | 3700/10000 [14:30:05<24:23:47, 13.94s/it] 37%|███▋ | 3701/10000 [14:30:19<24:24:53, 13.95s/it] {'loss': 0.0866, 'learning_rate': 3.1515e-05, 'epoch': 4.84} 37%|███▋ | 3701/10000 [14:30:19<24:24:53, 13.95s/it] 37%|███▋ | 3702/10000 [14:30:33<24:24:54, 13.96s/it] {'loss': 0.0836, 'learning_rate': 3.151e-05, 'epoch': 4.85} 37%|███▋ | 3702/10000 [14:30:33<24:24:54, 13.96s/it] 37%|███▋ | 3703/10000 [14:30:47<24:23:29, 13.94s/it] {'loss': 0.0721, 'learning_rate': 3.1505000000000004e-05, 'epoch': 4.85} 37%|███▋ | 3703/10000 [14:30:47<24:23:29, 13.94s/it] 37%|███▋ | 3704/10000 [14:31:01<24:24:25, 13.96s/it] {'loss': 0.0735, 'learning_rate': 3.15e-05, 'epoch': 4.85} 37%|███▋ | 3704/10000 [14:31:01<24:24:25, 13.96s/it] 37%|███▋ | 3705/10000 [14:31:15<24:25:30, 13.97s/it] {'loss': 0.0801, 'learning_rate': 3.1495e-05, 'epoch': 4.85} 37%|███▋ | 3705/10000 [14:31:15<24:25:30, 13.97s/it] 37%|███▋ | 3706/10000 [14:31:29<24:27:12, 13.99s/it] {'loss': 0.0711, 'learning_rate': 3.1490000000000005e-05, 'epoch': 4.85} 37%|███▋ | 3706/10000 [14:31:29<24:27:12, 13.99s/it] 37%|███▋ | 3707/10000 [14:31:43<24:27:25, 13.99s/it] {'loss': 0.0729, 'learning_rate': 3.1485e-05, 'epoch': 4.85} 37%|███▋ | 3707/10000 [14:31:43<24:27:25, 13.99s/it] 37%|███▋ | 3708/10000 [14:31:57<24:26:20, 13.98s/it] {'loss': 0.0774, 'learning_rate': 3.1480000000000004e-05, 'epoch': 4.85} 37%|███▋ | 3708/10000 [14:31:57<24:26:20, 13.98s/it] 37%|███▋ | 3709/10000 [14:32:11<24:24:20, 13.97s/it] {'loss': 0.0841, 'learning_rate': 3.1475e-05, 'epoch': 4.85} 37%|███▋ | 3709/10000 [14:32:11<24:24:20, 13.97s/it] 37%|███▋ | 3710/10000 [14:32:25<24:25:18, 13.98s/it] {'loss': 0.0672, 'learning_rate': 3.147e-05, 'epoch': 4.86} 37%|███▋ | 3710/10000 [14:32:25<24:25:18, 13.98s/it] 37%|███▋ | 3711/10000 [14:32:39<24:23:47, 13.97s/it] {'loss': 0.0794, 'learning_rate': 3.1465e-05, 'epoch': 4.86} 37%|███▋ | 3711/10000 [14:32:39<24:23:47, 13.97s/it] 37%|███▋ | 3712/10000 [14:32:53<24:24:15, 13.97s/it] {'loss': 0.0717, 'learning_rate': 3.146e-05, 'epoch': 4.86} 37%|███▋ | 3712/10000 [14:32:53<24:24:15, 13.97s/it] 37%|███▋ | 3713/10000 [14:33:07<24:19:22, 13.93s/it] {'loss': 0.0855, 'learning_rate': 3.1455e-05, 'epoch': 4.86} 37%|███▋ | 3713/10000 [14:33:07<24:19:22, 13.93s/it] 37%|███▋ | 3714/10000 [14:33:20<24:18:27, 13.92s/it] {'loss': 0.0797, 'learning_rate': 3.145e-05, 'epoch': 4.86} 37%|███▋ | 3714/10000 [14:33:21<24:18:27, 13.92s/it] 37%|███▋ | 3715/10000 [14:33:34<24:18:45, 13.93s/it] {'loss': 0.0695, 'learning_rate': 3.1445e-05, 'epoch': 4.86} 37%|███▋ | 3715/10000 [14:33:34<24:18:45, 13.93s/it] 37%|███▋ | 3716/10000 [14:33:48<24:18:05, 13.92s/it] {'loss': 0.0772, 'learning_rate': 3.1440000000000004e-05, 'epoch': 4.86} 37%|███▋ | 3716/10000 [14:33:48<24:18:05, 13.92s/it] 37%|███▋ | 3717/10000 [14:34:02<24:21:09, 13.95s/it] {'loss': 0.0865, 'learning_rate': 3.1435000000000007e-05, 'epoch': 4.87} 37%|███▋ | 3717/10000 [14:34:02<24:21:09, 13.95s/it] 37%|███▋ | 3718/10000 [14:34:16<24:21:31, 13.96s/it] {'loss': 0.0812, 'learning_rate': 3.143e-05, 'epoch': 4.87} 37%|███▋ | 3718/10000 [14:34:16<24:21:31, 13.96s/it] 37%|███▋ | 3719/10000 [14:34:30<24:19:01, 13.94s/it] {'loss': 0.0738, 'learning_rate': 3.1425e-05, 'epoch': 4.87} 37%|███▋ | 3719/10000 [14:34:30<24:19:01, 13.94s/it] 37%|███▋ | 3720/10000 [14:34:44<24:21:26, 13.96s/it] {'loss': 0.0856, 'learning_rate': 3.142e-05, 'epoch': 4.87} 37%|███▋ | 3720/10000 [14:34:44<24:21:26, 13.96s/it] 37%|███▋ | 3721/10000 [14:34:58<24:22:29, 13.98s/it] {'loss': 0.0794, 'learning_rate': 3.1415e-05, 'epoch': 4.87} 37%|███▋ | 3721/10000 [14:34:58<24:22:29, 13.98s/it] 37%|███▋ | 3722/10000 [14:35:12<24:26:18, 14.01s/it] {'loss': 0.0756, 'learning_rate': 3.141e-05, 'epoch': 4.87} 37%|███▋ | 3722/10000 [14:35:12<24:26:18, 14.01s/it] 37%|███▋ | 3723/10000 [14:35:26<24:21:37, 13.97s/it] {'loss': 0.0729, 'learning_rate': 3.1405e-05, 'epoch': 4.87} 37%|███▋ | 3723/10000 [14:35:26<24:21:37, 13.97s/it] 37%|███▋ | 3724/10000 [14:35:40<24:24:03, 14.00s/it] {'loss': 0.0946, 'learning_rate': 3.1400000000000004e-05, 'epoch': 4.87} 37%|███▋ | 3724/10000 [14:35:40<24:24:03, 14.00s/it] 37%|███▋ | 3725/10000 [14:35:54<24:20:50, 13.97s/it] {'loss': 0.0778, 'learning_rate': 3.1395e-05, 'epoch': 4.88} 37%|███▋ | 3725/10000 [14:35:54<24:20:50, 13.97s/it] 37%|███▋ | 3726/10000 [14:36:08<24:15:08, 13.92s/it] {'loss': 0.0861, 'learning_rate': 3.139e-05, 'epoch': 4.88} 37%|███▋ | 3726/10000 [14:36:08<24:15:08, 13.92s/it] 37%|███▋ | 3727/10000 [14:36:22<24:17:10, 13.94s/it] {'loss': 0.0719, 'learning_rate': 3.1385000000000005e-05, 'epoch': 4.88} 37%|███▋ | 3727/10000 [14:36:22<24:17:10, 13.94s/it] 37%|███▋ | 3728/10000 [14:36:36<24:18:01, 13.95s/it] {'loss': 0.0701, 'learning_rate': 3.138e-05, 'epoch': 4.88} 37%|███▋ | 3728/10000 [14:36:36<24:18:01, 13.95s/it] 37%|███▋ | 3729/10000 [14:36:50<24:16:02, 13.93s/it] {'loss': 0.0701, 'learning_rate': 3.1375e-05, 'epoch': 4.88} 37%|███▋ | 3729/10000 [14:36:50<24:16:02, 13.93s/it] 37%|███▋ | 3730/10000 [14:37:04<24:15:49, 13.93s/it] {'loss': 0.0628, 'learning_rate': 3.137e-05, 'epoch': 4.88} 37%|███▋ | 3730/10000 [14:37:04<24:15:49, 13.93s/it] 37%|███▋ | 3731/10000 [14:37:18<24:15:00, 13.93s/it] {'loss': 0.093, 'learning_rate': 3.1365e-05, 'epoch': 4.88} 37%|███▋ | 3731/10000 [14:37:18<24:15:00, 13.93s/it] 37%|███▋ | 3732/10000 [14:37:31<24:10:26, 13.88s/it] {'loss': 0.0629, 'learning_rate': 3.136e-05, 'epoch': 4.88} 37%|███▋ | 3732/10000 [14:37:31<24:10:26, 13.88s/it] 37%|███▋ | 3733/10000 [14:37:46<24:15:35, 13.94s/it] {'loss': 0.0759, 'learning_rate': 3.1355e-05, 'epoch': 4.89} 37%|███▋ | 3733/10000 [14:37:46<24:15:35, 13.94s/it] 37%|███▋ | 3734/10000 [14:37:59<24:13:10, 13.91s/it] {'loss': 0.081, 'learning_rate': 3.135e-05, 'epoch': 4.89} 37%|███▋ | 3734/10000 [14:37:59<24:13:10, 13.91s/it] 37%|███▋ | 3735/10000 [14:38:13<24:11:52, 13.90s/it] {'loss': 0.0767, 'learning_rate': 3.1345e-05, 'epoch': 4.89} 37%|███▋ | 3735/10000 [14:38:13<24:11:52, 13.90s/it] 37%|███▋ | 3736/10000 [14:38:27<24:09:07, 13.88s/it] {'loss': 0.07, 'learning_rate': 3.134e-05, 'epoch': 4.89} 37%|███▋ | 3736/10000 [14:38:27<24:09:07, 13.88s/it] 37%|███▋ | 3737/10000 [14:38:41<24:07:39, 13.87s/it] {'loss': 0.0799, 'learning_rate': 3.1335000000000004e-05, 'epoch': 4.89} 37%|███▋ | 3737/10000 [14:38:41<24:07:39, 13.87s/it] 37%|███▋ | 3738/10000 [14:38:55<24:08:39, 13.88s/it] {'loss': 0.0683, 'learning_rate': 3.133000000000001e-05, 'epoch': 4.89} 37%|███▋ | 3738/10000 [14:38:55<24:08:39, 13.88s/it] 37%|███▋ | 3739/10000 [14:39:09<24:10:39, 13.90s/it] {'loss': 0.0806, 'learning_rate': 3.1324999999999996e-05, 'epoch': 4.89} 37%|███▋ | 3739/10000 [14:39:09<24:10:39, 13.90s/it] 37%|███▋ | 3740/10000 [14:39:23<24:17:31, 13.97s/it] {'loss': 0.0763, 'learning_rate': 3.132e-05, 'epoch': 4.9} 37%|███▋ | 3740/10000 [14:39:23<24:17:31, 13.97s/it] 37%|███▋ | 3741/10000 [14:39:37<24:20:13, 14.00s/it] {'loss': 0.0716, 'learning_rate': 3.1315e-05, 'epoch': 4.9} 37%|███▋ | 3741/10000 [14:39:37<24:20:13, 14.00s/it] 37%|███▋ | 3742/10000 [14:39:51<24:15:17, 13.95s/it] {'loss': 0.0835, 'learning_rate': 3.1310000000000003e-05, 'epoch': 4.9} 37%|███▋ | 3742/10000 [14:39:51<24:15:17, 13.95s/it] 37%|███▋ | 3743/10000 [14:40:05<24:17:17, 13.97s/it] {'loss': 0.0842, 'learning_rate': 3.1305e-05, 'epoch': 4.9} 37%|███▋ | 3743/10000 [14:40:05<24:17:17, 13.97s/it] 37%|███▋ | 3744/10000 [14:40:19<24:15:23, 13.96s/it] {'loss': 0.0857, 'learning_rate': 3.13e-05, 'epoch': 4.9} 37%|███▋ | 3744/10000 [14:40:19<24:15:23, 13.96s/it] 37%|███▋ | 3745/10000 [14:40:33<24:11:33, 13.92s/it] {'loss': 0.08, 'learning_rate': 3.1295000000000004e-05, 'epoch': 4.9} 37%|███▋ | 3745/10000 [14:40:33<24:11:33, 13.92s/it] 37%|███▋ | 3746/10000 [14:40:47<24:13:11, 13.94s/it] {'loss': 0.0841, 'learning_rate': 3.129e-05, 'epoch': 4.9} 37%|███▋ | 3746/10000 [14:40:47<24:13:11, 13.94s/it] 37%|███▋ | 3747/10000 [14:41:00<24:09:39, 13.91s/it] {'loss': 0.0712, 'learning_rate': 3.1285e-05, 'epoch': 4.9} 37%|███▋ | 3747/10000 [14:41:00<24:09:39, 13.91s/it] 37%|███▋ | 3748/10000 [14:41:14<24:13:29, 13.95s/it] {'loss': 0.0917, 'learning_rate': 3.1280000000000005e-05, 'epoch': 4.91} 37%|███▋ | 3748/10000 [14:41:15<24:13:29, 13.95s/it] 37%|███▋ | 3749/10000 [14:41:28<24:09:44, 13.92s/it] {'loss': 0.0746, 'learning_rate': 3.1275e-05, 'epoch': 4.91} 37%|███▋ | 3749/10000 [14:41:28<24:09:44, 13.92s/it] 38%|███▊ | 3750/10000 [14:41:42<24:09:30, 13.92s/it] {'loss': 0.0773, 'learning_rate': 3.127e-05, 'epoch': 4.91} 38%|███▊ | 3750/10000 [14:41:42<24:09:30, 13.92s/it] 38%|███▊ | 3751/10000 [14:41:56<24:11:18, 13.93s/it] {'loss': 0.0742, 'learning_rate': 3.1265e-05, 'epoch': 4.91} 38%|███▊ | 3751/10000 [14:41:56<24:11:18, 13.93s/it] 38%|███▊ | 3752/10000 [14:42:10<24:14:36, 13.97s/it] {'loss': 0.0687, 'learning_rate': 3.126e-05, 'epoch': 4.91} 38%|███▊ | 3752/10000 [14:42:10<24:14:36, 13.97s/it] 38%|███▊ | 3753/10000 [14:42:24<24:12:51, 13.95s/it] {'loss': 0.072, 'learning_rate': 3.1255e-05, 'epoch': 4.91} 38%|███▊ | 3753/10000 [14:42:24<24:12:51, 13.95s/it] 38%|███▊ | 3754/10000 [14:42:38<24:12:19, 13.95s/it] {'loss': 0.0765, 'learning_rate': 3.125e-05, 'epoch': 4.91} 38%|███▊ | 3754/10000 [14:42:38<24:12:19, 13.95s/it] 38%|███▊ | 3755/10000 [14:42:52<24:14:43, 13.98s/it] {'loss': 0.0879, 'learning_rate': 3.1245e-05, 'epoch': 4.91} 38%|███▊ | 3755/10000 [14:42:52<24:14:43, 13.98s/it] 38%|███▊ | 3756/10000 [14:43:06<24:11:21, 13.95s/it] {'loss': 0.0622, 'learning_rate': 3.1240000000000006e-05, 'epoch': 4.92} 38%|███▊ | 3756/10000 [14:43:06<24:11:21, 13.95s/it] 38%|███▊ | 3757/10000 [14:43:20<24:12:49, 13.96s/it] {'loss': 0.0659, 'learning_rate': 3.1235e-05, 'epoch': 4.92} 38%|███▊ | 3757/10000 [14:43:20<24:12:49, 13.96s/it] 38%|███▊ | 3758/10000 [14:43:34<24:12:35, 13.96s/it] {'loss': 0.0885, 'learning_rate': 3.1230000000000004e-05, 'epoch': 4.92} 38%|███▊ | 3758/10000 [14:43:34<24:12:35, 13.96s/it] 38%|███▊ | 3759/10000 [14:43:48<24:11:01, 13.95s/it] {'loss': 0.0872, 'learning_rate': 3.122500000000001e-05, 'epoch': 4.92} 38%|███▊ | 3759/10000 [14:43:48<24:11:01, 13.95s/it] 38%|███▊ | 3760/10000 [14:44:02<24:15:06, 13.99s/it] {'loss': 0.0782, 'learning_rate': 3.122e-05, 'epoch': 4.92} 38%|███▊ | 3760/10000 [14:44:02<24:15:06, 13.99s/it] 38%|███▊ | 3761/10000 [14:44:16<24:16:30, 14.01s/it] {'loss': 0.0643, 'learning_rate': 3.1215e-05, 'epoch': 4.92} 38%|███▊ | 3761/10000 [14:44:16<24:16:30, 14.01s/it] 38%|███▊ | 3762/10000 [14:44:30<24:13:00, 13.98s/it] {'loss': 0.0635, 'learning_rate': 3.121e-05, 'epoch': 4.92} 38%|███▊ | 3762/10000 [14:44:30<24:13:00, 13.98s/it] 38%|███▊ | 3763/10000 [14:44:44<24:07:59, 13.93s/it] {'loss': 0.0785, 'learning_rate': 3.1205000000000004e-05, 'epoch': 4.93} 38%|███▊ | 3763/10000 [14:44:44<24:07:59, 13.93s/it] 38%|███▊ | 3764/10000 [14:44:58<24:07:43, 13.93s/it] {'loss': 0.0635, 'learning_rate': 3.12e-05, 'epoch': 4.93} 38%|███▊ | 3764/10000 [14:44:58<24:07:43, 13.93s/it] 38%|███▊ | 3765/10000 [14:45:12<24:08:19, 13.94s/it] {'loss': 0.0863, 'learning_rate': 3.1195e-05, 'epoch': 4.93} 38%|███▊ | 3765/10000 [14:45:12<24:08:19, 13.94s/it] 38%|███▊ | 3766/10000 [14:45:26<24:07:59, 13.94s/it] {'loss': 0.0879, 'learning_rate': 3.1190000000000005e-05, 'epoch': 4.93} 38%|███▊ | 3766/10000 [14:45:26<24:07:59, 13.94s/it] 38%|███▊ | 3767/10000 [14:45:39<24:05:53, 13.92s/it] {'loss': 0.0771, 'learning_rate': 3.1185e-05, 'epoch': 4.93} 38%|███▊ | 3767/10000 [14:45:39<24:05:53, 13.92s/it] 38%|███▊ | 3768/10000 [14:45:53<24:06:15, 13.92s/it] {'loss': 0.078, 'learning_rate': 3.118e-05, 'epoch': 4.93} 38%|███▊ | 3768/10000 [14:45:53<24:06:15, 13.92s/it] 38%|███▊ | 3769/10000 [14:46:07<24:09:57, 13.96s/it] {'loss': 0.0756, 'learning_rate': 3.1175000000000006e-05, 'epoch': 4.93} 38%|███▊ | 3769/10000 [14:46:08<24:09:57, 13.96s/it] 38%|███▊ | 3770/10000 [14:46:21<24:09:17, 13.96s/it] {'loss': 0.0703, 'learning_rate': 3.117e-05, 'epoch': 4.93} 38%|███▊ | 3770/10000 [14:46:21<24:09:17, 13.96s/it] 38%|███▊ | 3771/10000 [14:46:35<24:09:37, 13.96s/it] {'loss': 0.0786, 'learning_rate': 3.1165e-05, 'epoch': 4.94} 38%|███▊ | 3771/10000 [14:46:35<24:09:37, 13.96s/it] 38%|███▊ | 3772/10000 [14:46:49<24:06:47, 13.94s/it] {'loss': 0.0778, 'learning_rate': 3.116e-05, 'epoch': 4.94} 38%|███▊ | 3772/10000 [14:46:49<24:06:47, 13.94s/it] 38%|███▊ | 3773/10000 [14:47:03<24:05:59, 13.93s/it] {'loss': 0.0759, 'learning_rate': 3.1155e-05, 'epoch': 4.94} 38%|███▊ | 3773/10000 [14:47:03<24:05:59, 13.93s/it] 38%|███▊ | 3774/10000 [14:47:17<24:10:06, 13.97s/it] {'loss': 0.0656, 'learning_rate': 3.115e-05, 'epoch': 4.94} 38%|███▊ | 3774/10000 [14:47:17<24:10:06, 13.97s/it] 38%|███▊ | 3775/10000 [14:47:31<24:07:47, 13.95s/it] {'loss': 0.0871, 'learning_rate': 3.1145e-05, 'epoch': 4.94} 38%|███▊ | 3775/10000 [14:47:31<24:07:47, 13.95s/it] 38%|███▊ | 3776/10000 [14:47:45<24:02:29, 13.91s/it] {'loss': 0.0653, 'learning_rate': 3.1140000000000003e-05, 'epoch': 4.94} 38%|███▊ | 3776/10000 [14:47:45<24:02:29, 13.91s/it] 38%|███▊ | 3777/10000 [14:47:59<24:01:27, 13.90s/it] {'loss': 0.0818, 'learning_rate': 3.1135000000000006e-05, 'epoch': 4.94} 38%|███▊ | 3777/10000 [14:47:59<24:01:27, 13.90s/it] 38%|███▊ | 3778/10000 [14:48:13<24:03:01, 13.92s/it] {'loss': 0.0711, 'learning_rate': 3.113e-05, 'epoch': 4.95} 38%|███▊ | 3778/10000 [14:48:13<24:03:01, 13.92s/it] 38%|███▊ | 3779/10000 [14:48:27<23:59:27, 13.88s/it] {'loss': 0.0705, 'learning_rate': 3.1125000000000004e-05, 'epoch': 4.95} 38%|███▊ | 3779/10000 [14:48:27<23:59:27, 13.88s/it] 38%|███▊ | 3780/10000 [14:48:41<24:03:02, 13.92s/it] {'loss': 0.0734, 'learning_rate': 3.112e-05, 'epoch': 4.95} 38%|███▊ | 3780/10000 [14:48:41<24:03:02, 13.92s/it] 38%|███▊ | 3781/10000 [14:48:54<24:01:38, 13.91s/it] {'loss': 0.0811, 'learning_rate': 3.1115e-05, 'epoch': 4.95} 38%|███▊ | 3781/10000 [14:48:55<24:01:38, 13.91s/it] 38%|███▊ | 3782/10000 [14:49:09<24:11:07, 14.00s/it] {'loss': 0.0735, 'learning_rate': 3.111e-05, 'epoch': 4.95} 38%|███▊ | 3782/10000 [14:49:09<24:11:07, 14.00s/it] 38%|███▊ | 3783/10000 [14:49:23<24:08:38, 13.98s/it] {'loss': 0.0704, 'learning_rate': 3.1105e-05, 'epoch': 4.95} 38%|███▊ | 3783/10000 [14:49:23<24:08:38, 13.98s/it] 38%|███▊ | 3784/10000 [14:49:36<24:05:01, 13.95s/it] {'loss': 0.0664, 'learning_rate': 3.1100000000000004e-05, 'epoch': 4.95} 38%|███▊ | 3784/10000 [14:49:37<24:05:01, 13.95s/it] 38%|███▊ | 3785/10000 [14:49:50<24:03:33, 13.94s/it] {'loss': 0.0775, 'learning_rate': 3.1095e-05, 'epoch': 4.95} 38%|███▊ | 3785/10000 [14:49:50<24:03:33, 13.94s/it] 38%|███▊ | 3786/10000 [14:50:04<24:04:05, 13.94s/it] {'loss': 0.0657, 'learning_rate': 3.109e-05, 'epoch': 4.96} 38%|███▊ | 3786/10000 [14:50:04<24:04:05, 13.94s/it] 38%|███▊ | 3787/10000 [14:50:18<24:03:30, 13.94s/it] {'loss': 0.0791, 'learning_rate': 3.1085000000000005e-05, 'epoch': 4.96} 38%|███▊ | 3787/10000 [14:50:18<24:03:30, 13.94s/it] 38%|███▊ | 3788/10000 [14:50:32<24:01:52, 13.93s/it] {'loss': 0.0774, 'learning_rate': 3.108e-05, 'epoch': 4.96} 38%|███▊ | 3788/10000 [14:50:32<24:01:52, 13.93s/it] 38%|███▊ | 3789/10000 [14:50:46<23:57:19, 13.88s/it] {'loss': 0.0688, 'learning_rate': 3.1075e-05, 'epoch': 4.96} 38%|███▊ | 3789/10000 [14:50:46<23:57:19, 13.88s/it] 38%|███▊ | 3790/10000 [14:51:00<23:56:04, 13.88s/it] {'loss': 0.0842, 'learning_rate': 3.107e-05, 'epoch': 4.96} 38%|███▊ | 3790/10000 [14:51:00<23:56:04, 13.88s/it] 38%|███▊ | 3791/10000 [14:51:14<23:57:10, 13.89s/it] {'loss': 0.0836, 'learning_rate': 3.1065e-05, 'epoch': 4.96} 38%|███▊ | 3791/10000 [14:51:14<23:57:10, 13.89s/it] 38%|███▊ | 3792/10000 [14:51:28<23:55:45, 13.88s/it] {'loss': 0.0846, 'learning_rate': 3.106e-05, 'epoch': 4.96} 38%|███▊ | 3792/10000 [14:51:28<23:55:45, 13.88s/it] 38%|███▊ | 3793/10000 [14:51:42<24:00:10, 13.92s/it] {'loss': 0.0676, 'learning_rate': 3.1055e-05, 'epoch': 4.96} 38%|███▊ | 3793/10000 [14:51:42<24:00:10, 13.92s/it] 38%|███▊ | 3794/10000 [14:51:55<23:57:00, 13.89s/it] {'loss': 0.0741, 'learning_rate': 3.105e-05, 'epoch': 4.97} 38%|███▊ | 3794/10000 [14:51:56<23:57:00, 13.89s/it] 38%|███▊ | 3795/10000 [14:52:09<23:57:19, 13.90s/it] {'loss': 0.0843, 'learning_rate': 3.1045000000000005e-05, 'epoch': 4.97} 38%|███▊ | 3795/10000 [14:52:09<23:57:19, 13.90s/it] 38%|███▊ | 3796/10000 [14:52:23<23:53:40, 13.87s/it] {'loss': 0.0622, 'learning_rate': 3.104e-05, 'epoch': 4.97} 38%|███▊ | 3796/10000 [14:52:23<23:53:40, 13.87s/it] 38%|███▊ | 3797/10000 [14:52:37<23:51:02, 13.84s/it] {'loss': 0.0583, 'learning_rate': 3.1035000000000004e-05, 'epoch': 4.97} 38%|███▊ | 3797/10000 [14:52:37<23:51:02, 13.84s/it] 38%|███▊ | 3798/10000 [14:52:51<23:55:58, 13.89s/it] {'loss': 0.079, 'learning_rate': 3.1030000000000006e-05, 'epoch': 4.97} 38%|███▊ | 3798/10000 [14:52:51<23:55:58, 13.89s/it] 38%|███▊ | 3799/10000 [14:53:05<23:53:46, 13.87s/it] {'loss': 0.0685, 'learning_rate': 3.1025e-05, 'epoch': 4.97} 38%|███▊ | 3799/10000 [14:53:05<23:53:46, 13.87s/it] 38%|███▊ | 3800/10000 [14:53:19<23:57:34, 13.91s/it] {'loss': 0.0772, 'learning_rate': 3.102e-05, 'epoch': 4.97} 38%|███▊ | 3800/10000 [14:53:19<23:57:34, 13.91s/it] 38%|███▊ | 3801/10000 [14:53:33<23:56:28, 13.90s/it] {'loss': 0.074, 'learning_rate': 3.1015e-05, 'epoch': 4.98} 38%|███▊ | 3801/10000 [14:53:33<23:56:28, 13.90s/it] 38%|███▊ | 3802/10000 [14:53:47<23:56:44, 13.91s/it] {'loss': 0.0661, 'learning_rate': 3.101e-05, 'epoch': 4.98} 38%|███▊ | 3802/10000 [14:53:47<23:56:44, 13.91s/it] 38%|███▊ | 3803/10000 [14:54:00<23:55:46, 13.90s/it] {'loss': 0.0817, 'learning_rate': 3.1005e-05, 'epoch': 4.98} 38%|███▊ | 3803/10000 [14:54:01<23:55:46, 13.90s/it] 38%|███▊ | 3804/10000 [14:54:14<23:56:07, 13.91s/it] {'loss': 0.0665, 'learning_rate': 3.1e-05, 'epoch': 4.98} 38%|███▊ | 3804/10000 [14:54:14<23:56:07, 13.91s/it] 38%|███▊ | 3805/10000 [14:54:28<23:51:56, 13.87s/it] {'loss': 0.0614, 'learning_rate': 3.0995000000000004e-05, 'epoch': 4.98} 38%|███▊ | 3805/10000 [14:54:28<23:51:56, 13.87s/it] 38%|███▊ | 3806/10000 [14:54:42<23:54:17, 13.89s/it] {'loss': 0.0659, 'learning_rate': 3.099e-05, 'epoch': 4.98} 38%|███▊ | 3806/10000 [14:54:42<23:54:17, 13.89s/it] 38%|███▊ | 3807/10000 [14:54:56<23:51:56, 13.87s/it] {'loss': 0.0672, 'learning_rate': 3.0985e-05, 'epoch': 4.98} 38%|███▊ | 3807/10000 [14:54:56<23:51:56, 13.87s/it] 38%|███▊ | 3808/10000 [14:55:10<23:55:13, 13.91s/it] {'loss': 0.0933, 'learning_rate': 3.0980000000000005e-05, 'epoch': 4.98} 38%|███▊ | 3808/10000 [14:55:10<23:55:13, 13.91s/it] 38%|███▊ | 3809/10000 [14:55:24<23:56:27, 13.92s/it] {'loss': 0.0803, 'learning_rate': 3.0975e-05, 'epoch': 4.99} 38%|███▊ | 3809/10000 [14:55:24<23:56:27, 13.92s/it] 38%|███▊ | 3810/10000 [14:55:38<23:53:45, 13.90s/it] {'loss': 0.0597, 'learning_rate': 3.0969999999999997e-05, 'epoch': 4.99} 38%|███▊ | 3810/10000 [14:55:38<23:53:45, 13.90s/it] 38%|███▊ | 3811/10000 [14:55:52<23:55:49, 13.92s/it] {'loss': 0.0753, 'learning_rate': 3.0965e-05, 'epoch': 4.99} 38%|███▊ | 3811/10000 [14:55:52<23:55:49, 13.92s/it] 38%|███▊ | 3812/10000 [14:56:06<23:52:30, 13.89s/it] {'loss': 0.0729, 'learning_rate': 3.096e-05, 'epoch': 4.99} 38%|███▊ | 3812/10000 [14:56:06<23:52:30, 13.89s/it] 38%|███▊ | 3813/10000 [14:56:19<23:51:58, 13.89s/it] {'loss': 0.0702, 'learning_rate': 3.0955e-05, 'epoch': 4.99} 38%|███▊ | 3813/10000 [14:56:19<23:51:58, 13.89s/it] 38%|███▊ | 3814/10000 [14:56:33<23:53:15, 13.90s/it] {'loss': 0.0802, 'learning_rate': 3.095e-05, 'epoch': 4.99} 38%|███▊ | 3814/10000 [14:56:33<23:53:15, 13.90s/it] 38%|███▊ | 3815/10000 [14:56:47<23:54:44, 13.92s/it] {'loss': 0.0854, 'learning_rate': 3.0945e-05, 'epoch': 4.99} 38%|███▊ | 3815/10000 [14:56:47<23:54:44, 13.92s/it] 38%|███▊ | 3816/10000 [14:57:01<23:50:12, 13.88s/it] {'loss': 0.0809, 'learning_rate': 3.0940000000000005e-05, 'epoch': 4.99} 38%|███▊ | 3816/10000 [14:57:01<23:50:12, 13.88s/it] 38%|███▊ | 3817/10000 [14:57:15<23:50:23, 13.88s/it] {'loss': 0.0737, 'learning_rate': 3.0935e-05, 'epoch': 5.0} 38%|███▊ | 3817/10000 [14:57:15<23:50:23, 13.88s/it] 38%|███▊ | 3818/10000 [14:57:29<23:51:03, 13.89s/it] {'loss': 0.077, 'learning_rate': 3.0930000000000004e-05, 'epoch': 5.0} 38%|███▊ | 3818/10000 [14:57:29<23:51:03, 13.89s/it] 38%|███▊ | 3819/10000 [14:57:43<23:57:42, 13.96s/it] {'loss': 0.0919, 'learning_rate': 3.0925000000000006e-05, 'epoch': 5.0} 38%|███▊ | 3819/10000 [14:57:43<23:57:42, 13.96s/it] 38%|███▊ | 3820/10000 [14:57:56<23:19:02, 13.58s/it] {'loss': 0.0639, 'learning_rate': 3.092e-05, 'epoch': 5.0} 38%|███▊ | 3820/10000 [14:57:56<23:19:02, 13.58s/it] 38%|███▊ | 3821/10000 [14:58:10<23:32:31, 13.72s/it] {'loss': 0.0326, 'learning_rate': 3.0915e-05, 'epoch': 5.0} 38%|███▊ | 3821/10000 [14:58:10<23:32:31, 13.72s/it] 38%|███▊ | 3822/10000 [14:58:24<23:39:18, 13.78s/it] {'loss': 0.0304, 'learning_rate': 3.091e-05, 'epoch': 5.0} 38%|███▊ | 3822/10000 [14:58:24<23:39:18, 13.78s/it] 38%|███▊ | 3823/10000 [14:58:38<23:49:38, 13.89s/it] {'loss': 0.0392, 'learning_rate': 3.0905e-05, 'epoch': 5.0} 38%|███▊ | 3823/10000 [14:58:38<23:49:38, 13.89s/it] 38%|███▊ | 3824/10000 [14:58:52<23:53:46, 13.93s/it] {'loss': 0.0344, 'learning_rate': 3.09e-05, 'epoch': 5.01} 38%|███▊ | 3824/10000 [14:58:52<23:53:46, 13.93s/it] 38%|███▊ | 3825/10000 [14:59:06<23:54:07, 13.93s/it] {'loss': 0.0332, 'learning_rate': 3.0895e-05, 'epoch': 5.01} 38%|███▊ | 3825/10000 [14:59:06<23:54:07, 13.93s/it] 38%|███▊ | 3826/10000 [14:59:20<23:50:58, 13.91s/it] {'loss': 0.0266, 'learning_rate': 3.0890000000000004e-05, 'epoch': 5.01} 38%|███▊ | 3826/10000 [14:59:20<23:50:58, 13.91s/it] 38%|███▊ | 3827/10000 [14:59:34<23:54:30, 13.94s/it] {'loss': 0.0378, 'learning_rate': 3.0885e-05, 'epoch': 5.01} 38%|███▊ | 3827/10000 [14:59:34<23:54:30, 13.94s/it] 38%|███▊ | 3828/10000 [14:59:48<23:57:07, 13.97s/it] {'loss': 0.0355, 'learning_rate': 3.088e-05, 'epoch': 5.01} 38%|███▊ | 3828/10000 [14:59:48<23:57:07, 13.97s/it] 38%|███▊ | 3829/10000 [15:00:02<23:58:39, 13.99s/it] {'loss': 0.0418, 'learning_rate': 3.0875000000000005e-05, 'epoch': 5.01} 38%|███▊ | 3829/10000 [15:00:02<23:58:39, 13.99s/it] 38%|███▊ | 3830/10000 [15:00:16<23:55:27, 13.96s/it] {'loss': 0.0298, 'learning_rate': 3.087e-05, 'epoch': 5.01} 38%|███▊ | 3830/10000 [15:00:16<23:55:27, 13.96s/it] 38%|███▊ | 3831/10000 [15:00:29<23:50:56, 13.92s/it] {'loss': 0.0303, 'learning_rate': 3.0865e-05, 'epoch': 5.01} 38%|███▊ | 3831/10000 [15:00:29<23:50:56, 13.92s/it] 38%|███▊ | 3832/10000 [15:00:43<23:48:12, 13.89s/it] {'loss': 0.0371, 'learning_rate': 3.086e-05, 'epoch': 5.02} 38%|███▊ | 3832/10000 [15:00:43<23:48:12, 13.89s/it] 38%|███▊ | 3833/10000 [15:00:57<23:46:20, 13.88s/it] {'loss': 0.0372, 'learning_rate': 3.0855e-05, 'epoch': 5.02} 38%|███▊ | 3833/10000 [15:00:57<23:46:20, 13.88s/it] 38%|███▊ | 3834/10000 [15:01:11<23:46:11, 13.88s/it] {'loss': 0.0316, 'learning_rate': 3.0850000000000004e-05, 'epoch': 5.02} 38%|███▊ | 3834/10000 [15:01:11<23:46:11, 13.88s/it] 38%|███▊ | 3835/10000 [15:01:25<23:43:33, 13.85s/it] {'loss': 0.0277, 'learning_rate': 3.0845e-05, 'epoch': 5.02} 38%|███▊ | 3835/10000 [15:01:25<23:43:33, 13.85s/it] 38%|███▊ | 3836/10000 [15:01:39<23:46:13, 13.88s/it] {'loss': 0.0336, 'learning_rate': 3.084e-05, 'epoch': 5.02} 38%|███▊ | 3836/10000 [15:01:39<23:46:13, 13.88s/it] 38%|███▊ | 3837/10000 [15:01:53<23:47:29, 13.90s/it] {'loss': 0.0462, 'learning_rate': 3.0835000000000005e-05, 'epoch': 5.02} 38%|███▊ | 3837/10000 [15:01:53<23:47:29, 13.90s/it] 38%|███▊ | 3838/10000 [15:02:07<23:51:16, 13.94s/it] {'loss': 0.0298, 'learning_rate': 3.083e-05, 'epoch': 5.02} 38%|███▊ | 3838/10000 [15:02:07<23:51:16, 13.94s/it] 38%|███▊ | 3839/10000 [15:02:21<23:49:03, 13.92s/it] {'loss': 0.0313, 'learning_rate': 3.0825000000000004e-05, 'epoch': 5.02} 38%|███▊ | 3839/10000 [15:02:21<23:49:03, 13.92s/it] 38%|███▊ | 3840/10000 [15:02:34<23:47:30, 13.90s/it] {'loss': 0.035, 'learning_rate': 3.082e-05, 'epoch': 5.03} 38%|███▊ | 3840/10000 [15:02:34<23:47:30, 13.90s/it] 38%|███▊ | 3841/10000 [15:02:48<23:48:55, 13.92s/it] {'loss': 0.0363, 'learning_rate': 3.0815e-05, 'epoch': 5.03} 38%|███▊ | 3841/10000 [15:02:48<23:48:55, 13.92s/it] 38%|███▊ | 3842/10000 [15:03:02<23:49:40, 13.93s/it] {'loss': 0.0387, 'learning_rate': 3.081e-05, 'epoch': 5.03} 38%|███▊ | 3842/10000 [15:03:02<23:49:40, 13.93s/it] 38%|███▊ | 3843/10000 [15:03:16<23:48:52, 13.92s/it] {'loss': 0.0361, 'learning_rate': 3.0805e-05, 'epoch': 5.03} 38%|███▊ | 3843/10000 [15:03:16<23:48:52, 13.92s/it] 38%|███▊ | 3844/10000 [15:03:30<23:46:18, 13.90s/it] {'loss': 0.032, 'learning_rate': 3.08e-05, 'epoch': 5.03} 38%|███▊ | 3844/10000 [15:03:30<23:46:18, 13.90s/it] 38%|███▊ | 3845/10000 [15:03:44<23:49:19, 13.93s/it] {'loss': 0.0333, 'learning_rate': 3.0795e-05, 'epoch': 5.03} 38%|███▊ | 3845/10000 [15:03:44<23:49:19, 13.93s/it] 38%|███▊ | 3846/10000 [15:03:58<23:46:14, 13.91s/it] {'loss': 0.0387, 'learning_rate': 3.079e-05, 'epoch': 5.03} 38%|███▊ | 3846/10000 [15:03:58<23:46:14, 13.91s/it] 38%|███▊ | 3847/10000 [15:04:12<23:49:35, 13.94s/it] {'loss': 0.0374, 'learning_rate': 3.0785000000000004e-05, 'epoch': 5.04} 38%|███▊ | 3847/10000 [15:04:12<23:49:35, 13.94s/it] 38%|███▊ | 3848/10000 [15:04:26<23:47:33, 13.92s/it] {'loss': 0.033, 'learning_rate': 3.078e-05, 'epoch': 5.04} 38%|███▊ | 3848/10000 [15:04:26<23:47:33, 13.92s/it] 38%|███▊ | 3849/10000 [15:04:40<23:47:36, 13.93s/it] {'loss': 0.0322, 'learning_rate': 3.0775e-05, 'epoch': 5.04} 38%|███▊ | 3849/10000 [15:04:40<23:47:36, 13.93s/it] 38%|███▊ | 3850/10000 [15:04:54<23:45:44, 13.91s/it] {'loss': 0.0294, 'learning_rate': 3.077e-05, 'epoch': 5.04} 38%|███▊ | 3850/10000 [15:04:54<23:45:44, 13.91s/it] 39%|███▊ | 3851/10000 [15:05:08<23:44:27, 13.90s/it] {'loss': 0.0299, 'learning_rate': 3.0765e-05, 'epoch': 5.04} 39%|███▊ | 3851/10000 [15:05:08<23:44:27, 13.90s/it] 39%|███▊ | 3852/10000 [15:05:22<23:47:26, 13.93s/it] {'loss': 0.0326, 'learning_rate': 3.076e-05, 'epoch': 5.04} 39%|███▊ | 3852/10000 [15:05:22<23:47:26, 13.93s/it] 39%|███▊ | 3853/10000 [15:05:36<23:49:12, 13.95s/it] {'loss': 0.0322, 'learning_rate': 3.0755e-05, 'epoch': 5.04} 39%|███▊ | 3853/10000 [15:05:36<23:49:12, 13.95s/it] 39%|███▊ | 3854/10000 [15:05:50<23:50:16, 13.96s/it] {'loss': 0.0324, 'learning_rate': 3.075e-05, 'epoch': 5.04} 39%|███▊ | 3854/10000 [15:05:50<23:50:16, 13.96s/it] 39%|███▊ | 3855/10000 [15:06:04<23:52:45, 13.99s/it] {'loss': 0.0365, 'learning_rate': 3.0745000000000005e-05, 'epoch': 5.05} 39%|███▊ | 3855/10000 [15:06:04<23:52:45, 13.99s/it] 39%|███▊ | 3856/10000 [15:06:17<23:46:42, 13.93s/it] {'loss': 0.041, 'learning_rate': 3.074e-05, 'epoch': 5.05} 39%|███▊ | 3856/10000 [15:06:17<23:46:42, 13.93s/it] 39%|███▊ | 3857/10000 [15:06:31<23:50:52, 13.98s/it] {'loss': 0.026, 'learning_rate': 3.0735e-05, 'epoch': 5.05} 39%|███▊ | 3857/10000 [15:06:31<23:50:52, 13.98s/it] 39%|███▊ | 3858/10000 [15:06:46<23:53:50, 14.01s/it] {'loss': 0.0261, 'learning_rate': 3.0730000000000006e-05, 'epoch': 5.05} 39%|███▊ | 3858/10000 [15:06:46<23:53:50, 14.01s/it] 39%|███▊ | 3859/10000 [15:06:59<23:51:56, 13.99s/it] {'loss': 0.038, 'learning_rate': 3.0725e-05, 'epoch': 5.05} 39%|███▊ | 3859/10000 [15:07:00<23:51:56, 13.99s/it] 39%|███▊ | 3860/10000 [15:07:13<23:45:36, 13.93s/it] {'loss': 0.0311, 'learning_rate': 3.072e-05, 'epoch': 5.05} 39%|███▊ | 3860/10000 [15:07:13<23:45:36, 13.93s/it] 39%|███▊ | 3861/10000 [15:07:27<23:46:52, 13.95s/it] {'loss': 0.0304, 'learning_rate': 3.0715e-05, 'epoch': 5.05} 39%|███▊ | 3861/10000 [15:07:27<23:46:52, 13.95s/it] 39%|███▊ | 3862/10000 [15:07:41<23:47:26, 13.95s/it] {'loss': 0.0332, 'learning_rate': 3.071e-05, 'epoch': 5.05} 39%|███▊ | 3862/10000 [15:07:41<23:47:26, 13.95s/it] 39%|███▊ | 3863/10000 [15:07:55<23:48:16, 13.96s/it] {'loss': 0.0333, 'learning_rate': 3.0705e-05, 'epoch': 5.06} 39%|███▊ | 3863/10000 [15:07:55<23:48:16, 13.96s/it] 39%|███▊ | 3864/10000 [15:08:09<23:49:01, 13.97s/it] {'loss': 0.0393, 'learning_rate': 3.07e-05, 'epoch': 5.06} 39%|███▊ | 3864/10000 [15:08:09<23:49:01, 13.97s/it] 39%|███▊ | 3865/10000 [15:08:23<23:49:21, 13.98s/it] {'loss': 0.0302, 'learning_rate': 3.0695000000000003e-05, 'epoch': 5.06} 39%|███▊ | 3865/10000 [15:08:23<23:49:21, 13.98s/it] 39%|███▊ | 3866/10000 [15:08:37<23:49:31, 13.98s/it] {'loss': 0.0341, 'learning_rate': 3.069e-05, 'epoch': 5.06} 39%|███▊ | 3866/10000 [15:08:37<23:49:31, 13.98s/it] 39%|███▊ | 3867/10000 [15:08:51<23:45:53, 13.95s/it] {'loss': 0.0245, 'learning_rate': 3.0685e-05, 'epoch': 5.06} 39%|███▊ | 3867/10000 [15:08:51<23:45:53, 13.95s/it] 39%|███▊ | 3868/10000 [15:09:05<23:43:45, 13.93s/it] {'loss': 0.0345, 'learning_rate': 3.0680000000000004e-05, 'epoch': 5.06} 39%|███▊ | 3868/10000 [15:09:05<23:43:45, 13.93s/it] 39%|███▊ | 3869/10000 [15:09:19<23:45:02, 13.95s/it] {'loss': 0.0373, 'learning_rate': 3.067500000000001e-05, 'epoch': 5.06} 39%|███▊ | 3869/10000 [15:09:19<23:45:02, 13.95s/it] 39%|███▊ | 3870/10000 [15:09:33<23:44:44, 13.95s/it] {'loss': 0.0303, 'learning_rate': 3.0669999999999996e-05, 'epoch': 5.07} 39%|███▊ | 3870/10000 [15:09:33<23:44:44, 13.95s/it] 39%|███▊ | 3871/10000 [15:09:47<23:41:31, 13.92s/it] {'loss': 0.0363, 'learning_rate': 3.0665e-05, 'epoch': 5.07} 39%|███▊ | 3871/10000 [15:09:47<23:41:31, 13.92s/it] 39%|███▊ | 3872/10000 [15:10:01<23:38:14, 13.89s/it] {'loss': 0.0324, 'learning_rate': 3.066e-05, 'epoch': 5.07} 39%|███▊ | 3872/10000 [15:10:01<23:38:14, 13.89s/it] 39%|███▊ | 3873/10000 [15:10:14<23:40:02, 13.91s/it] {'loss': 0.0305, 'learning_rate': 3.0655e-05, 'epoch': 5.07} 39%|███▊ | 3873/10000 [15:10:15<23:40:02, 13.91s/it] 39%|███▊ | 3874/10000 [15:10:28<23:38:36, 13.89s/it] {'loss': 0.0285, 'learning_rate': 3.065e-05, 'epoch': 5.07} 39%|███▊ | 3874/10000 [15:10:28<23:38:36, 13.89s/it] 39%|███▉ | 3875/10000 [15:10:42<23:43:23, 13.94s/it] {'loss': 0.0285, 'learning_rate': 3.0645e-05, 'epoch': 5.07} 39%|███▉ | 3875/10000 [15:10:42<23:43:23, 13.94s/it] 39%|███▉ | 3876/10000 [15:10:56<23:42:08, 13.93s/it] {'loss': 0.029, 'learning_rate': 3.0640000000000005e-05, 'epoch': 5.07} 39%|███▉ | 3876/10000 [15:10:56<23:42:08, 13.93s/it] 39%|███▉ | 3877/10000 [15:11:10<23:41:51, 13.93s/it] {'loss': 0.0382, 'learning_rate': 3.0635e-05, 'epoch': 5.07} 39%|███▉ | 3877/10000 [15:11:10<23:41:51, 13.93s/it] 39%|███▉ | 3878/10000 [15:11:24<23:41:13, 13.93s/it] {'loss': 0.0283, 'learning_rate': 3.063e-05, 'epoch': 5.08} 39%|███▉ | 3878/10000 [15:11:24<23:41:13, 13.93s/it] 39%|███▉ | 3879/10000 [15:11:38<23:38:53, 13.91s/it] {'loss': 0.0307, 'learning_rate': 3.0625000000000006e-05, 'epoch': 5.08} 39%|███▉ | 3879/10000 [15:11:38<23:38:53, 13.91s/it] 39%|███▉ | 3880/10000 [15:11:52<23:42:43, 13.95s/it] {'loss': 0.0372, 'learning_rate': 3.062e-05, 'epoch': 5.08} 39%|███▉ | 3880/10000 [15:11:52<23:42:43, 13.95s/it] 39%|███▉ | 3881/10000 [15:12:06<23:45:44, 13.98s/it] {'loss': 0.0301, 'learning_rate': 3.0615e-05, 'epoch': 5.08} 39%|███▉ | 3881/10000 [15:12:06<23:45:44, 13.98s/it] 39%|███▉ | 3882/10000 [15:12:20<23:43:32, 13.96s/it] {'loss': 0.0302, 'learning_rate': 3.061e-05, 'epoch': 5.08} 39%|███▉ | 3882/10000 [15:12:20<23:43:32, 13.96s/it] 39%|███▉ | 3883/10000 [15:12:34<23:44:38, 13.97s/it] {'loss': 0.0421, 'learning_rate': 3.0605e-05, 'epoch': 5.08} 39%|███▉ | 3883/10000 [15:12:34<23:44:38, 13.97s/it] 39%|███▉ | 3884/10000 [15:12:48<23:43:21, 13.96s/it] {'loss': 0.0294, 'learning_rate': 3.06e-05, 'epoch': 5.08} 39%|███▉ | 3884/10000 [15:12:48<23:43:21, 13.96s/it] 39%|███▉ | 3885/10000 [15:13:02<23:38:36, 13.92s/it] {'loss': 0.0335, 'learning_rate': 3.0595e-05, 'epoch': 5.09} 39%|███▉ | 3885/10000 [15:13:02<23:38:36, 13.92s/it] 39%|███▉ | 3886/10000 [15:13:16<23:36:48, 13.90s/it] {'loss': 0.0364, 'learning_rate': 3.0590000000000004e-05, 'epoch': 5.09} 39%|███▉ | 3886/10000 [15:13:16<23:36:48, 13.90s/it] 39%|███▉ | 3887/10000 [15:13:30<23:37:01, 13.91s/it] {'loss': 0.0309, 'learning_rate': 3.0585e-05, 'epoch': 5.09} 39%|███▉ | 3887/10000 [15:13:30<23:37:01, 13.91s/it] 39%|███▉ | 3888/10000 [15:13:44<23:38:39, 13.93s/it] {'loss': 0.0338, 'learning_rate': 3.058e-05, 'epoch': 5.09} 39%|███▉ | 3888/10000 [15:13:44<23:38:39, 13.93s/it] 39%|███▉ | 3889/10000 [15:13:57<23:36:01, 13.90s/it] {'loss': 0.028, 'learning_rate': 3.0575000000000005e-05, 'epoch': 5.09} 39%|███▉ | 3889/10000 [15:13:57<23:36:01, 13.90s/it] 39%|███▉ | 3890/10000 [15:14:11<23:33:44, 13.88s/it] {'loss': 0.034, 'learning_rate': 3.057000000000001e-05, 'epoch': 5.09} 39%|███▉ | 3890/10000 [15:14:11<23:33:44, 13.88s/it] 39%|███▉ | 3891/10000 [15:14:25<23:32:31, 13.87s/it] {'loss': 0.0346, 'learning_rate': 3.0564999999999996e-05, 'epoch': 5.09} 39%|███▉ | 3891/10000 [15:14:25<23:32:31, 13.87s/it] 39%|███▉ | 3892/10000 [15:14:39<23:33:53, 13.89s/it] {'loss': 0.0309, 'learning_rate': 3.056e-05, 'epoch': 5.09} 39%|███▉ | 3892/10000 [15:14:39<23:33:53, 13.89s/it] 39%|███▉ | 3893/10000 [15:14:53<23:35:28, 13.91s/it] {'loss': 0.0288, 'learning_rate': 3.0555e-05, 'epoch': 5.1} 39%|███▉ | 3893/10000 [15:14:53<23:35:28, 13.91s/it] 39%|███▉ | 3894/10000 [15:15:07<23:33:25, 13.89s/it] {'loss': 0.0366, 'learning_rate': 3.0550000000000004e-05, 'epoch': 5.1} 39%|███▉ | 3894/10000 [15:15:07<23:33:25, 13.89s/it] 39%|███▉ | 3895/10000 [15:15:21<23:35:30, 13.91s/it] {'loss': 0.0391, 'learning_rate': 3.0545e-05, 'epoch': 5.1} 39%|███▉ | 3895/10000 [15:15:21<23:35:30, 13.91s/it] 39%|███▉ | 3896/10000 [15:15:35<23:39:40, 13.95s/it] {'loss': 0.0387, 'learning_rate': 3.054e-05, 'epoch': 5.1} 39%|███▉ | 3896/10000 [15:15:35<23:39:40, 13.95s/it] 39%|███▉ | 3897/10000 [15:15:49<23:39:06, 13.95s/it] {'loss': 0.0366, 'learning_rate': 3.0535000000000005e-05, 'epoch': 5.1} 39%|███▉ | 3897/10000 [15:15:49<23:39:06, 13.95s/it] 39%|███▉ | 3898/10000 [15:16:03<23:36:49, 13.93s/it] {'loss': 0.0308, 'learning_rate': 3.053e-05, 'epoch': 5.1} 39%|███▉ | 3898/10000 [15:16:03<23:36:49, 13.93s/it] 39%|███▉ | 3899/10000 [15:16:17<23:36:36, 13.93s/it] {'loss': 0.0367, 'learning_rate': 3.0525e-05, 'epoch': 5.1} 39%|███▉ | 3899/10000 [15:16:17<23:36:36, 13.93s/it] 39%|███▉ | 3900/10000 [15:16:30<23:33:45, 13.91s/it] {'loss': 0.0335, 'learning_rate': 3.0520000000000006e-05, 'epoch': 5.1} 39%|███▉ | 3900/10000 [15:16:30<23:33:45, 13.91s/it] 39%|███▉ | 3901/10000 [15:16:44<23:37:40, 13.95s/it] {'loss': 0.0369, 'learning_rate': 3.0515e-05, 'epoch': 5.11} 39%|███▉ | 3901/10000 [15:16:45<23:37:40, 13.95s/it] 39%|███▉ | 3902/10000 [15:16:58<23:33:39, 13.91s/it] {'loss': 0.0297, 'learning_rate': 3.051e-05, 'epoch': 5.11} 39%|███▉ | 3902/10000 [15:16:58<23:33:39, 13.91s/it] 39%|███▉ | 3903/10000 [15:17:12<23:32:42, 13.90s/it] {'loss': 0.0285, 'learning_rate': 3.0505e-05, 'epoch': 5.11} 39%|███▉ | 3903/10000 [15:17:12<23:32:42, 13.90s/it] 39%|███▉ | 3904/10000 [15:17:26<23:32:24, 13.90s/it] {'loss': 0.0275, 'learning_rate': 3.05e-05, 'epoch': 5.11} 39%|███▉ | 3904/10000 [15:17:26<23:32:24, 13.90s/it] 39%|███▉ | 3905/10000 [15:17:40<23:36:22, 13.94s/it] {'loss': 0.0405, 'learning_rate': 3.0495000000000002e-05, 'epoch': 5.11} 39%|███▉ | 3905/10000 [15:17:40<23:36:22, 13.94s/it] 39%|███▉ | 3906/10000 [15:17:54<23:30:05, 13.88s/it] {'loss': 0.0299, 'learning_rate': 3.049e-05, 'epoch': 5.11} 39%|███▉ | 3906/10000 [15:17:54<23:30:05, 13.88s/it] 39%|███▉ | 3907/10000 [15:18:08<23:31:34, 13.90s/it] {'loss': 0.0282, 'learning_rate': 3.0485000000000004e-05, 'epoch': 5.11} 39%|███▉ | 3907/10000 [15:18:08<23:31:34, 13.90s/it] 39%|███▉ | 3908/10000 [15:18:22<23:33:09, 13.92s/it] {'loss': 0.035, 'learning_rate': 3.0480000000000003e-05, 'epoch': 5.12} 39%|███▉ | 3908/10000 [15:18:22<23:33:09, 13.92s/it] 39%|███▉ | 3909/10000 [15:18:36<23:35:30, 13.94s/it] {'loss': 0.0302, 'learning_rate': 3.0475000000000002e-05, 'epoch': 5.12} 39%|███▉ | 3909/10000 [15:18:36<23:35:30, 13.94s/it] 39%|███▉ | 3910/10000 [15:18:50<23:33:15, 13.92s/it] {'loss': 0.0322, 'learning_rate': 3.0470000000000005e-05, 'epoch': 5.12} 39%|███▉ | 3910/10000 [15:18:50<23:33:15, 13.92s/it] 39%|███▉ | 3911/10000 [15:19:04<23:30:35, 13.90s/it] {'loss': 0.0473, 'learning_rate': 3.0465e-05, 'epoch': 5.12} 39%|███▉ | 3911/10000 [15:19:04<23:30:35, 13.90s/it] 39%|███▉ | 3912/10000 [15:19:17<23:29:49, 13.89s/it] {'loss': 0.028, 'learning_rate': 3.046e-05, 'epoch': 5.12} 39%|███▉ | 3912/10000 [15:19:17<23:29:49, 13.89s/it] 39%|███▉ | 3913/10000 [15:19:31<23:34:19, 13.94s/it] {'loss': 0.0325, 'learning_rate': 3.0455e-05, 'epoch': 5.12} 39%|███▉ | 3913/10000 [15:19:31<23:34:19, 13.94s/it] 39%|███▉ | 3914/10000 [15:19:46<23:39:14, 13.99s/it] {'loss': 0.0305, 'learning_rate': 3.045e-05, 'epoch': 5.12} 39%|███▉ | 3914/10000 [15:19:46<23:39:14, 13.99s/it] 39%|███▉ | 3915/10000 [15:19:59<23:36:52, 13.97s/it] {'loss': 0.0313, 'learning_rate': 3.0445e-05, 'epoch': 5.12} 39%|███▉ | 3915/10000 [15:19:59<23:36:52, 13.97s/it] 39%|███▉ | 3916/10000 [15:20:13<23:33:38, 13.94s/it] {'loss': 0.0324, 'learning_rate': 3.0440000000000003e-05, 'epoch': 5.13} 39%|███▉ | 3916/10000 [15:20:13<23:33:38, 13.94s/it] 39%|███▉ | 3917/10000 [15:20:27<23:33:21, 13.94s/it] {'loss': 0.0383, 'learning_rate': 3.0435000000000003e-05, 'epoch': 5.13} 39%|███▉ | 3917/10000 [15:20:27<23:33:21, 13.94s/it] 39%|███▉ | 3918/10000 [15:20:41<23:35:54, 13.97s/it] {'loss': 0.0317, 'learning_rate': 3.0430000000000002e-05, 'epoch': 5.13} 39%|███▉ | 3918/10000 [15:20:41<23:35:54, 13.97s/it] 39%|███▉ | 3919/10000 [15:20:55<23:38:20, 13.99s/it] {'loss': 0.0341, 'learning_rate': 3.0425000000000004e-05, 'epoch': 5.13} 39%|███▉ | 3919/10000 [15:20:55<23:38:20, 13.99s/it] 39%|███▉ | 3920/10000 [15:21:09<23:37:38, 13.99s/it] {'loss': 0.0336, 'learning_rate': 3.0420000000000004e-05, 'epoch': 5.13} 39%|███▉ | 3920/10000 [15:21:09<23:37:38, 13.99s/it] 39%|███▉ | 3921/10000 [15:21:23<23:37:09, 13.99s/it] {'loss': 0.034, 'learning_rate': 3.0415e-05, 'epoch': 5.13} 39%|███▉ | 3921/10000 [15:21:23<23:37:09, 13.99s/it] 39%|███▉ | 3922/10000 [15:21:37<23:36:40, 13.98s/it] {'loss': 0.0247, 'learning_rate': 3.041e-05, 'epoch': 5.13} 39%|███▉ | 3922/10000 [15:21:37<23:36:40, 13.98s/it][2024-11-04 11:39:58,747] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1 39%|███▉ | 3923/10000 [15:21:50<23:12:04, 13.74s/it] {'loss': 0.1777, 'learning_rate': 3.041e-05, 'epoch': 5.13} 39%|███▉ | 3923/10000 [15:21:51<23:12:04, 13.74s/it] 39%|███▉ | 3924/10000 [15:22:05<23:19:53, 13.82s/it] {'loss': 0.0256, 'learning_rate': 3.0405e-05, 'epoch': 5.14} 39%|███▉ | 3924/10000 [15:22:05<23:19:53, 13.82s/it] 39%|███▉ | 3925/10000 [15:22:18<23:23:03, 13.86s/it] {'loss': 0.0359, 'learning_rate': 3.04e-05, 'epoch': 5.14} 39%|███▉ | 3925/10000 [15:22:18<23:23:03, 13.86s/it] 39%|███▉ | 3926/10000 [15:22:32<23:27:46, 13.91s/it] {'loss': 0.0384, 'learning_rate': 3.0395000000000003e-05, 'epoch': 5.14} 39%|███▉ | 3926/10000 [15:22:32<23:27:46, 13.91s/it] 39%|███▉ | 3927/10000 [15:22:46<23:24:26, 13.88s/it] {'loss': 0.0313, 'learning_rate': 3.0390000000000002e-05, 'epoch': 5.14} 39%|███▉ | 3927/10000 [15:22:46<23:24:26, 13.88s/it] 39%|███▉ | 3928/10000 [15:23:00<23:25:53, 13.89s/it] {'loss': 0.0339, 'learning_rate': 3.0385e-05, 'epoch': 5.14} 39%|███▉ | 3928/10000 [15:23:00<23:25:53, 13.89s/it] 39%|███▉ | 3929/10000 [15:23:14<23:29:04, 13.93s/it] {'loss': 0.0405, 'learning_rate': 3.0380000000000004e-05, 'epoch': 5.14} 39%|███▉ | 3929/10000 [15:23:14<23:29:04, 13.93s/it] 39%|███▉ | 3930/10000 [15:23:28<23:27:12, 13.91s/it] {'loss': 0.0303, 'learning_rate': 3.0375000000000003e-05, 'epoch': 5.14} 39%|███▉ | 3930/10000 [15:23:28<23:27:12, 13.91s/it] 39%|███▉ | 3931/10000 [15:23:42<23:28:46, 13.93s/it] {'loss': 0.0386, 'learning_rate': 3.0370000000000006e-05, 'epoch': 5.15} 39%|███▉ | 3931/10000 [15:23:42<23:28:46, 13.93s/it] 39%|███▉ | 3932/10000 [15:23:56<23:27:26, 13.92s/it] {'loss': 0.0363, 'learning_rate': 3.0364999999999998e-05, 'epoch': 5.15} 39%|███▉ | 3932/10000 [15:23:56<23:27:26, 13.92s/it] 39%|███▉ | 3933/10000 [15:24:10<23:31:32, 13.96s/it] {'loss': 0.0279, 'learning_rate': 3.036e-05, 'epoch': 5.15} 39%|███▉ | 3933/10000 [15:24:10<23:31:32, 13.96s/it] 39%|███▉ | 3934/10000 [15:24:24<23:31:02, 13.96s/it] {'loss': 0.0328, 'learning_rate': 3.0355e-05, 'epoch': 5.15} 39%|███▉ | 3934/10000 [15:24:24<23:31:02, 13.96s/it] 39%|███▉ | 3935/10000 [15:24:38<23:38:28, 14.03s/it] {'loss': 0.0374, 'learning_rate': 3.035e-05, 'epoch': 5.15} 39%|███▉ | 3935/10000 [15:24:38<23:38:28, 14.03s/it] 39%|███▉ | 3936/10000 [15:24:52<23:35:40, 14.01s/it] {'loss': 0.0372, 'learning_rate': 3.0345e-05, 'epoch': 5.15} 39%|███▉ | 3936/10000 [15:24:52<23:35:40, 14.01s/it] 39%|███▉ | 3937/10000 [15:25:06<23:34:07, 13.99s/it] {'loss': 0.0269, 'learning_rate': 3.034e-05, 'epoch': 5.15} 39%|███▉ | 3937/10000 [15:25:06<23:34:07, 13.99s/it] 39%|███▉ | 3938/10000 [15:25:20<23:33:34, 13.99s/it] {'loss': 0.0276, 'learning_rate': 3.0335000000000003e-05, 'epoch': 5.15} 39%|███▉ | 3938/10000 [15:25:20<23:33:34, 13.99s/it] 39%|███▉ | 3939/10000 [15:25:34<23:29:02, 13.95s/it] {'loss': 0.0312, 'learning_rate': 3.0330000000000003e-05, 'epoch': 5.16} 39%|███▉ | 3939/10000 [15:25:34<23:29:02, 13.95s/it] 39%|███▉ | 3940/10000 [15:25:48<23:27:32, 13.94s/it] {'loss': 0.035, 'learning_rate': 3.0325000000000002e-05, 'epoch': 5.16} 39%|███▉ | 3940/10000 [15:25:48<23:27:32, 13.94s/it] 39%|███▉ | 3941/10000 [15:26:02<23:29:44, 13.96s/it] {'loss': 0.0426, 'learning_rate': 3.0320000000000004e-05, 'epoch': 5.16} 39%|███▉ | 3941/10000 [15:26:02<23:29:44, 13.96s/it] 39%|███▉ | 3942/10000 [15:26:16<23:29:22, 13.96s/it] {'loss': 0.025, 'learning_rate': 3.0315e-05, 'epoch': 5.16} 39%|███▉ | 3942/10000 [15:26:16<23:29:22, 13.96s/it] 39%|███▉ | 3943/10000 [15:26:30<23:25:18, 13.92s/it] {'loss': 0.0238, 'learning_rate': 3.031e-05, 'epoch': 5.16} 39%|███▉ | 3943/10000 [15:26:30<23:25:18, 13.92s/it] 39%|███▉ | 3944/10000 [15:26:43<23:23:16, 13.90s/it] {'loss': 0.0355, 'learning_rate': 3.0305e-05, 'epoch': 5.16} 39%|███▉ | 3944/10000 [15:26:44<23:23:16, 13.90s/it] 39%|███▉ | 3945/10000 [15:26:57<23:22:25, 13.90s/it] {'loss': 0.0329, 'learning_rate': 3.03e-05, 'epoch': 5.16} 39%|███▉ | 3945/10000 [15:26:57<23:22:25, 13.90s/it] 39%|███▉ | 3946/10000 [15:27:11<23:23:13, 13.91s/it] {'loss': 0.0291, 'learning_rate': 3.0295e-05, 'epoch': 5.16} 39%|███▉ | 3946/10000 [15:27:11<23:23:13, 13.91s/it] 39%|███▉ | 3947/10000 [15:27:25<23:21:13, 13.89s/it] {'loss': 0.0328, 'learning_rate': 3.0290000000000003e-05, 'epoch': 5.17} 39%|███▉ | 3947/10000 [15:27:25<23:21:13, 13.89s/it] 39%|███▉ | 3948/10000 [15:27:39<23:21:19, 13.89s/it] {'loss': 0.0343, 'learning_rate': 3.0285000000000002e-05, 'epoch': 5.17} 39%|███▉ | 3948/10000 [15:27:39<23:21:19, 13.89s/it] 39%|███▉ | 3949/10000 [15:27:53<23:22:08, 13.90s/it] {'loss': 0.0285, 'learning_rate': 3.028e-05, 'epoch': 5.17} 39%|███▉ | 3949/10000 [15:27:53<23:22:08, 13.90s/it] 40%|███▉ | 3950/10000 [15:28:07<23:22:54, 13.91s/it] {'loss': 0.0331, 'learning_rate': 3.0275000000000004e-05, 'epoch': 5.17} 40%|███▉ | 3950/10000 [15:28:07<23:22:54, 13.91s/it] 40%|███▉ | 3951/10000 [15:28:21<23:21:40, 13.90s/it] {'loss': 0.0279, 'learning_rate': 3.0270000000000003e-05, 'epoch': 5.17} 40%|███▉ | 3951/10000 [15:28:21<23:21:40, 13.90s/it] 40%|███▉ | 3952/10000 [15:28:35<23:18:48, 13.88s/it] {'loss': 0.0309, 'learning_rate': 3.0265e-05, 'epoch': 5.17} 40%|███▉ | 3952/10000 [15:28:35<23:18:48, 13.88s/it] 40%|███▉ | 3953/10000 [15:28:49<23:23:50, 13.93s/it] {'loss': 0.0344, 'learning_rate': 3.0259999999999998e-05, 'epoch': 5.17} 40%|███▉ | 3953/10000 [15:28:49<23:23:50, 13.93s/it] 40%|███▉ | 3954/10000 [15:29:03<23:25:46, 13.95s/it] {'loss': 0.0361, 'learning_rate': 3.0255e-05, 'epoch': 5.18} 40%|███▉ | 3954/10000 [15:29:03<23:25:46, 13.95s/it] 40%|███▉ | 3955/10000 [15:29:17<23:27:01, 13.97s/it] {'loss': 0.0311, 'learning_rate': 3.025e-05, 'epoch': 5.18} 40%|███▉ | 3955/10000 [15:29:17<23:27:01, 13.97s/it] 40%|███▉ | 3956/10000 [15:29:31<23:26:00, 13.96s/it] {'loss': 0.0329, 'learning_rate': 3.0245000000000003e-05, 'epoch': 5.18} 40%|███▉ | 3956/10000 [15:29:31<23:26:00, 13.96s/it] 40%|███▉ | 3957/10000 [15:29:45<23:28:45, 13.99s/it] {'loss': 0.031, 'learning_rate': 3.0240000000000002e-05, 'epoch': 5.18} 40%|███▉ | 3957/10000 [15:29:45<23:28:45, 13.99s/it] 40%|███▉ | 3958/10000 [15:29:59<23:25:09, 13.95s/it] {'loss': 0.0273, 'learning_rate': 3.0235e-05, 'epoch': 5.18} 40%|███▉ | 3958/10000 [15:29:59<23:25:09, 13.95s/it] 40%|███▉ | 3959/10000 [15:30:12<23:25:35, 13.96s/it] {'loss': 0.0367, 'learning_rate': 3.0230000000000004e-05, 'epoch': 5.18} 40%|███▉ | 3959/10000 [15:30:13<23:25:35, 13.96s/it] 40%|███▉ | 3960/10000 [15:30:26<23:26:47, 13.97s/it] {'loss': 0.0333, 'learning_rate': 3.0225000000000003e-05, 'epoch': 5.18} 40%|███▉ | 3960/10000 [15:30:27<23:26:47, 13.97s/it] 40%|███▉ | 3961/10000 [15:30:40<23:25:59, 13.97s/it] {'loss': 0.0316, 'learning_rate': 3.0220000000000005e-05, 'epoch': 5.18} 40%|███▉ | 3961/10000 [15:30:41<23:25:59, 13.97s/it] 40%|███▉ | 3962/10000 [15:30:55<23:30:55, 14.02s/it] {'loss': 0.0313, 'learning_rate': 3.0214999999999998e-05, 'epoch': 5.19} 40%|███▉ | 3962/10000 [15:30:55<23:30:55, 14.02s/it] 40%|███▉ | 3963/10000 [15:31:09<23:30:55, 14.02s/it] {'loss': 0.0378, 'learning_rate': 3.021e-05, 'epoch': 5.19} 40%|███▉ | 3963/10000 [15:31:09<23:30:55, 14.02s/it] 40%|███▉ | 3964/10000 [15:31:23<23:30:44, 14.02s/it] {'loss': 0.0385, 'learning_rate': 3.0205e-05, 'epoch': 5.19} 40%|███▉ | 3964/10000 [15:31:23<23:30:44, 14.02s/it] 40%|███▉ | 3965/10000 [15:31:37<23:29:11, 14.01s/it] {'loss': 0.0333, 'learning_rate': 3.02e-05, 'epoch': 5.19} 40%|███▉ | 3965/10000 [15:31:37<23:29:11, 14.01s/it] 40%|███▉ | 3966/10000 [15:31:51<23:31:03, 14.03s/it] {'loss': 0.028, 'learning_rate': 3.0195e-05, 'epoch': 5.19} 40%|███▉ | 3966/10000 [15:31:51<23:31:03, 14.03s/it] 40%|███▉ | 3967/10000 [15:32:05<23:28:55, 14.01s/it] {'loss': 0.0388, 'learning_rate': 3.019e-05, 'epoch': 5.19} 40%|███▉ | 3967/10000 [15:32:05<23:28:55, 14.01s/it] 40%|███▉ | 3968/10000 [15:32:19<23:23:44, 13.96s/it] {'loss': 0.0428, 'learning_rate': 3.0185000000000003e-05, 'epoch': 5.19} 40%|███▉ | 3968/10000 [15:32:19<23:23:44, 13.96s/it] 40%|███▉ | 3969/10000 [15:32:32<23:21:00, 13.94s/it] {'loss': 0.0228, 'learning_rate': 3.0180000000000002e-05, 'epoch': 5.2} 40%|███▉ | 3969/10000 [15:32:32<23:21:00, 13.94s/it] 40%|███▉ | 3970/10000 [15:32:46<23:19:08, 13.92s/it] {'loss': 0.0365, 'learning_rate': 3.0175e-05, 'epoch': 5.2} 40%|███▉ | 3970/10000 [15:32:46<23:19:08, 13.92s/it] 40%|███▉ | 3971/10000 [15:33:00<23:16:02, 13.89s/it] {'loss': 0.0356, 'learning_rate': 3.0170000000000004e-05, 'epoch': 5.2} 40%|███▉ | 3971/10000 [15:33:00<23:16:02, 13.89s/it] 40%|███▉ | 3972/10000 [15:33:14<23:18:20, 13.92s/it] {'loss': 0.0359, 'learning_rate': 3.0165e-05, 'epoch': 5.2} 40%|███▉ | 3972/10000 [15:33:14<23:18:20, 13.92s/it] 40%|███▉ | 3973/10000 [15:33:28<23:19:43, 13.93s/it] {'loss': 0.0334, 'learning_rate': 3.016e-05, 'epoch': 5.2} 40%|███▉ | 3973/10000 [15:33:28<23:19:43, 13.93s/it] 40%|███▉ | 3974/10000 [15:33:42<23:22:54, 13.97s/it] {'loss': 0.0297, 'learning_rate': 3.0155e-05, 'epoch': 5.2} 40%|███▉ | 3974/10000 [15:33:42<23:22:54, 13.97s/it] 40%|███▉ | 3975/10000 [15:33:56<23:21:49, 13.96s/it] {'loss': 0.034, 'learning_rate': 3.015e-05, 'epoch': 5.2} 40%|███▉ | 3975/10000 [15:33:56<23:21:49, 13.96s/it] 40%|███▉ | 3976/10000 [15:34:10<23:21:51, 13.96s/it] {'loss': 0.0313, 'learning_rate': 3.0145e-05, 'epoch': 5.2} 40%|███▉ | 3976/10000 [15:34:10<23:21:51, 13.96s/it] 40%|███▉ | 3977/10000 [15:34:24<23:23:19, 13.98s/it] {'loss': 0.0211, 'learning_rate': 3.0140000000000003e-05, 'epoch': 5.21} 40%|███▉ | 3977/10000 [15:34:24<23:23:19, 13.98s/it] 40%|███▉ | 3978/10000 [15:34:38<23:18:44, 13.94s/it] {'loss': 0.0285, 'learning_rate': 3.0135000000000002e-05, 'epoch': 5.21} 40%|███▉ | 3978/10000 [15:34:38<23:18:44, 13.94s/it] 40%|███▉ | 3979/10000 [15:34:52<23:11:46, 13.87s/it] {'loss': 0.0323, 'learning_rate': 3.013e-05, 'epoch': 5.21} 40%|███▉ | 3979/10000 [15:34:52<23:11:46, 13.87s/it] 40%|███▉ | 3980/10000 [15:35:05<23:12:21, 13.88s/it] {'loss': 0.032, 'learning_rate': 3.0125000000000004e-05, 'epoch': 5.21} 40%|███▉ | 3980/10000 [15:35:06<23:12:21, 13.88s/it] 40%|███▉ | 3981/10000 [15:35:19<23:14:22, 13.90s/it] {'loss': 0.0337, 'learning_rate': 3.0120000000000003e-05, 'epoch': 5.21} 40%|███▉ | 3981/10000 [15:35:19<23:14:22, 13.90s/it] 40%|███▉ | 3982/10000 [15:35:33<23:17:03, 13.93s/it] {'loss': 0.0319, 'learning_rate': 3.0115e-05, 'epoch': 5.21} 40%|███▉ | 3982/10000 [15:35:33<23:17:03, 13.93s/it] 40%|███▉ | 3983/10000 [15:35:47<23:18:27, 13.95s/it] {'loss': 0.0305, 'learning_rate': 3.0109999999999998e-05, 'epoch': 5.21} 40%|███▉ | 3983/10000 [15:35:47<23:18:27, 13.95s/it] 40%|███▉ | 3984/10000 [15:36:01<23:19:01, 13.95s/it] {'loss': 0.0259, 'learning_rate': 3.0105e-05, 'epoch': 5.21} 40%|███▉ | 3984/10000 [15:36:01<23:19:01, 13.95s/it] 40%|███▉ | 3985/10000 [15:36:15<23:18:30, 13.95s/it] {'loss': 0.0389, 'learning_rate': 3.01e-05, 'epoch': 5.22} 40%|███▉ | 3985/10000 [15:36:15<23:18:30, 13.95s/it] 40%|███▉ | 3986/10000 [15:36:29<23:17:22, 13.94s/it] {'loss': 0.0304, 'learning_rate': 3.0095000000000002e-05, 'epoch': 5.22} 40%|███▉ | 3986/10000 [15:36:29<23:17:22, 13.94s/it] 40%|███▉ | 3987/10000 [15:36:43<23:15:08, 13.92s/it] {'loss': 0.0304, 'learning_rate': 3.009e-05, 'epoch': 5.22} 40%|███▉ | 3987/10000 [15:36:43<23:15:08, 13.92s/it] 40%|███▉ | 3988/10000 [15:36:57<23:14:20, 13.92s/it] {'loss': 0.0288, 'learning_rate': 3.0085e-05, 'epoch': 5.22} 40%|███▉ | 3988/10000 [15:36:57<23:14:20, 13.92s/it] 40%|███▉ | 3989/10000 [15:37:11<23:14:03, 13.92s/it] {'loss': 0.0287, 'learning_rate': 3.0080000000000003e-05, 'epoch': 5.22} 40%|███▉ | 3989/10000 [15:37:11<23:14:03, 13.92s/it] 40%|███▉ | 3990/10000 [15:37:25<23:14:51, 13.93s/it] {'loss': 0.0362, 'learning_rate': 3.0075000000000003e-05, 'epoch': 5.22} 40%|███▉ | 3990/10000 [15:37:25<23:14:51, 13.93s/it] 40%|███▉ | 3991/10000 [15:37:39<23:15:44, 13.94s/it] {'loss': 0.0304, 'learning_rate': 3.0070000000000005e-05, 'epoch': 5.22} 40%|███▉ | 3991/10000 [15:37:39<23:15:44, 13.94s/it] 40%|███▉ | 3992/10000 [15:37:53<23:16:23, 13.95s/it] {'loss': 0.0293, 'learning_rate': 3.0064999999999998e-05, 'epoch': 5.23} 40%|███▉ | 3992/10000 [15:37:53<23:16:23, 13.95s/it] 40%|███▉ | 3993/10000 [15:38:07<23:15:38, 13.94s/it] {'loss': 0.0271, 'learning_rate': 3.006e-05, 'epoch': 5.23} 40%|███▉ | 3993/10000 [15:38:07<23:15:38, 13.94s/it] 40%|███▉ | 3994/10000 [15:38:21<23:15:59, 13.95s/it] {'loss': 0.0277, 'learning_rate': 3.0055e-05, 'epoch': 5.23} 40%|███▉ | 3994/10000 [15:38:21<23:15:59, 13.95s/it] 40%|███▉ | 3995/10000 [15:38:35<23:16:32, 13.95s/it] {'loss': 0.0395, 'learning_rate': 3.0050000000000002e-05, 'epoch': 5.23} 40%|███▉ | 3995/10000 [15:38:35<23:16:32, 13.95s/it] 40%|███▉ | 3996/10000 [15:38:49<23:13:29, 13.93s/it] {'loss': 0.0311, 'learning_rate': 3.0045e-05, 'epoch': 5.23} 40%|███▉ | 3996/10000 [15:38:49<23:13:29, 13.93s/it] 40%|███▉ | 3997/10000 [15:39:02<23:13:48, 13.93s/it] {'loss': 0.0306, 'learning_rate': 3.004e-05, 'epoch': 5.23} 40%|███▉ | 3997/10000 [15:39:03<23:13:48, 13.93s/it] 40%|███▉ | 3998/10000 [15:39:16<23:13:32, 13.93s/it] {'loss': 0.0348, 'learning_rate': 3.0035000000000003e-05, 'epoch': 5.23} 40%|███▉ | 3998/10000 [15:39:16<23:13:32, 13.93s/it] 40%|███▉ | 3999/10000 [15:39:30<23:11:40, 13.91s/it] {'loss': 0.0312, 'learning_rate': 3.0030000000000002e-05, 'epoch': 5.23} 40%|███▉ | 3999/10000 [15:39:30<23:11:40, 13.91s/it] 40%|████ | 4000/10000 [15:39:44<23:08:12, 13.88s/it] {'loss': 0.0378, 'learning_rate': 3.0025000000000005e-05, 'epoch': 5.24} 40%|████ | 4000/10000 [15:39:44<23:08:12, 13.88s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 11:57:52,419 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-4000/config.json [INFO|configuration_utils.py:364] 2024-11-04 11:57:52,421 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-4000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 11:58:50,327 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-4000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 11:58:50,330 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-4000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 11:58:50,332 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-4000/special_tokens_map.json [2024-11-04 11:58:50,348] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is about to be saved! [2024-11-04 11:58:50,396] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt [2024-11-04 11:58:50,397] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt... [2024-11-04 12:00:05,572] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/mp_rank_00_model_states.pt. [2024-11-04 12:00:05,688] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 12:02:09,280] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 12:02:09,284] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 12:02:09,284] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step4000 is ready now! 40%|████ | 4001/10000 [15:44:15<151:26:02, 90.88s/it] {'loss': 0.0359, 'learning_rate': 3.0020000000000004e-05, 'epoch': 5.24} 40%|████ | 4001/10000 [15:44:15<151:26:02, 90.88s/it] 40%|████ | 4002/10000 [15:44:28<112:46:06, 67.68s/it] {'loss': 0.029, 'learning_rate': 3.0015e-05, 'epoch': 5.24} 40%|████ | 4002/10000 [15:44:28<112:46:06, 67.68s/it] 40%|████ | 4003/10000 [15:44:42<85:51:10, 51.54s/it] {'loss': 0.0344, 'learning_rate': 3.001e-05, 'epoch': 5.24} 40%|████ | 4003/10000 [15:44:42<85:51:10, 51.54s/it] 40%|████ | 4004/10000 [15:44:56<67:01:26, 40.24s/it] {'loss': 0.0363, 'learning_rate': 3.0004999999999998e-05, 'epoch': 5.24} 40%|████ | 4004/10000 [15:44:56<67:01:26, 40.24s/it] 40%|████ | 4005/10000 [15:45:10<53:46:46, 32.29s/it] {'loss': 0.0331, 'learning_rate': 3e-05, 'epoch': 5.24} 40%|████ | 4005/10000 [15:45:10<53:46:46, 32.29s/it] 40%|████ | 4006/10000 [15:45:24<44:34:44, 26.77s/it] {'loss': 0.0425, 'learning_rate': 2.9995e-05, 'epoch': 5.24} 40%|████ | 4006/10000 [15:45:24<44:34:44, 26.77s/it] 40%|████ | 4007/10000 [15:45:38<38:15:44, 22.98s/it] {'loss': 0.0308, 'learning_rate': 2.9990000000000003e-05, 'epoch': 5.24} 40%|████ | 4007/10000 [15:45:38<38:15:44, 22.98s/it] 40%|████ | 4008/10000 [15:45:52<33:45:00, 20.28s/it] {'loss': 0.031, 'learning_rate': 2.9985000000000002e-05, 'epoch': 5.25} 40%|████ | 4008/10000 [15:45:52<33:45:00, 20.28s/it] 40%|████ | 4009/10000 [15:46:06<30:36:16, 18.39s/it] {'loss': 0.034, 'learning_rate': 2.998e-05, 'epoch': 5.25} 40%|████ | 4009/10000 [15:46:06<30:36:16, 18.39s/it] 40%|████ | 4010/10000 [15:46:20<28:23:52, 17.07s/it] {'loss': 0.0233, 'learning_rate': 2.9975000000000004e-05, 'epoch': 5.25} 40%|████ | 4010/10000 [15:46:20<28:23:52, 17.07s/it] 40%|████ | 4011/10000 [15:46:34<26:54:06, 16.17s/it] {'loss': 0.0351, 'learning_rate': 2.9970000000000003e-05, 'epoch': 5.25} 40%|████ | 4011/10000 [15:46:34<26:54:06, 16.17s/it] 40%|████ | 4012/10000 [15:46:48<25:46:19, 15.49s/it] {'loss': 0.033, 'learning_rate': 2.9965000000000005e-05, 'epoch': 5.25} 40%|████ | 4012/10000 [15:46:48<25:46:19, 15.49s/it] 40%|████ | 4013/10000 [15:47:02<25:04:50, 15.08s/it] {'loss': 0.0357, 'learning_rate': 2.9959999999999998e-05, 'epoch': 5.25} 40%|████ | 4013/10000 [15:47:02<25:04:50, 15.08s/it] 40%|████ | 4014/10000 [15:47:16<24:31:11, 14.75s/it] {'loss': 0.033, 'learning_rate': 2.9955e-05, 'epoch': 5.25} 40%|████ | 4014/10000 [15:47:16<24:31:11, 14.75s/it] 40%|████ | 4015/10000 [15:47:30<24:12:32, 14.56s/it] {'loss': 0.0342, 'learning_rate': 2.995e-05, 'epoch': 5.26} 40%|████ | 4015/10000 [15:47:30<24:12:32, 14.56s/it] 40%|████ | 4016/10000 [15:47:44<23:51:28, 14.35s/it] {'loss': 0.0272, 'learning_rate': 2.9945000000000002e-05, 'epoch': 5.26} 40%|████ | 4016/10000 [15:47:44<23:51:28, 14.35s/it] 40%|████ | 4017/10000 [15:47:58<23:36:43, 14.21s/it] {'loss': 0.032, 'learning_rate': 2.994e-05, 'epoch': 5.26} 40%|████ | 4017/10000 [15:47:58<23:36:43, 14.21s/it] 40%|████ | 4018/10000 [15:48:12<23:29:57, 14.14s/it] {'loss': 0.0333, 'learning_rate': 2.9935e-05, 'epoch': 5.26} 40%|████ | 4018/10000 [15:48:12<23:29:57, 14.14s/it] 40%|████ | 4019/10000 [15:48:25<23:22:31, 14.07s/it] {'loss': 0.0347, 'learning_rate': 2.9930000000000003e-05, 'epoch': 5.26} 40%|████ | 4019/10000 [15:48:26<23:22:31, 14.07s/it] 40%|████ | 4020/10000 [15:48:39<23:19:22, 14.04s/it] {'loss': 0.0291, 'learning_rate': 2.9925000000000002e-05, 'epoch': 5.26} 40%|████ | 4020/10000 [15:48:40<23:19:22, 14.04s/it] 40%|████ | 4021/10000 [15:48:53<23:12:42, 13.98s/it] {'loss': 0.0356, 'learning_rate': 2.9920000000000005e-05, 'epoch': 5.26} 40%|████ | 4021/10000 [15:48:53<23:12:42, 13.98s/it] 40%|████ | 4022/10000 [15:49:07<23:07:21, 13.92s/it] {'loss': 0.0361, 'learning_rate': 2.9915000000000004e-05, 'epoch': 5.26} 40%|████ | 4022/10000 [15:49:07<23:07:21, 13.92s/it] 40%|████ | 4023/10000 [15:49:21<23:06:41, 13.92s/it] {'loss': 0.0321, 'learning_rate': 2.991e-05, 'epoch': 5.27} 40%|████ | 4023/10000 [15:49:21<23:06:41, 13.92s/it] 40%|████ | 4024/10000 [15:49:35<23:03:56, 13.89s/it] {'loss': 0.0344, 'learning_rate': 2.9905e-05, 'epoch': 5.27} 40%|████ | 4024/10000 [15:49:35<23:03:56, 13.89s/it] 40%|████ | 4025/10000 [15:49:49<23:04:36, 13.90s/it] {'loss': 0.0275, 'learning_rate': 2.9900000000000002e-05, 'epoch': 5.27} 40%|████ | 4025/10000 [15:49:49<23:04:36, 13.90s/it] 40%|████ | 4026/10000 [15:50:03<23:05:21, 13.91s/it] {'loss': 0.0313, 'learning_rate': 2.9895e-05, 'epoch': 5.27} 40%|████ | 4026/10000 [15:50:03<23:05:21, 13.91s/it] 40%|████ | 4027/10000 [15:50:17<23:05:32, 13.92s/it] {'loss': 0.0324, 'learning_rate': 2.989e-05, 'epoch': 5.27} 40%|████ | 4027/10000 [15:50:17<23:05:32, 13.92s/it] 40%|████ | 4028/10000 [15:50:31<23:09:38, 13.96s/it] {'loss': 0.0306, 'learning_rate': 2.9885000000000003e-05, 'epoch': 5.27} 40%|████ | 4028/10000 [15:50:31<23:09:38, 13.96s/it] 40%|████ | 4029/10000 [15:50:45<23:07:42, 13.94s/it] {'loss': 0.0294, 'learning_rate': 2.9880000000000002e-05, 'epoch': 5.27} 40%|████ | 4029/10000 [15:50:45<23:07:42, 13.94s/it] 40%|████ | 4030/10000 [15:50:58<23:05:38, 13.93s/it] {'loss': 0.0317, 'learning_rate': 2.9875000000000004e-05, 'epoch': 5.27} 40%|████ | 4030/10000 [15:50:59<23:05:38, 13.93s/it] 40%|████ | 4031/10000 [15:51:12<23:04:52, 13.92s/it] {'loss': 0.0315, 'learning_rate': 2.9870000000000004e-05, 'epoch': 5.28} 40%|████ | 4031/10000 [15:51:12<23:04:52, 13.92s/it] 40%|████ | 4032/10000 [15:51:26<23:02:58, 13.90s/it] {'loss': 0.0276, 'learning_rate': 2.9865000000000003e-05, 'epoch': 5.28} 40%|████ | 4032/10000 [15:51:26<23:02:58, 13.90s/it] 40%|████ | 4033/10000 [15:51:40<23:04:10, 13.92s/it] {'loss': 0.034, 'learning_rate': 2.986e-05, 'epoch': 5.28} 40%|████ | 4033/10000 [15:51:40<23:04:10, 13.92s/it] 40%|████ | 4034/10000 [15:51:54<23:00:59, 13.89s/it] {'loss': 0.0399, 'learning_rate': 2.9855e-05, 'epoch': 5.28} 40%|████ | 4034/10000 [15:51:54<23:00:59, 13.89s/it] 40%|████ | 4035/10000 [15:52:08<23:00:58, 13.89s/it] {'loss': 0.0364, 'learning_rate': 2.985e-05, 'epoch': 5.28} 40%|████ | 4035/10000 [15:52:08<23:00:58, 13.89s/it] 40%|████ | 4036/10000 [15:52:22<23:00:51, 13.89s/it] {'loss': 0.0344, 'learning_rate': 2.9845e-05, 'epoch': 5.28} 40%|████ | 4036/10000 [15:52:22<23:00:51, 13.89s/it] 40%|████ | 4037/10000 [15:52:36<22:59:32, 13.88s/it] {'loss': 0.0289, 'learning_rate': 2.9840000000000002e-05, 'epoch': 5.28} 40%|████ | 4037/10000 [15:52:36<22:59:32, 13.88s/it] 40%|████ | 4038/10000 [15:52:49<22:54:43, 13.83s/it] {'loss': 0.0345, 'learning_rate': 2.9835e-05, 'epoch': 5.29} 40%|████ | 4038/10000 [15:52:49<22:54:43, 13.83s/it] 40%|████ | 4039/10000 [15:53:03<22:53:18, 13.82s/it] {'loss': 0.029, 'learning_rate': 2.9830000000000004e-05, 'epoch': 5.29} 40%|████ | 4039/10000 [15:53:03<22:53:18, 13.82s/it] 40%|████ | 4040/10000 [15:53:17<22:55:07, 13.84s/it] {'loss': 0.031, 'learning_rate': 2.9825000000000003e-05, 'epoch': 5.29} 40%|████ | 4040/10000 [15:53:17<22:55:07, 13.84s/it] 40%|████ | 4041/10000 [15:53:31<22:53:54, 13.83s/it] {'loss': 0.0323, 'learning_rate': 2.9820000000000002e-05, 'epoch': 5.29} 40%|████ | 4041/10000 [15:53:31<22:53:54, 13.83s/it] 40%|████ | 4042/10000 [15:53:45<22:58:00, 13.88s/it] {'loss': 0.0371, 'learning_rate': 2.9815000000000005e-05, 'epoch': 5.29} 40%|████ | 4042/10000 [15:53:45<22:58:00, 13.88s/it] 40%|████ | 4043/10000 [15:53:59<22:56:01, 13.86s/it] {'loss': 0.0293, 'learning_rate': 2.9809999999999997e-05, 'epoch': 5.29} 40%|████ | 4043/10000 [15:53:59<22:56:01, 13.86s/it] 40%|████ | 4044/10000 [15:54:13<22:56:08, 13.86s/it] {'loss': 0.026, 'learning_rate': 2.9805e-05, 'epoch': 5.29} 40%|████ | 4044/10000 [15:54:13<22:56:08, 13.86s/it] 40%|████ | 4045/10000 [15:54:27<23:01:52, 13.92s/it] {'loss': 0.0396, 'learning_rate': 2.98e-05, 'epoch': 5.29} 40%|████ | 4045/10000 [15:54:27<23:01:52, 13.92s/it] 40%|████ | 4046/10000 [15:54:41<23:07:02, 13.98s/it] {'loss': 0.0381, 'learning_rate': 2.9795000000000002e-05, 'epoch': 5.3} 40%|████ | 4046/10000 [15:54:41<23:07:02, 13.98s/it] 40%|████ | 4047/10000 [15:54:55<23:10:02, 14.01s/it] {'loss': 0.0383, 'learning_rate': 2.979e-05, 'epoch': 5.3} 40%|████ | 4047/10000 [15:54:55<23:10:02, 14.01s/it] 40%|████ | 4048/10000 [15:55:09<23:08:13, 13.99s/it] {'loss': 0.0324, 'learning_rate': 2.9785e-05, 'epoch': 5.3} 40%|████ | 4048/10000 [15:55:09<23:08:13, 13.99s/it] 40%|████ | 4049/10000 [15:55:23<23:12:23, 14.04s/it] {'loss': 0.038, 'learning_rate': 2.9780000000000003e-05, 'epoch': 5.3} 40%|████ | 4049/10000 [15:55:23<23:12:23, 14.04s/it] 40%|████ | 4050/10000 [15:55:37<23:12:35, 14.04s/it] {'loss': 0.0336, 'learning_rate': 2.9775000000000002e-05, 'epoch': 5.3} 40%|████ | 4050/10000 [15:55:37<23:12:35, 14.04s/it] 41%|████ | 4051/10000 [15:55:51<23:05:02, 13.97s/it] {'loss': 0.0335, 'learning_rate': 2.9770000000000005e-05, 'epoch': 5.3} 41%|████ | 4051/10000 [15:55:51<23:05:02, 13.97s/it] 41%|████ | 4052/10000 [15:56:05<23:04:26, 13.97s/it] {'loss': 0.0355, 'learning_rate': 2.9765000000000004e-05, 'epoch': 5.3} 41%|████ | 4052/10000 [15:56:05<23:04:26, 13.97s/it] 41%|████ | 4053/10000 [15:56:19<23:07:05, 13.99s/it] {'loss': 0.033, 'learning_rate': 2.976e-05, 'epoch': 5.3} 41%|████ | 4053/10000 [15:56:19<23:07:05, 13.99s/it] 41%|████ | 4054/10000 [15:56:33<23:02:33, 13.95s/it] {'loss': 0.0289, 'learning_rate': 2.9755e-05, 'epoch': 5.31} 41%|████ | 4054/10000 [15:56:33<23:02:33, 13.95s/it] 41%|████ | 4055/10000 [15:56:46<22:59:22, 13.92s/it] {'loss': 0.0403, 'learning_rate': 2.975e-05, 'epoch': 5.31} 41%|████ | 4055/10000 [15:56:47<22:59:22, 13.92s/it] 41%|████ | 4056/10000 [15:57:00<22:58:55, 13.92s/it] {'loss': 0.0289, 'learning_rate': 2.9745e-05, 'epoch': 5.31} 41%|████ | 4056/10000 [15:57:00<22:58:55, 13.92s/it] 41%|████ | 4057/10000 [15:57:14<22:55:37, 13.89s/it] {'loss': 0.0364, 'learning_rate': 2.974e-05, 'epoch': 5.31} 41%|████ | 4057/10000 [15:57:14<22:55:37, 13.89s/it] 41%|████ | 4058/10000 [15:57:28<22:55:12, 13.89s/it] {'loss': 0.0334, 'learning_rate': 2.9735000000000002e-05, 'epoch': 5.31} 41%|████ | 4058/10000 [15:57:28<22:55:12, 13.89s/it] 41%|████ | 4059/10000 [15:57:42<22:55:51, 13.90s/it] {'loss': 0.0317, 'learning_rate': 2.973e-05, 'epoch': 5.31} 41%|████ | 4059/10000 [15:57:42<22:55:51, 13.90s/it] 41%|████ | 4060/10000 [15:57:56<22:54:40, 13.89s/it] {'loss': 0.0369, 'learning_rate': 2.9725000000000004e-05, 'epoch': 5.31} 41%|████ | 4060/10000 [15:57:56<22:54:40, 13.89s/it] 41%|████ | 4061/10000 [15:58:10<22:54:21, 13.88s/it] {'loss': 0.0332, 'learning_rate': 2.9720000000000003e-05, 'epoch': 5.32} 41%|████ | 4061/10000 [15:58:10<22:54:21, 13.88s/it] 41%|████ | 4062/10000 [15:58:24<22:54:20, 13.89s/it] {'loss': 0.0259, 'learning_rate': 2.9715000000000003e-05, 'epoch': 5.32} 41%|████ | 4062/10000 [15:58:24<22:54:20, 13.89s/it] 41%|████ | 4063/10000 [15:58:37<22:52:35, 13.87s/it] {'loss': 0.0311, 'learning_rate': 2.971e-05, 'epoch': 5.32} 41%|████ | 4063/10000 [15:58:38<22:52:35, 13.87s/it] 41%|████ | 4064/10000 [15:58:51<22:53:54, 13.89s/it] {'loss': 0.0323, 'learning_rate': 2.9705e-05, 'epoch': 5.32} 41%|████ | 4064/10000 [15:58:51<22:53:54, 13.89s/it] 41%|████ | 4065/10000 [15:59:05<22:53:06, 13.88s/it] {'loss': 0.0263, 'learning_rate': 2.97e-05, 'epoch': 5.32} 41%|████ | 4065/10000 [15:59:05<22:53:06, 13.88s/it] 41%|████ | 4066/10000 [15:59:19<22:54:03, 13.89s/it] {'loss': 0.0356, 'learning_rate': 2.9695e-05, 'epoch': 5.32} 41%|████ | 4066/10000 [15:59:19<22:54:03, 13.89s/it] 41%|████ | 4067/10000 [15:59:33<23:00:09, 13.96s/it] {'loss': 0.0354, 'learning_rate': 2.9690000000000002e-05, 'epoch': 5.32} 41%|████ | 4067/10000 [15:59:33<23:00:09, 13.96s/it] 41%|████ | 4068/10000 [15:59:47<22:58:58, 13.95s/it] {'loss': 0.0364, 'learning_rate': 2.9685e-05, 'epoch': 5.32} 41%|████ | 4068/10000 [15:59:47<22:58:58, 13.95s/it] 41%|████ | 4069/10000 [16:00:01<22:55:55, 13.92s/it] {'loss': 0.0362, 'learning_rate': 2.9680000000000004e-05, 'epoch': 5.33} 41%|████ | 4069/10000 [16:00:01<22:55:55, 13.92s/it] 41%|████ | 4070/10000 [16:00:15<23:00:00, 13.96s/it] {'loss': 0.0379, 'learning_rate': 2.9675000000000003e-05, 'epoch': 5.33} 41%|████ | 4070/10000 [16:00:15<23:00:00, 13.96s/it] 41%|████ | 4071/10000 [16:00:29<22:56:03, 13.93s/it] {'loss': 0.0278, 'learning_rate': 2.9670000000000002e-05, 'epoch': 5.33} 41%|████ | 4071/10000 [16:00:29<22:56:03, 13.93s/it] 41%|████ | 4072/10000 [16:00:43<22:54:55, 13.92s/it] {'loss': 0.0261, 'learning_rate': 2.9665000000000005e-05, 'epoch': 5.33} 41%|████ | 4072/10000 [16:00:43<22:54:55, 13.92s/it] 41%|████ | 4073/10000 [16:00:57<22:59:03, 13.96s/it] {'loss': 0.0299, 'learning_rate': 2.9659999999999997e-05, 'epoch': 5.33} 41%|████ | 4073/10000 [16:00:57<22:59:03, 13.96s/it] 41%|████ | 4074/10000 [16:01:11<22:57:46, 13.95s/it] {'loss': 0.0275, 'learning_rate': 2.9655e-05, 'epoch': 5.33} 41%|████ | 4074/10000 [16:01:11<22:57:46, 13.95s/it] 41%|████ | 4075/10000 [16:01:25<22:55:52, 13.93s/it] {'loss': 0.0327, 'learning_rate': 2.965e-05, 'epoch': 5.33} 41%|████ | 4075/10000 [16:01:25<22:55:52, 13.93s/it] 41%|████ | 4076/10000 [16:01:39<22:55:29, 13.93s/it] {'loss': 0.0323, 'learning_rate': 2.9645e-05, 'epoch': 5.34} 41%|████ | 4076/10000 [16:01:39<22:55:29, 13.93s/it] 41%|████ | 4077/10000 [16:01:53<22:55:14, 13.93s/it] {'loss': 0.032, 'learning_rate': 2.964e-05, 'epoch': 5.34} 41%|████ | 4077/10000 [16:01:53<22:55:14, 13.93s/it] 41%|████ | 4078/10000 [16:02:07<22:54:44, 13.93s/it] {'loss': 0.0288, 'learning_rate': 2.9635e-05, 'epoch': 5.34} 41%|████ | 4078/10000 [16:02:07<22:54:44, 13.93s/it] 41%|████ | 4079/10000 [16:02:21<22:59:03, 13.97s/it] {'loss': 0.033, 'learning_rate': 2.9630000000000003e-05, 'epoch': 5.34} 41%|████ | 4079/10000 [16:02:21<22:59:03, 13.97s/it] 41%|████ | 4080/10000 [16:02:35<22:58:49, 13.97s/it] {'loss': 0.0349, 'learning_rate': 2.9625000000000002e-05, 'epoch': 5.34} 41%|████ | 4080/10000 [16:02:35<22:58:49, 13.97s/it] 41%|████ | 4081/10000 [16:02:49<22:58:10, 13.97s/it] {'loss': 0.0324, 'learning_rate': 2.9620000000000004e-05, 'epoch': 5.34} 41%|████ | 4081/10000 [16:02:49<22:58:10, 13.97s/it] 41%|████ | 4082/10000 [16:03:02<22:53:57, 13.93s/it] {'loss': 0.0344, 'learning_rate': 2.9615000000000004e-05, 'epoch': 5.34} 41%|████ | 4082/10000 [16:03:02<22:53:57, 13.93s/it] 41%|████ | 4083/10000 [16:03:16<22:57:33, 13.97s/it] {'loss': 0.0424, 'learning_rate': 2.961e-05, 'epoch': 5.34} 41%|████ | 4083/10000 [16:03:16<22:57:33, 13.97s/it] 41%|████ | 4084/10000 [16:03:30<22:55:56, 13.95s/it] {'loss': 0.0359, 'learning_rate': 2.9605e-05, 'epoch': 5.35} 41%|████ | 4084/10000 [16:03:30<22:55:56, 13.95s/it] 41%|████ | 4085/10000 [16:03:44<22:49:55, 13.90s/it] {'loss': 0.0304, 'learning_rate': 2.96e-05, 'epoch': 5.35} 41%|████ | 4085/10000 [16:03:44<22:49:55, 13.90s/it] 41%|████ | 4086/10000 [16:03:58<22:49:11, 13.89s/it] {'loss': 0.0322, 'learning_rate': 2.9595e-05, 'epoch': 5.35} 41%|████ | 4086/10000 [16:03:58<22:49:11, 13.89s/it] 41%|████ | 4087/10000 [16:04:12<22:55:03, 13.95s/it] {'loss': 0.0331, 'learning_rate': 2.959e-05, 'epoch': 5.35} 41%|████ | 4087/10000 [16:04:12<22:55:03, 13.95s/it] 41%|████ | 4088/10000 [16:04:26<22:56:08, 13.97s/it] {'loss': 0.0292, 'learning_rate': 2.9585000000000002e-05, 'epoch': 5.35} 41%|████ | 4088/10000 [16:04:26<22:56:08, 13.97s/it] 41%|████ | 4089/10000 [16:04:40<22:54:18, 13.95s/it] {'loss': 0.037, 'learning_rate': 2.958e-05, 'epoch': 5.35} 41%|████ | 4089/10000 [16:04:40<22:54:18, 13.95s/it] 41%|████ | 4090/10000 [16:04:54<22:53:43, 13.95s/it] {'loss': 0.0308, 'learning_rate': 2.9575000000000004e-05, 'epoch': 5.35} 41%|████ | 4090/10000 [16:04:54<22:53:43, 13.95s/it] 41%|████ | 4091/10000 [16:05:08<22:51:43, 13.93s/it] {'loss': 0.0301, 'learning_rate': 2.9570000000000003e-05, 'epoch': 5.35} 41%|████ | 4091/10000 [16:05:08<22:51:43, 13.93s/it] 41%|████ | 4092/10000 [16:05:22<22:49:23, 13.91s/it] {'loss': 0.0325, 'learning_rate': 2.9565000000000002e-05, 'epoch': 5.36} 41%|████ | 4092/10000 [16:05:22<22:49:23, 13.91s/it] 41%|████ | 4093/10000 [16:05:36<22:48:39, 13.90s/it] {'loss': 0.032, 'learning_rate': 2.9559999999999998e-05, 'epoch': 5.36} 41%|████ | 4093/10000 [16:05:36<22:48:39, 13.90s/it] 41%|████ | 4094/10000 [16:05:50<22:55:46, 13.98s/it] {'loss': 0.0302, 'learning_rate': 2.9555e-05, 'epoch': 5.36} 41%|████ | 4094/10000 [16:05:50<22:55:46, 13.98s/it] 41%|████ | 4095/10000 [16:06:04<22:58:17, 14.00s/it] {'loss': 0.0354, 'learning_rate': 2.955e-05, 'epoch': 5.36} 41%|████ | 4095/10000 [16:06:04<22:58:17, 14.00s/it] 41%|████ | 4096/10000 [16:06:18<22:55:46, 13.98s/it] {'loss': 0.0293, 'learning_rate': 2.9545e-05, 'epoch': 5.36} 41%|████ | 4096/10000 [16:06:18<22:55:46, 13.98s/it] 41%|████ | 4097/10000 [16:06:32<22:54:37, 13.97s/it] {'loss': 0.0288, 'learning_rate': 2.9540000000000002e-05, 'epoch': 5.36} 41%|████ | 4097/10000 [16:06:32<22:54:37, 13.97s/it] 41%|████ | 4098/10000 [16:06:46<22:51:47, 13.95s/it] {'loss': 0.0249, 'learning_rate': 2.9535e-05, 'epoch': 5.36} 41%|████ | 4098/10000 [16:06:46<22:51:47, 13.95s/it] 41%|████ | 4099/10000 [16:07:00<22:51:23, 13.94s/it] {'loss': 0.0398, 'learning_rate': 2.9530000000000004e-05, 'epoch': 5.37} 41%|████ | 4099/10000 [16:07:00<22:51:23, 13.94s/it] 41%|████ | 4100/10000 [16:07:14<22:54:38, 13.98s/it] {'loss': 0.032, 'learning_rate': 2.9525000000000003e-05, 'epoch': 5.37} 41%|████ | 4100/10000 [16:07:14<22:54:38, 13.98s/it] 41%|████ | 4101/10000 [16:07:28<22:53:56, 13.97s/it] {'loss': 0.034, 'learning_rate': 2.9520000000000002e-05, 'epoch': 5.37} 41%|████ | 4101/10000 [16:07:28<22:53:56, 13.97s/it] 41%|████ | 4102/10000 [16:07:42<22:53:18, 13.97s/it] {'loss': 0.0335, 'learning_rate': 2.9515000000000005e-05, 'epoch': 5.37} 41%|████ | 4102/10000 [16:07:42<22:53:18, 13.97s/it] 41%|████ | 4103/10000 [16:07:55<22:49:40, 13.94s/it] {'loss': 0.0407, 'learning_rate': 2.951e-05, 'epoch': 5.37} 41%|████ | 4103/10000 [16:07:55<22:49:40, 13.94s/it] 41%|████ | 4104/10000 [16:08:09<22:52:11, 13.96s/it] {'loss': 0.032, 'learning_rate': 2.9505e-05, 'epoch': 5.37} 41%|████ | 4104/10000 [16:08:09<22:52:11, 13.96s/it] 41%|████ | 4105/10000 [16:08:23<22:50:53, 13.95s/it] {'loss': 0.037, 'learning_rate': 2.95e-05, 'epoch': 5.37} 41%|████ | 4105/10000 [16:08:23<22:50:53, 13.95s/it] 41%|████ | 4106/10000 [16:08:37<22:49:37, 13.94s/it] {'loss': 0.0284, 'learning_rate': 2.9495e-05, 'epoch': 5.37} 41%|████ | 4106/10000 [16:08:37<22:49:37, 13.94s/it] 41%|████ | 4107/10000 [16:08:51<22:45:38, 13.90s/it] {'loss': 0.0305, 'learning_rate': 2.949e-05, 'epoch': 5.38} 41%|████ | 4107/10000 [16:08:51<22:45:38, 13.90s/it] 41%|████ | 4108/10000 [16:09:05<22:51:00, 13.96s/it] {'loss': 0.0311, 'learning_rate': 2.9485000000000003e-05, 'epoch': 5.38} 41%|████ | 4108/10000 [16:09:05<22:51:00, 13.96s/it] 41%|████ | 4109/10000 [16:09:19<22:49:04, 13.94s/it] {'loss': 0.0386, 'learning_rate': 2.9480000000000002e-05, 'epoch': 5.38} 41%|████ | 4109/10000 [16:09:19<22:49:04, 13.94s/it] 41%|████ | 4110/10000 [16:09:33<22:45:19, 13.91s/it] {'loss': 0.036, 'learning_rate': 2.9475e-05, 'epoch': 5.38} 41%|████ | 4110/10000 [16:09:33<22:45:19, 13.91s/it] 41%|████ | 4111/10000 [16:09:47<22:44:40, 13.90s/it] {'loss': 0.0356, 'learning_rate': 2.9470000000000004e-05, 'epoch': 5.38} 41%|████ | 4111/10000 [16:09:47<22:44:40, 13.90s/it] 41%|████ | 4112/10000 [16:10:01<22:48:08, 13.94s/it] {'loss': 0.038, 'learning_rate': 2.9465000000000003e-05, 'epoch': 5.38} 41%|████ | 4112/10000 [16:10:01<22:48:08, 13.94s/it] 41%|████ | 4113/10000 [16:10:15<22:48:46, 13.95s/it] {'loss': 0.029, 'learning_rate': 2.946e-05, 'epoch': 5.38} 41%|████ | 4113/10000 [16:10:15<22:48:46, 13.95s/it] 41%|████ | 4114/10000 [16:10:29<22:57:28, 14.04s/it] {'loss': 0.0296, 'learning_rate': 2.9455e-05, 'epoch': 5.38} 41%|████ | 4114/10000 [16:10:29<22:57:28, 14.04s/it] 41%|████ | 4115/10000 [16:10:43<22:50:42, 13.97s/it] {'loss': 0.0324, 'learning_rate': 2.945e-05, 'epoch': 5.39} 41%|████ | 4115/10000 [16:10:43<22:50:42, 13.97s/it] 41%|████ | 4116/10000 [16:10:57<22:48:06, 13.95s/it] {'loss': 0.0347, 'learning_rate': 2.9445e-05, 'epoch': 5.39} 41%|████ | 4116/10000 [16:10:57<22:48:06, 13.95s/it] 41%|████ | 4117/10000 [16:11:11<22:45:31, 13.93s/it] {'loss': 0.0378, 'learning_rate': 2.944e-05, 'epoch': 5.39} 41%|████ | 4117/10000 [16:11:11<22:45:31, 13.93s/it] 41%|████ | 4118/10000 [16:11:25<22:46:01, 13.93s/it] {'loss': 0.0354, 'learning_rate': 2.9435000000000002e-05, 'epoch': 5.39} 41%|████ | 4118/10000 [16:11:25<22:46:01, 13.93s/it] 41%|████ | 4119/10000 [16:11:38<22:42:36, 13.90s/it] {'loss': 0.0338, 'learning_rate': 2.943e-05, 'epoch': 5.39} 41%|████ | 4119/10000 [16:11:38<22:42:36, 13.90s/it] 41%|████ | 4120/10000 [16:11:52<22:43:53, 13.92s/it] {'loss': 0.0352, 'learning_rate': 2.9425000000000004e-05, 'epoch': 5.39} 41%|████ | 4120/10000 [16:11:52<22:43:53, 13.92s/it] 41%|████ | 4121/10000 [16:12:06<22:43:59, 13.92s/it] {'loss': 0.0351, 'learning_rate': 2.9420000000000003e-05, 'epoch': 5.39} 41%|████ | 4121/10000 [16:12:06<22:43:59, 13.92s/it] 41%|████ | 4122/10000 [16:12:20<22:42:18, 13.91s/it] {'loss': 0.0313, 'learning_rate': 2.9415000000000002e-05, 'epoch': 5.4} 41%|████ | 4122/10000 [16:12:20<22:42:18, 13.91s/it] 41%|████ | 4123/10000 [16:12:34<22:39:22, 13.88s/it] {'loss': 0.0261, 'learning_rate': 2.9409999999999998e-05, 'epoch': 5.4} 41%|████ | 4123/10000 [16:12:34<22:39:22, 13.88s/it] 41%|████ | 4124/10000 [16:12:48<22:40:25, 13.89s/it] {'loss': 0.0332, 'learning_rate': 2.9405e-05, 'epoch': 5.4} 41%|████ | 4124/10000 [16:12:48<22:40:25, 13.89s/it] 41%|████▏ | 4125/10000 [16:13:02<22:45:08, 13.94s/it] {'loss': 0.0332, 'learning_rate': 2.94e-05, 'epoch': 5.4} 41%|████▏ | 4125/10000 [16:13:02<22:45:08, 13.94s/it] 41%|████▏ | 4126/10000 [16:13:16<22:44:04, 13.93s/it] {'loss': 0.0406, 'learning_rate': 2.9395e-05, 'epoch': 5.4} 41%|████▏ | 4126/10000 [16:13:16<22:44:04, 13.93s/it] 41%|████▏ | 4127/10000 [16:13:30<22:38:36, 13.88s/it] {'loss': 0.0393, 'learning_rate': 2.939e-05, 'epoch': 5.4} 41%|████▏ | 4127/10000 [16:13:30<22:38:36, 13.88s/it] 41%|████▏ | 4128/10000 [16:13:44<22:42:38, 13.92s/it] {'loss': 0.036, 'learning_rate': 2.9385e-05, 'epoch': 5.4} 41%|████▏ | 4128/10000 [16:13:44<22:42:38, 13.92s/it] 41%|████▏ | 4129/10000 [16:13:58<22:43:34, 13.94s/it] {'loss': 0.0286, 'learning_rate': 2.9380000000000003e-05, 'epoch': 5.4} 41%|████▏ | 4129/10000 [16:13:58<22:43:34, 13.94s/it] 41%|████▏ | 4130/10000 [16:14:11<22:39:39, 13.90s/it] {'loss': 0.0339, 'learning_rate': 2.9375000000000003e-05, 'epoch': 5.41} 41%|████▏ | 4130/10000 [16:14:11<22:39:39, 13.90s/it] 41%|████▏ | 4131/10000 [16:14:25<22:42:02, 13.92s/it] {'loss': 0.038, 'learning_rate': 2.9370000000000002e-05, 'epoch': 5.41} 41%|████▏ | 4131/10000 [16:14:25<22:42:02, 13.92s/it] 41%|████▏ | 4132/10000 [16:14:39<22:42:32, 13.93s/it] {'loss': 0.0343, 'learning_rate': 2.9365000000000004e-05, 'epoch': 5.41} 41%|████▏ | 4132/10000 [16:14:39<22:42:32, 13.93s/it] 41%|████▏ | 4133/10000 [16:14:53<22:40:28, 13.91s/it] {'loss': 0.0299, 'learning_rate': 2.9360000000000003e-05, 'epoch': 5.41} 41%|████▏ | 4133/10000 [16:14:53<22:40:28, 13.91s/it] 41%|████▏ | 4134/10000 [16:15:07<22:42:51, 13.94s/it] {'loss': 0.0367, 'learning_rate': 2.9355e-05, 'epoch': 5.41} 41%|████▏ | 4134/10000 [16:15:07<22:42:51, 13.94s/it] 41%|████▏ | 4135/10000 [16:15:21<22:40:40, 13.92s/it] {'loss': 0.0405, 'learning_rate': 2.935e-05, 'epoch': 5.41} 41%|████▏ | 4135/10000 [16:15:21<22:40:40, 13.92s/it] 41%|████▏ | 4136/10000 [16:15:35<22:44:03, 13.96s/it] {'loss': 0.031, 'learning_rate': 2.9345e-05, 'epoch': 5.41} 41%|████▏ | 4136/10000 [16:15:35<22:44:03, 13.96s/it] 41%|████▏ | 4137/10000 [16:15:49<22:41:36, 13.93s/it] {'loss': 0.0284, 'learning_rate': 2.934e-05, 'epoch': 5.41} 41%|████▏ | 4137/10000 [16:15:49<22:41:36, 13.93s/it] 41%|████▏ | 4138/10000 [16:16:03<22:40:55, 13.93s/it] {'loss': 0.0345, 'learning_rate': 2.9335000000000003e-05, 'epoch': 5.42} 41%|████▏ | 4138/10000 [16:16:03<22:40:55, 13.93s/it] 41%|████▏ | 4139/10000 [16:16:17<22:40:16, 13.93s/it] {'loss': 0.0343, 'learning_rate': 2.9330000000000002e-05, 'epoch': 5.42} 41%|████▏ | 4139/10000 [16:16:17<22:40:16, 13.93s/it] 41%|████▏ | 4140/10000 [16:16:31<22:40:00, 13.93s/it] {'loss': 0.0343, 'learning_rate': 2.9325e-05, 'epoch': 5.42} 41%|████▏ | 4140/10000 [16:16:31<22:40:00, 13.93s/it] 41%|████▏ | 4141/10000 [16:16:45<22:35:47, 13.88s/it] {'loss': 0.0292, 'learning_rate': 2.9320000000000004e-05, 'epoch': 5.42} 41%|████▏ | 4141/10000 [16:16:45<22:35:47, 13.88s/it] 41%|████▏ | 4142/10000 [16:16:59<22:39:54, 13.93s/it] {'loss': 0.032, 'learning_rate': 2.9315000000000003e-05, 'epoch': 5.42} 41%|████▏ | 4142/10000 [16:16:59<22:39:54, 13.93s/it] 41%|████▏ | 4143/10000 [16:17:12<22:38:56, 13.92s/it] {'loss': 0.0372, 'learning_rate': 2.9310000000000006e-05, 'epoch': 5.42} 41%|████▏ | 4143/10000 [16:17:13<22:38:56, 13.92s/it] 41%|████▏ | 4144/10000 [16:17:26<22:38:11, 13.92s/it] {'loss': 0.033, 'learning_rate': 2.9304999999999998e-05, 'epoch': 5.42} 41%|████▏ | 4144/10000 [16:17:26<22:38:11, 13.92s/it] 41%|████▏ | 4145/10000 [16:17:40<22:36:08, 13.90s/it] {'loss': 0.0331, 'learning_rate': 2.93e-05, 'epoch': 5.43} 41%|████▏ | 4145/10000 [16:17:40<22:36:08, 13.90s/it] 41%|████▏ | 4146/10000 [16:17:54<22:33:58, 13.88s/it] {'loss': 0.0334, 'learning_rate': 2.9295e-05, 'epoch': 5.43} 41%|████▏ | 4146/10000 [16:17:54<22:33:58, 13.88s/it] 41%|████▏ | 4147/10000 [16:18:08<22:33:25, 13.87s/it] {'loss': 0.0388, 'learning_rate': 2.929e-05, 'epoch': 5.43} 41%|████▏ | 4147/10000 [16:18:08<22:33:25, 13.87s/it] 41%|████▏ | 4148/10000 [16:18:22<22:35:36, 13.90s/it] {'loss': 0.0282, 'learning_rate': 2.9285e-05, 'epoch': 5.43} 41%|████▏ | 4148/10000 [16:18:22<22:35:36, 13.90s/it] 41%|████▏ | 4149/10000 [16:18:36<22:36:58, 13.92s/it] {'loss': 0.0364, 'learning_rate': 2.928e-05, 'epoch': 5.43} 41%|████▏ | 4149/10000 [16:18:36<22:36:58, 13.92s/it] 42%|████▏ | 4150/10000 [16:18:50<22:32:58, 13.88s/it] {'loss': 0.0309, 'learning_rate': 2.9275000000000003e-05, 'epoch': 5.43} 42%|████▏ | 4150/10000 [16:18:50<22:32:58, 13.88s/it] 42%|████▏ | 4151/10000 [16:19:04<22:38:19, 13.93s/it] {'loss': 0.0321, 'learning_rate': 2.9270000000000003e-05, 'epoch': 5.43} 42%|████▏ | 4151/10000 [16:19:04<22:38:19, 13.93s/it] 42%|████▏ | 4152/10000 [16:19:18<22:38:31, 13.94s/it] {'loss': 0.0391, 'learning_rate': 2.9265000000000002e-05, 'epoch': 5.43} 42%|████▏ | 4152/10000 [16:19:18<22:38:31, 13.94s/it] 42%|████▏ | 4153/10000 [16:19:32<22:37:24, 13.93s/it] {'loss': 0.0404, 'learning_rate': 2.9260000000000004e-05, 'epoch': 5.44} 42%|████▏ | 4153/10000 [16:19:32<22:37:24, 13.93s/it] 42%|████▏ | 4154/10000 [16:19:46<22:41:41, 13.98s/it] {'loss': 0.0426, 'learning_rate': 2.9255e-05, 'epoch': 5.44} 42%|████▏ | 4154/10000 [16:19:46<22:41:41, 13.98s/it] 42%|████▏ | 4155/10000 [16:20:00<22:38:25, 13.94s/it] {'loss': 0.0299, 'learning_rate': 2.925e-05, 'epoch': 5.44} 42%|████▏ | 4155/10000 [16:20:00<22:38:25, 13.94s/it] 42%|████▏ | 4156/10000 [16:20:13<22:38:30, 13.95s/it] {'loss': 0.025, 'learning_rate': 2.9245e-05, 'epoch': 5.44} 42%|████▏ | 4156/10000 [16:20:14<22:38:30, 13.95s/it] 42%|████▏ | 4157/10000 [16:20:27<22:37:10, 13.94s/it] {'loss': 0.0367, 'learning_rate': 2.924e-05, 'epoch': 5.44} 42%|████▏ | 4157/10000 [16:20:27<22:37:10, 13.94s/it] 42%|████▏ | 4158/10000 [16:20:41<22:39:13, 13.96s/it] {'loss': 0.0342, 'learning_rate': 2.9235e-05, 'epoch': 5.44} 42%|████▏ | 4158/10000 [16:20:41<22:39:13, 13.96s/it] 42%|████▏ | 4159/10000 [16:20:55<22:42:29, 14.00s/it] {'loss': 0.0302, 'learning_rate': 2.9230000000000003e-05, 'epoch': 5.44} 42%|████▏ | 4159/10000 [16:20:56<22:42:29, 14.00s/it] 42%|████▏ | 4160/10000 [16:21:10<22:43:35, 14.01s/it] {'loss': 0.0392, 'learning_rate': 2.9225000000000002e-05, 'epoch': 5.45} 42%|████▏ | 4160/10000 [16:21:10<22:43:35, 14.01s/it] 42%|████▏ | 4161/10000 [16:21:23<22:38:59, 13.96s/it] {'loss': 0.0365, 'learning_rate': 2.922e-05, 'epoch': 5.45} 42%|████▏ | 4161/10000 [16:21:23<22:38:59, 13.96s/it] 42%|████▏ | 4162/10000 [16:21:37<22:37:42, 13.95s/it] {'loss': 0.0337, 'learning_rate': 2.9215000000000004e-05, 'epoch': 5.45} 42%|████▏ | 4162/10000 [16:21:37<22:37:42, 13.95s/it] 42%|████▏ | 4163/10000 [16:21:51<22:39:06, 13.97s/it] {'loss': 0.0335, 'learning_rate': 2.9210000000000003e-05, 'epoch': 5.45} 42%|████▏ | 4163/10000 [16:21:51<22:39:06, 13.97s/it] 42%|████▏ | 4164/10000 [16:22:05<22:34:41, 13.93s/it] {'loss': 0.0205, 'learning_rate': 2.9205e-05, 'epoch': 5.45} 42%|████▏ | 4164/10000 [16:22:05<22:34:41, 13.93s/it] 42%|████▏ | 4165/10000 [16:22:19<22:30:29, 13.89s/it] {'loss': 0.0417, 'learning_rate': 2.9199999999999998e-05, 'epoch': 5.45} 42%|████▏ | 4165/10000 [16:22:19<22:30:29, 13.89s/it] 42%|████▏ | 4166/10000 [16:22:33<22:31:17, 13.90s/it] {'loss': 0.0354, 'learning_rate': 2.9195e-05, 'epoch': 5.45} 42%|████▏ | 4166/10000 [16:22:33<22:31:17, 13.90s/it] 42%|████▏ | 4167/10000 [16:22:47<22:34:57, 13.94s/it] {'loss': 0.0378, 'learning_rate': 2.919e-05, 'epoch': 5.45} 42%|████▏ | 4167/10000 [16:22:47<22:34:57, 13.94s/it] 42%|████▏ | 4168/10000 [16:23:01<22:35:52, 13.95s/it] {'loss': 0.0341, 'learning_rate': 2.9185000000000003e-05, 'epoch': 5.46} 42%|████▏ | 4168/10000 [16:23:01<22:35:52, 13.95s/it] 42%|████▏ | 4169/10000 [16:23:15<22:34:22, 13.94s/it] {'loss': 0.0397, 'learning_rate': 2.9180000000000002e-05, 'epoch': 5.46} 42%|████▏ | 4169/10000 [16:23:15<22:34:22, 13.94s/it] 42%|████▏ | 4170/10000 [16:23:29<22:36:36, 13.96s/it] {'loss': 0.0315, 'learning_rate': 2.9175e-05, 'epoch': 5.46} 42%|████▏ | 4170/10000 [16:23:29<22:36:36, 13.96s/it] 42%|████▏ | 4171/10000 [16:23:43<22:30:33, 13.90s/it] {'loss': 0.0335, 'learning_rate': 2.9170000000000004e-05, 'epoch': 5.46} 42%|████▏ | 4171/10000 [16:23:43<22:30:33, 13.90s/it] 42%|████▏ | 4172/10000 [16:23:56<22:29:05, 13.89s/it] {'loss': 0.0279, 'learning_rate': 2.9165000000000003e-05, 'epoch': 5.46} 42%|████▏ | 4172/10000 [16:23:56<22:29:05, 13.89s/it] 42%|████▏ | 4173/10000 [16:24:10<22:28:35, 13.89s/it] {'loss': 0.0332, 'learning_rate': 2.9160000000000005e-05, 'epoch': 5.46} 42%|████▏ | 4173/10000 [16:24:10<22:28:35, 13.89s/it] 42%|████▏ | 4174/10000 [16:24:24<22:30:29, 13.91s/it] {'loss': 0.027, 'learning_rate': 2.9154999999999998e-05, 'epoch': 5.46} 42%|████▏ | 4174/10000 [16:24:24<22:30:29, 13.91s/it] 42%|████▏ | 4175/10000 [16:24:38<22:31:44, 13.92s/it] {'loss': 0.0234, 'learning_rate': 2.915e-05, 'epoch': 5.46} 42%|████▏ | 4175/10000 [16:24:38<22:31:44, 13.92s/it] 42%|████▏ | 4176/10000 [16:24:52<22:33:31, 13.94s/it] {'loss': 0.0349, 'learning_rate': 2.9145e-05, 'epoch': 5.47} 42%|████▏ | 4176/10000 [16:24:52<22:33:31, 13.94s/it] 42%|████▏ | 4177/10000 [16:25:06<22:31:13, 13.92s/it] {'loss': 0.0342, 'learning_rate': 2.9140000000000002e-05, 'epoch': 5.47} 42%|████▏ | 4177/10000 [16:25:06<22:31:13, 13.92s/it] 42%|████▏ | 4178/10000 [16:25:20<22:32:35, 13.94s/it] {'loss': 0.0284, 'learning_rate': 2.9135e-05, 'epoch': 5.47} 42%|████▏ | 4178/10000 [16:25:20<22:32:35, 13.94s/it] 42%|████▏ | 4179/10000 [16:25:34<22:30:00, 13.92s/it] {'loss': 0.033, 'learning_rate': 2.913e-05, 'epoch': 5.47} 42%|████▏ | 4179/10000 [16:25:34<22:30:00, 13.92s/it] 42%|████▏ | 4180/10000 [16:25:48<22:29:12, 13.91s/it] {'loss': 0.029, 'learning_rate': 2.9125000000000003e-05, 'epoch': 5.47} 42%|████▏ | 4180/10000 [16:25:48<22:29:12, 13.91s/it] 42%|████▏ | 4181/10000 [16:26:02<22:25:19, 13.87s/it] {'loss': 0.0326, 'learning_rate': 2.9120000000000002e-05, 'epoch': 5.47} 42%|████▏ | 4181/10000 [16:26:02<22:25:19, 13.87s/it] 42%|████▏ | 4182/10000 [16:26:16<22:26:23, 13.89s/it] {'loss': 0.034, 'learning_rate': 2.9115000000000005e-05, 'epoch': 5.47} 42%|████▏ | 4182/10000 [16:26:16<22:26:23, 13.89s/it] 42%|████▏ | 4183/10000 [16:26:29<22:26:10, 13.89s/it] {'loss': 0.036, 'learning_rate': 2.9110000000000004e-05, 'epoch': 5.48} 42%|████▏ | 4183/10000 [16:26:29<22:26:10, 13.89s/it] 42%|████▏ | 4184/10000 [16:26:43<22:21:49, 13.84s/it] {'loss': 0.0321, 'learning_rate': 2.9105e-05, 'epoch': 5.48} 42%|████▏ | 4184/10000 [16:26:43<22:21:49, 13.84s/it] 42%|████▏ | 4185/10000 [16:26:57<22:26:23, 13.89s/it] {'loss': 0.0353, 'learning_rate': 2.91e-05, 'epoch': 5.48} 42%|████▏ | 4185/10000 [16:26:57<22:26:23, 13.89s/it] 42%|████▏ | 4186/10000 [16:27:11<22:33:23, 13.97s/it] {'loss': 0.0292, 'learning_rate': 2.9095e-05, 'epoch': 5.48} 42%|████▏ | 4186/10000 [16:27:11<22:33:23, 13.97s/it] 42%|████▏ | 4187/10000 [16:27:25<22:31:31, 13.95s/it] {'loss': 0.0317, 'learning_rate': 2.909e-05, 'epoch': 5.48} 42%|████▏ | 4187/10000 [16:27:25<22:31:31, 13.95s/it] 42%|████▏ | 4188/10000 [16:27:39<22:28:31, 13.92s/it] {'loss': 0.0342, 'learning_rate': 2.9085e-05, 'epoch': 5.48} 42%|████▏ | 4188/10000 [16:27:39<22:28:31, 13.92s/it] 42%|████▏ | 4189/10000 [16:27:53<22:26:24, 13.90s/it] {'loss': 0.0277, 'learning_rate': 2.9080000000000003e-05, 'epoch': 5.48} 42%|████▏ | 4189/10000 [16:27:53<22:26:24, 13.90s/it] 42%|████▏ | 4190/10000 [16:28:07<22:22:05, 13.86s/it] {'loss': 0.0332, 'learning_rate': 2.9075000000000002e-05, 'epoch': 5.48} 42%|████▏ | 4190/10000 [16:28:07<22:22:05, 13.86s/it] 42%|████▏ | 4191/10000 [16:28:21<22:22:01, 13.86s/it] {'loss': 0.0387, 'learning_rate': 2.907e-05, 'epoch': 5.49} 42%|████▏ | 4191/10000 [16:28:21<22:22:01, 13.86s/it] 42%|████▏ | 4192/10000 [16:28:35<22:28:33, 13.93s/it] {'loss': 0.0285, 'learning_rate': 2.9065000000000004e-05, 'epoch': 5.49} 42%|████▏ | 4192/10000 [16:28:35<22:28:33, 13.93s/it] 42%|████▏ | 4193/10000 [16:28:49<22:28:43, 13.94s/it] {'loss': 0.0348, 'learning_rate': 2.9060000000000003e-05, 'epoch': 5.49} 42%|████▏ | 4193/10000 [16:28:49<22:28:43, 13.94s/it] 42%|████▏ | 4194/10000 [16:29:03<22:32:33, 13.98s/it] {'loss': 0.0294, 'learning_rate': 2.9055e-05, 'epoch': 5.49} 42%|████▏ | 4194/10000 [16:29:03<22:32:33, 13.98s/it] 42%|████▏ | 4195/10000 [16:29:17<22:32:44, 13.98s/it] {'loss': 0.0413, 'learning_rate': 2.9049999999999998e-05, 'epoch': 5.49} 42%|████▏ | 4195/10000 [16:29:17<22:32:44, 13.98s/it] 42%|████▏ | 4196/10000 [16:29:31<22:31:01, 13.97s/it] {'loss': 0.0341, 'learning_rate': 2.9045e-05, 'epoch': 5.49} 42%|████▏ | 4196/10000 [16:29:31<22:31:01, 13.97s/it] 42%|████▏ | 4197/10000 [16:29:45<22:31:02, 13.97s/it] {'loss': 0.0399, 'learning_rate': 2.904e-05, 'epoch': 5.49} 42%|████▏ | 4197/10000 [16:29:45<22:31:02, 13.97s/it] 42%|████▏ | 4198/10000 [16:29:58<22:29:20, 13.95s/it] {'loss': 0.0373, 'learning_rate': 2.9035000000000002e-05, 'epoch': 5.49} 42%|████▏ | 4198/10000 [16:29:59<22:29:20, 13.95s/it] 42%|████▏ | 4199/10000 [16:30:12<22:27:09, 13.93s/it] {'loss': 0.0307, 'learning_rate': 2.903e-05, 'epoch': 5.5} 42%|████▏ | 4199/10000 [16:30:12<22:27:09, 13.93s/it] 42%|████▏ | 4200/10000 [16:30:26<22:22:01, 13.88s/it] {'loss': 0.038, 'learning_rate': 2.9025e-05, 'epoch': 5.5} 42%|████▏ | 4200/10000 [16:30:26<22:22:01, 13.88s/it] 42%|████▏ | 4201/10000 [16:30:40<22:24:45, 13.91s/it] {'loss': 0.0322, 'learning_rate': 2.9020000000000003e-05, 'epoch': 5.5} 42%|████▏ | 4201/10000 [16:30:40<22:24:45, 13.91s/it] 42%|████▏ | 4202/10000 [16:30:54<22:21:56, 13.89s/it] {'loss': 0.0334, 'learning_rate': 2.9015000000000003e-05, 'epoch': 5.5} 42%|████▏ | 4202/10000 [16:30:54<22:21:56, 13.89s/it] 42%|████▏ | 4203/10000 [16:31:08<22:22:40, 13.90s/it] {'loss': 0.0406, 'learning_rate': 2.9010000000000005e-05, 'epoch': 5.5} 42%|████▏ | 4203/10000 [16:31:08<22:22:40, 13.90s/it] 42%|████▏ | 4204/10000 [16:31:22<22:26:55, 13.94s/it] {'loss': 0.0344, 'learning_rate': 2.9004999999999998e-05, 'epoch': 5.5} 42%|████▏ | 4204/10000 [16:31:22<22:26:55, 13.94s/it] 42%|████▏ | 4205/10000 [16:31:36<22:25:05, 13.93s/it] {'loss': 0.0359, 'learning_rate': 2.9e-05, 'epoch': 5.5} 42%|████▏ | 4205/10000 [16:31:36<22:25:05, 13.93s/it] 42%|████▏ | 4206/10000 [16:31:50<22:23:34, 13.91s/it] {'loss': 0.0307, 'learning_rate': 2.8995e-05, 'epoch': 5.51} 42%|████▏ | 4206/10000 [16:31:50<22:23:34, 13.91s/it] 42%|████▏ | 4207/10000 [16:32:04<22:23:30, 13.92s/it] {'loss': 0.0357, 'learning_rate': 2.8990000000000002e-05, 'epoch': 5.51} 42%|████▏ | 4207/10000 [16:32:04<22:23:30, 13.92s/it] 42%|████▏ | 4208/10000 [16:32:17<22:20:40, 13.89s/it] {'loss': 0.0345, 'learning_rate': 2.8985e-05, 'epoch': 5.51} 42%|████▏ | 4208/10000 [16:32:17<22:20:40, 13.89s/it] 42%|████▏ | 4209/10000 [16:32:31<22:18:48, 13.87s/it] {'loss': 0.0321, 'learning_rate': 2.898e-05, 'epoch': 5.51} 42%|████▏ | 4209/10000 [16:32:31<22:18:48, 13.87s/it] 42%|████▏ | 4210/10000 [16:32:45<22:17:40, 13.86s/it] {'loss': 0.033, 'learning_rate': 2.8975000000000003e-05, 'epoch': 5.51} 42%|████▏ | 4210/10000 [16:32:45<22:17:40, 13.86s/it] 42%|████▏ | 4211/10000 [16:32:59<22:19:45, 13.89s/it] {'loss': 0.0428, 'learning_rate': 2.8970000000000002e-05, 'epoch': 5.51} 42%|████▏ | 4211/10000 [16:32:59<22:19:45, 13.89s/it] 42%|████▏ | 4212/10000 [16:33:13<22:21:44, 13.91s/it] {'loss': 0.0353, 'learning_rate': 2.8965000000000005e-05, 'epoch': 5.51} 42%|████▏ | 4212/10000 [16:33:13<22:21:44, 13.91s/it] 42%|████▏ | 4213/10000 [16:33:27<22:23:14, 13.93s/it] {'loss': 0.0305, 'learning_rate': 2.8960000000000004e-05, 'epoch': 5.51} 42%|████▏ | 4213/10000 [16:33:27<22:23:14, 13.93s/it] 42%|████▏ | 4214/10000 [16:33:41<22:25:42, 13.95s/it] {'loss': 0.0307, 'learning_rate': 2.8955e-05, 'epoch': 5.52} 42%|████▏ | 4214/10000 [16:33:41<22:25:42, 13.95s/it] 42%|████▏ | 4215/10000 [16:33:55<22:31:36, 14.02s/it] {'loss': 0.0343, 'learning_rate': 2.895e-05, 'epoch': 5.52} 42%|████▏ | 4215/10000 [16:33:55<22:31:36, 14.02s/it] 42%|████▏ | 4216/10000 [16:34:09<22:29:14, 14.00s/it] {'loss': 0.0344, 'learning_rate': 2.8945e-05, 'epoch': 5.52} 42%|████▏ | 4216/10000 [16:34:09<22:29:14, 14.00s/it] 42%|████▏ | 4217/10000 [16:34:23<22:27:00, 13.98s/it] {'loss': 0.0312, 'learning_rate': 2.894e-05, 'epoch': 5.52} 42%|████▏ | 4217/10000 [16:34:23<22:27:00, 13.98s/it] 42%|████▏ | 4218/10000 [16:34:37<22:21:16, 13.92s/it] {'loss': 0.0355, 'learning_rate': 2.8935e-05, 'epoch': 5.52} 42%|████▏ | 4218/10000 [16:34:37<22:21:16, 13.92s/it] 42%|████▏ | 4219/10000 [16:34:51<22:19:01, 13.90s/it] {'loss': 0.0362, 'learning_rate': 2.8930000000000003e-05, 'epoch': 5.52} 42%|████▏ | 4219/10000 [16:34:51<22:19:01, 13.90s/it] 42%|████▏ | 4220/10000 [16:35:05<22:17:40, 13.89s/it] {'loss': 0.0346, 'learning_rate': 2.8925000000000002e-05, 'epoch': 5.52} 42%|████▏ | 4220/10000 [16:35:05<22:17:40, 13.89s/it] 42%|████▏ | 4221/10000 [16:35:18<22:15:54, 13.87s/it] {'loss': 0.0334, 'learning_rate': 2.8920000000000004e-05, 'epoch': 5.52} 42%|████▏ | 4221/10000 [16:35:18<22:15:54, 13.87s/it] 42%|████▏ | 4222/10000 [16:35:32<22:16:36, 13.88s/it] {'loss': 0.0346, 'learning_rate': 2.8915000000000004e-05, 'epoch': 5.53} 42%|████▏ | 4222/10000 [16:35:32<22:16:36, 13.88s/it] 42%|████▏ | 4223/10000 [16:35:46<22:16:57, 13.89s/it] {'loss': 0.0375, 'learning_rate': 2.8910000000000003e-05, 'epoch': 5.53} 42%|████▏ | 4223/10000 [16:35:46<22:16:57, 13.89s/it] 42%|████▏ | 4224/10000 [16:36:00<22:16:11, 13.88s/it] {'loss': 0.0358, 'learning_rate': 2.8905e-05, 'epoch': 5.53} 42%|████▏ | 4224/10000 [16:36:00<22:16:11, 13.88s/it] 42%|████▏ | 4225/10000 [16:36:14<22:11:22, 13.83s/it] {'loss': 0.035, 'learning_rate': 2.8899999999999998e-05, 'epoch': 5.53} 42%|████▏ | 4225/10000 [16:36:14<22:11:22, 13.83s/it] 42%|████▏ | 4226/10000 [16:36:28<22:11:06, 13.83s/it] {'loss': 0.0407, 'learning_rate': 2.8895e-05, 'epoch': 5.53} 42%|████▏ | 4226/10000 [16:36:28<22:11:06, 13.83s/it] 42%|████▏ | 4227/10000 [16:36:42<22:14:01, 13.86s/it] {'loss': 0.0304, 'learning_rate': 2.889e-05, 'epoch': 5.53} 42%|████▏ | 4227/10000 [16:36:42<22:14:01, 13.86s/it] 42%|████▏ | 4228/10000 [16:36:56<22:20:57, 13.94s/it] {'loss': 0.034, 'learning_rate': 2.8885000000000002e-05, 'epoch': 5.53} 42%|████▏ | 4228/10000 [16:36:56<22:20:57, 13.94s/it] 42%|████▏ | 4229/10000 [16:37:10<22:20:31, 13.94s/it] {'loss': 0.0342, 'learning_rate': 2.888e-05, 'epoch': 5.54} 42%|████▏ | 4229/10000 [16:37:10<22:20:31, 13.94s/it] 42%|████▏ | 4230/10000 [16:37:24<22:21:11, 13.95s/it] {'loss': 0.0364, 'learning_rate': 2.8875e-05, 'epoch': 5.54} 42%|████▏ | 4230/10000 [16:37:24<22:21:11, 13.95s/it] 42%|████▏ | 4231/10000 [16:37:37<22:19:14, 13.93s/it] {'loss': 0.0393, 'learning_rate': 2.8870000000000003e-05, 'epoch': 5.54} 42%|████▏ | 4231/10000 [16:37:37<22:19:14, 13.93s/it] 42%|████▏ | 4232/10000 [16:37:51<22:18:25, 13.92s/it] {'loss': 0.0336, 'learning_rate': 2.8865000000000002e-05, 'epoch': 5.54} 42%|████▏ | 4232/10000 [16:37:51<22:18:25, 13.92s/it] 42%|████▏ | 4233/10000 [16:38:05<22:22:47, 13.97s/it] {'loss': 0.0329, 'learning_rate': 2.8860000000000005e-05, 'epoch': 5.54} 42%|████▏ | 4233/10000 [16:38:05<22:22:47, 13.97s/it] 42%|████▏ | 4234/10000 [16:38:19<22:21:38, 13.96s/it] {'loss': 0.0316, 'learning_rate': 2.8854999999999997e-05, 'epoch': 5.54} 42%|████▏ | 4234/10000 [16:38:19<22:21:38, 13.96s/it] 42%|████▏ | 4235/10000 [16:38:33<22:22:03, 13.97s/it] {'loss': 0.0326, 'learning_rate': 2.885e-05, 'epoch': 5.54} 42%|████▏ | 4235/10000 [16:38:33<22:22:03, 13.97s/it] 42%|████▏ | 4236/10000 [16:38:47<22:18:43, 13.94s/it] {'loss': 0.0387, 'learning_rate': 2.8845e-05, 'epoch': 5.54} 42%|████▏ | 4236/10000 [16:38:47<22:18:43, 13.94s/it] 42%|████▏ | 4237/10000 [16:39:01<22:17:14, 13.92s/it] {'loss': 0.0353, 'learning_rate': 2.8840000000000002e-05, 'epoch': 5.55} 42%|████▏ | 4237/10000 [16:39:01<22:17:14, 13.92s/it] 42%|████▏ | 4238/10000 [16:39:15<22:15:33, 13.91s/it] {'loss': 0.036, 'learning_rate': 2.8835e-05, 'epoch': 5.55} 42%|████▏ | 4238/10000 [16:39:15<22:15:33, 13.91s/it] 42%|████▏ | 4239/10000 [16:39:29<22:14:39, 13.90s/it] {'loss': 0.039, 'learning_rate': 2.883e-05, 'epoch': 5.55} 42%|████▏ | 4239/10000 [16:39:29<22:14:39, 13.90s/it] 42%|████▏ | 4240/10000 [16:39:43<22:16:01, 13.92s/it] {'loss': 0.0299, 'learning_rate': 2.8825000000000003e-05, 'epoch': 5.55} 42%|████▏ | 4240/10000 [16:39:43<22:16:01, 13.92s/it] 42%|████▏ | 4241/10000 [16:39:57<22:15:41, 13.92s/it] {'loss': 0.0413, 'learning_rate': 2.8820000000000002e-05, 'epoch': 5.55} 42%|████▏ | 4241/10000 [16:39:57<22:15:41, 13.92s/it] 42%|████▏ | 4242/10000 [16:40:11<22:14:29, 13.91s/it] {'loss': 0.0401, 'learning_rate': 2.8815000000000004e-05, 'epoch': 5.55} 42%|████▏ | 4242/10000 [16:40:11<22:14:29, 13.91s/it] 42%|████▏ | 4243/10000 [16:40:25<22:15:57, 13.92s/it] {'loss': 0.0356, 'learning_rate': 2.8810000000000004e-05, 'epoch': 5.55} 42%|████▏ | 4243/10000 [16:40:25<22:15:57, 13.92s/it] 42%|████▏ | 4244/10000 [16:40:38<22:12:44, 13.89s/it] {'loss': 0.0277, 'learning_rate': 2.8805e-05, 'epoch': 5.55} 42%|████▏ | 4244/10000 [16:40:38<22:12:44, 13.89s/it] 42%|████▏ | 4245/10000 [16:40:52<22:13:49, 13.91s/it] {'loss': 0.0328, 'learning_rate': 2.88e-05, 'epoch': 5.56} 42%|████▏ | 4245/10000 [16:40:52<22:13:49, 13.91s/it] 42%|████▏ | 4246/10000 [16:41:06<22:11:58, 13.89s/it] {'loss': 0.0359, 'learning_rate': 2.8795e-05, 'epoch': 5.56} 42%|████▏ | 4246/10000 [16:41:06<22:11:58, 13.89s/it] 42%|████▏ | 4247/10000 [16:41:20<22:13:33, 13.91s/it] {'loss': 0.0288, 'learning_rate': 2.879e-05, 'epoch': 5.56} 42%|████▏ | 4247/10000 [16:41:20<22:13:33, 13.91s/it] 42%|████▏ | 4248/10000 [16:41:34<22:13:39, 13.91s/it] {'loss': 0.0394, 'learning_rate': 2.8785e-05, 'epoch': 5.56} 42%|████▏ | 4248/10000 [16:41:34<22:13:39, 13.91s/it] 42%|████▏ | 4249/10000 [16:41:48<22:12:47, 13.91s/it] {'loss': 0.0385, 'learning_rate': 2.8780000000000002e-05, 'epoch': 5.56} 42%|████▏ | 4249/10000 [16:41:48<22:12:47, 13.91s/it] 42%|████▎ | 4250/10000 [16:42:02<22:10:11, 13.88s/it] {'loss': 0.0327, 'learning_rate': 2.8775e-05, 'epoch': 5.56} 42%|████▎ | 4250/10000 [16:42:02<22:10:11, 13.88s/it] 43%|████▎ | 4251/10000 [16:42:16<22:10:31, 13.89s/it] {'loss': 0.0317, 'learning_rate': 2.8770000000000004e-05, 'epoch': 5.56} 43%|████▎ | 4251/10000 [16:42:16<22:10:31, 13.89s/it] 43%|████▎ | 4252/10000 [16:42:29<22:05:10, 13.83s/it] {'loss': 0.0352, 'learning_rate': 2.8765000000000003e-05, 'epoch': 5.57} 43%|████▎ | 4252/10000 [16:42:29<22:05:10, 13.83s/it] 43%|████▎ | 4253/10000 [16:42:43<22:05:36, 13.84s/it] {'loss': 0.0358, 'learning_rate': 2.8760000000000002e-05, 'epoch': 5.57} 43%|████▎ | 4253/10000 [16:42:43<22:05:36, 13.84s/it] 43%|████▎ | 4254/10000 [16:42:57<22:12:03, 13.91s/it] {'loss': 0.031, 'learning_rate': 2.8754999999999998e-05, 'epoch': 5.57} 43%|████▎ | 4254/10000 [16:42:57<22:12:03, 13.91s/it] 43%|████▎ | 4255/10000 [16:43:11<22:13:23, 13.93s/it] {'loss': 0.037, 'learning_rate': 2.8749999999999997e-05, 'epoch': 5.57} 43%|████▎ | 4255/10000 [16:43:11<22:13:23, 13.93s/it] 43%|████▎ | 4256/10000 [16:43:25<22:13:13, 13.93s/it] {'loss': 0.0286, 'learning_rate': 2.8745e-05, 'epoch': 5.57} 43%|████▎ | 4256/10000 [16:43:25<22:13:13, 13.93s/it] 43%|████▎ | 4257/10000 [16:43:39<22:13:22, 13.93s/it] {'loss': 0.0281, 'learning_rate': 2.874e-05, 'epoch': 5.57} 43%|████▎ | 4257/10000 [16:43:39<22:13:22, 13.93s/it] 43%|████▎ | 4258/10000 [16:43:53<22:10:54, 13.91s/it] {'loss': 0.0336, 'learning_rate': 2.8735000000000002e-05, 'epoch': 5.57} 43%|████▎ | 4258/10000 [16:43:53<22:10:54, 13.91s/it] 43%|████▎ | 4259/10000 [16:44:07<22:14:49, 13.95s/it] {'loss': 0.043, 'learning_rate': 2.873e-05, 'epoch': 5.57} 43%|████▎ | 4259/10000 [16:44:07<22:14:49, 13.95s/it] 43%|████▎ | 4260/10000 [16:44:21<22:17:48, 13.98s/it] {'loss': 0.0465, 'learning_rate': 2.8725e-05, 'epoch': 5.58} 43%|████▎ | 4260/10000 [16:44:21<22:17:48, 13.98s/it] 43%|████▎ | 4261/10000 [16:44:35<22:12:09, 13.93s/it] {'loss': 0.0329, 'learning_rate': 2.8720000000000003e-05, 'epoch': 5.58} 43%|████▎ | 4261/10000 [16:44:35<22:12:09, 13.93s/it] 43%|████▎ | 4262/10000 [16:44:49<22:12:07, 13.93s/it] {'loss': 0.0303, 'learning_rate': 2.8715000000000002e-05, 'epoch': 5.58} 43%|████▎ | 4262/10000 [16:44:49<22:12:07, 13.93s/it] 43%|████▎ | 4263/10000 [16:45:03<22:16:12, 13.97s/it] {'loss': 0.0317, 'learning_rate': 2.8710000000000005e-05, 'epoch': 5.58} 43%|████▎ | 4263/10000 [16:45:03<22:16:12, 13.97s/it] 43%|████▎ | 4264/10000 [16:45:17<22:14:16, 13.96s/it] {'loss': 0.0367, 'learning_rate': 2.8705000000000004e-05, 'epoch': 5.58} 43%|████▎ | 4264/10000 [16:45:17<22:14:16, 13.96s/it] 43%|████▎ | 4265/10000 [16:45:31<22:11:03, 13.93s/it] {'loss': 0.0319, 'learning_rate': 2.87e-05, 'epoch': 5.58} 43%|████▎ | 4265/10000 [16:45:31<22:11:03, 13.93s/it] 43%|████▎ | 4266/10000 [16:45:45<22:12:18, 13.94s/it] {'loss': 0.0357, 'learning_rate': 2.8695e-05, 'epoch': 5.58} 43%|████▎ | 4266/10000 [16:45:45<22:12:18, 13.94s/it] 43%|████▎ | 4267/10000 [16:45:59<22:14:44, 13.97s/it] {'loss': 0.032, 'learning_rate': 2.869e-05, 'epoch': 5.59} 43%|████▎ | 4267/10000 [16:45:59<22:14:44, 13.97s/it] 43%|████▎ | 4268/10000 [16:46:13<22:10:56, 13.93s/it] {'loss': 0.0362, 'learning_rate': 2.8685e-05, 'epoch': 5.59} 43%|████▎ | 4268/10000 [16:46:13<22:10:56, 13.93s/it] 43%|████▎ | 4269/10000 [16:46:27<22:13:29, 13.96s/it] {'loss': 0.0404, 'learning_rate': 2.868e-05, 'epoch': 5.59} 43%|████▎ | 4269/10000 [16:46:27<22:13:29, 13.96s/it] 43%|████▎ | 4270/10000 [16:46:40<22:12:24, 13.95s/it] {'loss': 0.0378, 'learning_rate': 2.8675000000000002e-05, 'epoch': 5.59} 43%|████▎ | 4270/10000 [16:46:41<22:12:24, 13.95s/it] 43%|████▎ | 4271/10000 [16:46:54<22:08:01, 13.91s/it] {'loss': 0.0372, 'learning_rate': 2.867e-05, 'epoch': 5.59} 43%|████▎ | 4271/10000 [16:46:54<22:08:01, 13.91s/it] 43%|████▎ | 4272/10000 [16:47:08<22:07:16, 13.90s/it] {'loss': 0.0409, 'learning_rate': 2.8665000000000004e-05, 'epoch': 5.59} 43%|████▎ | 4272/10000 [16:47:08<22:07:16, 13.90s/it] 43%|████▎ | 4273/10000 [16:47:22<22:09:11, 13.93s/it] {'loss': 0.0411, 'learning_rate': 2.8660000000000003e-05, 'epoch': 5.59} 43%|████▎ | 4273/10000 [16:47:22<22:09:11, 13.93s/it] 43%|████▎ | 4274/10000 [16:47:36<22:10:38, 13.94s/it] {'loss': 0.0369, 'learning_rate': 2.8655000000000003e-05, 'epoch': 5.59} 43%|████▎ | 4274/10000 [16:47:36<22:10:38, 13.94s/it] 43%|████▎ | 4275/10000 [16:47:50<22:10:42, 13.95s/it] {'loss': 0.0318, 'learning_rate': 2.865e-05, 'epoch': 5.6} 43%|████▎ | 4275/10000 [16:47:50<22:10:42, 13.95s/it] 43%|████▎ | 4276/10000 [16:48:04<22:09:00, 13.93s/it] {'loss': 0.0324, 'learning_rate': 2.8645e-05, 'epoch': 5.6} 43%|████▎ | 4276/10000 [16:48:04<22:09:00, 13.93s/it] 43%|████▎ | 4277/10000 [16:48:18<22:06:23, 13.91s/it] {'loss': 0.0362, 'learning_rate': 2.864e-05, 'epoch': 5.6} 43%|████▎ | 4277/10000 [16:48:18<22:06:23, 13.91s/it] 43%|████▎ | 4278/10000 [16:48:32<22:07:07, 13.92s/it] {'loss': 0.0354, 'learning_rate': 2.8635e-05, 'epoch': 5.6} 43%|████▎ | 4278/10000 [16:48:32<22:07:07, 13.92s/it] 43%|████▎ | 4279/10000 [16:48:46<22:10:33, 13.95s/it] {'loss': 0.0293, 'learning_rate': 2.8630000000000002e-05, 'epoch': 5.6} 43%|████▎ | 4279/10000 [16:48:46<22:10:33, 13.95s/it] 43%|████▎ | 4280/10000 [16:49:00<22:08:18, 13.93s/it] {'loss': 0.0363, 'learning_rate': 2.8625e-05, 'epoch': 5.6} 43%|████▎ | 4280/10000 [16:49:00<22:08:18, 13.93s/it] 43%|████▎ | 4281/10000 [16:49:14<22:09:52, 13.95s/it] {'loss': 0.0366, 'learning_rate': 2.8620000000000004e-05, 'epoch': 5.6} 43%|████▎ | 4281/10000 [16:49:14<22:09:52, 13.95s/it] 43%|████▎ | 4282/10000 [16:49:28<22:11:14, 13.97s/it] {'loss': 0.036, 'learning_rate': 2.8615000000000003e-05, 'epoch': 5.6} 43%|████▎ | 4282/10000 [16:49:28<22:11:14, 13.97s/it] 43%|████▎ | 4283/10000 [16:49:42<22:09:09, 13.95s/it] {'loss': 0.0347, 'learning_rate': 2.8610000000000002e-05, 'epoch': 5.61} 43%|████▎ | 4283/10000 [16:49:42<22:09:09, 13.95s/it] 43%|████▎ | 4284/10000 [16:49:55<22:07:29, 13.93s/it] {'loss': 0.0381, 'learning_rate': 2.8605000000000005e-05, 'epoch': 5.61} 43%|████▎ | 4284/10000 [16:49:56<22:07:29, 13.93s/it] 43%|████▎ | 4285/10000 [16:50:09<22:06:14, 13.92s/it] {'loss': 0.0327, 'learning_rate': 2.86e-05, 'epoch': 5.61} 43%|████▎ | 4285/10000 [16:50:09<22:06:14, 13.92s/it] 43%|████▎ | 4286/10000 [16:50:23<22:06:24, 13.93s/it] {'loss': 0.037, 'learning_rate': 2.8595e-05, 'epoch': 5.61} 43%|████▎ | 4286/10000 [16:50:23<22:06:24, 13.93s/it] 43%|████▎ | 4287/10000 [16:50:37<22:05:18, 13.92s/it] {'loss': 0.0253, 'learning_rate': 2.859e-05, 'epoch': 5.61} 43%|████▎ | 4287/10000 [16:50:37<22:05:18, 13.92s/it] 43%|████▎ | 4288/10000 [16:50:51<22:05:10, 13.92s/it] {'loss': 0.0393, 'learning_rate': 2.8585e-05, 'epoch': 5.61} 43%|████▎ | 4288/10000 [16:50:51<22:05:10, 13.92s/it] 43%|████▎ | 4289/10000 [16:51:05<22:05:50, 13.93s/it] {'loss': 0.0318, 'learning_rate': 2.858e-05, 'epoch': 5.61} 43%|████▎ | 4289/10000 [16:51:05<22:05:50, 13.93s/it] 43%|████▎ | 4290/10000 [16:51:19<22:04:05, 13.91s/it] {'loss': 0.0411, 'learning_rate': 2.8575000000000003e-05, 'epoch': 5.62} 43%|████▎ | 4290/10000 [16:51:19<22:04:05, 13.91s/it] 43%|████▎ | 4291/10000 [16:51:33<22:03:08, 13.91s/it] {'loss': 0.0357, 'learning_rate': 2.8570000000000003e-05, 'epoch': 5.62} 43%|████▎ | 4291/10000 [16:51:33<22:03:08, 13.91s/it] 43%|████▎ | 4292/10000 [16:51:47<22:07:11, 13.95s/it] {'loss': 0.029, 'learning_rate': 2.8565000000000002e-05, 'epoch': 5.62} 43%|████▎ | 4292/10000 [16:51:47<22:07:11, 13.95s/it] 43%|████▎ | 4293/10000 [16:52:01<22:06:10, 13.94s/it] {'loss': 0.0413, 'learning_rate': 2.8560000000000004e-05, 'epoch': 5.62} 43%|████▎ | 4293/10000 [16:52:01<22:06:10, 13.94s/it] 43%|████▎ | 4294/10000 [16:52:15<22:06:12, 13.95s/it] {'loss': 0.034, 'learning_rate': 2.8555000000000004e-05, 'epoch': 5.62} 43%|████▎ | 4294/10000 [16:52:15<22:06:12, 13.95s/it] 43%|████▎ | 4295/10000 [16:52:29<22:07:54, 13.97s/it] {'loss': 0.0344, 'learning_rate': 2.855e-05, 'epoch': 5.62} 43%|████▎ | 4295/10000 [16:52:29<22:07:54, 13.97s/it] 43%|████▎ | 4296/10000 [16:52:43<22:07:57, 13.97s/it] {'loss': 0.0287, 'learning_rate': 2.8545e-05, 'epoch': 5.62} 43%|████▎ | 4296/10000 [16:52:43<22:07:57, 13.97s/it] 43%|████▎ | 4297/10000 [16:52:57<22:05:04, 13.94s/it] {'loss': 0.034, 'learning_rate': 2.854e-05, 'epoch': 5.62} 43%|████▎ | 4297/10000 [16:52:57<22:05:04, 13.94s/it] 43%|████▎ | 4298/10000 [16:53:11<22:02:05, 13.91s/it] {'loss': 0.0391, 'learning_rate': 2.8535e-05, 'epoch': 5.63} 43%|████▎ | 4298/10000 [16:53:11<22:02:05, 13.91s/it] 43%|████▎ | 4299/10000 [16:53:24<22:02:12, 13.92s/it] {'loss': 0.0314, 'learning_rate': 2.853e-05, 'epoch': 5.63} 43%|████▎ | 4299/10000 [16:53:24<22:02:12, 13.92s/it] 43%|████▎ | 4300/10000 [16:53:38<22:04:02, 13.94s/it] {'loss': 0.0391, 'learning_rate': 2.8525000000000002e-05, 'epoch': 5.63} 43%|████▎ | 4300/10000 [16:53:38<22:04:02, 13.94s/it] 43%|████▎ | 4301/10000 [16:53:52<22:01:06, 13.91s/it] {'loss': 0.0301, 'learning_rate': 2.852e-05, 'epoch': 5.63} 43%|████▎ | 4301/10000 [16:53:52<22:01:06, 13.91s/it] 43%|████▎ | 4302/10000 [16:54:06<21:59:58, 13.90s/it] {'loss': 0.0369, 'learning_rate': 2.8515000000000004e-05, 'epoch': 5.63} 43%|████▎ | 4302/10000 [16:54:06<21:59:58, 13.90s/it] 43%|████▎ | 4303/10000 [16:54:20<22:01:08, 13.91s/it] {'loss': 0.03, 'learning_rate': 2.8510000000000003e-05, 'epoch': 5.63} 43%|████▎ | 4303/10000 [16:54:20<22:01:08, 13.91s/it] 43%|████▎ | 4304/10000 [16:54:34<22:00:51, 13.91s/it] {'loss': 0.0359, 'learning_rate': 2.8505000000000002e-05, 'epoch': 5.63} 43%|████▎ | 4304/10000 [16:54:34<22:00:51, 13.91s/it] 43%|████▎ | 4305/10000 [16:54:48<21:58:44, 13.89s/it] {'loss': 0.0374, 'learning_rate': 2.8499999999999998e-05, 'epoch': 5.63} 43%|████▎ | 4305/10000 [16:54:48<21:58:44, 13.89s/it] 43%|████▎ | 4306/10000 [16:55:02<21:59:15, 13.90s/it] {'loss': 0.0454, 'learning_rate': 2.8495e-05, 'epoch': 5.64} 43%|████▎ | 4306/10000 [16:55:02<21:59:15, 13.90s/it] 43%|████▎ | 4307/10000 [16:55:16<21:59:33, 13.91s/it] {'loss': 0.03, 'learning_rate': 2.849e-05, 'epoch': 5.64} 43%|████▎ | 4307/10000 [16:55:16<21:59:33, 13.91s/it] 43%|████▎ | 4308/10000 [16:55:30<21:59:28, 13.91s/it] {'loss': 0.0321, 'learning_rate': 2.8485e-05, 'epoch': 5.64} 43%|████▎ | 4308/10000 [16:55:30<21:59:28, 13.91s/it] 43%|████▎ | 4309/10000 [16:55:43<21:54:52, 13.86s/it] {'loss': 0.0303, 'learning_rate': 2.8480000000000002e-05, 'epoch': 5.64} 43%|████▎ | 4309/10000 [16:55:43<21:54:52, 13.86s/it] 43%|████▎ | 4310/10000 [16:55:57<21:56:25, 13.88s/it] {'loss': 0.0333, 'learning_rate': 2.8475e-05, 'epoch': 5.64} 43%|████▎ | 4310/10000 [16:55:57<21:56:25, 13.88s/it] 43%|████▎ | 4311/10000 [16:56:11<21:55:53, 13.88s/it] {'loss': 0.031, 'learning_rate': 2.8470000000000004e-05, 'epoch': 5.64} 43%|████▎ | 4311/10000 [16:56:11<21:55:53, 13.88s/it] 43%|████▎ | 4312/10000 [16:56:25<21:59:51, 13.92s/it] {'loss': 0.037, 'learning_rate': 2.8465000000000003e-05, 'epoch': 5.64} 43%|████▎ | 4312/10000 [16:56:25<21:59:51, 13.92s/it] 43%|████▎ | 4313/10000 [16:56:39<22:02:02, 13.95s/it] {'loss': 0.0321, 'learning_rate': 2.8460000000000002e-05, 'epoch': 5.65} 43%|████▎ | 4313/10000 [16:56:39<22:02:02, 13.95s/it] 43%|████▎ | 4314/10000 [16:56:53<22:01:32, 13.95s/it] {'loss': 0.0368, 'learning_rate': 2.8455000000000005e-05, 'epoch': 5.65} 43%|████▎ | 4314/10000 [16:56:53<22:01:32, 13.95s/it] 43%|████▎ | 4315/10000 [16:57:07<22:02:58, 13.96s/it] {'loss': 0.0347, 'learning_rate': 2.845e-05, 'epoch': 5.65} 43%|████▎ | 4315/10000 [16:57:07<22:02:58, 13.96s/it] 43%|████▎ | 4316/10000 [16:57:21<21:59:35, 13.93s/it] {'loss': 0.0386, 'learning_rate': 2.8445e-05, 'epoch': 5.65} 43%|████▎ | 4316/10000 [16:57:21<21:59:35, 13.93s/it] 43%|████▎ | 4317/10000 [16:57:35<22:00:44, 13.94s/it] {'loss': 0.0474, 'learning_rate': 2.844e-05, 'epoch': 5.65} 43%|████▎ | 4317/10000 [16:57:35<22:00:44, 13.94s/it] 43%|████▎ | 4318/10000 [16:57:49<21:57:50, 13.92s/it] {'loss': 0.035, 'learning_rate': 2.8435e-05, 'epoch': 5.65} 43%|████▎ | 4318/10000 [16:57:49<21:57:50, 13.92s/it] 43%|████▎ | 4319/10000 [16:58:03<21:57:43, 13.92s/it] {'loss': 0.0376, 'learning_rate': 2.843e-05, 'epoch': 5.65} 43%|████▎ | 4319/10000 [16:58:03<21:57:43, 13.92s/it] 43%|████▎ | 4320/10000 [16:58:17<21:57:47, 13.92s/it] {'loss': 0.0295, 'learning_rate': 2.8425000000000003e-05, 'epoch': 5.65} 43%|████▎ | 4320/10000 [16:58:17<21:57:47, 13.92s/it] 43%|████▎ | 4321/10000 [16:58:31<22:01:35, 13.96s/it] {'loss': 0.0356, 'learning_rate': 2.8420000000000002e-05, 'epoch': 5.66} 43%|████▎ | 4321/10000 [16:58:31<22:01:35, 13.96s/it] 43%|████▎ | 4322/10000 [16:58:45<21:56:40, 13.91s/it] {'loss': 0.0456, 'learning_rate': 2.8415e-05, 'epoch': 5.66} 43%|████▎ | 4322/10000 [16:58:45<21:56:40, 13.91s/it] 43%|████▎ | 4323/10000 [16:58:58<21:55:43, 13.91s/it] {'loss': 0.0373, 'learning_rate': 2.8410000000000004e-05, 'epoch': 5.66} 43%|████▎ | 4323/10000 [16:58:58<21:55:43, 13.91s/it] 43%|████▎ | 4324/10000 [16:59:12<21:56:20, 13.91s/it] {'loss': 0.0303, 'learning_rate': 2.8405000000000003e-05, 'epoch': 5.66} 43%|████▎ | 4324/10000 [16:59:12<21:56:20, 13.91s/it] 43%|████▎ | 4325/10000 [16:59:26<21:57:30, 13.93s/it] {'loss': 0.0296, 'learning_rate': 2.84e-05, 'epoch': 5.66} 43%|████▎ | 4325/10000 [16:59:26<21:57:30, 13.93s/it] 43%|████▎ | 4326/10000 [16:59:40<21:54:24, 13.90s/it] {'loss': 0.0305, 'learning_rate': 2.8395e-05, 'epoch': 5.66} 43%|████▎ | 4326/10000 [16:59:40<21:54:24, 13.90s/it] 43%|████▎ | 4327/10000 [16:59:54<21:52:06, 13.88s/it] {'loss': 0.0294, 'learning_rate': 2.839e-05, 'epoch': 5.66} 43%|████▎ | 4327/10000 [16:59:54<21:52:06, 13.88s/it] 43%|████▎ | 4328/10000 [17:00:08<21:53:23, 13.89s/it] {'loss': 0.0328, 'learning_rate': 2.8385e-05, 'epoch': 5.66} 43%|████▎ | 4328/10000 [17:00:08<21:53:23, 13.89s/it] 43%|████▎ | 4329/10000 [17:00:22<21:49:26, 13.85s/it] {'loss': 0.0418, 'learning_rate': 2.8380000000000003e-05, 'epoch': 5.67} 43%|████▎ | 4329/10000 [17:00:22<21:49:26, 13.85s/it] 43%|████▎ | 4330/10000 [17:00:35<21:48:35, 13.85s/it] {'loss': 0.035, 'learning_rate': 2.8375000000000002e-05, 'epoch': 5.67} 43%|████▎ | 4330/10000 [17:00:36<21:48:35, 13.85s/it] 43%|████▎ | 4331/10000 [17:00:49<21:51:49, 13.88s/it] {'loss': 0.029, 'learning_rate': 2.837e-05, 'epoch': 5.67} 43%|████▎ | 4331/10000 [17:00:49<21:51:49, 13.88s/it] 43%|████▎ | 4332/10000 [17:01:03<21:49:57, 13.87s/it] {'loss': 0.0401, 'learning_rate': 2.8365000000000004e-05, 'epoch': 5.67} 43%|████▎ | 4332/10000 [17:01:03<21:49:57, 13.87s/it] 43%|████▎ | 4333/10000 [17:01:17<21:49:44, 13.87s/it] {'loss': 0.0372, 'learning_rate': 2.8360000000000003e-05, 'epoch': 5.67} 43%|████▎ | 4333/10000 [17:01:17<21:49:44, 13.87s/it] 43%|████▎ | 4334/10000 [17:01:31<21:49:43, 13.87s/it] {'loss': 0.042, 'learning_rate': 2.8355000000000002e-05, 'epoch': 5.67} 43%|████▎ | 4334/10000 [17:01:31<21:49:43, 13.87s/it] 43%|████▎ | 4335/10000 [17:01:45<21:49:28, 13.87s/it] {'loss': 0.0319, 'learning_rate': 2.8349999999999998e-05, 'epoch': 5.67} 43%|████▎ | 4335/10000 [17:01:45<21:49:28, 13.87s/it] 43%|████▎ | 4336/10000 [17:01:59<21:51:02, 13.89s/it] {'loss': 0.0316, 'learning_rate': 2.8345e-05, 'epoch': 5.68} 43%|████▎ | 4336/10000 [17:01:59<21:51:02, 13.89s/it] 43%|████▎ | 4337/10000 [17:02:13<21:52:03, 13.90s/it] {'loss': 0.0373, 'learning_rate': 2.834e-05, 'epoch': 5.68} 43%|████▎ | 4337/10000 [17:02:13<21:52:03, 13.90s/it] 43%|████▎ | 4338/10000 [17:02:27<21:53:01, 13.91s/it] {'loss': 0.0374, 'learning_rate': 2.8335e-05, 'epoch': 5.68} 43%|████▎ | 4338/10000 [17:02:27<21:53:01, 13.91s/it] 43%|████▎ | 4339/10000 [17:02:41<21:50:55, 13.89s/it] {'loss': 0.0363, 'learning_rate': 2.833e-05, 'epoch': 5.68} 43%|████▎ | 4339/10000 [17:02:41<21:50:55, 13.89s/it] 43%|████▎ | 4340/10000 [17:02:54<21:49:23, 13.88s/it] {'loss': 0.0365, 'learning_rate': 2.8325e-05, 'epoch': 5.68} 43%|████▎ | 4340/10000 [17:02:54<21:49:23, 13.88s/it] 43%|████▎ | 4341/10000 [17:03:08<21:49:29, 13.88s/it] {'loss': 0.0406, 'learning_rate': 2.8320000000000003e-05, 'epoch': 5.68} 43%|████▎ | 4341/10000 [17:03:08<21:49:29, 13.88s/it] 43%|████▎ | 4342/10000 [17:03:22<21:51:57, 13.91s/it] {'loss': 0.0433, 'learning_rate': 2.8315000000000002e-05, 'epoch': 5.68} 43%|████▎ | 4342/10000 [17:03:22<21:51:57, 13.91s/it] 43%|████▎ | 4343/10000 [17:03:36<21:50:41, 13.90s/it] {'loss': 0.038, 'learning_rate': 2.8310000000000002e-05, 'epoch': 5.68} 43%|████▎ | 4343/10000 [17:03:36<21:50:41, 13.90s/it] 43%|████▎ | 4344/10000 [17:03:50<21:55:10, 13.95s/it] {'loss': 0.0368, 'learning_rate': 2.8305000000000004e-05, 'epoch': 5.69} 43%|████▎ | 4344/10000 [17:03:50<21:55:10, 13.95s/it] 43%|████▎ | 4345/10000 [17:04:04<21:54:36, 13.95s/it] {'loss': 0.0322, 'learning_rate': 2.83e-05, 'epoch': 5.69} 43%|████▎ | 4345/10000 [17:04:04<21:54:36, 13.95s/it] 43%|████▎ | 4346/10000 [17:04:18<21:49:51, 13.90s/it] {'loss': 0.0394, 'learning_rate': 2.8295e-05, 'epoch': 5.69} 43%|████▎ | 4346/10000 [17:04:18<21:49:51, 13.90s/it] 43%|████▎ | 4347/10000 [17:04:32<21:50:17, 13.91s/it] {'loss': 0.0364, 'learning_rate': 2.829e-05, 'epoch': 5.69} 43%|████▎ | 4347/10000 [17:04:32<21:50:17, 13.91s/it] 43%|████▎ | 4348/10000 [17:04:46<21:52:48, 13.94s/it] {'loss': 0.0385, 'learning_rate': 2.8285e-05, 'epoch': 5.69} 43%|████▎ | 4348/10000 [17:04:46<21:52:48, 13.94s/it] 43%|████▎ | 4349/10000 [17:05:00<21:49:02, 13.90s/it] {'loss': 0.0403, 'learning_rate': 2.828e-05, 'epoch': 5.69} 43%|████▎ | 4349/10000 [17:05:00<21:49:02, 13.90s/it] 44%|████▎ | 4350/10000 [17:05:14<21:52:02, 13.93s/it] {'loss': 0.0353, 'learning_rate': 2.8275000000000003e-05, 'epoch': 5.69} 44%|████▎ | 4350/10000 [17:05:14<21:52:02, 13.93s/it] 44%|████▎ | 4351/10000 [17:05:28<21:50:02, 13.91s/it] {'loss': 0.0378, 'learning_rate': 2.8270000000000002e-05, 'epoch': 5.7} 44%|████▎ | 4351/10000 [17:05:28<21:50:02, 13.91s/it] 44%|████▎ | 4352/10000 [17:05:42<21:50:44, 13.92s/it] {'loss': 0.0355, 'learning_rate': 2.8265e-05, 'epoch': 5.7} 44%|████▎ | 4352/10000 [17:05:42<21:50:44, 13.92s/it] 44%|████▎ | 4353/10000 [17:05:56<21:52:47, 13.95s/it] {'loss': 0.0275, 'learning_rate': 2.8260000000000004e-05, 'epoch': 5.7} 44%|████▎ | 4353/10000 [17:05:56<21:52:47, 13.95s/it] 44%|████▎ | 4354/10000 [17:06:09<21:51:21, 13.94s/it] {'loss': 0.0382, 'learning_rate': 2.8255000000000003e-05, 'epoch': 5.7} 44%|████▎ | 4354/10000 [17:06:09<21:51:21, 13.94s/it] 44%|████▎ | 4355/10000 [17:06:23<21:51:44, 13.94s/it] {'loss': 0.0342, 'learning_rate': 2.825e-05, 'epoch': 5.7} 44%|████▎ | 4355/10000 [17:06:23<21:51:44, 13.94s/it] 44%|████▎ | 4356/10000 [17:06:37<21:49:29, 13.92s/it] {'loss': 0.0414, 'learning_rate': 2.8244999999999998e-05, 'epoch': 5.7} 44%|████▎ | 4356/10000 [17:06:37<21:49:29, 13.92s/it] 44%|████▎ | 4357/10000 [17:06:51<21:48:01, 13.91s/it] {'loss': 0.0412, 'learning_rate': 2.824e-05, 'epoch': 5.7} 44%|████▎ | 4357/10000 [17:06:51<21:48:01, 13.91s/it] 44%|████▎ | 4358/10000 [17:07:05<21:45:59, 13.89s/it] {'loss': 0.0374, 'learning_rate': 2.8235e-05, 'epoch': 5.7} 44%|████▎ | 4358/10000 [17:07:05<21:45:59, 13.89s/it] 44%|████▎ | 4359/10000 [17:07:19<21:49:55, 13.93s/it] {'loss': 0.0424, 'learning_rate': 2.8230000000000002e-05, 'epoch': 5.71} 44%|████▎ | 4359/10000 [17:07:19<21:49:55, 13.93s/it] 44%|████▎ | 4360/10000 [17:07:33<21:49:39, 13.93s/it] {'loss': 0.0361, 'learning_rate': 2.8225e-05, 'epoch': 5.71} 44%|████▎ | 4360/10000 [17:07:33<21:49:39, 13.93s/it] 44%|████▎ | 4361/10000 [17:07:47<21:48:04, 13.92s/it] {'loss': 0.0306, 'learning_rate': 2.822e-05, 'epoch': 5.71} 44%|████▎ | 4361/10000 [17:07:47<21:48:04, 13.92s/it] 44%|████▎ | 4362/10000 [17:08:01<21:49:00, 13.93s/it] {'loss': 0.0332, 'learning_rate': 2.8215000000000003e-05, 'epoch': 5.71} 44%|████▎ | 4362/10000 [17:08:01<21:49:00, 13.93s/it] 44%|████▎ | 4363/10000 [17:08:15<21:48:21, 13.93s/it] {'loss': 0.0421, 'learning_rate': 2.8210000000000003e-05, 'epoch': 5.71} 44%|████▎ | 4363/10000 [17:08:15<21:48:21, 13.93s/it] 44%|████▎ | 4364/10000 [17:08:29<21:49:36, 13.94s/it] {'loss': 0.0373, 'learning_rate': 2.8205000000000005e-05, 'epoch': 5.71} 44%|████▎ | 4364/10000 [17:08:29<21:49:36, 13.94s/it] 44%|████▎ | 4365/10000 [17:08:43<21:49:56, 13.95s/it] {'loss': 0.0372, 'learning_rate': 2.8199999999999998e-05, 'epoch': 5.71} 44%|████▎ | 4365/10000 [17:08:43<21:49:56, 13.95s/it] 44%|████▎ | 4366/10000 [17:08:57<21:51:21, 13.97s/it] {'loss': 0.0363, 'learning_rate': 2.8195e-05, 'epoch': 5.71} 44%|████▎ | 4366/10000 [17:08:57<21:51:21, 13.97s/it] 44%|████▎ | 4367/10000 [17:09:11<21:54:52, 14.01s/it] {'loss': 0.041, 'learning_rate': 2.819e-05, 'epoch': 5.72} 44%|████▎ | 4367/10000 [17:09:11<21:54:52, 14.01s/it] 44%|████▎ | 4368/10000 [17:09:25<21:51:44, 13.97s/it] {'loss': 0.0359, 'learning_rate': 2.8185e-05, 'epoch': 5.72} 44%|████▎ | 4368/10000 [17:09:25<21:51:44, 13.97s/it] 44%|████▎ | 4369/10000 [17:09:38<21:46:28, 13.92s/it] {'loss': 0.0364, 'learning_rate': 2.818e-05, 'epoch': 5.72} 44%|████▎ | 4369/10000 [17:09:38<21:46:28, 13.92s/it] 44%|████▎ | 4370/10000 [17:09:52<21:47:20, 13.93s/it] {'loss': 0.0384, 'learning_rate': 2.8175e-05, 'epoch': 5.72} 44%|████▎ | 4370/10000 [17:09:52<21:47:20, 13.93s/it] 44%|████▎ | 4371/10000 [17:10:06<21:43:16, 13.89s/it] {'loss': 0.0322, 'learning_rate': 2.8170000000000003e-05, 'epoch': 5.72} 44%|████▎ | 4371/10000 [17:10:06<21:43:16, 13.89s/it] 44%|████▎ | 4372/10000 [17:10:20<21:46:14, 13.93s/it] {'loss': 0.0407, 'learning_rate': 2.8165000000000002e-05, 'epoch': 5.72} 44%|████▎ | 4372/10000 [17:10:20<21:46:14, 13.93s/it] 44%|████▎ | 4373/10000 [17:10:34<21:46:21, 13.93s/it] {'loss': 0.04, 'learning_rate': 2.816e-05, 'epoch': 5.72} 44%|████▎ | 4373/10000 [17:10:34<21:46:21, 13.93s/it] 44%|████▎ | 4374/10000 [17:10:48<21:47:47, 13.95s/it] {'loss': 0.0272, 'learning_rate': 2.8155000000000004e-05, 'epoch': 5.73} 44%|████▎ | 4374/10000 [17:10:48<21:47:47, 13.95s/it] 44%|████▍ | 4375/10000 [17:11:02<21:43:44, 13.91s/it] {'loss': 0.0344, 'learning_rate': 2.815e-05, 'epoch': 5.73} 44%|████▍ | 4375/10000 [17:11:02<21:43:44, 13.91s/it] 44%|████▍ | 4376/10000 [17:11:16<21:40:57, 13.88s/it] {'loss': 0.0335, 'learning_rate': 2.8145e-05, 'epoch': 5.73} 44%|████▍ | 4376/10000 [17:11:16<21:40:57, 13.88s/it] 44%|████▍ | 4377/10000 [17:11:30<21:41:32, 13.89s/it] {'loss': 0.0347, 'learning_rate': 2.8139999999999998e-05, 'epoch': 5.73} 44%|████▍ | 4377/10000 [17:11:30<21:41:32, 13.89s/it] 44%|████▍ | 4378/10000 [17:11:44<21:47:22, 13.95s/it] {'loss': 0.0395, 'learning_rate': 2.8135e-05, 'epoch': 5.73} 44%|████▍ | 4378/10000 [17:11:44<21:47:22, 13.95s/it] 44%|████▍ | 4379/10000 [17:11:58<21:44:11, 13.92s/it] {'loss': 0.0398, 'learning_rate': 2.813e-05, 'epoch': 5.73} 44%|████▍ | 4379/10000 [17:11:58<21:44:11, 13.92s/it] 44%|████▍ | 4380/10000 [17:12:11<21:40:14, 13.88s/it] {'loss': 0.0339, 'learning_rate': 2.8125000000000003e-05, 'epoch': 5.73} 44%|████▍ | 4380/10000 [17:12:11<21:40:14, 13.88s/it] 44%|████▍ | 4381/10000 [17:12:25<21:41:03, 13.89s/it] {'loss': 0.0387, 'learning_rate': 2.8120000000000002e-05, 'epoch': 5.73} 44%|████▍ | 4381/10000 [17:12:25<21:41:03, 13.89s/it] 44%|████▍ | 4382/10000 [17:12:39<21:44:56, 13.94s/it] {'loss': 0.0327, 'learning_rate': 2.8115e-05, 'epoch': 5.74} 44%|████▍ | 4382/10000 [17:12:39<21:44:56, 13.94s/it] 44%|████▍ | 4383/10000 [17:12:53<21:43:22, 13.92s/it] {'loss': 0.0355, 'learning_rate': 2.8110000000000004e-05, 'epoch': 5.74} 44%|████▍ | 4383/10000 [17:12:53<21:43:22, 13.92s/it] 44%|████▍ | 4384/10000 [17:13:07<21:43:19, 13.92s/it] {'loss': 0.0327, 'learning_rate': 2.8105000000000003e-05, 'epoch': 5.74} 44%|████▍ | 4384/10000 [17:13:07<21:43:19, 13.92s/it] 44%|████▍ | 4385/10000 [17:13:21<21:41:17, 13.91s/it] {'loss': 0.0339, 'learning_rate': 2.8100000000000005e-05, 'epoch': 5.74} 44%|████▍ | 4385/10000 [17:13:21<21:41:17, 13.91s/it] 44%|████▍ | 4386/10000 [17:13:35<21:44:35, 13.94s/it] {'loss': 0.039, 'learning_rate': 2.8094999999999998e-05, 'epoch': 5.74} 44%|████▍ | 4386/10000 [17:13:35<21:44:35, 13.94s/it] 44%|████▍ | 4387/10000 [17:13:49<21:41:07, 13.91s/it] {'loss': 0.0341, 'learning_rate': 2.809e-05, 'epoch': 5.74} 44%|████▍ | 4387/10000 [17:13:49<21:41:07, 13.91s/it] 44%|████▍ | 4388/10000 [17:14:03<21:43:22, 13.93s/it] {'loss': 0.0375, 'learning_rate': 2.8085e-05, 'epoch': 5.74} 44%|████▍ | 4388/10000 [17:14:03<21:43:22, 13.93s/it] 44%|████▍ | 4389/10000 [17:14:17<21:46:38, 13.97s/it] {'loss': 0.0391, 'learning_rate': 2.8080000000000002e-05, 'epoch': 5.74} 44%|████▍ | 4389/10000 [17:14:17<21:46:38, 13.97s/it] 44%|████▍ | 4390/10000 [17:14:31<21:43:08, 13.94s/it] {'loss': 0.0409, 'learning_rate': 2.8075e-05, 'epoch': 5.75} 44%|████▍ | 4390/10000 [17:14:31<21:43:08, 13.94s/it] 44%|████▍ | 4391/10000 [17:14:45<21:41:05, 13.92s/it] {'loss': 0.0342, 'learning_rate': 2.807e-05, 'epoch': 5.75} 44%|████▍ | 4391/10000 [17:14:45<21:41:05, 13.92s/it] 44%|████▍ | 4392/10000 [17:14:59<21:40:59, 13.92s/it] {'loss': 0.0411, 'learning_rate': 2.8065000000000003e-05, 'epoch': 5.75} 44%|████▍ | 4392/10000 [17:14:59<21:40:59, 13.92s/it] 44%|████▍ | 4393/10000 [17:15:12<21:39:14, 13.90s/it] {'loss': 0.0338, 'learning_rate': 2.8060000000000002e-05, 'epoch': 5.75} 44%|████▍ | 4393/10000 [17:15:13<21:39:14, 13.90s/it] 44%|████▍ | 4394/10000 [17:15:26<21:38:30, 13.90s/it] {'loss': 0.0371, 'learning_rate': 2.8055000000000005e-05, 'epoch': 5.75} 44%|████▍ | 4394/10000 [17:15:26<21:38:30, 13.90s/it] 44%|████▍ | 4395/10000 [17:15:40<21:40:25, 13.92s/it] {'loss': 0.0297, 'learning_rate': 2.8050000000000004e-05, 'epoch': 5.75} 44%|████▍ | 4395/10000 [17:15:40<21:40:25, 13.92s/it] 44%|████▍ | 4396/10000 [17:15:54<21:36:34, 13.88s/it] {'loss': 0.0398, 'learning_rate': 2.8045e-05, 'epoch': 5.75} 44%|████▍ | 4396/10000 [17:15:54<21:36:34, 13.88s/it] 44%|████▍ | 4397/10000 [17:16:08<21:37:20, 13.89s/it] {'loss': 0.0404, 'learning_rate': 2.804e-05, 'epoch': 5.76} 44%|████▍ | 4397/10000 [17:16:08<21:37:20, 13.89s/it] 44%|████▍ | 4398/10000 [17:16:22<21:36:36, 13.89s/it] {'loss': 0.038, 'learning_rate': 2.8035000000000002e-05, 'epoch': 5.76} 44%|████▍ | 4398/10000 [17:16:22<21:36:36, 13.89s/it] 44%|████▍ | 4399/10000 [17:16:36<21:37:17, 13.90s/it] {'loss': 0.0365, 'learning_rate': 2.803e-05, 'epoch': 5.76} 44%|████▍ | 4399/10000 [17:16:36<21:37:17, 13.90s/it] 44%|████▍ | 4400/10000 [17:16:50<21:39:03, 13.92s/it] {'loss': 0.0366, 'learning_rate': 2.8025e-05, 'epoch': 5.76} 44%|████▍ | 4400/10000 [17:16:50<21:39:03, 13.92s/it] 44%|████▍ | 4401/10000 [17:17:04<21:37:23, 13.90s/it] {'loss': 0.0373, 'learning_rate': 2.8020000000000003e-05, 'epoch': 5.76} 44%|████▍ | 4401/10000 [17:17:04<21:37:23, 13.90s/it] 44%|████▍ | 4402/10000 [17:17:18<21:38:25, 13.92s/it] {'loss': 0.0337, 'learning_rate': 2.8015000000000002e-05, 'epoch': 5.76} 44%|████▍ | 4402/10000 [17:17:18<21:38:25, 13.92s/it] 44%|████▍ | 4403/10000 [17:17:32<21:39:01, 13.93s/it] {'loss': 0.0344, 'learning_rate': 2.8010000000000005e-05, 'epoch': 5.76} 44%|████▍ | 4403/10000 [17:17:32<21:39:01, 13.93s/it] 44%|████▍ | 4404/10000 [17:17:46<21:39:38, 13.93s/it] {'loss': 0.036, 'learning_rate': 2.8005000000000004e-05, 'epoch': 5.76} 44%|████▍ | 4404/10000 [17:17:46<21:39:38, 13.93s/it] 44%|████▍ | 4405/10000 [17:17:59<21:38:36, 13.93s/it] {'loss': 0.0302, 'learning_rate': 2.8000000000000003e-05, 'epoch': 5.77} 44%|████▍ | 4405/10000 [17:17:59<21:38:36, 13.93s/it] 44%|████▍ | 4406/10000 [17:18:13<21:36:38, 13.91s/it] {'loss': 0.0401, 'learning_rate': 2.7995e-05, 'epoch': 5.77} 44%|████▍ | 4406/10000 [17:18:13<21:36:38, 13.91s/it] 44%|████▍ | 4407/10000 [17:18:27<21:37:39, 13.92s/it] {'loss': 0.0326, 'learning_rate': 2.7989999999999998e-05, 'epoch': 5.77} 44%|████▍ | 4407/10000 [17:18:27<21:37:39, 13.92s/it] 44%|████▍ | 4408/10000 [17:18:41<21:35:09, 13.90s/it] {'loss': 0.034, 'learning_rate': 2.7985e-05, 'epoch': 5.77} 44%|████▍ | 4408/10000 [17:18:41<21:35:09, 13.90s/it] 44%|████▍ | 4409/10000 [17:18:55<21:33:20, 13.88s/it] {'loss': 0.0355, 'learning_rate': 2.798e-05, 'epoch': 5.77} 44%|████▍ | 4409/10000 [17:18:55<21:33:20, 13.88s/it] 44%|████▍ | 4410/10000 [17:19:09<21:34:58, 13.90s/it] {'loss': 0.0339, 'learning_rate': 2.7975000000000002e-05, 'epoch': 5.77} 44%|████▍ | 4410/10000 [17:19:09<21:34:58, 13.90s/it] 44%|████▍ | 4411/10000 [17:19:23<21:33:41, 13.89s/it] {'loss': 0.0394, 'learning_rate': 2.797e-05, 'epoch': 5.77} 44%|████▍ | 4411/10000 [17:19:23<21:33:41, 13.89s/it] 44%|████▍ | 4412/10000 [17:19:37<21:32:55, 13.88s/it] {'loss': 0.0332, 'learning_rate': 2.7965e-05, 'epoch': 5.77} 44%|████▍ | 4412/10000 [17:19:37<21:32:55, 13.88s/it] 44%|████▍ | 4413/10000 [17:19:50<21:29:53, 13.85s/it] {'loss': 0.0385, 'learning_rate': 2.7960000000000003e-05, 'epoch': 5.78} 44%|████▍ | 4413/10000 [17:19:50<21:29:53, 13.85s/it] 44%|████▍ | 4414/10000 [17:20:04<21:33:14, 13.89s/it] {'loss': 0.0306, 'learning_rate': 2.7955000000000003e-05, 'epoch': 5.78} 44%|████▍ | 4414/10000 [17:20:04<21:33:14, 13.89s/it] 44%|████▍ | 4415/10000 [17:20:18<21:33:16, 13.89s/it] {'loss': 0.04, 'learning_rate': 2.7950000000000005e-05, 'epoch': 5.78} 44%|████▍ | 4415/10000 [17:20:18<21:33:16, 13.89s/it] 44%|████▍ | 4416/10000 [17:20:32<21:34:13, 13.91s/it] {'loss': 0.0411, 'learning_rate': 2.7944999999999998e-05, 'epoch': 5.78} 44%|████▍ | 4416/10000 [17:20:32<21:34:13, 13.91s/it] 44%|████▍ | 4417/10000 [17:20:46<21:34:15, 13.91s/it] {'loss': 0.0376, 'learning_rate': 2.794e-05, 'epoch': 5.78} 44%|████▍ | 4417/10000 [17:20:46<21:34:15, 13.91s/it] 44%|████▍ | 4418/10000 [17:21:00<21:33:59, 13.91s/it] {'loss': 0.0371, 'learning_rate': 2.7935e-05, 'epoch': 5.78} 44%|████▍ | 4418/10000 [17:21:00<21:33:59, 13.91s/it] 44%|████▍ | 4419/10000 [17:21:14<21:32:36, 13.90s/it] {'loss': 0.0438, 'learning_rate': 2.7930000000000002e-05, 'epoch': 5.78} 44%|████▍ | 4419/10000 [17:21:14<21:32:36, 13.90s/it] 44%|████▍ | 4420/10000 [17:21:28<21:30:56, 13.88s/it] {'loss': 0.0352, 'learning_rate': 2.7925e-05, 'epoch': 5.79} 44%|████▍ | 4420/10000 [17:21:28<21:30:56, 13.88s/it] 44%|████▍ | 4421/10000 [17:21:42<21:29:45, 13.87s/it] {'loss': 0.0289, 'learning_rate': 2.792e-05, 'epoch': 5.79} 44%|████▍ | 4421/10000 [17:21:42<21:29:45, 13.87s/it] 44%|████▍ | 4422/10000 [17:21:56<21:32:27, 13.90s/it] {'loss': 0.0315, 'learning_rate': 2.7915000000000003e-05, 'epoch': 5.79} 44%|████▍ | 4422/10000 [17:21:56<21:32:27, 13.90s/it] 44%|████▍ | 4423/10000 [17:22:09<21:31:29, 13.89s/it] {'loss': 0.0415, 'learning_rate': 2.7910000000000002e-05, 'epoch': 5.79} 44%|████▍ | 4423/10000 [17:22:09<21:31:29, 13.89s/it] 44%|████▍ | 4424/10000 [17:22:23<21:35:12, 13.94s/it] {'loss': 0.0374, 'learning_rate': 2.7905000000000005e-05, 'epoch': 5.79} 44%|████▍ | 4424/10000 [17:22:24<21:35:12, 13.94s/it] 44%|████▍ | 4425/10000 [17:22:37<21:33:43, 13.92s/it] {'loss': 0.0391, 'learning_rate': 2.7900000000000004e-05, 'epoch': 5.79} 44%|████▍ | 4425/10000 [17:22:37<21:33:43, 13.92s/it] 44%|████▍ | 4426/10000 [17:22:51<21:37:18, 13.96s/it] {'loss': 0.0405, 'learning_rate': 2.7895e-05, 'epoch': 5.79} 44%|████▍ | 4426/10000 [17:22:51<21:37:18, 13.96s/it] 44%|████▍ | 4427/10000 [17:23:05<21:36:49, 13.96s/it] {'loss': 0.034, 'learning_rate': 2.789e-05, 'epoch': 5.79} 44%|████▍ | 4427/10000 [17:23:05<21:36:49, 13.96s/it] 44%|████▍ | 4428/10000 [17:23:19<21:33:02, 13.92s/it] {'loss': 0.0327, 'learning_rate': 2.7885e-05, 'epoch': 5.8} 44%|████▍ | 4428/10000 [17:23:19<21:33:02, 13.92s/it] 44%|████▍ | 4429/10000 [17:23:33<21:33:04, 13.93s/it] {'loss': 0.0379, 'learning_rate': 2.788e-05, 'epoch': 5.8} 44%|████▍ | 4429/10000 [17:23:33<21:33:04, 13.93s/it] 44%|████▍ | 4430/10000 [17:23:47<21:35:56, 13.96s/it] {'loss': 0.0413, 'learning_rate': 2.7875e-05, 'epoch': 5.8} 44%|████▍ | 4430/10000 [17:23:47<21:35:56, 13.96s/it] 44%|████▍ | 4431/10000 [17:24:01<21:35:29, 13.96s/it] {'loss': 0.0264, 'learning_rate': 2.7870000000000003e-05, 'epoch': 5.8} 44%|████▍ | 4431/10000 [17:24:01<21:35:29, 13.96s/it] 44%|████▍ | 4432/10000 [17:24:15<21:39:05, 14.00s/it] {'loss': 0.0368, 'learning_rate': 2.7865000000000002e-05, 'epoch': 5.8} 44%|████▍ | 4432/10000 [17:24:15<21:39:05, 14.00s/it] 44%|████▍ | 4433/10000 [17:24:29<21:35:44, 13.97s/it] {'loss': 0.0342, 'learning_rate': 2.7860000000000004e-05, 'epoch': 5.8} 44%|████▍ | 4433/10000 [17:24:29<21:35:44, 13.97s/it] 44%|████▍ | 4434/10000 [17:24:43<21:33:55, 13.95s/it] {'loss': 0.0297, 'learning_rate': 2.7855000000000004e-05, 'epoch': 5.8} 44%|████▍ | 4434/10000 [17:24:43<21:33:55, 13.95s/it] 44%|████▍ | 4435/10000 [17:24:57<21:31:40, 13.93s/it] {'loss': 0.0374, 'learning_rate': 2.7850000000000003e-05, 'epoch': 5.8} 44%|████▍ | 4435/10000 [17:24:57<21:31:40, 13.93s/it] 44%|████▍ | 4436/10000 [17:25:11<21:32:02, 13.93s/it] {'loss': 0.0416, 'learning_rate': 2.7845e-05, 'epoch': 5.81} 44%|████▍ | 4436/10000 [17:25:11<21:32:02, 13.93s/it] 44%|████▍ | 4437/10000 [17:25:25<21:33:07, 13.95s/it] {'loss': 0.0517, 'learning_rate': 2.7839999999999998e-05, 'epoch': 5.81} 44%|████▍ | 4437/10000 [17:25:25<21:33:07, 13.95s/it] 44%|████▍ | 4438/10000 [17:25:39<21:34:39, 13.97s/it] {'loss': 0.0358, 'learning_rate': 2.7835e-05, 'epoch': 5.81} 44%|████▍ | 4438/10000 [17:25:39<21:34:39, 13.97s/it] 44%|████▍ | 4439/10000 [17:25:53<21:33:44, 13.96s/it] {'loss': 0.04, 'learning_rate': 2.783e-05, 'epoch': 5.81} 44%|████▍ | 4439/10000 [17:25:53<21:33:44, 13.96s/it] 44%|████▍ | 4440/10000 [17:26:07<21:34:39, 13.97s/it] {'loss': 0.0341, 'learning_rate': 2.7825000000000002e-05, 'epoch': 5.81} 44%|████▍ | 4440/10000 [17:26:07<21:34:39, 13.97s/it] 44%|████▍ | 4441/10000 [17:26:21<21:29:46, 13.92s/it] {'loss': 0.0334, 'learning_rate': 2.782e-05, 'epoch': 5.81} 44%|████▍ | 4441/10000 [17:26:21<21:29:46, 13.92s/it] 44%|████▍ | 4442/10000 [17:26:35<21:30:37, 13.93s/it] {'loss': 0.0441, 'learning_rate': 2.7815e-05, 'epoch': 5.81} 44%|████▍ | 4442/10000 [17:26:35<21:30:37, 13.93s/it] 44%|████▍ | 4443/10000 [17:26:48<21:28:49, 13.92s/it] {'loss': 0.045, 'learning_rate': 2.7810000000000003e-05, 'epoch': 5.82} 44%|████▍ | 4443/10000 [17:26:48<21:28:49, 13.92s/it] 44%|████▍ | 4444/10000 [17:27:02<21:26:17, 13.89s/it] {'loss': 0.0428, 'learning_rate': 2.7805000000000002e-05, 'epoch': 5.82} 44%|████▍ | 4444/10000 [17:27:02<21:26:17, 13.89s/it] 44%|████▍ | 4445/10000 [17:27:16<21:25:48, 13.89s/it] {'loss': 0.035, 'learning_rate': 2.7800000000000005e-05, 'epoch': 5.82} 44%|████▍ | 4445/10000 [17:27:16<21:25:48, 13.89s/it] 44%|████▍ | 4446/10000 [17:27:30<21:26:48, 13.90s/it] {'loss': 0.0445, 'learning_rate': 2.7794999999999997e-05, 'epoch': 5.82} 44%|████▍ | 4446/10000 [17:27:30<21:26:48, 13.90s/it] 44%|████▍ | 4447/10000 [17:27:44<21:28:01, 13.92s/it] {'loss': 0.035, 'learning_rate': 2.779e-05, 'epoch': 5.82} 44%|████▍ | 4447/10000 [17:27:44<21:28:01, 13.92s/it] 44%|████▍ | 4448/10000 [17:27:58<21:28:09, 13.92s/it] {'loss': 0.0395, 'learning_rate': 2.7785e-05, 'epoch': 5.82} 44%|████▍ | 4448/10000 [17:27:58<21:28:09, 13.92s/it] 44%|████▍ | 4449/10000 [17:28:12<21:25:51, 13.90s/it] {'loss': 0.0352, 'learning_rate': 2.778e-05, 'epoch': 5.82} 44%|████▍ | 4449/10000 [17:28:12<21:25:51, 13.90s/it] 44%|████▍ | 4450/10000 [17:28:26<21:26:30, 13.91s/it] {'loss': 0.036, 'learning_rate': 2.7775e-05, 'epoch': 5.82} 44%|████▍ | 4450/10000 [17:28:26<21:26:30, 13.91s/it] 45%|████▍ | 4451/10000 [17:28:40<21:26:36, 13.91s/it] {'loss': 0.0406, 'learning_rate': 2.777e-05, 'epoch': 5.83} 45%|████▍ | 4451/10000 [17:28:40<21:26:36, 13.91s/it] 45%|████▍ | 4452/10000 [17:28:54<21:28:24, 13.93s/it] {'loss': 0.0385, 'learning_rate': 2.7765000000000003e-05, 'epoch': 5.83} 45%|████▍ | 4452/10000 [17:28:54<21:28:24, 13.93s/it] 45%|████▍ | 4453/10000 [17:29:08<21:30:12, 13.96s/it] {'loss': 0.0369, 'learning_rate': 2.7760000000000002e-05, 'epoch': 5.83} 45%|████▍ | 4453/10000 [17:29:08<21:30:12, 13.96s/it] 45%|████▍ | 4454/10000 [17:29:22<21:27:10, 13.93s/it] {'loss': 0.0333, 'learning_rate': 2.7755000000000004e-05, 'epoch': 5.83} 45%|████▍ | 4454/10000 [17:29:22<21:27:10, 13.93s/it] 45%|████▍ | 4455/10000 [17:29:36<21:29:42, 13.96s/it] {'loss': 0.0433, 'learning_rate': 2.7750000000000004e-05, 'epoch': 5.83} 45%|████▍ | 4455/10000 [17:29:36<21:29:42, 13.96s/it] 45%|████▍ | 4456/10000 [17:29:50<21:31:23, 13.98s/it] {'loss': 0.0365, 'learning_rate': 2.7745e-05, 'epoch': 5.83} 45%|████▍ | 4456/10000 [17:29:50<21:31:23, 13.98s/it] 45%|████▍ | 4457/10000 [17:30:03<21:29:30, 13.96s/it] {'loss': 0.028, 'learning_rate': 2.774e-05, 'epoch': 5.83} 45%|████▍ | 4457/10000 [17:30:04<21:29:30, 13.96s/it] 45%|████▍ | 4458/10000 [17:30:17<21:26:18, 13.93s/it] {'loss': 0.0383, 'learning_rate': 2.7735e-05, 'epoch': 5.84} 45%|████▍ | 4458/10000 [17:30:17<21:26:18, 13.93s/it] 45%|████▍ | 4459/10000 [17:30:31<21:27:59, 13.95s/it] {'loss': 0.0399, 'learning_rate': 2.773e-05, 'epoch': 5.84} 45%|████▍ | 4459/10000 [17:30:31<21:27:59, 13.95s/it] 45%|████▍ | 4460/10000 [17:30:45<21:28:57, 13.96s/it] {'loss': 0.0279, 'learning_rate': 2.7725e-05, 'epoch': 5.84} 45%|████▍ | 4460/10000 [17:30:45<21:28:57, 13.96s/it] 45%|████▍ | 4461/10000 [17:30:59<21:27:01, 13.94s/it] {'loss': 0.0369, 'learning_rate': 2.7720000000000002e-05, 'epoch': 5.84} 45%|████▍ | 4461/10000 [17:30:59<21:27:01, 13.94s/it] 45%|████▍ | 4462/10000 [17:31:13<21:29:11, 13.97s/it] {'loss': 0.0313, 'learning_rate': 2.7715e-05, 'epoch': 5.84} 45%|████▍ | 4462/10000 [17:31:13<21:29:11, 13.97s/it] 45%|████▍ | 4463/10000 [17:31:27<21:23:57, 13.91s/it] {'loss': 0.0393, 'learning_rate': 2.7710000000000004e-05, 'epoch': 5.84} 45%|████▍ | 4463/10000 [17:31:27<21:23:57, 13.91s/it] 45%|████▍ | 4464/10000 [17:31:41<21:23:30, 13.91s/it] {'loss': 0.0412, 'learning_rate': 2.7705000000000003e-05, 'epoch': 5.84} 45%|████▍ | 4464/10000 [17:31:41<21:23:30, 13.91s/it] 45%|████▍ | 4465/10000 [17:31:55<21:21:12, 13.89s/it] {'loss': 0.0406, 'learning_rate': 2.7700000000000002e-05, 'epoch': 5.84} 45%|████▍ | 4465/10000 [17:31:55<21:21:12, 13.89s/it] 45%|████▍ | 4466/10000 [17:32:08<21:16:12, 13.84s/it] {'loss': 0.0343, 'learning_rate': 2.7694999999999998e-05, 'epoch': 5.85} 45%|████▍ | 4466/10000 [17:32:09<21:16:12, 13.84s/it] 45%|████▍ | 4467/10000 [17:32:22<21:18:26, 13.86s/it] {'loss': 0.0375, 'learning_rate': 2.769e-05, 'epoch': 5.85} 45%|████▍ | 4467/10000 [17:32:22<21:18:26, 13.86s/it] 45%|████▍ | 4468/10000 [17:32:36<21:24:21, 13.93s/it] {'loss': 0.0423, 'learning_rate': 2.7685e-05, 'epoch': 5.85} 45%|████▍ | 4468/10000 [17:32:37<21:24:21, 13.93s/it] 45%|████▍ | 4469/10000 [17:32:50<21:19:48, 13.88s/it] {'loss': 0.039, 'learning_rate': 2.768e-05, 'epoch': 5.85} 45%|████▍ | 4469/10000 [17:32:50<21:19:48, 13.88s/it] 45%|████▍ | 4470/10000 [17:33:04<21:17:18, 13.86s/it] {'loss': 0.0284, 'learning_rate': 2.7675000000000002e-05, 'epoch': 5.85} 45%|████▍ | 4470/10000 [17:33:04<21:17:18, 13.86s/it] 45%|████▍ | 4471/10000 [17:33:18<21:18:13, 13.87s/it] {'loss': 0.0379, 'learning_rate': 2.767e-05, 'epoch': 5.85} 45%|████▍ | 4471/10000 [17:33:18<21:18:13, 13.87s/it] 45%|████▍ | 4472/10000 [17:33:32<21:17:22, 13.86s/it] {'loss': 0.0381, 'learning_rate': 2.7665000000000004e-05, 'epoch': 5.85} 45%|████▍ | 4472/10000 [17:33:32<21:17:22, 13.86s/it] 45%|████▍ | 4473/10000 [17:33:46<21:18:21, 13.88s/it] {'loss': 0.0322, 'learning_rate': 2.7660000000000003e-05, 'epoch': 5.85} 45%|████▍ | 4473/10000 [17:33:46<21:18:21, 13.88s/it] 45%|████▍ | 4474/10000 [17:34:00<21:17:46, 13.87s/it] {'loss': 0.0364, 'learning_rate': 2.7655000000000002e-05, 'epoch': 5.86} 45%|████▍ | 4474/10000 [17:34:00<21:17:46, 13.87s/it] 45%|████▍ | 4475/10000 [17:34:13<21:17:26, 13.87s/it] {'loss': 0.0423, 'learning_rate': 2.7650000000000005e-05, 'epoch': 5.86} 45%|████▍ | 4475/10000 [17:34:14<21:17:26, 13.87s/it] 45%|████▍ | 4476/10000 [17:34:27<21:17:39, 13.88s/it] {'loss': 0.0393, 'learning_rate': 2.7644999999999997e-05, 'epoch': 5.86} 45%|████▍ | 4476/10000 [17:34:27<21:17:39, 13.88s/it] 45%|████▍ | 4477/10000 [17:34:41<21:17:11, 13.88s/it] {'loss': 0.0394, 'learning_rate': 2.764e-05, 'epoch': 5.86} 45%|████▍ | 4477/10000 [17:34:41<21:17:11, 13.88s/it] 45%|████▍ | 4478/10000 [17:34:55<21:15:33, 13.86s/it] {'loss': 0.0349, 'learning_rate': 2.7635e-05, 'epoch': 5.86} 45%|████▍ | 4478/10000 [17:34:55<21:15:33, 13.86s/it] 45%|████▍ | 4479/10000 [17:35:09<21:16:40, 13.87s/it] {'loss': 0.0374, 'learning_rate': 2.763e-05, 'epoch': 5.86} 45%|████▍ | 4479/10000 [17:35:09<21:16:40, 13.87s/it] 45%|████▍ | 4480/10000 [17:35:23<21:19:08, 13.90s/it] {'loss': 0.0347, 'learning_rate': 2.7625e-05, 'epoch': 5.86} 45%|████▍ | 4480/10000 [17:35:23<21:19:08, 13.90s/it] 45%|████▍ | 4481/10000 [17:35:37<21:15:55, 13.87s/it] {'loss': 0.0494, 'learning_rate': 2.762e-05, 'epoch': 5.87} 45%|████▍ | 4481/10000 [17:35:37<21:15:55, 13.87s/it] 45%|████▍ | 4482/10000 [17:35:51<21:18:32, 13.90s/it] {'loss': 0.038, 'learning_rate': 2.7615000000000002e-05, 'epoch': 5.87} 45%|████▍ | 4482/10000 [17:35:51<21:18:32, 13.90s/it] 45%|████▍ | 4483/10000 [17:36:05<21:19:01, 13.91s/it] {'loss': 0.044, 'learning_rate': 2.761e-05, 'epoch': 5.87} 45%|████▍ | 4483/10000 [17:36:05<21:19:01, 13.91s/it] 45%|████▍ | 4484/10000 [17:36:18<21:16:54, 13.89s/it] {'loss': 0.0334, 'learning_rate': 2.7605000000000004e-05, 'epoch': 5.87} 45%|████▍ | 4484/10000 [17:36:19<21:16:54, 13.89s/it] 45%|████▍ | 4485/10000 [17:36:32<21:16:28, 13.89s/it] {'loss': 0.0382, 'learning_rate': 2.7600000000000003e-05, 'epoch': 5.87} 45%|████▍ | 4485/10000 [17:36:32<21:16:28, 13.89s/it] 45%|████▍ | 4486/10000 [17:36:46<21:16:08, 13.89s/it] {'loss': 0.0364, 'learning_rate': 2.7595e-05, 'epoch': 5.87} 45%|████▍ | 4486/10000 [17:36:46<21:16:08, 13.89s/it] 45%|████▍ | 4487/10000 [17:37:00<21:13:57, 13.86s/it] {'loss': 0.0405, 'learning_rate': 2.759e-05, 'epoch': 5.87} 45%|████▍ | 4487/10000 [17:37:00<21:13:57, 13.86s/it] 45%|████▍ | 4488/10000 [17:37:14<21:17:21, 13.90s/it] {'loss': 0.0398, 'learning_rate': 2.7585e-05, 'epoch': 5.87} 45%|████▍ | 4488/10000 [17:37:14<21:17:21, 13.90s/it] 45%|████▍ | 4489/10000 [17:37:28<21:17:20, 13.91s/it] {'loss': 0.0378, 'learning_rate': 2.758e-05, 'epoch': 5.88} 45%|████▍ | 4489/10000 [17:37:28<21:17:20, 13.91s/it] 45%|████▍ | 4490/10000 [17:37:42<21:16:32, 13.90s/it] {'loss': 0.0418, 'learning_rate': 2.7575e-05, 'epoch': 5.88} 45%|████▍ | 4490/10000 [17:37:42<21:16:32, 13.90s/it] 45%|████▍ | 4491/10000 [17:37:56<21:10:29, 13.84s/it] {'loss': 0.041, 'learning_rate': 2.7570000000000002e-05, 'epoch': 5.88} 45%|████▍ | 4491/10000 [17:37:56<21:10:29, 13.84s/it] 45%|████▍ | 4492/10000 [17:38:09<21:11:39, 13.85s/it] {'loss': 0.0447, 'learning_rate': 2.7565e-05, 'epoch': 5.88} 45%|████▍ | 4492/10000 [17:38:09<21:11:39, 13.85s/it] 45%|████▍ | 4493/10000 [17:38:23<21:15:32, 13.90s/it] {'loss': 0.0393, 'learning_rate': 2.7560000000000004e-05, 'epoch': 5.88} 45%|████▍ | 4493/10000 [17:38:23<21:15:32, 13.90s/it] 45%|████▍ | 4494/10000 [17:38:37<21:14:27, 13.89s/it] {'loss': 0.0407, 'learning_rate': 2.7555000000000003e-05, 'epoch': 5.88} 45%|████▍ | 4494/10000 [17:38:37<21:14:27, 13.89s/it] 45%|████▍ | 4495/10000 [17:38:51<21:14:10, 13.89s/it] {'loss': 0.0437, 'learning_rate': 2.7550000000000002e-05, 'epoch': 5.88} 45%|████▍ | 4495/10000 [17:38:51<21:14:10, 13.89s/it] 45%|████▍ | 4496/10000 [17:39:05<21:15:43, 13.91s/it] {'loss': 0.0343, 'learning_rate': 2.7544999999999998e-05, 'epoch': 5.88} 45%|████▍ | 4496/10000 [17:39:05<21:15:43, 13.91s/it] 45%|████▍ | 4497/10000 [17:39:19<21:16:13, 13.91s/it] {'loss': 0.0413, 'learning_rate': 2.754e-05, 'epoch': 5.89} 45%|████▍ | 4497/10000 [17:39:19<21:16:13, 13.91s/it] 45%|████▍ | 4498/10000 [17:39:33<21:16:14, 13.92s/it] {'loss': 0.0353, 'learning_rate': 2.7535e-05, 'epoch': 5.89} 45%|████▍ | 4498/10000 [17:39:33<21:16:14, 13.92s/it] 45%|████▍ | 4499/10000 [17:39:47<21:16:23, 13.92s/it] {'loss': 0.0444, 'learning_rate': 2.753e-05, 'epoch': 5.89} 45%|████▍ | 4499/10000 [17:39:47<21:16:23, 13.92s/it] 45%|████▌ | 4500/10000 [17:40:01<21:15:58, 13.92s/it] {'loss': 0.0326, 'learning_rate': 2.7525e-05, 'epoch': 5.89} 45%|████▌ | 4500/10000 [17:40:01<21:15:58, 13.92s/it] 45%|████▌ | 4501/10000 [17:40:15<21:15:10, 13.91s/it] {'loss': 0.0365, 'learning_rate': 2.752e-05, 'epoch': 5.89} 45%|████▌ | 4501/10000 [17:40:15<21:15:10, 13.91s/it] 45%|████▌ | 4502/10000 [17:40:29<21:15:04, 13.92s/it] {'loss': 0.0312, 'learning_rate': 2.7515000000000003e-05, 'epoch': 5.89} 45%|████▌ | 4502/10000 [17:40:29<21:15:04, 13.92s/it] 45%|████▌ | 4503/10000 [17:40:43<21:14:12, 13.91s/it] {'loss': 0.0346, 'learning_rate': 2.7510000000000003e-05, 'epoch': 5.89} 45%|████▌ | 4503/10000 [17:40:43<21:14:12, 13.91s/it] 45%|████▌ | 4504/10000 [17:40:56<21:15:01, 13.92s/it] {'loss': 0.0325, 'learning_rate': 2.7505000000000002e-05, 'epoch': 5.9} 45%|████▌ | 4504/10000 [17:40:57<21:15:01, 13.92s/it] 45%|████▌ | 4505/10000 [17:41:10<21:15:04, 13.92s/it] {'loss': 0.0413, 'learning_rate': 2.7500000000000004e-05, 'epoch': 5.9} 45%|████▌ | 4505/10000 [17:41:10<21:15:04, 13.92s/it] 45%|████▌ | 4506/10000 [17:41:24<21:14:34, 13.92s/it] {'loss': 0.0427, 'learning_rate': 2.7495000000000004e-05, 'epoch': 5.9} 45%|████▌ | 4506/10000 [17:41:24<21:14:34, 13.92s/it] 45%|████▌ | 4507/10000 [17:41:38<21:14:33, 13.92s/it] {'loss': 0.0478, 'learning_rate': 2.749e-05, 'epoch': 5.9} 45%|████▌ | 4507/10000 [17:41:38<21:14:33, 13.92s/it] 45%|████▌ | 4508/10000 [17:41:52<21:16:54, 13.95s/it] {'loss': 0.0357, 'learning_rate': 2.7485e-05, 'epoch': 5.9} 45%|████▌ | 4508/10000 [17:41:52<21:16:54, 13.95s/it] 45%|████▌ | 4509/10000 [17:42:06<21:14:45, 13.93s/it] {'loss': 0.0415, 'learning_rate': 2.748e-05, 'epoch': 5.9} 45%|████▌ | 4509/10000 [17:42:06<21:14:45, 13.93s/it] 45%|████▌ | 4510/10000 [17:42:20<21:18:20, 13.97s/it] {'loss': 0.0376, 'learning_rate': 2.7475e-05, 'epoch': 5.9} 45%|████▌ | 4510/10000 [17:42:20<21:18:20, 13.97s/it] 45%|████▌ | 4511/10000 [17:42:34<21:17:14, 13.96s/it] {'loss': 0.034, 'learning_rate': 2.7470000000000003e-05, 'epoch': 5.9} 45%|████▌ | 4511/10000 [17:42:34<21:17:14, 13.96s/it] 45%|████▌ | 4512/10000 [17:42:48<21:18:23, 13.98s/it] {'loss': 0.0384, 'learning_rate': 2.7465000000000002e-05, 'epoch': 5.91} 45%|████▌ | 4512/10000 [17:42:48<21:18:23, 13.98s/it] 45%|████▌ | 4513/10000 [17:43:02<21:15:56, 13.95s/it] {'loss': 0.0328, 'learning_rate': 2.746e-05, 'epoch': 5.91} 45%|████▌ | 4513/10000 [17:43:02<21:15:56, 13.95s/it] 45%|████▌ | 4514/10000 [17:43:16<21:16:21, 13.96s/it] {'loss': 0.0388, 'learning_rate': 2.7455000000000004e-05, 'epoch': 5.91} 45%|████▌ | 4514/10000 [17:43:16<21:16:21, 13.96s/it] 45%|████▌ | 4515/10000 [17:43:30<21:17:40, 13.98s/it] {'loss': 0.0397, 'learning_rate': 2.7450000000000003e-05, 'epoch': 5.91} 45%|████▌ | 4515/10000 [17:43:30<21:17:40, 13.98s/it] 45%|████▌ | 4516/10000 [17:43:44<21:15:01, 13.95s/it] {'loss': 0.0414, 'learning_rate': 2.7445000000000002e-05, 'epoch': 5.91} 45%|████▌ | 4516/10000 [17:43:44<21:15:01, 13.95s/it] 45%|████▌ | 4517/10000 [17:43:58<21:12:52, 13.93s/it] {'loss': 0.035, 'learning_rate': 2.7439999999999998e-05, 'epoch': 5.91} 45%|████▌ | 4517/10000 [17:43:58<21:12:52, 13.93s/it] 45%|████▌ | 4518/10000 [17:44:12<21:14:16, 13.95s/it] {'loss': 0.0331, 'learning_rate': 2.7435e-05, 'epoch': 5.91} 45%|████▌ | 4518/10000 [17:44:12<21:14:16, 13.95s/it] 45%|████▌ | 4519/10000 [17:44:26<21:12:05, 13.93s/it] {'loss': 0.0357, 'learning_rate': 2.743e-05, 'epoch': 5.91} 45%|████▌ | 4519/10000 [17:44:26<21:12:05, 13.93s/it] 45%|████▌ | 4520/10000 [17:44:40<21:12:47, 13.94s/it] {'loss': 0.0377, 'learning_rate': 2.7425e-05, 'epoch': 5.92} 45%|████▌ | 4520/10000 [17:44:40<21:12:47, 13.94s/it] 45%|████▌ | 4521/10000 [17:44:54<21:11:21, 13.92s/it] {'loss': 0.0269, 'learning_rate': 2.7420000000000002e-05, 'epoch': 5.92} 45%|████▌ | 4521/10000 [17:44:54<21:11:21, 13.92s/it] 45%|████▌ | 4522/10000 [17:45:07<21:09:32, 13.91s/it] {'loss': 0.0439, 'learning_rate': 2.7415e-05, 'epoch': 5.92} 45%|████▌ | 4522/10000 [17:45:07<21:09:32, 13.91s/it] 45%|████▌ | 4523/10000 [17:45:21<21:07:12, 13.88s/it] {'loss': 0.0313, 'learning_rate': 2.7410000000000004e-05, 'epoch': 5.92} 45%|████▌ | 4523/10000 [17:45:21<21:07:12, 13.88s/it] 45%|████▌ | 4524/10000 [17:45:35<21:10:34, 13.92s/it] {'loss': 0.0389, 'learning_rate': 2.7405000000000003e-05, 'epoch': 5.92} 45%|████▌ | 4524/10000 [17:45:35<21:10:34, 13.92s/it] 45%|████▌ | 4525/10000 [17:45:49<21:09:21, 13.91s/it] {'loss': 0.043, 'learning_rate': 2.7400000000000002e-05, 'epoch': 5.92} 45%|████▌ | 4525/10000 [17:45:49<21:09:21, 13.91s/it] 45%|████▌ | 4526/10000 [17:46:03<21:09:17, 13.91s/it] {'loss': 0.0433, 'learning_rate': 2.7395000000000005e-05, 'epoch': 5.92} 45%|████▌ | 4526/10000 [17:46:03<21:09:17, 13.91s/it] 45%|████▌ | 4527/10000 [17:46:17<21:03:41, 13.85s/it] {'loss': 0.0307, 'learning_rate': 2.739e-05, 'epoch': 5.93} 45%|████▌ | 4527/10000 [17:46:17<21:03:41, 13.85s/it] 45%|████▌ | 4528/10000 [17:46:31<21:04:02, 13.86s/it] {'loss': 0.033, 'learning_rate': 2.7385e-05, 'epoch': 5.93} 45%|████▌ | 4528/10000 [17:46:31<21:04:02, 13.86s/it] 45%|████▌ | 4529/10000 [17:46:45<21:06:25, 13.89s/it] {'loss': 0.0446, 'learning_rate': 2.738e-05, 'epoch': 5.93} 45%|████▌ | 4529/10000 [17:46:45<21:06:25, 13.89s/it] 45%|████▌ | 4530/10000 [17:46:59<21:06:49, 13.90s/it] {'loss': 0.034, 'learning_rate': 2.7375e-05, 'epoch': 5.93} 45%|████▌ | 4530/10000 [17:46:59<21:06:49, 13.90s/it] 45%|████▌ | 4531/10000 [17:47:13<21:10:26, 13.94s/it] {'loss': 0.0425, 'learning_rate': 2.737e-05, 'epoch': 5.93} 45%|████▌ | 4531/10000 [17:47:13<21:10:26, 13.94s/it] 45%|████▌ | 4532/10000 [17:47:26<21:07:54, 13.91s/it] {'loss': 0.0373, 'learning_rate': 2.7365000000000003e-05, 'epoch': 5.93} 45%|████▌ | 4532/10000 [17:47:26<21:07:54, 13.91s/it] 45%|████▌ | 4533/10000 [17:47:40<21:05:25, 13.89s/it] {'loss': 0.0393, 'learning_rate': 2.7360000000000002e-05, 'epoch': 5.93} 45%|████▌ | 4533/10000 [17:47:40<21:05:25, 13.89s/it] 45%|████▌ | 4534/10000 [17:47:54<21:09:27, 13.93s/it] {'loss': 0.0433, 'learning_rate': 2.7355e-05, 'epoch': 5.93} 45%|████▌ | 4534/10000 [17:47:54<21:09:27, 13.93s/it] 45%|████▌ | 4535/10000 [17:48:08<21:11:53, 13.96s/it] {'loss': 0.0424, 'learning_rate': 2.7350000000000004e-05, 'epoch': 5.94} 45%|████▌ | 4535/10000 [17:48:08<21:11:53, 13.96s/it] 45%|████▌ | 4536/10000 [17:48:22<21:09:43, 13.94s/it] {'loss': 0.0339, 'learning_rate': 2.7345000000000003e-05, 'epoch': 5.94} 45%|████▌ | 4536/10000 [17:48:22<21:09:43, 13.94s/it] 45%|████▌ | 4537/10000 [17:48:36<21:06:59, 13.92s/it] {'loss': 0.0445, 'learning_rate': 2.734e-05, 'epoch': 5.94} 45%|████▌ | 4537/10000 [17:48:36<21:06:59, 13.92s/it] 45%|████▌ | 4538/10000 [17:48:50<21:07:30, 13.92s/it] {'loss': 0.0356, 'learning_rate': 2.7335e-05, 'epoch': 5.94} 45%|████▌ | 4538/10000 [17:48:50<21:07:30, 13.92s/it] 45%|████▌ | 4539/10000 [17:49:04<21:07:37, 13.93s/it] {'loss': 0.0385, 'learning_rate': 2.733e-05, 'epoch': 5.94} 45%|████▌ | 4539/10000 [17:49:04<21:07:37, 13.93s/it] 45%|████▌ | 4540/10000 [17:49:18<21:08:46, 13.94s/it] {'loss': 0.0388, 'learning_rate': 2.7325e-05, 'epoch': 5.94} 45%|████▌ | 4540/10000 [17:49:18<21:08:46, 13.94s/it] 45%|████▌ | 4541/10000 [17:49:32<21:06:46, 13.92s/it] {'loss': 0.0448, 'learning_rate': 2.7320000000000003e-05, 'epoch': 5.94} 45%|████▌ | 4541/10000 [17:49:32<21:06:46, 13.92s/it] 45%|████▌ | 4542/10000 [17:49:46<21:05:08, 13.91s/it] {'loss': 0.0354, 'learning_rate': 2.7315000000000002e-05, 'epoch': 5.95} 45%|████▌ | 4542/10000 [17:49:46<21:05:08, 13.91s/it] 45%|████▌ | 4543/10000 [17:50:00<21:08:24, 13.95s/it] {'loss': 0.0389, 'learning_rate': 2.731e-05, 'epoch': 5.95} 45%|████▌ | 4543/10000 [17:50:00<21:08:24, 13.95s/it] 45%|████▌ | 4544/10000 [17:50:14<21:07:05, 13.93s/it] {'loss': 0.0342, 'learning_rate': 2.7305000000000004e-05, 'epoch': 5.95} 45%|████▌ | 4544/10000 [17:50:14<21:07:05, 13.93s/it] 45%|████▌ | 4545/10000 [17:50:27<21:05:51, 13.92s/it] {'loss': 0.0403, 'learning_rate': 2.7300000000000003e-05, 'epoch': 5.95} 45%|████▌ | 4545/10000 [17:50:28<21:05:51, 13.92s/it] 45%|████▌ | 4546/10000 [17:50:42<21:09:12, 13.96s/it] {'loss': 0.0378, 'learning_rate': 2.7295000000000005e-05, 'epoch': 5.95} 45%|████▌ | 4546/10000 [17:50:42<21:09:12, 13.96s/it] 45%|████▌ | 4547/10000 [17:50:55<21:07:56, 13.95s/it] {'loss': 0.0347, 'learning_rate': 2.7289999999999998e-05, 'epoch': 5.95} 45%|████▌ | 4547/10000 [17:50:56<21:07:56, 13.95s/it] 45%|████▌ | 4548/10000 [17:51:09<21:08:17, 13.96s/it] {'loss': 0.0395, 'learning_rate': 2.7285e-05, 'epoch': 5.95} 45%|████▌ | 4548/10000 [17:51:09<21:08:17, 13.96s/it] 45%|████▌ | 4549/10000 [17:51:23<21:05:43, 13.93s/it] {'loss': 0.0381, 'learning_rate': 2.728e-05, 'epoch': 5.95} 45%|████▌ | 4549/10000 [17:51:23<21:05:43, 13.93s/it] 46%|████▌ | 4550/10000 [17:51:37<21:06:43, 13.95s/it] {'loss': 0.0371, 'learning_rate': 2.7275e-05, 'epoch': 5.96} 46%|████▌ | 4550/10000 [17:51:37<21:06:43, 13.95s/it] 46%|████▌ | 4551/10000 [17:51:51<21:08:23, 13.97s/it] {'loss': 0.041, 'learning_rate': 2.727e-05, 'epoch': 5.96} 46%|████▌ | 4551/10000 [17:51:51<21:08:23, 13.97s/it] 46%|████▌ | 4552/10000 [17:52:05<21:07:25, 13.96s/it] {'loss': 0.0488, 'learning_rate': 2.7265e-05, 'epoch': 5.96} 46%|████▌ | 4552/10000 [17:52:05<21:07:25, 13.96s/it] 46%|████▌ | 4553/10000 [17:52:19<21:07:00, 13.96s/it] {'loss': 0.0416, 'learning_rate': 2.7260000000000003e-05, 'epoch': 5.96} 46%|████▌ | 4553/10000 [17:52:19<21:07:00, 13.96s/it] 46%|████▌ | 4554/10000 [17:52:33<21:04:44, 13.93s/it] {'loss': 0.0399, 'learning_rate': 2.7255000000000002e-05, 'epoch': 5.96} 46%|████▌ | 4554/10000 [17:52:33<21:04:44, 13.93s/it] 46%|████▌ | 4555/10000 [17:52:47<21:00:35, 13.89s/it] {'loss': 0.0442, 'learning_rate': 2.725e-05, 'epoch': 5.96} 46%|████▌ | 4555/10000 [17:52:47<21:00:35, 13.89s/it] 46%|████▌ | 4556/10000 [17:53:01<20:59:56, 13.89s/it] {'loss': 0.0355, 'learning_rate': 2.7245000000000004e-05, 'epoch': 5.96} 46%|████▌ | 4556/10000 [17:53:01<20:59:56, 13.89s/it] 46%|████▌ | 4557/10000 [17:53:15<21:00:52, 13.90s/it] {'loss': 0.0367, 'learning_rate': 2.724e-05, 'epoch': 5.96} 46%|████▌ | 4557/10000 [17:53:15<21:00:52, 13.90s/it] 46%|████▌ | 4558/10000 [17:53:29<21:00:33, 13.90s/it] {'loss': 0.0377, 'learning_rate': 2.7235e-05, 'epoch': 5.97} 46%|████▌ | 4558/10000 [17:53:29<21:00:33, 13.90s/it] 46%|████▌ | 4559/10000 [17:53:43<21:01:54, 13.92s/it] {'loss': 0.0385, 'learning_rate': 2.723e-05, 'epoch': 5.97} 46%|████▌ | 4559/10000 [17:53:43<21:01:54, 13.92s/it] 46%|████▌ | 4560/10000 [17:53:56<21:01:22, 13.91s/it] {'loss': 0.0447, 'learning_rate': 2.7225e-05, 'epoch': 5.97} 46%|████▌ | 4560/10000 [17:53:56<21:01:22, 13.91s/it] 46%|████▌ | 4561/10000 [17:54:10<21:01:33, 13.92s/it] {'loss': 0.0341, 'learning_rate': 2.722e-05, 'epoch': 5.97} 46%|████▌ | 4561/10000 [17:54:10<21:01:33, 13.92s/it] 46%|████▌ | 4562/10000 [17:54:24<21:03:02, 13.94s/it] {'loss': 0.0389, 'learning_rate': 2.7215000000000003e-05, 'epoch': 5.97} 46%|████▌ | 4562/10000 [17:54:24<21:03:02, 13.94s/it] 46%|████▌ | 4563/10000 [17:54:38<21:00:20, 13.91s/it] {'loss': 0.0357, 'learning_rate': 2.7210000000000002e-05, 'epoch': 5.97} 46%|████▌ | 4563/10000 [17:54:38<21:00:20, 13.91s/it] 46%|████▌ | 4564/10000 [17:54:52<21:00:31, 13.91s/it] {'loss': 0.0434, 'learning_rate': 2.7205e-05, 'epoch': 5.97} 46%|████▌ | 4564/10000 [17:54:52<21:00:31, 13.91s/it] 46%|████▌ | 4565/10000 [17:55:06<20:58:30, 13.89s/it] {'loss': 0.0355, 'learning_rate': 2.7200000000000004e-05, 'epoch': 5.98} 46%|████▌ | 4565/10000 [17:55:06<20:58:30, 13.89s/it] 46%|████▌ | 4566/10000 [17:55:20<20:59:24, 13.91s/it] {'loss': 0.0402, 'learning_rate': 2.7195000000000003e-05, 'epoch': 5.98} 46%|████▌ | 4566/10000 [17:55:20<20:59:24, 13.91s/it] 46%|████▌ | 4567/10000 [17:55:34<21:03:10, 13.95s/it] {'loss': 0.0351, 'learning_rate': 2.719e-05, 'epoch': 5.98} 46%|████▌ | 4567/10000 [17:55:34<21:03:10, 13.95s/it] 46%|████▌ | 4568/10000 [17:55:48<20:58:32, 13.90s/it] {'loss': 0.0354, 'learning_rate': 2.7184999999999998e-05, 'epoch': 5.98} 46%|████▌ | 4568/10000 [17:55:48<20:58:32, 13.90s/it] 46%|████▌ | 4569/10000 [17:56:02<20:55:09, 13.87s/it] {'loss': 0.0312, 'learning_rate': 2.718e-05, 'epoch': 5.98} 46%|████▌ | 4569/10000 [17:56:02<20:55:09, 13.87s/it] 46%|████▌ | 4570/10000 [17:56:16<20:59:26, 13.92s/it] {'loss': 0.0379, 'learning_rate': 2.7175e-05, 'epoch': 5.98} 46%|████▌ | 4570/10000 [17:56:16<20:59:26, 13.92s/it] 46%|████▌ | 4571/10000 [17:56:30<21:01:11, 13.94s/it] {'loss': 0.0366, 'learning_rate': 2.7170000000000002e-05, 'epoch': 5.98} 46%|████▌ | 4571/10000 [17:56:30<21:01:11, 13.94s/it] 46%|████▌ | 4572/10000 [17:56:43<21:01:19, 13.94s/it] {'loss': 0.038, 'learning_rate': 2.7165e-05, 'epoch': 5.98} 46%|████▌ | 4572/10000 [17:56:44<21:01:19, 13.94s/it] 46%|████▌ | 4573/10000 [17:56:57<21:02:42, 13.96s/it] {'loss': 0.0436, 'learning_rate': 2.716e-05, 'epoch': 5.99} 46%|████▌ | 4573/10000 [17:56:58<21:02:42, 13.96s/it] 46%|████▌ | 4574/10000 [17:57:11<20:58:56, 13.92s/it] {'loss': 0.0393, 'learning_rate': 2.7155000000000003e-05, 'epoch': 5.99} 46%|████▌ | 4574/10000 [17:57:11<20:58:56, 13.92s/it] 46%|████▌ | 4575/10000 [17:57:25<20:58:40, 13.92s/it] {'loss': 0.0341, 'learning_rate': 2.7150000000000003e-05, 'epoch': 5.99} 46%|████▌ | 4575/10000 [17:57:25<20:58:40, 13.92s/it] 46%|████▌ | 4576/10000 [17:57:39<20:57:46, 13.91s/it] {'loss': 0.0316, 'learning_rate': 2.7145000000000005e-05, 'epoch': 5.99} 46%|████▌ | 4576/10000 [17:57:39<20:57:46, 13.91s/it] 46%|████▌ | 4577/10000 [17:57:53<20:59:15, 13.93s/it] {'loss': 0.0294, 'learning_rate': 2.7139999999999998e-05, 'epoch': 5.99} 46%|████▌ | 4577/10000 [17:57:53<20:59:15, 13.93s/it] 46%|████▌ | 4578/10000 [17:58:07<20:57:48, 13.92s/it] {'loss': 0.0814, 'learning_rate': 2.7135e-05, 'epoch': 5.99} 46%|████▌ | 4578/10000 [17:58:07<20:57:48, 13.92s/it] 46%|████▌ | 4579/10000 [17:58:21<20:57:23, 13.92s/it] {'loss': 0.047, 'learning_rate': 2.713e-05, 'epoch': 5.99} 46%|████▌ | 4579/10000 [17:58:21<20:57:23, 13.92s/it] 46%|████▌ | 4580/10000 [17:58:35<20:57:46, 13.92s/it] {'loss': 0.0384, 'learning_rate': 2.7125000000000002e-05, 'epoch': 5.99} 46%|████▌ | 4580/10000 [17:58:35<20:57:46, 13.92s/it] 46%|████▌ | 4581/10000 [17:58:49<20:54:13, 13.89s/it] {'loss': 0.0269, 'learning_rate': 2.712e-05, 'epoch': 6.0} 46%|████▌ | 4581/10000 [17:58:49<20:54:13, 13.89s/it] 46%|████▌ | 4582/10000 [17:59:03<20:53:46, 13.88s/it] {'loss': 0.0371, 'learning_rate': 2.7115e-05, 'epoch': 6.0} 46%|████▌ | 4582/10000 [17:59:03<20:53:46, 13.88s/it] 46%|████▌ | 4583/10000 [17:59:16<20:54:23, 13.89s/it] {'loss': 0.0374, 'learning_rate': 2.7110000000000003e-05, 'epoch': 6.0} 46%|████▌ | 4583/10000 [17:59:17<20:54:23, 13.89s/it] 46%|████▌ | 4584/10000 [17:59:29<20:21:31, 13.53s/it] {'loss': 0.0397, 'learning_rate': 2.7105000000000002e-05, 'epoch': 6.0} 46%|████▌ | 4584/10000 [17:59:29<20:21:31, 13.53s/it] 46%|████▌ | 4585/10000 [17:59:43<20:36:32, 13.70s/it] {'loss': 0.0238, 'learning_rate': 2.7100000000000005e-05, 'epoch': 6.0} 46%|████▌ | 4585/10000 [17:59:43<20:36:32, 13.70s/it] 46%|████▌ | 4586/10000 [17:59:57<20:42:44, 13.77s/it] {'loss': 0.0227, 'learning_rate': 2.7095000000000004e-05, 'epoch': 6.0} 46%|████▌ | 4586/10000 [17:59:57<20:42:44, 13.77s/it] 46%|████▌ | 4587/10000 [18:00:11<20:44:49, 13.80s/it] {'loss': 0.0193, 'learning_rate': 2.709e-05, 'epoch': 6.0} 46%|████▌ | 4587/10000 [18:00:11<20:44:49, 13.80s/it] 46%|████▌ | 4588/10000 [18:00:25<20:49:20, 13.85s/it] {'loss': 0.0243, 'learning_rate': 2.7085e-05, 'epoch': 6.01} 46%|████▌ | 4588/10000 [18:00:25<20:49:20, 13.85s/it] 46%|████▌ | 4589/10000 [18:00:39<20:49:43, 13.86s/it] {'loss': 0.0221, 'learning_rate': 2.7079999999999998e-05, 'epoch': 6.01} 46%|████▌ | 4589/10000 [18:00:39<20:49:43, 13.86s/it] 46%|████▌ | 4590/10000 [18:00:53<20:52:30, 13.89s/it] {'loss': 0.0243, 'learning_rate': 2.7075e-05, 'epoch': 6.01} 46%|████▌ | 4590/10000 [18:00:53<20:52:30, 13.89s/it] 46%|████▌ | 4591/10000 [18:01:07<20:53:49, 13.91s/it] {'loss': 0.0216, 'learning_rate': 2.707e-05, 'epoch': 6.01} 46%|████▌ | 4591/10000 [18:01:07<20:53:49, 13.91s/it] 46%|████▌ | 4592/10000 [18:01:21<20:58:59, 13.97s/it] {'loss': 0.0234, 'learning_rate': 2.7065000000000003e-05, 'epoch': 6.01} 46%|████▌ | 4592/10000 [18:01:21<20:58:59, 13.97s/it] 46%|████▌ | 4593/10000 [18:01:35<20:58:24, 13.96s/it] {'loss': 0.0243, 'learning_rate': 2.7060000000000002e-05, 'epoch': 6.01} 46%|████▌ | 4593/10000 [18:01:35<20:58:24, 13.96s/it] 46%|████▌ | 4594/10000 [18:01:49<20:55:34, 13.94s/it] {'loss': 0.0193, 'learning_rate': 2.7055e-05, 'epoch': 6.01} 46%|████▌ | 4594/10000 [18:01:49<20:55:34, 13.94s/it] 46%|████▌ | 4595/10000 [18:02:03<20:54:49, 13.93s/it] {'loss': 0.0178, 'learning_rate': 2.7050000000000004e-05, 'epoch': 6.01} 46%|████▌ | 4595/10000 [18:02:03<20:54:49, 13.93s/it] 46%|████▌ | 4596/10000 [18:02:17<20:55:53, 13.94s/it] {'loss': 0.0215, 'learning_rate': 2.7045000000000003e-05, 'epoch': 6.02} 46%|████▌ | 4596/10000 [18:02:17<20:55:53, 13.94s/it] 46%|████▌ | 4597/10000 [18:02:30<20:52:32, 13.91s/it] {'loss': 0.0283, 'learning_rate': 2.704e-05, 'epoch': 6.02} 46%|████▌ | 4597/10000 [18:02:30<20:52:32, 13.91s/it] 46%|████▌ | 4598/10000 [18:02:44<20:51:46, 13.90s/it] {'loss': 0.0218, 'learning_rate': 2.7034999999999998e-05, 'epoch': 6.02} 46%|████▌ | 4598/10000 [18:02:44<20:51:46, 13.90s/it] 46%|████▌ | 4599/10000 [18:02:58<20:53:29, 13.93s/it] {'loss': 0.0211, 'learning_rate': 2.703e-05, 'epoch': 6.02} 46%|████▌ | 4599/10000 [18:02:58<20:53:29, 13.93s/it] 46%|████▌ | 4600/10000 [18:03:12<20:56:41, 13.96s/it] {'loss': 0.0211, 'learning_rate': 2.7025e-05, 'epoch': 6.02} 46%|████▌ | 4600/10000 [18:03:12<20:56:41, 13.96s/it] 46%|████▌ | 4601/10000 [18:03:26<20:57:54, 13.98s/it] {'loss': 0.0191, 'learning_rate': 2.7020000000000002e-05, 'epoch': 6.02} 46%|████▌ | 4601/10000 [18:03:26<20:57:54, 13.98s/it] 46%|████▌ | 4602/10000 [18:03:40<20:58:03, 13.98s/it] {'loss': 0.0222, 'learning_rate': 2.7015e-05, 'epoch': 6.02} 46%|████▌ | 4602/10000 [18:03:40<20:58:03, 13.98s/it] 46%|████▌ | 4603/10000 [18:03:54<20:59:29, 14.00s/it] {'loss': 0.0236, 'learning_rate': 2.701e-05, 'epoch': 6.02} 46%|████▌ | 4603/10000 [18:03:54<20:59:29, 14.00s/it] 46%|████▌ | 4604/10000 [18:04:08<20:54:49, 13.95s/it] {'loss': 0.0179, 'learning_rate': 2.7005000000000003e-05, 'epoch': 6.03} 46%|████▌ | 4604/10000 [18:04:08<20:54:49, 13.95s/it] 46%|████▌ | 4605/10000 [18:04:22<20:55:10, 13.96s/it] {'loss': 0.023, 'learning_rate': 2.7000000000000002e-05, 'epoch': 6.03} 46%|████▌ | 4605/10000 [18:04:22<20:55:10, 13.96s/it] 46%|████▌ | 4606/10000 [18:04:36<20:52:36, 13.93s/it] {'loss': 0.0197, 'learning_rate': 2.6995000000000005e-05, 'epoch': 6.03} 46%|████▌ | 4606/10000 [18:04:36<20:52:36, 13.93s/it] 46%|████▌ | 4607/10000 [18:04:50<20:51:09, 13.92s/it] {'loss': 0.0185, 'learning_rate': 2.6989999999999997e-05, 'epoch': 6.03} 46%|████▌ | 4607/10000 [18:04:50<20:51:09, 13.92s/it] 46%|████▌ | 4608/10000 [18:05:04<20:48:57, 13.90s/it] {'loss': 0.0175, 'learning_rate': 2.6985e-05, 'epoch': 6.03} 46%|████▌ | 4608/10000 [18:05:04<20:48:57, 13.90s/it] 46%|████▌ | 4609/10000 [18:05:18<20:48:37, 13.90s/it] {'loss': 0.0266, 'learning_rate': 2.698e-05, 'epoch': 6.03} 46%|████▌ | 4609/10000 [18:05:18<20:48:37, 13.90s/it] 46%|████▌ | 4610/10000 [18:05:32<20:52:05, 13.94s/it] {'loss': 0.0173, 'learning_rate': 2.6975000000000002e-05, 'epoch': 6.03} 46%|████▌ | 4610/10000 [18:05:32<20:52:05, 13.94s/it] 46%|████▌ | 4611/10000 [18:05:46<20:49:25, 13.91s/it] {'loss': 0.0224, 'learning_rate': 2.697e-05, 'epoch': 6.04} 46%|████▌ | 4611/10000 [18:05:46<20:49:25, 13.91s/it] 46%|████▌ | 4612/10000 [18:06:00<20:51:23, 13.94s/it] {'loss': 0.0198, 'learning_rate': 2.6965e-05, 'epoch': 6.04} 46%|████▌ | 4612/10000 [18:06:00<20:51:23, 13.94s/it] 46%|████▌ | 4613/10000 [18:06:14<20:51:46, 13.94s/it] {'loss': 0.0215, 'learning_rate': 2.6960000000000003e-05, 'epoch': 6.04} 46%|████▌ | 4613/10000 [18:06:14<20:51:46, 13.94s/it] 46%|████▌ | 4614/10000 [18:06:28<20:51:38, 13.94s/it] {'loss': 0.0229, 'learning_rate': 2.6955000000000002e-05, 'epoch': 6.04} 46%|████▌ | 4614/10000 [18:06:28<20:51:38, 13.94s/it] 46%|████▌ | 4615/10000 [18:06:42<20:55:18, 13.99s/it] {'loss': 0.0225, 'learning_rate': 2.6950000000000005e-05, 'epoch': 6.04} 46%|████▌ | 4615/10000 [18:06:42<20:55:18, 13.99s/it] 46%|████▌ | 4616/10000 [18:06:56<20:55:06, 13.99s/it] {'loss': 0.0207, 'learning_rate': 2.6945000000000004e-05, 'epoch': 6.04} 46%|████▌ | 4616/10000 [18:06:56<20:55:06, 13.99s/it] 46%|████▌ | 4617/10000 [18:07:10<20:55:32, 13.99s/it] {'loss': 0.0178, 'learning_rate': 2.694e-05, 'epoch': 6.04} 46%|████▌ | 4617/10000 [18:07:10<20:55:32, 13.99s/it] 46%|████▌ | 4618/10000 [18:07:24<20:52:46, 13.97s/it] {'loss': 0.0189, 'learning_rate': 2.6935e-05, 'epoch': 6.04} 46%|████▌ | 4618/10000 [18:07:24<20:52:46, 13.97s/it] 46%|████▌ | 4619/10000 [18:07:37<20:53:07, 13.97s/it] {'loss': 0.0221, 'learning_rate': 2.693e-05, 'epoch': 6.05} 46%|████▌ | 4619/10000 [18:07:38<20:53:07, 13.97s/it] 46%|████▌ | 4620/10000 [18:07:51<20:51:25, 13.96s/it] {'loss': 0.0215, 'learning_rate': 2.6925e-05, 'epoch': 6.05} 46%|████▌ | 4620/10000 [18:07:51<20:51:25, 13.96s/it] 46%|████▌ | 4621/10000 [18:08:05<20:48:00, 13.92s/it] {'loss': 0.0201, 'learning_rate': 2.692e-05, 'epoch': 6.05} 46%|████▌ | 4621/10000 [18:08:05<20:48:00, 13.92s/it] 46%|████▌ | 4622/10000 [18:08:19<20:48:10, 13.93s/it] {'loss': 0.0181, 'learning_rate': 2.6915000000000002e-05, 'epoch': 6.05} 46%|████▌ | 4622/10000 [18:08:19<20:48:10, 13.93s/it] 46%|████▌ | 4623/10000 [18:08:33<20:47:16, 13.92s/it] {'loss': 0.019, 'learning_rate': 2.691e-05, 'epoch': 6.05} 46%|████▌ | 4623/10000 [18:08:33<20:47:16, 13.92s/it] 46%|████▌ | 4624/10000 [18:08:47<20:44:26, 13.89s/it] {'loss': 0.0186, 'learning_rate': 2.6905e-05, 'epoch': 6.05} 46%|████▌ | 4624/10000 [18:08:47<20:44:26, 13.89s/it] 46%|████▋ | 4625/10000 [18:09:01<20:43:20, 13.88s/it] {'loss': 0.0194, 'learning_rate': 2.6900000000000003e-05, 'epoch': 6.05} 46%|████▋ | 4625/10000 [18:09:01<20:43:20, 13.88s/it] 46%|████▋ | 4626/10000 [18:09:15<20:43:04, 13.88s/it] {'loss': 0.019, 'learning_rate': 2.6895000000000003e-05, 'epoch': 6.05} 46%|████▋ | 4626/10000 [18:09:15<20:43:04, 13.88s/it] 46%|████▋ | 4627/10000 [18:09:29<20:43:37, 13.89s/it] {'loss': 0.023, 'learning_rate': 2.689e-05, 'epoch': 6.06} 46%|████▋ | 4627/10000 [18:09:29<20:43:37, 13.89s/it] 46%|████▋ | 4628/10000 [18:09:43<20:45:26, 13.91s/it] {'loss': 0.0253, 'learning_rate': 2.6884999999999998e-05, 'epoch': 6.06} 46%|████▋ | 4628/10000 [18:09:43<20:45:26, 13.91s/it] 46%|████▋ | 4629/10000 [18:09:56<20:46:45, 13.93s/it] {'loss': 0.0166, 'learning_rate': 2.688e-05, 'epoch': 6.06} 46%|████▋ | 4629/10000 [18:09:57<20:46:45, 13.93s/it] 46%|████▋ | 4630/10000 [18:10:10<20:43:57, 13.90s/it] {'loss': 0.0191, 'learning_rate': 2.6875e-05, 'epoch': 6.06} 46%|████▋ | 4630/10000 [18:10:10<20:43:57, 13.90s/it] 46%|████▋ | 4631/10000 [18:10:24<20:45:56, 13.92s/it] {'loss': 0.0173, 'learning_rate': 2.6870000000000002e-05, 'epoch': 6.06} 46%|████▋ | 4631/10000 [18:10:24<20:45:56, 13.92s/it] 46%|████▋ | 4632/10000 [18:10:38<20:45:28, 13.92s/it] {'loss': 0.0198, 'learning_rate': 2.6865e-05, 'epoch': 6.06} 46%|████▋ | 4632/10000 [18:10:38<20:45:28, 13.92s/it] 46%|████▋ | 4633/10000 [18:10:52<20:43:31, 13.90s/it] {'loss': 0.0133, 'learning_rate': 2.686e-05, 'epoch': 6.06} 46%|████▋ | 4633/10000 [18:10:52<20:43:31, 13.90s/it] 46%|████▋ | 4634/10000 [18:11:06<20:45:05, 13.92s/it] {'loss': 0.0166, 'learning_rate': 2.6855000000000003e-05, 'epoch': 6.07} 46%|████▋ | 4634/10000 [18:11:06<20:45:05, 13.92s/it] 46%|████▋ | 4635/10000 [18:11:20<20:41:44, 13.89s/it] {'loss': 0.0166, 'learning_rate': 2.6850000000000002e-05, 'epoch': 6.07} 46%|████▋ | 4635/10000 [18:11:20<20:41:44, 13.89s/it] 46%|████▋ | 4636/10000 [18:11:34<20:46:32, 13.94s/it] {'loss': 0.0177, 'learning_rate': 2.6845000000000005e-05, 'epoch': 6.07} 46%|████▋ | 4636/10000 [18:11:34<20:46:32, 13.94s/it] 46%|████▋ | 4637/10000 [18:11:48<20:44:33, 13.92s/it] {'loss': 0.0212, 'learning_rate': 2.6840000000000004e-05, 'epoch': 6.07} 46%|████▋ | 4637/10000 [18:11:48<20:44:33, 13.92s/it] 46%|████▋ | 4638/10000 [18:12:02<20:46:52, 13.95s/it] {'loss': 0.017, 'learning_rate': 2.6835e-05, 'epoch': 6.07} 46%|████▋ | 4638/10000 [18:12:02<20:46:52, 13.95s/it] 46%|████▋ | 4639/10000 [18:12:16<20:50:30, 14.00s/it] {'loss': 0.0179, 'learning_rate': 2.683e-05, 'epoch': 6.07} 46%|████▋ | 4639/10000 [18:12:16<20:50:30, 14.00s/it] 46%|████▋ | 4640/10000 [18:12:30<20:48:11, 13.97s/it] {'loss': 0.0142, 'learning_rate': 2.6825e-05, 'epoch': 6.07} 46%|████▋ | 4640/10000 [18:12:30<20:48:11, 13.97s/it] 46%|████▋ | 4641/10000 [18:12:44<20:42:52, 13.92s/it] {'loss': 0.0249, 'learning_rate': 2.682e-05, 'epoch': 6.07} 46%|████▋ | 4641/10000 [18:12:44<20:42:52, 13.92s/it] 46%|████▋ | 4642/10000 [18:12:57<20:41:14, 13.90s/it] {'loss': 0.0191, 'learning_rate': 2.6815e-05, 'epoch': 6.08} 46%|████▋ | 4642/10000 [18:12:58<20:41:14, 13.90s/it] 46%|████▋ | 4643/10000 [18:13:11<20:41:00, 13.90s/it] {'loss': 0.0178, 'learning_rate': 2.6810000000000003e-05, 'epoch': 6.08} 46%|████▋ | 4643/10000 [18:13:11<20:41:00, 13.90s/it] 46%|████▋ | 4644/10000 [18:13:25<20:38:12, 13.87s/it] {'loss': 0.0203, 'learning_rate': 2.6805000000000002e-05, 'epoch': 6.08} 46%|████▋ | 4644/10000 [18:13:25<20:38:12, 13.87s/it] 46%|████▋ | 4645/10000 [18:13:39<20:39:04, 13.88s/it] {'loss': 0.0181, 'learning_rate': 2.6800000000000004e-05, 'epoch': 6.08} 46%|████▋ | 4645/10000 [18:13:39<20:39:04, 13.88s/it] 46%|████▋ | 4646/10000 [18:13:53<20:39:37, 13.89s/it] {'loss': 0.0185, 'learning_rate': 2.6795000000000003e-05, 'epoch': 6.08} 46%|████▋ | 4646/10000 [18:13:53<20:39:37, 13.89s/it] 46%|████▋ | 4647/10000 [18:14:07<20:43:11, 13.93s/it] {'loss': 0.0223, 'learning_rate': 2.6790000000000003e-05, 'epoch': 6.08} 46%|████▋ | 4647/10000 [18:14:07<20:43:11, 13.93s/it] 46%|████▋ | 4648/10000 [18:14:21<20:39:43, 13.90s/it] {'loss': 0.0177, 'learning_rate': 2.6785e-05, 'epoch': 6.08} 46%|████▋ | 4648/10000 [18:14:21<20:39:43, 13.90s/it] 46%|████▋ | 4649/10000 [18:14:35<20:40:25, 13.91s/it] {'loss': 0.0237, 'learning_rate': 2.678e-05, 'epoch': 6.09} 46%|████▋ | 4649/10000 [18:14:35<20:40:25, 13.91s/it] 46%|████▋ | 4650/10000 [18:14:49<20:36:19, 13.87s/it] {'loss': 0.0192, 'learning_rate': 2.6775e-05, 'epoch': 6.09} 46%|████▋ | 4650/10000 [18:14:49<20:36:19, 13.87s/it] 47%|████▋ | 4651/10000 [18:15:02<20:36:39, 13.87s/it] {'loss': 0.0219, 'learning_rate': 2.677e-05, 'epoch': 6.09} 47%|████▋ | 4651/10000 [18:15:02<20:36:39, 13.87s/it] 47%|████▋ | 4652/10000 [18:15:16<20:37:44, 13.89s/it] {'loss': 0.0202, 'learning_rate': 2.6765000000000002e-05, 'epoch': 6.09} 47%|████▋ | 4652/10000 [18:15:16<20:37:44, 13.89s/it] 47%|████▋ | 4653/10000 [18:15:30<20:37:45, 13.89s/it] {'loss': 0.0191, 'learning_rate': 2.676e-05, 'epoch': 6.09} 47%|████▋ | 4653/10000 [18:15:30<20:37:45, 13.89s/it] 47%|████▋ | 4654/10000 [18:15:44<20:40:34, 13.92s/it] {'loss': 0.0264, 'learning_rate': 2.6755000000000004e-05, 'epoch': 6.09} 47%|████▋ | 4654/10000 [18:15:44<20:40:34, 13.92s/it] 47%|████▋ | 4655/10000 [18:15:58<20:37:13, 13.89s/it] {'loss': 0.0177, 'learning_rate': 2.6750000000000003e-05, 'epoch': 6.09} 47%|████▋ | 4655/10000 [18:15:58<20:37:13, 13.89s/it] 47%|████▋ | 4656/10000 [18:16:12<20:33:56, 13.85s/it] {'loss': 0.0171, 'learning_rate': 2.6745000000000002e-05, 'epoch': 6.09} 47%|████▋ | 4656/10000 [18:16:12<20:33:56, 13.85s/it] 47%|████▋ | 4657/10000 [18:16:26<20:34:32, 13.86s/it] {'loss': 0.0157, 'learning_rate': 2.6740000000000005e-05, 'epoch': 6.1} 47%|████▋ | 4657/10000 [18:16:26<20:34:32, 13.86s/it] 47%|████▋ | 4658/10000 [18:16:40<20:33:15, 13.85s/it] {'loss': 0.0203, 'learning_rate': 2.6734999999999997e-05, 'epoch': 6.1} 47%|████▋ | 4658/10000 [18:16:40<20:33:15, 13.85s/it] 47%|████▋ | 4659/10000 [18:16:53<20:34:31, 13.87s/it] {'loss': 0.0205, 'learning_rate': 2.673e-05, 'epoch': 6.1} 47%|████▋ | 4659/10000 [18:16:54<20:34:31, 13.87s/it] 47%|████▋ | 4660/10000 [18:17:07<20:34:35, 13.87s/it] {'loss': 0.0207, 'learning_rate': 2.6725e-05, 'epoch': 6.1} 47%|████▋ | 4660/10000 [18:17:07<20:34:35, 13.87s/it] 47%|████▋ | 4661/10000 [18:17:21<20:37:07, 13.90s/it] {'loss': 0.021, 'learning_rate': 2.672e-05, 'epoch': 6.1} 47%|████▋ | 4661/10000 [18:17:21<20:37:07, 13.90s/it] 47%|████▋ | 4662/10000 [18:17:35<20:38:31, 13.92s/it] {'loss': 0.0209, 'learning_rate': 2.6715e-05, 'epoch': 6.1} 47%|████▋ | 4662/10000 [18:17:35<20:38:31, 13.92s/it] 47%|████▋ | 4663/10000 [18:17:49<20:35:30, 13.89s/it] {'loss': 0.0204, 'learning_rate': 2.671e-05, 'epoch': 6.1} 47%|████▋ | 4663/10000 [18:17:49<20:35:30, 13.89s/it] 47%|████▋ | 4664/10000 [18:18:03<20:36:27, 13.90s/it] {'loss': 0.0183, 'learning_rate': 2.6705000000000003e-05, 'epoch': 6.1} 47%|████▋ | 4664/10000 [18:18:03<20:36:27, 13.90s/it] 47%|████▋ | 4665/10000 [18:18:17<20:37:01, 13.91s/it] {'loss': 0.0221, 'learning_rate': 2.6700000000000002e-05, 'epoch': 6.11} 47%|████▋ | 4665/10000 [18:18:17<20:37:01, 13.91s/it] 47%|████▋ | 4666/10000 [18:18:31<20:38:03, 13.93s/it] {'loss': 0.0178, 'learning_rate': 2.6695000000000004e-05, 'epoch': 6.11} 47%|████▋ | 4666/10000 [18:18:31<20:38:03, 13.93s/it] 47%|████▋ | 4667/10000 [18:18:45<20:36:10, 13.91s/it] {'loss': 0.019, 'learning_rate': 2.6690000000000004e-05, 'epoch': 6.11} 47%|████▋ | 4667/10000 [18:18:45<20:36:10, 13.91s/it] 47%|████▋ | 4668/10000 [18:18:59<20:37:12, 13.92s/it] {'loss': 0.0209, 'learning_rate': 2.6685e-05, 'epoch': 6.11} 47%|████▋ | 4668/10000 [18:18:59<20:37:12, 13.92s/it] 47%|████▋ | 4669/10000 [18:19:13<20:36:12, 13.91s/it] {'loss': 0.0199, 'learning_rate': 2.668e-05, 'epoch': 6.11} 47%|████▋ | 4669/10000 [18:19:13<20:36:12, 13.91s/it] 47%|████▋ | 4670/10000 [18:19:27<20:37:42, 13.93s/it] {'loss': 0.0232, 'learning_rate': 2.6675e-05, 'epoch': 6.11} 47%|████▋ | 4670/10000 [18:19:27<20:37:42, 13.93s/it] 47%|████▋ | 4671/10000 [18:19:41<20:39:10, 13.95s/it] {'loss': 0.0204, 'learning_rate': 2.667e-05, 'epoch': 6.11} 47%|████▋ | 4671/10000 [18:19:41<20:39:10, 13.95s/it] 47%|████▋ | 4672/10000 [18:19:54<20:34:08, 13.90s/it] {'loss': 0.0181, 'learning_rate': 2.6665e-05, 'epoch': 6.12} 47%|████▋ | 4672/10000 [18:19:54<20:34:08, 13.90s/it] 47%|████▋ | 4673/10000 [18:20:08<20:37:09, 13.93s/it] {'loss': 0.0211, 'learning_rate': 2.6660000000000002e-05, 'epoch': 6.12} 47%|████▋ | 4673/10000 [18:20:08<20:37:09, 13.93s/it] 47%|████▋ | 4674/10000 [18:20:22<20:40:15, 13.97s/it] {'loss': 0.0191, 'learning_rate': 2.6655e-05, 'epoch': 6.12} 47%|████▋ | 4674/10000 [18:20:22<20:40:15, 13.97s/it] 47%|████▋ | 4675/10000 [18:20:36<20:38:09, 13.95s/it] {'loss': 0.0225, 'learning_rate': 2.6650000000000004e-05, 'epoch': 6.12} 47%|████▋ | 4675/10000 [18:20:36<20:38:09, 13.95s/it] 47%|████▋ | 4676/10000 [18:20:50<20:35:53, 13.93s/it] {'loss': 0.0224, 'learning_rate': 2.6645000000000003e-05, 'epoch': 6.12} 47%|████▋ | 4676/10000 [18:20:50<20:35:53, 13.93s/it] 47%|████▋ | 4677/10000 [18:21:04<20:34:16, 13.91s/it] {'loss': 0.0199, 'learning_rate': 2.6640000000000002e-05, 'epoch': 6.12} 47%|████▋ | 4677/10000 [18:21:04<20:34:16, 13.91s/it] 47%|████▋ | 4678/10000 [18:21:18<20:31:39, 13.89s/it] {'loss': 0.015, 'learning_rate': 2.6634999999999998e-05, 'epoch': 6.12} 47%|████▋ | 4678/10000 [18:21:18<20:31:39, 13.89s/it] 47%|████▋ | 4679/10000 [18:21:32<20:32:47, 13.90s/it] {'loss': 0.0192, 'learning_rate': 2.663e-05, 'epoch': 6.12} 47%|████▋ | 4679/10000 [18:21:32<20:32:47, 13.90s/it] 47%|████▋ | 4680/10000 [18:21:46<20:33:17, 13.91s/it] {'loss': 0.0227, 'learning_rate': 2.6625e-05, 'epoch': 6.13} 47%|████▋ | 4680/10000 [18:21:46<20:33:17, 13.91s/it] 47%|████▋ | 4681/10000 [18:22:00<20:31:30, 13.89s/it] {'loss': 0.0175, 'learning_rate': 2.662e-05, 'epoch': 6.13} 47%|████▋ | 4681/10000 [18:22:00<20:31:30, 13.89s/it] 47%|████▋ | 4682/10000 [18:22:14<20:31:42, 13.90s/it] {'loss': 0.0225, 'learning_rate': 2.6615000000000002e-05, 'epoch': 6.13} 47%|████▋ | 4682/10000 [18:22:14<20:31:42, 13.90s/it] 47%|████▋ | 4683/10000 [18:22:27<20:30:30, 13.89s/it] {'loss': 0.0232, 'learning_rate': 2.661e-05, 'epoch': 6.13} 47%|████▋ | 4683/10000 [18:22:27<20:30:30, 13.89s/it] 47%|████▋ | 4684/10000 [18:22:41<20:31:23, 13.90s/it] {'loss': 0.0222, 'learning_rate': 2.6605000000000004e-05, 'epoch': 6.13} 47%|████▋ | 4684/10000 [18:22:41<20:31:23, 13.90s/it] 47%|████▋ | 4685/10000 [18:22:55<20:30:59, 13.90s/it] {'loss': 0.0214, 'learning_rate': 2.6600000000000003e-05, 'epoch': 6.13} 47%|████▋ | 4685/10000 [18:22:55<20:30:59, 13.90s/it] 47%|████▋ | 4686/10000 [18:23:09<20:33:11, 13.92s/it] {'loss': 0.0253, 'learning_rate': 2.6595000000000002e-05, 'epoch': 6.13} 47%|████▋ | 4686/10000 [18:23:09<20:33:11, 13.92s/it] 47%|████▋ | 4687/10000 [18:23:23<20:35:18, 13.95s/it] {'loss': 0.0189, 'learning_rate': 2.6590000000000005e-05, 'epoch': 6.13} 47%|████▋ | 4687/10000 [18:23:23<20:35:18, 13.95s/it] 47%|████▋ | 4688/10000 [18:23:37<20:33:05, 13.93s/it] {'loss': 0.0166, 'learning_rate': 2.6585e-05, 'epoch': 6.14} 47%|████▋ | 4688/10000 [18:23:37<20:33:05, 13.93s/it] 47%|████▋ | 4689/10000 [18:23:51<20:33:30, 13.94s/it] {'loss': 0.0186, 'learning_rate': 2.658e-05, 'epoch': 6.14} 47%|████▋ | 4689/10000 [18:23:51<20:33:30, 13.94s/it] 47%|████▋ | 4690/10000 [18:24:05<20:31:09, 13.91s/it] {'loss': 0.0177, 'learning_rate': 2.6575e-05, 'epoch': 6.14} 47%|████▋ | 4690/10000 [18:24:05<20:31:09, 13.91s/it] 47%|████▋ | 4691/10000 [18:24:19<20:34:07, 13.95s/it] {'loss': 0.0253, 'learning_rate': 2.657e-05, 'epoch': 6.14} 47%|████▋ | 4691/10000 [18:24:19<20:34:07, 13.95s/it] 47%|████▋ | 4692/10000 [18:24:33<20:35:01, 13.96s/it] {'loss': 0.019, 'learning_rate': 2.6565e-05, 'epoch': 6.14} 47%|████▋ | 4692/10000 [18:24:33<20:35:01, 13.96s/it] 47%|████▋ | 4693/10000 [18:24:47<20:31:59, 13.93s/it] {'loss': 0.0176, 'learning_rate': 2.6560000000000003e-05, 'epoch': 6.14} 47%|████▋ | 4693/10000 [18:24:47<20:31:59, 13.93s/it] 47%|████▋ | 4694/10000 [18:25:01<20:31:05, 13.92s/it] {'loss': 0.0206, 'learning_rate': 2.6555000000000002e-05, 'epoch': 6.14} 47%|████▋ | 4694/10000 [18:25:01<20:31:05, 13.92s/it] 47%|████▋ | 4695/10000 [18:25:15<20:32:50, 13.94s/it] {'loss': 0.0186, 'learning_rate': 2.655e-05, 'epoch': 6.15} 47%|████▋ | 4695/10000 [18:25:15<20:32:50, 13.94s/it] 47%|████▋ | 4696/10000 [18:25:29<20:32:18, 13.94s/it] {'loss': 0.0227, 'learning_rate': 2.6545000000000004e-05, 'epoch': 6.15} 47%|████▋ | 4696/10000 [18:25:29<20:32:18, 13.94s/it] 47%|████▋ | 4697/10000 [18:25:43<20:32:53, 13.95s/it] {'loss': 0.0176, 'learning_rate': 2.6540000000000003e-05, 'epoch': 6.15} 47%|████▋ | 4697/10000 [18:25:43<20:32:53, 13.95s/it] 47%|████▋ | 4698/10000 [18:25:56<20:29:53, 13.92s/it] {'loss': 0.0235, 'learning_rate': 2.6535e-05, 'epoch': 6.15} 47%|████▋ | 4698/10000 [18:25:56<20:29:53, 13.92s/it] 47%|████▋ | 4699/10000 [18:26:10<20:27:17, 13.89s/it] {'loss': 0.0268, 'learning_rate': 2.653e-05, 'epoch': 6.15} 47%|████▋ | 4699/10000 [18:26:10<20:27:17, 13.89s/it] 47%|████▋ | 4700/10000 [18:26:24<20:27:35, 13.90s/it] {'loss': 0.0184, 'learning_rate': 2.6525e-05, 'epoch': 6.15} 47%|████▋ | 4700/10000 [18:26:24<20:27:35, 13.90s/it] 47%|████▋ | 4701/10000 [18:26:38<20:24:10, 13.86s/it] {'loss': 0.0206, 'learning_rate': 2.652e-05, 'epoch': 6.15} 47%|████▋ | 4701/10000 [18:26:38<20:24:10, 13.86s/it] 47%|████▋ | 4702/10000 [18:26:52<20:25:28, 13.88s/it] {'loss': 0.0213, 'learning_rate': 2.6515e-05, 'epoch': 6.15} 47%|████▋ | 4702/10000 [18:26:52<20:25:28, 13.88s/it] 47%|████▋ | 4703/10000 [18:27:06<20:30:00, 13.93s/it] {'loss': 0.0238, 'learning_rate': 2.6510000000000002e-05, 'epoch': 6.16} 47%|████▋ | 4703/10000 [18:27:06<20:30:00, 13.93s/it] 47%|████▋ | 4704/10000 [18:27:20<20:29:48, 13.93s/it] {'loss': 0.0216, 'learning_rate': 2.6505e-05, 'epoch': 6.16} 47%|████▋ | 4704/10000 [18:27:20<20:29:48, 13.93s/it] 47%|████▋ | 4705/10000 [18:27:34<20:32:24, 13.96s/it] {'loss': 0.0194, 'learning_rate': 2.6500000000000004e-05, 'epoch': 6.16} 47%|████▋ | 4705/10000 [18:27:34<20:32:24, 13.96s/it] 47%|████▋ | 4706/10000 [18:27:48<20:27:29, 13.91s/it] {'loss': 0.0293, 'learning_rate': 2.6495000000000003e-05, 'epoch': 6.16} 47%|████▋ | 4706/10000 [18:27:48<20:27:29, 13.91s/it] 47%|████▋ | 4707/10000 [18:28:02<20:25:46, 13.90s/it] {'loss': 0.0169, 'learning_rate': 2.6490000000000002e-05, 'epoch': 6.16} 47%|████▋ | 4707/10000 [18:28:02<20:25:46, 13.90s/it] 47%|████▋ | 4708/10000 [18:28:16<20:26:57, 13.91s/it] {'loss': 0.0203, 'learning_rate': 2.6484999999999998e-05, 'epoch': 6.16} 47%|████▋ | 4708/10000 [18:28:16<20:26:57, 13.91s/it] 47%|████▋ | 4709/10000 [18:28:29<20:28:56, 13.94s/it] {'loss': 0.0195, 'learning_rate': 2.648e-05, 'epoch': 6.16} 47%|████▋ | 4709/10000 [18:28:30<20:28:56, 13.94s/it] 47%|████▋ | 4710/10000 [18:28:43<20:30:18, 13.95s/it] {'loss': 0.0193, 'learning_rate': 2.6475e-05, 'epoch': 6.16} 47%|████▋ | 4710/10000 [18:28:44<20:30:18, 13.95s/it] 47%|████▋ | 4711/10000 [18:28:57<20:29:55, 13.95s/it] {'loss': 0.0205, 'learning_rate': 2.647e-05, 'epoch': 6.17} 47%|████▋ | 4711/10000 [18:28:57<20:29:55, 13.95s/it] 47%|████▋ | 4712/10000 [18:29:11<20:28:10, 13.94s/it] {'loss': 0.0219, 'learning_rate': 2.6465e-05, 'epoch': 6.17} 47%|████▋ | 4712/10000 [18:29:11<20:28:10, 13.94s/it] 47%|████▋ | 4713/10000 [18:29:25<20:28:04, 13.94s/it] {'loss': 0.0183, 'learning_rate': 2.646e-05, 'epoch': 6.17} 47%|████▋ | 4713/10000 [18:29:25<20:28:04, 13.94s/it] 47%|████▋ | 4714/10000 [18:29:39<20:29:19, 13.95s/it] {'loss': 0.0214, 'learning_rate': 2.6455000000000003e-05, 'epoch': 6.17} 47%|████▋ | 4714/10000 [18:29:39<20:29:19, 13.95s/it] 47%|████▋ | 4715/10000 [18:29:53<20:28:21, 13.95s/it] {'loss': 0.0252, 'learning_rate': 2.6450000000000003e-05, 'epoch': 6.17} 47%|████▋ | 4715/10000 [18:29:53<20:28:21, 13.95s/it] 47%|████▋ | 4716/10000 [18:30:07<20:26:18, 13.92s/it] {'loss': 0.0167, 'learning_rate': 2.6445000000000002e-05, 'epoch': 6.17} 47%|████▋ | 4716/10000 [18:30:07<20:26:18, 13.92s/it] 47%|████▋ | 4717/10000 [18:30:21<20:27:15, 13.94s/it] {'loss': 0.0188, 'learning_rate': 2.6440000000000004e-05, 'epoch': 6.17} 47%|████▋ | 4717/10000 [18:30:21<20:27:15, 13.94s/it] 47%|████▋ | 4718/10000 [18:30:35<20:28:31, 13.96s/it] {'loss': 0.0236, 'learning_rate': 2.6435e-05, 'epoch': 6.18} 47%|████▋ | 4718/10000 [18:30:35<20:28:31, 13.96s/it] 47%|████▋ | 4719/10000 [18:30:49<20:29:51, 13.97s/it] {'loss': 0.0211, 'learning_rate': 2.643e-05, 'epoch': 6.18} 47%|████▋ | 4719/10000 [18:30:49<20:29:51, 13.97s/it] 47%|████▋ | 4720/10000 [18:31:03<20:23:43, 13.91s/it] {'loss': 0.0177, 'learning_rate': 2.6425e-05, 'epoch': 6.18} 47%|████▋ | 4720/10000 [18:31:03<20:23:43, 13.91s/it] 47%|████▋ | 4721/10000 [18:31:17<20:23:54, 13.91s/it] {'loss': 0.0266, 'learning_rate': 2.642e-05, 'epoch': 6.18} 47%|████▋ | 4721/10000 [18:31:17<20:23:54, 13.91s/it] 47%|████▋ | 4722/10000 [18:31:31<20:20:42, 13.88s/it] {'loss': 0.0217, 'learning_rate': 2.6415e-05, 'epoch': 6.18} 47%|████▋ | 4722/10000 [18:31:31<20:20:42, 13.88s/it] 47%|████▋ | 4723/10000 [18:31:44<20:17:17, 13.84s/it] {'loss': 0.0171, 'learning_rate': 2.6410000000000003e-05, 'epoch': 6.18} 47%|████▋ | 4723/10000 [18:31:44<20:17:17, 13.84s/it] 47%|████▋ | 4724/10000 [18:31:58<20:19:40, 13.87s/it] {'loss': 0.0153, 'learning_rate': 2.6405000000000002e-05, 'epoch': 6.18} 47%|████▋ | 4724/10000 [18:31:58<20:19:40, 13.87s/it] 47%|████▋ | 4725/10000 [18:32:12<20:21:25, 13.89s/it] {'loss': 0.0199, 'learning_rate': 2.64e-05, 'epoch': 6.18} 47%|████▋ | 4725/10000 [18:32:12<20:21:25, 13.89s/it] 47%|████▋ | 4726/10000 [18:32:26<20:22:02, 13.90s/it] {'loss': 0.0156, 'learning_rate': 2.6395000000000004e-05, 'epoch': 6.19} 47%|████▋ | 4726/10000 [18:32:26<20:22:02, 13.90s/it] 47%|████▋ | 4727/10000 [18:32:40<20:23:59, 13.93s/it] {'loss': 0.0199, 'learning_rate': 2.6390000000000003e-05, 'epoch': 6.19} 47%|████▋ | 4727/10000 [18:32:40<20:23:59, 13.93s/it] 47%|████▋ | 4728/10000 [18:32:54<20:22:43, 13.92s/it] {'loss': 0.0271, 'learning_rate': 2.6385e-05, 'epoch': 6.19} 47%|████▋ | 4728/10000 [18:32:54<20:22:43, 13.92s/it] 47%|████▋ | 4729/10000 [18:33:08<20:25:21, 13.95s/it] {'loss': 0.0195, 'learning_rate': 2.6379999999999998e-05, 'epoch': 6.19} 47%|████▋ | 4729/10000 [18:33:08<20:25:21, 13.95s/it] 47%|████▋ | 4730/10000 [18:33:22<20:26:53, 13.97s/it] {'loss': 0.0202, 'learning_rate': 2.6375e-05, 'epoch': 6.19} 47%|████▋ | 4730/10000 [18:33:22<20:26:53, 13.97s/it] 47%|████▋ | 4731/10000 [18:33:36<20:26:07, 13.96s/it] {'loss': 0.021, 'learning_rate': 2.637e-05, 'epoch': 6.19} 47%|████▋ | 4731/10000 [18:33:36<20:26:07, 13.96s/it] 47%|████▋ | 4732/10000 [18:33:50<20:23:11, 13.93s/it] {'loss': 0.0279, 'learning_rate': 2.6365e-05, 'epoch': 6.19} 47%|████▋ | 4732/10000 [18:33:50<20:23:11, 13.93s/it] 47%|████▋ | 4733/10000 [18:34:04<20:22:20, 13.92s/it] {'loss': 0.0187, 'learning_rate': 2.6360000000000002e-05, 'epoch': 6.2} 47%|████▋ | 4733/10000 [18:34:04<20:22:20, 13.92s/it] 47%|████▋ | 4734/10000 [18:34:18<20:20:56, 13.91s/it] {'loss': 0.0174, 'learning_rate': 2.6355e-05, 'epoch': 6.2} 47%|████▋ | 4734/10000 [18:34:18<20:20:56, 13.91s/it] 47%|████▋ | 4735/10000 [18:34:32<20:21:23, 13.92s/it] {'loss': 0.0203, 'learning_rate': 2.6350000000000004e-05, 'epoch': 6.2} 47%|████▋ | 4735/10000 [18:34:32<20:21:23, 13.92s/it] 47%|████▋ | 4736/10000 [18:34:46<20:23:15, 13.94s/it] {'loss': 0.0209, 'learning_rate': 2.6345000000000003e-05, 'epoch': 6.2} 47%|████▋ | 4736/10000 [18:34:46<20:23:15, 13.94s/it] 47%|████▋ | 4737/10000 [18:34:59<20:21:41, 13.93s/it] {'loss': 0.0207, 'learning_rate': 2.6340000000000002e-05, 'epoch': 6.2} 47%|████▋ | 4737/10000 [18:34:59<20:21:41, 13.93s/it] 47%|████▋ | 4738/10000 [18:35:13<20:19:51, 13.91s/it] {'loss': 0.0216, 'learning_rate': 2.6334999999999998e-05, 'epoch': 6.2} 47%|████▋ | 4738/10000 [18:35:13<20:19:51, 13.91s/it] 47%|████▋ | 4739/10000 [18:35:27<20:19:11, 13.90s/it] {'loss': 0.017, 'learning_rate': 2.633e-05, 'epoch': 6.2} 47%|████▋ | 4739/10000 [18:35:27<20:19:11, 13.90s/it] 47%|████▋ | 4740/10000 [18:35:41<20:22:32, 13.95s/it] {'loss': 0.0175, 'learning_rate': 2.6325e-05, 'epoch': 6.2} 47%|████▋ | 4740/10000 [18:35:41<20:22:32, 13.95s/it] 47%|████▋ | 4741/10000 [18:35:55<20:21:51, 13.94s/it] {'loss': 0.0183, 'learning_rate': 2.632e-05, 'epoch': 6.21} 47%|████▋ | 4741/10000 [18:35:55<20:21:51, 13.94s/it] 47%|████▋ | 4742/10000 [18:36:09<20:20:55, 13.93s/it] {'loss': 0.0208, 'learning_rate': 2.6315e-05, 'epoch': 6.21} 47%|████▋ | 4742/10000 [18:36:09<20:20:55, 13.93s/it] 47%|████▋ | 4743/10000 [18:36:23<20:23:52, 13.97s/it] {'loss': 0.0217, 'learning_rate': 2.631e-05, 'epoch': 6.21} 47%|████▋ | 4743/10000 [18:36:23<20:23:52, 13.97s/it] 47%|████▋ | 4744/10000 [18:36:37<20:22:36, 13.96s/it] {'loss': 0.0139, 'learning_rate': 2.6305000000000003e-05, 'epoch': 6.21} 47%|████▋ | 4744/10000 [18:36:37<20:22:36, 13.96s/it] 47%|████▋ | 4745/10000 [18:36:51<20:20:13, 13.93s/it] {'loss': 0.0217, 'learning_rate': 2.6300000000000002e-05, 'epoch': 6.21} 47%|████▋ | 4745/10000 [18:36:51<20:20:13, 13.93s/it] 47%|████▋ | 4746/10000 [18:37:05<20:20:06, 13.93s/it] {'loss': 0.0191, 'learning_rate': 2.6295e-05, 'epoch': 6.21} 47%|████▋ | 4746/10000 [18:37:05<20:20:06, 13.93s/it] 47%|████▋ | 4747/10000 [18:37:19<20:18:21, 13.92s/it] {'loss': 0.0215, 'learning_rate': 2.6290000000000004e-05, 'epoch': 6.21} 47%|████▋ | 4747/10000 [18:37:19<20:18:21, 13.92s/it] 47%|████▋ | 4748/10000 [18:37:33<20:19:24, 13.93s/it] {'loss': 0.0266, 'learning_rate': 2.6285e-05, 'epoch': 6.21} 47%|████▋ | 4748/10000 [18:37:33<20:19:24, 13.93s/it] 47%|████▋ | 4749/10000 [18:37:47<20:19:48, 13.94s/it] {'loss': 0.0188, 'learning_rate': 2.628e-05, 'epoch': 6.22} 47%|████▋ | 4749/10000 [18:37:47<20:19:48, 13.94s/it] 48%|████▊ | 4750/10000 [18:38:01<20:17:19, 13.91s/it] {'loss': 0.0235, 'learning_rate': 2.6275e-05, 'epoch': 6.22} 48%|████▊ | 4750/10000 [18:38:01<20:17:19, 13.91s/it] 48%|████▊ | 4751/10000 [18:38:14<20:17:04, 13.91s/it] {'loss': 0.0165, 'learning_rate': 2.627e-05, 'epoch': 6.22} 48%|████▊ | 4751/10000 [18:38:14<20:17:04, 13.91s/it] 48%|████▊ | 4752/10000 [18:38:28<20:14:44, 13.89s/it] {'loss': 0.0195, 'learning_rate': 2.6265e-05, 'epoch': 6.22} 48%|████▊ | 4752/10000 [18:38:28<20:14:44, 13.89s/it] 48%|████▊ | 4753/10000 [18:38:42<20:14:05, 13.88s/it] {'loss': 0.0158, 'learning_rate': 2.6260000000000003e-05, 'epoch': 6.22} 48%|████▊ | 4753/10000 [18:38:42<20:14:05, 13.88s/it] 48%|████▊ | 4754/10000 [18:38:56<20:20:30, 13.96s/it] {'loss': 0.0213, 'learning_rate': 2.6255000000000002e-05, 'epoch': 6.22} 48%|████▊ | 4754/10000 [18:38:56<20:20:30, 13.96s/it] 48%|████▊ | 4755/10000 [18:39:10<20:18:29, 13.94s/it] {'loss': 0.0183, 'learning_rate': 2.625e-05, 'epoch': 6.22} 48%|████▊ | 4755/10000 [18:39:10<20:18:29, 13.94s/it] 48%|████▊ | 4756/10000 [18:39:24<20:18:55, 13.95s/it] {'loss': 0.0197, 'learning_rate': 2.6245000000000004e-05, 'epoch': 6.23} 48%|████▊ | 4756/10000 [18:39:24<20:18:55, 13.95s/it] 48%|████▊ | 4757/10000 [18:39:38<20:17:46, 13.94s/it] {'loss': 0.0158, 'learning_rate': 2.6240000000000003e-05, 'epoch': 6.23} 48%|████▊ | 4757/10000 [18:39:38<20:17:46, 13.94s/it] 48%|████▊ | 4758/10000 [18:39:52<20:14:48, 13.90s/it] {'loss': 0.0163, 'learning_rate': 2.6235000000000005e-05, 'epoch': 6.23} 48%|████▊ | 4758/10000 [18:39:52<20:14:48, 13.90s/it] 48%|████▊ | 4759/10000 [18:40:06<20:13:37, 13.89s/it] {'loss': 0.0194, 'learning_rate': 2.6229999999999998e-05, 'epoch': 6.23} 48%|████▊ | 4759/10000 [18:40:06<20:13:37, 13.89s/it] 48%|████▊ | 4760/10000 [18:40:20<20:13:46, 13.90s/it] {'loss': 0.021, 'learning_rate': 2.6225e-05, 'epoch': 6.23} 48%|████▊ | 4760/10000 [18:40:20<20:13:46, 13.90s/it] 48%|████▊ | 4761/10000 [18:40:34<20:12:54, 13.89s/it] {'loss': 0.022, 'learning_rate': 2.622e-05, 'epoch': 6.23} 48%|████▊ | 4761/10000 [18:40:34<20:12:54, 13.89s/it] 48%|████▊ | 4762/10000 [18:40:48<20:15:29, 13.92s/it] {'loss': 0.0173, 'learning_rate': 2.6215000000000002e-05, 'epoch': 6.23} 48%|████▊ | 4762/10000 [18:40:48<20:15:29, 13.92s/it] 48%|████▊ | 4763/10000 [18:41:01<20:15:48, 13.93s/it] {'loss': 0.0213, 'learning_rate': 2.621e-05, 'epoch': 6.23} 48%|████▊ | 4763/10000 [18:41:01<20:15:48, 13.93s/it] 48%|████▊ | 4764/10000 [18:41:15<20:12:12, 13.89s/it] {'loss': 0.0177, 'learning_rate': 2.6205e-05, 'epoch': 6.24} 48%|████▊ | 4764/10000 [18:41:15<20:12:12, 13.89s/it] 48%|████▊ | 4765/10000 [18:41:29<20:13:07, 13.90s/it] {'loss': 0.0211, 'learning_rate': 2.6200000000000003e-05, 'epoch': 6.24} 48%|████▊ | 4765/10000 [18:41:29<20:13:07, 13.90s/it] 48%|████▊ | 4766/10000 [18:41:43<20:13:43, 13.91s/it] {'loss': 0.0289, 'learning_rate': 2.6195000000000002e-05, 'epoch': 6.24} 48%|████▊ | 4766/10000 [18:41:43<20:13:43, 13.91s/it] 48%|████▊ | 4767/10000 [18:41:57<20:13:08, 13.91s/it] {'loss': 0.0222, 'learning_rate': 2.6190000000000005e-05, 'epoch': 6.24} 48%|████▊ | 4767/10000 [18:41:57<20:13:08, 13.91s/it] 48%|████▊ | 4768/10000 [18:42:11<20:12:08, 13.90s/it] {'loss': 0.0165, 'learning_rate': 2.6185000000000004e-05, 'epoch': 6.24} 48%|████▊ | 4768/10000 [18:42:11<20:12:08, 13.90s/it] 48%|████▊ | 4769/10000 [18:42:25<20:11:41, 13.90s/it] {'loss': 0.0207, 'learning_rate': 2.618e-05, 'epoch': 6.24} 48%|████▊ | 4769/10000 [18:42:25<20:11:41, 13.90s/it] 48%|████▊ | 4770/10000 [18:42:39<20:11:30, 13.90s/it] {'loss': 0.018, 'learning_rate': 2.6175e-05, 'epoch': 6.24} 48%|████▊ | 4770/10000 [18:42:39<20:11:30, 13.90s/it] 48%|████▊ | 4771/10000 [18:42:53<20:17:41, 13.97s/it] {'loss': 0.0172, 'learning_rate': 2.617e-05, 'epoch': 6.24} 48%|████▊ | 4771/10000 [18:42:53<20:17:41, 13.97s/it] 48%|████▊ | 4772/10000 [18:43:07<20:18:41, 13.99s/it] {'loss': 0.0135, 'learning_rate': 2.6165e-05, 'epoch': 6.25} 48%|████▊ | 4772/10000 [18:43:07<20:18:41, 13.99s/it] 48%|████▊ | 4773/10000 [18:43:21<20:17:21, 13.97s/it] {'loss': 0.0193, 'learning_rate': 2.616e-05, 'epoch': 6.25} 48%|████▊ | 4773/10000 [18:43:21<20:17:21, 13.97s/it] 48%|████▊ | 4774/10000 [18:43:35<20:17:16, 13.98s/it] {'loss': 0.0172, 'learning_rate': 2.6155000000000003e-05, 'epoch': 6.25} 48%|████▊ | 4774/10000 [18:43:35<20:17:16, 13.98s/it] 48%|████▊ | 4775/10000 [18:43:49<20:16:00, 13.96s/it] {'loss': 0.0243, 'learning_rate': 2.6150000000000002e-05, 'epoch': 6.25} 48%|████▊ | 4775/10000 [18:43:49<20:16:00, 13.96s/it] 48%|████▊ | 4776/10000 [18:44:03<20:12:00, 13.92s/it] {'loss': 0.0184, 'learning_rate': 2.6145e-05, 'epoch': 6.25} 48%|████▊ | 4776/10000 [18:44:03<20:12:00, 13.92s/it] 48%|████▊ | 4777/10000 [18:44:16<20:11:56, 13.92s/it] {'loss': 0.0199, 'learning_rate': 2.6140000000000004e-05, 'epoch': 6.25} 48%|████▊ | 4777/10000 [18:44:17<20:11:56, 13.92s/it] 48%|████▊ | 4778/10000 [18:44:30<20:09:37, 13.90s/it] {'loss': 0.023, 'learning_rate': 2.6135000000000003e-05, 'epoch': 6.25} 48%|████▊ | 4778/10000 [18:44:30<20:09:37, 13.90s/it] 48%|████▊ | 4779/10000 [18:44:44<20:08:28, 13.89s/it] {'loss': 0.0201, 'learning_rate': 2.613e-05, 'epoch': 6.26} 48%|████▊ | 4779/10000 [18:44:44<20:08:28, 13.89s/it] 48%|████▊ | 4780/10000 [18:44:58<20:06:59, 13.87s/it] {'loss': 0.0198, 'learning_rate': 2.6124999999999998e-05, 'epoch': 6.26} 48%|████▊ | 4780/10000 [18:44:58<20:06:59, 13.87s/it] 48%|████▊ | 4781/10000 [18:45:12<20:08:41, 13.90s/it] {'loss': 0.017, 'learning_rate': 2.612e-05, 'epoch': 6.26} 48%|████▊ | 4781/10000 [18:45:12<20:08:41, 13.90s/it] 48%|████▊ | 4782/10000 [18:45:26<20:09:34, 13.91s/it] {'loss': 0.0184, 'learning_rate': 2.6115e-05, 'epoch': 6.26} 48%|████▊ | 4782/10000 [18:45:26<20:09:34, 13.91s/it] 48%|████▊ | 4783/10000 [18:45:40<20:10:24, 13.92s/it] {'loss': 0.0256, 'learning_rate': 2.6110000000000002e-05, 'epoch': 6.26} 48%|████▊ | 4783/10000 [18:45:40<20:10:24, 13.92s/it] 48%|████▊ | 4784/10000 [18:45:54<20:10:11, 13.92s/it] {'loss': 0.0211, 'learning_rate': 2.6105e-05, 'epoch': 6.26} 48%|████▊ | 4784/10000 [18:45:54<20:10:11, 13.92s/it] 48%|████▊ | 4785/10000 [18:46:08<20:10:14, 13.92s/it] {'loss': 0.0218, 'learning_rate': 2.61e-05, 'epoch': 6.26} 48%|████▊ | 4785/10000 [18:46:08<20:10:14, 13.92s/it] 48%|████▊ | 4786/10000 [18:46:22<20:13:07, 13.96s/it] {'loss': 0.0191, 'learning_rate': 2.6095000000000003e-05, 'epoch': 6.26} 48%|████▊ | 4786/10000 [18:46:22<20:13:07, 13.96s/it] 48%|████▊ | 4787/10000 [18:46:36<20:12:41, 13.96s/it] {'loss': 0.0211, 'learning_rate': 2.6090000000000003e-05, 'epoch': 6.27} 48%|████▊ | 4787/10000 [18:46:36<20:12:41, 13.96s/it] 48%|████▊ | 4788/10000 [18:46:50<20:10:16, 13.93s/it] {'loss': 0.0221, 'learning_rate': 2.6085000000000005e-05, 'epoch': 6.27} 48%|████▊ | 4788/10000 [18:46:50<20:10:16, 13.93s/it] 48%|████▊ | 4789/10000 [18:47:04<20:09:58, 13.93s/it] {'loss': 0.0185, 'learning_rate': 2.6079999999999998e-05, 'epoch': 6.27} 48%|████▊ | 4789/10000 [18:47:04<20:09:58, 13.93s/it] 48%|████▊ | 4790/10000 [18:47:18<20:12:28, 13.96s/it] {'loss': 0.0155, 'learning_rate': 2.6075e-05, 'epoch': 6.27} 48%|████▊ | 4790/10000 [18:47:18<20:12:28, 13.96s/it] 48%|████▊ | 4791/10000 [18:47:31<20:10:42, 13.95s/it] {'loss': 0.0173, 'learning_rate': 2.607e-05, 'epoch': 6.27} 48%|████▊ | 4791/10000 [18:47:31<20:10:42, 13.95s/it] 48%|████▊ | 4792/10000 [18:47:45<20:08:32, 13.92s/it] {'loss': 0.0186, 'learning_rate': 2.6065000000000002e-05, 'epoch': 6.27} 48%|████▊ | 4792/10000 [18:47:45<20:08:32, 13.92s/it] 48%|████▊ | 4793/10000 [18:47:59<20:08:50, 13.93s/it] {'loss': 0.0187, 'learning_rate': 2.606e-05, 'epoch': 6.27} 48%|████▊ | 4793/10000 [18:47:59<20:08:50, 13.93s/it] 48%|████▊ | 4794/10000 [18:48:13<20:08:32, 13.93s/it] {'loss': 0.0192, 'learning_rate': 2.6055e-05, 'epoch': 6.27} 48%|████▊ | 4794/10000 [18:48:13<20:08:32, 13.93s/it] 48%|████▊ | 4795/10000 [18:48:27<20:07:22, 13.92s/it] {'loss': 0.017, 'learning_rate': 2.6050000000000003e-05, 'epoch': 6.28} 48%|████▊ | 4795/10000 [18:48:27<20:07:22, 13.92s/it] 48%|████▊ | 4796/10000 [18:48:41<20:03:15, 13.87s/it] {'loss': 0.0192, 'learning_rate': 2.6045000000000002e-05, 'epoch': 6.28} 48%|████▊ | 4796/10000 [18:48:41<20:03:15, 13.87s/it] 48%|████▊ | 4797/10000 [18:48:55<20:07:09, 13.92s/it] {'loss': 0.0173, 'learning_rate': 2.6040000000000005e-05, 'epoch': 6.28} 48%|████▊ | 4797/10000 [18:48:55<20:07:09, 13.92s/it] 48%|████▊ | 4798/10000 [18:49:09<20:05:15, 13.90s/it] {'loss': 0.0206, 'learning_rate': 2.6035000000000004e-05, 'epoch': 6.28} 48%|████▊ | 4798/10000 [18:49:09<20:05:15, 13.90s/it] 48%|████▊ | 4799/10000 [18:49:23<20:05:11, 13.90s/it] {'loss': 0.0204, 'learning_rate': 2.603e-05, 'epoch': 6.28} 48%|████▊ | 4799/10000 [18:49:23<20:05:11, 13.90s/it] 48%|████▊ | 4800/10000 [18:49:36<20:01:16, 13.86s/it] {'loss': 0.0145, 'learning_rate': 2.6025e-05, 'epoch': 6.28} 48%|████▊ | 4800/10000 [18:49:36<20:01:16, 13.86s/it] 48%|████▊ | 4801/10000 [18:49:50<20:01:04, 13.86s/it] {'loss': 0.0177, 'learning_rate': 2.602e-05, 'epoch': 6.28} 48%|████▊ | 4801/10000 [18:49:50<20:01:04, 13.86s/it] 48%|████▊ | 4802/10000 [18:50:04<19:59:43, 13.85s/it] {'loss': 0.0226, 'learning_rate': 2.6015e-05, 'epoch': 6.29} 48%|████▊ | 4802/10000 [18:50:04<19:59:43, 13.85s/it] 48%|████▊ | 4803/10000 [18:50:18<20:00:26, 13.86s/it] {'loss': 0.0198, 'learning_rate': 2.601e-05, 'epoch': 6.29} 48%|████▊ | 4803/10000 [18:50:18<20:00:26, 13.86s/it] 48%|████▊ | 4804/10000 [18:50:32<20:00:22, 13.86s/it] {'loss': 0.0171, 'learning_rate': 2.6005000000000003e-05, 'epoch': 6.29} 48%|████▊ | 4804/10000 [18:50:32<20:00:22, 13.86s/it] 48%|████▊ | 4805/10000 [18:50:46<20:01:45, 13.88s/it] {'loss': 0.0275, 'learning_rate': 2.6000000000000002e-05, 'epoch': 6.29} 48%|████▊ | 4805/10000 [18:50:46<20:01:45, 13.88s/it] 48%|████▊ | 4806/10000 [18:51:00<20:01:50, 13.88s/it] {'loss': 0.0223, 'learning_rate': 2.5995000000000004e-05, 'epoch': 6.29} 48%|████▊ | 4806/10000 [18:51:00<20:01:50, 13.88s/it] 48%|████▊ | 4807/10000 [18:51:14<20:04:27, 13.92s/it] {'loss': 0.0212, 'learning_rate': 2.5990000000000004e-05, 'epoch': 6.29} 48%|████▊ | 4807/10000 [18:51:14<20:04:27, 13.92s/it] 48%|████▊ | 4808/10000 [18:51:28<20:04:00, 13.91s/it] {'loss': 0.018, 'learning_rate': 2.5985000000000003e-05, 'epoch': 6.29} 48%|████▊ | 4808/10000 [18:51:28<20:04:00, 13.91s/it] 48%|████▊ | 4809/10000 [18:51:41<20:04:38, 13.92s/it] {'loss': 0.0168, 'learning_rate': 2.598e-05, 'epoch': 6.29} 48%|████▊ | 4809/10000 [18:51:42<20:04:38, 13.92s/it] 48%|████▊ | 4810/10000 [18:51:55<20:03:04, 13.91s/it] {'loss': 0.0169, 'learning_rate': 2.5974999999999998e-05, 'epoch': 6.3} 48%|████▊ | 4810/10000 [18:51:55<20:03:04, 13.91s/it] 48%|████▊ | 4811/10000 [18:52:09<20:01:30, 13.89s/it] {'loss': 0.0214, 'learning_rate': 2.597e-05, 'epoch': 6.3} 48%|████▊ | 4811/10000 [18:52:09<20:01:30, 13.89s/it] 48%|████▊ | 4812/10000 [18:52:23<19:58:40, 13.86s/it] {'loss': 0.0163, 'learning_rate': 2.5965e-05, 'epoch': 6.3} 48%|████▊ | 4812/10000 [18:52:23<19:58:40, 13.86s/it] 48%|████▊ | 4813/10000 [18:52:37<19:59:33, 13.88s/it] {'loss': 0.0222, 'learning_rate': 2.5960000000000002e-05, 'epoch': 6.3} 48%|████▊ | 4813/10000 [18:52:37<19:59:33, 13.88s/it] 48%|████▊ | 4814/10000 [18:52:51<20:01:00, 13.90s/it] {'loss': 0.0208, 'learning_rate': 2.5955e-05, 'epoch': 6.3} 48%|████▊ | 4814/10000 [18:52:51<20:01:00, 13.90s/it] 48%|████▊ | 4815/10000 [18:53:05<19:58:49, 13.87s/it] {'loss': 0.0184, 'learning_rate': 2.595e-05, 'epoch': 6.3} 48%|████▊ | 4815/10000 [18:53:05<19:58:49, 13.87s/it] 48%|████▊ | 4816/10000 [18:53:19<20:01:53, 13.91s/it] {'loss': 0.0168, 'learning_rate': 2.5945000000000003e-05, 'epoch': 6.3} 48%|████▊ | 4816/10000 [18:53:19<20:01:53, 13.91s/it] 48%|████▊ | 4817/10000 [18:53:33<19:59:13, 13.88s/it] {'loss': 0.0249, 'learning_rate': 2.5940000000000002e-05, 'epoch': 6.3} 48%|████▊ | 4817/10000 [18:53:33<19:59:13, 13.88s/it] 48%|████▊ | 4818/10000 [18:53:46<20:00:36, 13.90s/it] {'loss': 0.0241, 'learning_rate': 2.5935000000000005e-05, 'epoch': 6.31} 48%|████▊ | 4818/10000 [18:53:46<20:00:36, 13.90s/it] 48%|████▊ | 4819/10000 [18:54:00<20:01:38, 13.92s/it] {'loss': 0.0263, 'learning_rate': 2.5929999999999997e-05, 'epoch': 6.31} 48%|████▊ | 4819/10000 [18:54:00<20:01:38, 13.92s/it] 48%|████▊ | 4820/10000 [18:54:14<19:59:45, 13.90s/it] {'loss': 0.0173, 'learning_rate': 2.5925e-05, 'epoch': 6.31} 48%|████▊ | 4820/10000 [18:54:14<19:59:45, 13.90s/it] 48%|████▊ | 4821/10000 [18:54:28<20:01:34, 13.92s/it] {'loss': 0.0218, 'learning_rate': 2.592e-05, 'epoch': 6.31} 48%|████▊ | 4821/10000 [18:54:28<20:01:34, 13.92s/it] 48%|████▊ | 4822/10000 [18:54:42<20:00:42, 13.91s/it] {'loss': 0.0213, 'learning_rate': 2.5915000000000002e-05, 'epoch': 6.31} 48%|████▊ | 4822/10000 [18:54:42<20:00:42, 13.91s/it] 48%|████▊ | 4823/10000 [18:54:56<19:56:06, 13.86s/it] {'loss': 0.023, 'learning_rate': 2.591e-05, 'epoch': 6.31} 48%|████▊ | 4823/10000 [18:54:56<19:56:06, 13.86s/it] 48%|████▊ | 4824/10000 [18:55:10<19:59:00, 13.90s/it] {'loss': 0.0166, 'learning_rate': 2.5905e-05, 'epoch': 6.31} 48%|████▊ | 4824/10000 [18:55:10<19:59:00, 13.90s/it] 48%|████▊ | 4825/10000 [18:55:24<20:00:29, 13.92s/it] {'loss': 0.022, 'learning_rate': 2.5900000000000003e-05, 'epoch': 6.32} 48%|████▊ | 4825/10000 [18:55:24<20:00:29, 13.92s/it] 48%|████▊ | 4826/10000 [18:55:38<20:03:29, 13.96s/it] {'loss': 0.0208, 'learning_rate': 2.5895000000000002e-05, 'epoch': 6.32} 48%|████▊ | 4826/10000 [18:55:38<20:03:29, 13.96s/it] 48%|████▊ | 4827/10000 [18:55:52<20:01:45, 13.94s/it] {'loss': 0.0201, 'learning_rate': 2.5890000000000005e-05, 'epoch': 6.32} 48%|████▊ | 4827/10000 [18:55:52<20:01:45, 13.94s/it] 48%|████▊ | 4828/10000 [18:56:06<19:59:49, 13.92s/it] {'loss': 0.02, 'learning_rate': 2.5885000000000004e-05, 'epoch': 6.32} 48%|████▊ | 4828/10000 [18:56:06<19:59:49, 13.92s/it] 48%|████▊ | 4829/10000 [18:56:20<20:00:28, 13.93s/it] {'loss': 0.0205, 'learning_rate': 2.588e-05, 'epoch': 6.32} 48%|████▊ | 4829/10000 [18:56:20<20:00:28, 13.93s/it] 48%|████▊ | 4830/10000 [18:56:33<19:58:12, 13.91s/it] {'loss': 0.0198, 'learning_rate': 2.5875e-05, 'epoch': 6.32} 48%|████▊ | 4830/10000 [18:56:33<19:58:12, 13.91s/it] 48%|████▊ | 4831/10000 [18:56:47<19:59:36, 13.92s/it] {'loss': 0.0171, 'learning_rate': 2.587e-05, 'epoch': 6.32} 48%|████▊ | 4831/10000 [18:56:47<19:59:36, 13.92s/it] 48%|████▊ | 4832/10000 [18:57:01<19:56:02, 13.89s/it] {'loss': 0.0223, 'learning_rate': 2.5865e-05, 'epoch': 6.32} 48%|████▊ | 4832/10000 [18:57:01<19:56:02, 13.89s/it] 48%|████▊ | 4833/10000 [18:57:15<19:58:01, 13.91s/it] {'loss': 0.0189, 'learning_rate': 2.586e-05, 'epoch': 6.33} 48%|████▊ | 4833/10000 [18:57:15<19:58:01, 13.91s/it] 48%|████▊ | 4834/10000 [18:57:29<19:55:28, 13.88s/it] {'loss': 0.0195, 'learning_rate': 2.5855000000000002e-05, 'epoch': 6.33} 48%|████▊ | 4834/10000 [18:57:29<19:55:28, 13.88s/it] 48%|████▊ | 4835/10000 [18:57:43<19:56:09, 13.90s/it] {'loss': 0.0245, 'learning_rate': 2.585e-05, 'epoch': 6.33} 48%|████▊ | 4835/10000 [18:57:43<19:56:09, 13.90s/it] 48%|████▊ | 4836/10000 [18:57:57<19:57:45, 13.92s/it] {'loss': 0.0205, 'learning_rate': 2.5845000000000004e-05, 'epoch': 6.33} 48%|████▊ | 4836/10000 [18:57:57<19:57:45, 13.92s/it] 48%|████▊ | 4837/10000 [18:58:11<19:57:49, 13.92s/it] {'loss': 0.0165, 'learning_rate': 2.5840000000000003e-05, 'epoch': 6.33} 48%|████▊ | 4837/10000 [18:58:11<19:57:49, 13.92s/it] 48%|████▊ | 4838/10000 [18:58:25<19:58:42, 13.93s/it] {'loss': 0.0175, 'learning_rate': 2.5835000000000003e-05, 'epoch': 6.33} 48%|████▊ | 4838/10000 [18:58:25<19:58:42, 13.93s/it] 48%|████▊ | 4839/10000 [18:58:39<19:54:44, 13.89s/it] {'loss': 0.0199, 'learning_rate': 2.583e-05, 'epoch': 6.33} 48%|████▊ | 4839/10000 [18:58:39<19:54:44, 13.89s/it] 48%|████▊ | 4840/10000 [18:58:52<19:55:24, 13.90s/it] {'loss': 0.0206, 'learning_rate': 2.5824999999999998e-05, 'epoch': 6.34} 48%|████▊ | 4840/10000 [18:58:53<19:55:24, 13.90s/it] 48%|████▊ | 4841/10000 [18:59:06<19:51:39, 13.86s/it] {'loss': 0.0216, 'learning_rate': 2.582e-05, 'epoch': 6.34} 48%|████▊ | 4841/10000 [18:59:06<19:51:39, 13.86s/it] 48%|████▊ | 4842/10000 [18:59:20<19:53:18, 13.88s/it] {'loss': 0.0224, 'learning_rate': 2.5815e-05, 'epoch': 6.34} 48%|████▊ | 4842/10000 [18:59:20<19:53:18, 13.88s/it] 48%|████▊ | 4843/10000 [18:59:34<19:55:18, 13.91s/it] {'loss': 0.024, 'learning_rate': 2.5810000000000002e-05, 'epoch': 6.34} 48%|████▊ | 4843/10000 [18:59:34<19:55:18, 13.91s/it] 48%|████▊ | 4844/10000 [18:59:48<19:54:23, 13.90s/it] {'loss': 0.0202, 'learning_rate': 2.5805e-05, 'epoch': 6.34} 48%|████▊ | 4844/10000 [18:59:48<19:54:23, 13.90s/it] 48%|████▊ | 4845/10000 [19:00:02<19:57:41, 13.94s/it] {'loss': 0.0186, 'learning_rate': 2.58e-05, 'epoch': 6.34} 48%|████▊ | 4845/10000 [19:00:02<19:57:41, 13.94s/it] 48%|████▊ | 4846/10000 [19:00:16<19:56:31, 13.93s/it] {'loss': 0.0241, 'learning_rate': 2.5795000000000003e-05, 'epoch': 6.34} 48%|████▊ | 4846/10000 [19:00:16<19:56:31, 13.93s/it] 48%|████▊ | 4847/10000 [19:00:30<19:56:40, 13.93s/it] {'loss': 0.0208, 'learning_rate': 2.5790000000000002e-05, 'epoch': 6.34} 48%|████▊ | 4847/10000 [19:00:30<19:56:40, 13.93s/it] 48%|████▊ | 4848/10000 [19:00:44<19:58:08, 13.95s/it] {'loss': 0.0214, 'learning_rate': 2.5785000000000005e-05, 'epoch': 6.35} 48%|████▊ | 4848/10000 [19:00:44<19:58:08, 13.95s/it] 48%|████▊ | 4849/10000 [19:00:58<19:54:26, 13.91s/it] {'loss': 0.0161, 'learning_rate': 2.5779999999999997e-05, 'epoch': 6.35} 48%|████▊ | 4849/10000 [19:00:58<19:54:26, 13.91s/it] 48%|████▊ | 4850/10000 [19:01:12<19:54:31, 13.92s/it] {'loss': 0.0206, 'learning_rate': 2.5775e-05, 'epoch': 6.35} 48%|████▊ | 4850/10000 [19:01:12<19:54:31, 13.92s/it] 49%|████▊ | 4851/10000 [19:01:26<19:53:00, 13.90s/it] {'loss': 0.0218, 'learning_rate': 2.577e-05, 'epoch': 6.35} 49%|████▊ | 4851/10000 [19:01:26<19:53:00, 13.90s/it] 49%|████▊ | 4852/10000 [19:01:39<19:53:34, 13.91s/it] {'loss': 0.0195, 'learning_rate': 2.5765e-05, 'epoch': 6.35} 49%|████▊ | 4852/10000 [19:01:40<19:53:34, 13.91s/it] 49%|████▊ | 4853/10000 [19:01:53<19:56:45, 13.95s/it] {'loss': 0.0215, 'learning_rate': 2.576e-05, 'epoch': 6.35} 49%|████▊ | 4853/10000 [19:01:54<19:56:45, 13.95s/it] 49%|████▊ | 4854/10000 [19:02:07<19:55:13, 13.94s/it] {'loss': 0.0187, 'learning_rate': 2.5755e-05, 'epoch': 6.35} 49%|████▊ | 4854/10000 [19:02:07<19:55:13, 13.94s/it] 49%|████▊ | 4855/10000 [19:02:21<19:55:12, 13.94s/it] {'loss': 0.0184, 'learning_rate': 2.5750000000000002e-05, 'epoch': 6.35} 49%|████▊ | 4855/10000 [19:02:21<19:55:12, 13.94s/it] 49%|████▊ | 4856/10000 [19:02:35<19:54:39, 13.93s/it] {'loss': 0.0193, 'learning_rate': 2.5745e-05, 'epoch': 6.36} 49%|████▊ | 4856/10000 [19:02:35<19:54:39, 13.93s/it] 49%|████▊ | 4857/10000 [19:02:49<19:59:01, 13.99s/it] {'loss': 0.0207, 'learning_rate': 2.5740000000000004e-05, 'epoch': 6.36} 49%|████▊ | 4857/10000 [19:02:49<19:59:01, 13.99s/it] 49%|████▊ | 4858/10000 [19:03:03<19:59:24, 14.00s/it] {'loss': 0.0208, 'learning_rate': 2.5735000000000003e-05, 'epoch': 6.36} 49%|████▊ | 4858/10000 [19:03:03<19:59:24, 14.00s/it] 49%|████▊ | 4859/10000 [19:03:17<19:58:43, 13.99s/it] {'loss': 0.021, 'learning_rate': 2.573e-05, 'epoch': 6.36} 49%|████▊ | 4859/10000 [19:03:17<19:58:43, 13.99s/it] 49%|████▊ | 4860/10000 [19:03:31<19:56:22, 13.97s/it] {'loss': 0.0173, 'learning_rate': 2.5725e-05, 'epoch': 6.36} 49%|████▊ | 4860/10000 [19:03:31<19:56:22, 13.97s/it] 49%|████▊ | 4861/10000 [19:03:45<19:54:19, 13.94s/it] {'loss': 0.0165, 'learning_rate': 2.572e-05, 'epoch': 6.36} 49%|████▊ | 4861/10000 [19:03:45<19:54:19, 13.94s/it] 49%|████▊ | 4862/10000 [19:03:59<19:54:23, 13.95s/it] {'loss': 0.0203, 'learning_rate': 2.5715e-05, 'epoch': 6.36} 49%|████▊ | 4862/10000 [19:03:59<19:54:23, 13.95s/it] 49%|████▊ | 4863/10000 [19:04:13<19:52:22, 13.93s/it] {'loss': 0.0215, 'learning_rate': 2.571e-05, 'epoch': 6.37} 49%|████▊ | 4863/10000 [19:04:13<19:52:22, 13.93s/it] 49%|████▊ | 4864/10000 [19:04:27<19:51:04, 13.91s/it] {'loss': 0.0211, 'learning_rate': 2.5705000000000002e-05, 'epoch': 6.37} 49%|████▊ | 4864/10000 [19:04:27<19:51:04, 13.91s/it] 49%|████▊ | 4865/10000 [19:04:41<19:54:59, 13.96s/it] {'loss': 0.0191, 'learning_rate': 2.57e-05, 'epoch': 6.37} 49%|████▊ | 4865/10000 [19:04:41<19:54:59, 13.96s/it] 49%|████▊ | 4866/10000 [19:04:55<19:53:21, 13.95s/it] {'loss': 0.0202, 'learning_rate': 2.5695000000000004e-05, 'epoch': 6.37} 49%|████▊ | 4866/10000 [19:04:55<19:53:21, 13.95s/it] 49%|████▊ | 4867/10000 [19:05:09<19:51:01, 13.92s/it] {'loss': 0.0201, 'learning_rate': 2.5690000000000003e-05, 'epoch': 6.37} 49%|████▊ | 4867/10000 [19:05:09<19:51:01, 13.92s/it] 49%|████▊ | 4868/10000 [19:05:23<19:52:31, 13.94s/it] {'loss': 0.0192, 'learning_rate': 2.5685000000000002e-05, 'epoch': 6.37} 49%|████▊ | 4868/10000 [19:05:23<19:52:31, 13.94s/it] 49%|████▊ | 4869/10000 [19:05:37<19:52:54, 13.95s/it] {'loss': 0.0182, 'learning_rate': 2.5679999999999998e-05, 'epoch': 6.37} 49%|████▊ | 4869/10000 [19:05:37<19:52:54, 13.95s/it] 49%|████▊ | 4870/10000 [19:05:51<19:50:27, 13.92s/it] {'loss': 0.0183, 'learning_rate': 2.5675e-05, 'epoch': 6.37} 49%|████▊ | 4870/10000 [19:05:51<19:50:27, 13.92s/it] 49%|████▊ | 4871/10000 [19:06:04<19:48:54, 13.91s/it] {'loss': 0.0227, 'learning_rate': 2.567e-05, 'epoch': 6.38} 49%|████▊ | 4871/10000 [19:06:04<19:48:54, 13.91s/it] 49%|████▊ | 4872/10000 [19:06:18<19:45:04, 13.87s/it] {'loss': 0.0209, 'learning_rate': 2.5665e-05, 'epoch': 6.38} 49%|████▊ | 4872/10000 [19:06:18<19:45:04, 13.87s/it] 49%|████▊ | 4873/10000 [19:06:32<19:43:34, 13.85s/it] {'loss': 0.0411, 'learning_rate': 2.566e-05, 'epoch': 6.38} 49%|████▊ | 4873/10000 [19:06:32<19:43:34, 13.85s/it] 49%|████▊ | 4874/10000 [19:06:46<19:44:19, 13.86s/it] {'loss': 0.0186, 'learning_rate': 2.5655e-05, 'epoch': 6.38} 49%|████▊ | 4874/10000 [19:06:46<19:44:19, 13.86s/it] 49%|████▉ | 4875/10000 [19:07:00<19:44:46, 13.87s/it] {'loss': 0.0193, 'learning_rate': 2.5650000000000003e-05, 'epoch': 6.38} 49%|████▉ | 4875/10000 [19:07:00<19:44:46, 13.87s/it] 49%|████▉ | 4876/10000 [19:07:14<19:44:01, 13.86s/it] {'loss': 0.0205, 'learning_rate': 2.5645000000000003e-05, 'epoch': 6.38} 49%|████▉ | 4876/10000 [19:07:14<19:44:01, 13.86s/it] 49%|████▉ | 4877/10000 [19:07:28<19:44:10, 13.87s/it] {'loss': 0.0188, 'learning_rate': 2.5640000000000002e-05, 'epoch': 6.38} 49%|████▉ | 4877/10000 [19:07:28<19:44:10, 13.87s/it] 49%|████▉ | 4878/10000 [19:07:41<19:46:04, 13.89s/it] {'loss': 0.019, 'learning_rate': 2.5635000000000004e-05, 'epoch': 6.38} 49%|████▉ | 4878/10000 [19:07:42<19:46:04, 13.89s/it] 49%|████▉ | 4879/10000 [19:07:55<19:47:38, 13.91s/it] {'loss': 0.0185, 'learning_rate': 2.5629999999999997e-05, 'epoch': 6.39} 49%|████▉ | 4879/10000 [19:07:55<19:47:38, 13.91s/it] 49%|████▉ | 4880/10000 [19:08:09<19:46:55, 13.91s/it] {'loss': 0.0189, 'learning_rate': 2.5625e-05, 'epoch': 6.39} 49%|████▉ | 4880/10000 [19:08:09<19:46:55, 13.91s/it] 49%|████▉ | 4881/10000 [19:08:23<19:45:44, 13.90s/it] {'loss': 0.0201, 'learning_rate': 2.562e-05, 'epoch': 6.39} 49%|████▉ | 4881/10000 [19:08:23<19:45:44, 13.90s/it] 49%|████▉ | 4882/10000 [19:08:37<19:47:27, 13.92s/it] {'loss': 0.0165, 'learning_rate': 2.5615e-05, 'epoch': 6.39} 49%|████▉ | 4882/10000 [19:08:37<19:47:27, 13.92s/it] 49%|████▉ | 4883/10000 [19:08:51<19:45:54, 13.91s/it] {'loss': 0.0182, 'learning_rate': 2.561e-05, 'epoch': 6.39} 49%|████▉ | 4883/10000 [19:08:51<19:45:54, 13.91s/it] 49%|████▉ | 4884/10000 [19:09:05<19:43:36, 13.88s/it] {'loss': 0.0227, 'learning_rate': 2.5605e-05, 'epoch': 6.39} 49%|████▉ | 4884/10000 [19:09:05<19:43:36, 13.88s/it] 49%|████▉ | 4885/10000 [19:09:19<19:43:57, 13.89s/it] {'loss': 0.0204, 'learning_rate': 2.5600000000000002e-05, 'epoch': 6.39} 49%|████▉ | 4885/10000 [19:09:19<19:43:57, 13.89s/it] 49%|████▉ | 4886/10000 [19:09:33<19:43:54, 13.89s/it] {'loss': 0.0183, 'learning_rate': 2.5595e-05, 'epoch': 6.4} 49%|████▉ | 4886/10000 [19:09:33<19:43:54, 13.89s/it] 49%|████▉ | 4887/10000 [19:09:47<19:42:11, 13.87s/it] {'loss': 0.0189, 'learning_rate': 2.5590000000000004e-05, 'epoch': 6.4} 49%|████▉ | 4887/10000 [19:09:47<19:42:11, 13.87s/it] 49%|████▉ | 4888/10000 [19:10:00<19:42:33, 13.88s/it] {'loss': 0.0152, 'learning_rate': 2.5585000000000003e-05, 'epoch': 6.4} 49%|████▉ | 4888/10000 [19:10:00<19:42:33, 13.88s/it] 49%|████▉ | 4889/10000 [19:10:14<19:42:57, 13.89s/it] {'loss': 0.0198, 'learning_rate': 2.5580000000000002e-05, 'epoch': 6.4} 49%|████▉ | 4889/10000 [19:10:14<19:42:57, 13.89s/it] 49%|████▉ | 4890/10000 [19:10:28<19:47:24, 13.94s/it] {'loss': 0.0207, 'learning_rate': 2.5574999999999998e-05, 'epoch': 6.4} 49%|████▉ | 4890/10000 [19:10:28<19:47:24, 13.94s/it] 49%|████▉ | 4891/10000 [19:10:42<19:46:56, 13.94s/it] {'loss': 0.0196, 'learning_rate': 2.557e-05, 'epoch': 6.4} 49%|████▉ | 4891/10000 [19:10:42<19:46:56, 13.94s/it] 49%|████▉ | 4892/10000 [19:10:56<19:50:30, 13.98s/it] {'loss': 0.0188, 'learning_rate': 2.5565e-05, 'epoch': 6.4} 49%|████▉ | 4892/10000 [19:10:56<19:50:30, 13.98s/it] 49%|████▉ | 4893/10000 [19:11:10<19:47:33, 13.95s/it] {'loss': 0.0171, 'learning_rate': 2.556e-05, 'epoch': 6.4} 49%|████▉ | 4893/10000 [19:11:10<19:47:33, 13.95s/it] 49%|████▉ | 4894/10000 [19:11:24<19:48:28, 13.97s/it] {'loss': 0.0233, 'learning_rate': 2.5555000000000002e-05, 'epoch': 6.41} 49%|████▉ | 4894/10000 [19:11:24<19:48:28, 13.97s/it] 49%|████▉ | 4895/10000 [19:11:38<19:48:03, 13.96s/it] {'loss': 0.0215, 'learning_rate': 2.555e-05, 'epoch': 6.41} 49%|████▉ | 4895/10000 [19:11:38<19:48:03, 13.96s/it] 49%|████▉ | 4896/10000 [19:11:52<19:45:38, 13.94s/it] {'loss': 0.0246, 'learning_rate': 2.5545000000000004e-05, 'epoch': 6.41} 49%|████▉ | 4896/10000 [19:11:52<19:45:38, 13.94s/it] 49%|████▉ | 4897/10000 [19:12:06<19:49:50, 13.99s/it] {'loss': 0.0176, 'learning_rate': 2.5540000000000003e-05, 'epoch': 6.41} 49%|████▉ | 4897/10000 [19:12:06<19:49:50, 13.99s/it] 49%|████▉ | 4898/10000 [19:12:20<19:47:31, 13.97s/it] {'loss': 0.0221, 'learning_rate': 2.5535000000000002e-05, 'epoch': 6.41} 49%|████▉ | 4898/10000 [19:12:20<19:47:31, 13.97s/it] 49%|████▉ | 4899/10000 [19:12:34<19:46:24, 13.96s/it] {'loss': 0.0265, 'learning_rate': 2.5530000000000005e-05, 'epoch': 6.41} 49%|████▉ | 4899/10000 [19:12:34<19:46:24, 13.96s/it] 49%|████▉ | 4900/10000 [19:12:48<19:40:57, 13.89s/it] {'loss': 0.0179, 'learning_rate': 2.5525e-05, 'epoch': 6.41} 49%|████▉ | 4900/10000 [19:12:48<19:40:57, 13.89s/it] 49%|████▉ | 4901/10000 [19:13:02<19:46:35, 13.96s/it] {'loss': 0.0221, 'learning_rate': 2.552e-05, 'epoch': 6.41} 49%|████▉ | 4901/10000 [19:13:02<19:46:35, 13.96s/it] 49%|████▉ | 4902/10000 [19:13:16<19:45:10, 13.95s/it] {'loss': 0.0218, 'learning_rate': 2.5515e-05, 'epoch': 6.42} 49%|████▉ | 4902/10000 [19:13:16<19:45:10, 13.95s/it] 49%|████▉ | 4903/10000 [19:13:30<19:41:30, 13.91s/it] {'loss': 0.0169, 'learning_rate': 2.551e-05, 'epoch': 6.42} 49%|████▉ | 4903/10000 [19:13:30<19:41:30, 13.91s/it] 49%|████▉ | 4904/10000 [19:13:44<19:41:19, 13.91s/it] {'loss': 0.0231, 'learning_rate': 2.5505e-05, 'epoch': 6.42} 49%|████▉ | 4904/10000 [19:13:44<19:41:19, 13.91s/it] 49%|████▉ | 4905/10000 [19:13:58<19:45:14, 13.96s/it] {'loss': 0.0199, 'learning_rate': 2.5500000000000003e-05, 'epoch': 6.42} 49%|████▉ | 4905/10000 [19:13:58<19:45:14, 13.96s/it] 49%|████▉ | 4906/10000 [19:14:12<19:47:43, 13.99s/it] {'loss': 0.0233, 'learning_rate': 2.5495000000000002e-05, 'epoch': 6.42} 49%|████▉ | 4906/10000 [19:14:12<19:47:43, 13.99s/it] 49%|████▉ | 4907/10000 [19:14:26<19:46:19, 13.98s/it] {'loss': 0.0236, 'learning_rate': 2.549e-05, 'epoch': 6.42} 49%|████▉ | 4907/10000 [19:14:26<19:46:19, 13.98s/it] 49%|████▉ | 4908/10000 [19:14:40<19:46:11, 13.98s/it] {'loss': 0.0191, 'learning_rate': 2.5485000000000004e-05, 'epoch': 6.42} 49%|████▉ | 4908/10000 [19:14:40<19:46:11, 13.98s/it] 49%|████▉ | 4909/10000 [19:14:53<19:42:53, 13.94s/it] {'loss': 0.0193, 'learning_rate': 2.5480000000000003e-05, 'epoch': 6.43} 49%|████▉ | 4909/10000 [19:14:54<19:42:53, 13.94s/it] 49%|████▉ | 4910/10000 [19:15:07<19:42:30, 13.94s/it] {'loss': 0.0247, 'learning_rate': 2.5475e-05, 'epoch': 6.43} 49%|████▉ | 4910/10000 [19:15:07<19:42:30, 13.94s/it] 49%|████▉ | 4911/10000 [19:15:21<19:40:46, 13.92s/it] {'loss': 0.0215, 'learning_rate': 2.547e-05, 'epoch': 6.43} 49%|████▉ | 4911/10000 [19:15:21<19:40:46, 13.92s/it] 49%|████▉ | 4912/10000 [19:15:35<19:41:05, 13.93s/it] {'loss': 0.0198, 'learning_rate': 2.5465e-05, 'epoch': 6.43} 49%|████▉ | 4912/10000 [19:15:35<19:41:05, 13.93s/it] 49%|████▉ | 4913/10000 [19:15:49<19:42:49, 13.95s/it] {'loss': 0.0203, 'learning_rate': 2.546e-05, 'epoch': 6.43} 49%|████▉ | 4913/10000 [19:15:49<19:42:49, 13.95s/it] 49%|████▉ | 4914/10000 [19:16:03<19:42:43, 13.95s/it] {'loss': 0.0178, 'learning_rate': 2.5455e-05, 'epoch': 6.43} 49%|████▉ | 4914/10000 [19:16:03<19:42:43, 13.95s/it] 49%|████▉ | 4915/10000 [19:16:17<19:43:33, 13.97s/it] {'loss': 0.0179, 'learning_rate': 2.5450000000000002e-05, 'epoch': 6.43} 49%|████▉ | 4915/10000 [19:16:17<19:43:33, 13.97s/it] 49%|████▉ | 4916/10000 [19:16:31<19:40:46, 13.94s/it] {'loss': 0.0225, 'learning_rate': 2.5445e-05, 'epoch': 6.43} 49%|████▉ | 4916/10000 [19:16:31<19:40:46, 13.94s/it] 49%|████▉ | 4917/10000 [19:16:45<19:42:15, 13.96s/it] {'loss': 0.0231, 'learning_rate': 2.5440000000000004e-05, 'epoch': 6.44} 49%|████▉ | 4917/10000 [19:16:45<19:42:15, 13.96s/it] 49%|████▉ | 4918/10000 [19:16:59<19:40:08, 13.93s/it] {'loss': 0.0273, 'learning_rate': 2.5435000000000003e-05, 'epoch': 6.44} 49%|████▉ | 4918/10000 [19:16:59<19:40:08, 13.93s/it] 49%|████▉ | 4919/10000 [19:17:13<19:42:42, 13.97s/it] {'loss': 0.0197, 'learning_rate': 2.5430000000000002e-05, 'epoch': 6.44} 49%|████▉ | 4919/10000 [19:17:13<19:42:42, 13.97s/it] 49%|████▉ | 4920/10000 [19:17:27<19:38:24, 13.92s/it] {'loss': 0.0217, 'learning_rate': 2.5424999999999998e-05, 'epoch': 6.44} 49%|████▉ | 4920/10000 [19:17:27<19:38:24, 13.92s/it] 49%|████▉ | 4921/10000 [19:17:41<19:40:00, 13.94s/it] {'loss': 0.0184, 'learning_rate': 2.542e-05, 'epoch': 6.44} 49%|████▉ | 4921/10000 [19:17:41<19:40:00, 13.94s/it] 49%|████▉ | 4922/10000 [19:17:55<19:43:32, 13.98s/it] {'loss': 0.0252, 'learning_rate': 2.5415e-05, 'epoch': 6.44} 49%|████▉ | 4922/10000 [19:17:55<19:43:32, 13.98s/it] 49%|████▉ | 4923/10000 [19:18:09<19:41:03, 13.96s/it] {'loss': 0.0178, 'learning_rate': 2.541e-05, 'epoch': 6.44} 49%|████▉ | 4923/10000 [19:18:09<19:41:03, 13.96s/it][2024-11-04 15:36:29,924] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 49%|████▉ | 4924/10000 [19:18:22<19:13:41, 13.64s/it] {'loss': 0.0253, 'learning_rate': 2.541e-05, 'epoch': 6.45} 49%|████▉ | 4924/10000 [19:18:22<19:13:41, 13.64s/it][2024-11-04 15:36:42,863] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 49%|████▉ | 4925/10000 [19:18:35<18:55:58, 13.43s/it] {'loss': 0.0183, 'learning_rate': 2.541e-05, 'epoch': 6.45} 49%|████▉ | 4925/10000 [19:18:35<18:55:58, 13.43s/it] 49%|████▉ | 4926/10000 [19:18:49<19:08:34, 13.58s/it] {'loss': 0.0275, 'learning_rate': 2.5405e-05, 'epoch': 6.45} 49%|████▉ | 4926/10000 [19:18:49<19:08:34, 13.58s/it] 49%|████▉ | 4927/10000 [19:19:02<19:14:35, 13.66s/it] {'loss': 0.017, 'learning_rate': 2.54e-05, 'epoch': 6.45} 49%|████▉ | 4927/10000 [19:19:02<19:14:35, 13.66s/it] 49%|████▉ | 4928/10000 [19:19:16<19:19:08, 13.71s/it] {'loss': 0.0196, 'learning_rate': 2.5395000000000003e-05, 'epoch': 6.45} 49%|████▉ | 4928/10000 [19:19:16<19:19:08, 13.71s/it] 49%|████▉ | 4929/10000 [19:19:30<19:22:56, 13.76s/it] {'loss': 0.0282, 'learning_rate': 2.5390000000000003e-05, 'epoch': 6.45} 49%|████▉ | 4929/10000 [19:19:30<19:22:56, 13.76s/it] 49%|████▉ | 4930/10000 [19:19:44<19:28:12, 13.82s/it] {'loss': 0.0217, 'learning_rate': 2.5385000000000002e-05, 'epoch': 6.45} 49%|████▉ | 4930/10000 [19:19:44<19:28:12, 13.82s/it] 49%|████▉ | 4931/10000 [19:19:58<19:29:36, 13.84s/it] {'loss': 0.0234, 'learning_rate': 2.5380000000000004e-05, 'epoch': 6.45} 49%|████▉ | 4931/10000 [19:19:58<19:29:36, 13.84s/it] 49%|████▉ | 4932/10000 [19:20:12<19:30:33, 13.86s/it] {'loss': 0.0182, 'learning_rate': 2.5375e-05, 'epoch': 6.46} 49%|████▉ | 4932/10000 [19:20:12<19:30:33, 13.86s/it] 49%|████▉ | 4933/10000 [19:20:26<19:30:34, 13.86s/it] {'loss': 0.0186, 'learning_rate': 2.537e-05, 'epoch': 6.46} 49%|████▉ | 4933/10000 [19:20:26<19:30:34, 13.86s/it] 49%|████▉ | 4934/10000 [19:20:40<19:30:57, 13.87s/it] {'loss': 0.0218, 'learning_rate': 2.5365e-05, 'epoch': 6.46} 49%|████▉ | 4934/10000 [19:20:40<19:30:57, 13.87s/it] 49%|████▉ | 4935/10000 [19:20:54<19:32:12, 13.89s/it] {'loss': 0.022, 'learning_rate': 2.536e-05, 'epoch': 6.46} 49%|████▉ | 4935/10000 [19:20:54<19:32:12, 13.89s/it] 49%|████▉ | 4936/10000 [19:21:07<19:27:52, 13.84s/it] {'loss': 0.0217, 'learning_rate': 2.5355e-05, 'epoch': 6.46} 49%|████▉ | 4936/10000 [19:21:07<19:27:52, 13.84s/it] 49%|████▉ | 4937/10000 [19:21:21<19:30:50, 13.88s/it] {'loss': 0.0202, 'learning_rate': 2.5350000000000003e-05, 'epoch': 6.46} 49%|████▉ | 4937/10000 [19:21:21<19:30:50, 13.88s/it] 49%|████▉ | 4938/10000 [19:21:35<19:32:15, 13.89s/it] {'loss': 0.0187, 'learning_rate': 2.5345000000000002e-05, 'epoch': 6.46} 49%|████▉ | 4938/10000 [19:21:35<19:32:15, 13.89s/it] 49%|████▉ | 4939/10000 [19:21:49<19:32:03, 13.90s/it] {'loss': 0.0207, 'learning_rate': 2.534e-05, 'epoch': 6.46} 49%|████▉ | 4939/10000 [19:21:49<19:32:03, 13.90s/it] 49%|████▉ | 4940/10000 [19:22:03<19:32:42, 13.91s/it] {'loss': 0.0248, 'learning_rate': 2.5335000000000004e-05, 'epoch': 6.47} 49%|████▉ | 4940/10000 [19:22:03<19:32:42, 13.91s/it] 49%|████▉ | 4941/10000 [19:22:17<19:29:56, 13.88s/it] {'loss': 0.0217, 'learning_rate': 2.5330000000000003e-05, 'epoch': 6.47} 49%|████▉ | 4941/10000 [19:22:17<19:29:56, 13.88s/it] 49%|████▉ | 4942/10000 [19:22:31<19:30:31, 13.89s/it] {'loss': 0.0218, 'learning_rate': 2.5325e-05, 'epoch': 6.47} 49%|████▉ | 4942/10000 [19:22:31<19:30:31, 13.89s/it] 49%|████▉ | 4943/10000 [19:22:44<19:27:30, 13.85s/it] {'loss': 0.0177, 'learning_rate': 2.5319999999999998e-05, 'epoch': 6.47} 49%|████▉ | 4943/10000 [19:22:45<19:27:30, 13.85s/it] 49%|████▉ | 4944/10000 [19:22:58<19:31:29, 13.90s/it] {'loss': 0.0236, 'learning_rate': 2.5315e-05, 'epoch': 6.47} 49%|████▉ | 4944/10000 [19:22:59<19:31:29, 13.90s/it] 49%|████▉ | 4945/10000 [19:23:12<19:31:52, 13.91s/it] {'loss': 0.0211, 'learning_rate': 2.531e-05, 'epoch': 6.47} 49%|████▉ | 4945/10000 [19:23:12<19:31:52, 13.91s/it] 49%|████▉ | 4946/10000 [19:23:26<19:33:07, 13.93s/it] {'loss': 0.0167, 'learning_rate': 2.5305000000000003e-05, 'epoch': 6.47} 49%|████▉ | 4946/10000 [19:23:26<19:33:07, 13.93s/it] 49%|████▉ | 4947/10000 [19:23:40<19:32:55, 13.93s/it] {'loss': 0.0205, 'learning_rate': 2.5300000000000002e-05, 'epoch': 6.48} 49%|████▉ | 4947/10000 [19:23:40<19:32:55, 13.93s/it] 49%|████▉ | 4948/10000 [19:23:54<19:32:35, 13.93s/it] {'loss': 0.0183, 'learning_rate': 2.5295e-05, 'epoch': 6.48} 49%|████▉ | 4948/10000 [19:23:54<19:32:35, 13.93s/it] 49%|████▉ | 4949/10000 [19:24:08<19:31:51, 13.92s/it] {'loss': 0.0196, 'learning_rate': 2.5290000000000004e-05, 'epoch': 6.48} 49%|████▉ | 4949/10000 [19:24:08<19:31:51, 13.92s/it] 50%|████▉ | 4950/10000 [19:24:22<19:31:33, 13.92s/it] {'loss': 0.0231, 'learning_rate': 2.5285000000000003e-05, 'epoch': 6.48} 50%|████▉ | 4950/10000 [19:24:22<19:31:33, 13.92s/it] 50%|████▉ | 4951/10000 [19:24:36<19:34:08, 13.95s/it] {'loss': 0.0227, 'learning_rate': 2.5280000000000005e-05, 'epoch': 6.48} 50%|████▉ | 4951/10000 [19:24:36<19:34:08, 13.95s/it] 50%|████▉ | 4952/10000 [19:24:50<19:34:30, 13.96s/it] {'loss': 0.0163, 'learning_rate': 2.5274999999999998e-05, 'epoch': 6.48} 50%|████▉ | 4952/10000 [19:24:50<19:34:30, 13.96s/it] 50%|████▉ | 4953/10000 [19:25:04<19:32:39, 13.94s/it] {'loss': 0.0181, 'learning_rate': 2.527e-05, 'epoch': 6.48} 50%|████▉ | 4953/10000 [19:25:04<19:32:39, 13.94s/it] 50%|████▉ | 4954/10000 [19:25:18<19:35:02, 13.97s/it] {'loss': 0.0181, 'learning_rate': 2.5265e-05, 'epoch': 6.48} 50%|████▉ | 4954/10000 [19:25:18<19:35:02, 13.97s/it] 50%|████▉ | 4955/10000 [19:25:32<19:35:48, 13.98s/it] {'loss': 0.0227, 'learning_rate': 2.526e-05, 'epoch': 6.49} 50%|████▉ | 4955/10000 [19:25:32<19:35:48, 13.98s/it] 50%|████▉ | 4956/10000 [19:25:46<19:31:27, 13.93s/it] {'loss': 0.0231, 'learning_rate': 2.5255e-05, 'epoch': 6.49} 50%|████▉ | 4956/10000 [19:25:46<19:31:27, 13.93s/it] 50%|████▉ | 4957/10000 [19:26:00<19:28:43, 13.91s/it] {'loss': 0.0204, 'learning_rate': 2.525e-05, 'epoch': 6.49} 50%|████▉ | 4957/10000 [19:26:00<19:28:43, 13.91s/it] 50%|████▉ | 4958/10000 [19:26:14<19:31:17, 13.94s/it] {'loss': 0.0196, 'learning_rate': 2.5245000000000003e-05, 'epoch': 6.49} 50%|████▉ | 4958/10000 [19:26:14<19:31:17, 13.94s/it] 50%|████▉ | 4959/10000 [19:26:28<19:31:17, 13.94s/it] {'loss': 0.0223, 'learning_rate': 2.5240000000000002e-05, 'epoch': 6.49} 50%|████▉ | 4959/10000 [19:26:28<19:31:17, 13.94s/it] 50%|████▉ | 4960/10000 [19:26:42<19:29:31, 13.92s/it] {'loss': 0.0281, 'learning_rate': 2.5235e-05, 'epoch': 6.49} 50%|████▉ | 4960/10000 [19:26:42<19:29:31, 13.92s/it] 50%|████▉ | 4961/10000 [19:26:55<19:28:56, 13.92s/it] {'loss': 0.0186, 'learning_rate': 2.5230000000000004e-05, 'epoch': 6.49} 50%|████▉ | 4961/10000 [19:26:55<19:28:56, 13.92s/it] 50%|████▉ | 4962/10000 [19:27:09<19:28:01, 13.91s/it] {'loss': 0.017, 'learning_rate': 2.5225e-05, 'epoch': 6.49} 50%|████▉ | 4962/10000 [19:27:09<19:28:01, 13.91s/it] 50%|████▉ | 4963/10000 [19:27:23<19:28:13, 13.92s/it] {'loss': 0.0194, 'learning_rate': 2.522e-05, 'epoch': 6.5} 50%|████▉ | 4963/10000 [19:27:23<19:28:13, 13.92s/it] 50%|████▉ | 4964/10000 [19:27:37<19:27:05, 13.90s/it] {'loss': 0.017, 'learning_rate': 2.5214999999999998e-05, 'epoch': 6.5} 50%|████▉ | 4964/10000 [19:27:37<19:27:05, 13.90s/it] 50%|████▉ | 4965/10000 [19:27:51<19:30:39, 13.95s/it] {'loss': 0.0215, 'learning_rate': 2.521e-05, 'epoch': 6.5} 50%|████▉ | 4965/10000 [19:27:51<19:30:39, 13.95s/it] 50%|████▉ | 4966/10000 [19:28:05<19:27:57, 13.92s/it] {'loss': 0.0178, 'learning_rate': 2.5205e-05, 'epoch': 6.5} 50%|████▉ | 4966/10000 [19:28:05<19:27:57, 13.92s/it] 50%|████▉ | 4967/10000 [19:28:19<19:31:47, 13.97s/it] {'loss': 0.0182, 'learning_rate': 2.5200000000000003e-05, 'epoch': 6.5} 50%|████▉ | 4967/10000 [19:28:19<19:31:47, 13.97s/it] 50%|████▉ | 4968/10000 [19:28:33<19:35:31, 14.02s/it] {'loss': 0.0189, 'learning_rate': 2.5195000000000002e-05, 'epoch': 6.5} 50%|████▉ | 4968/10000 [19:28:33<19:35:31, 14.02s/it] 50%|████▉ | 4969/10000 [19:28:47<19:32:52, 13.99s/it] {'loss': 0.023, 'learning_rate': 2.519e-05, 'epoch': 6.5} 50%|████▉ | 4969/10000 [19:28:47<19:32:52, 13.99s/it] 50%|████▉ | 4970/10000 [19:29:01<19:31:41, 13.98s/it] {'loss': 0.0163, 'learning_rate': 2.5185000000000004e-05, 'epoch': 6.51} 50%|████▉ | 4970/10000 [19:29:01<19:31:41, 13.98s/it] 50%|████▉ | 4971/10000 [19:29:15<19:29:13, 13.95s/it] {'loss': 0.0228, 'learning_rate': 2.5180000000000003e-05, 'epoch': 6.51} 50%|████▉ | 4971/10000 [19:29:15<19:29:13, 13.95s/it] 50%|████▉ | 4972/10000 [19:29:29<19:29:24, 13.95s/it] {'loss': 0.0207, 'learning_rate': 2.5175e-05, 'epoch': 6.51} 50%|████▉ | 4972/10000 [19:29:29<19:29:24, 13.95s/it] 50%|████▉ | 4973/10000 [19:29:43<19:34:09, 14.01s/it] {'loss': 0.0199, 'learning_rate': 2.5169999999999998e-05, 'epoch': 6.51} 50%|████▉ | 4973/10000 [19:29:43<19:34:09, 14.01s/it] 50%|████▉ | 4974/10000 [19:29:57<19:31:25, 13.98s/it] {'loss': 0.0221, 'learning_rate': 2.5165e-05, 'epoch': 6.51} 50%|████▉ | 4974/10000 [19:29:57<19:31:25, 13.98s/it] 50%|████▉ | 4975/10000 [19:30:11<19:33:04, 14.01s/it] {'loss': 0.0223, 'learning_rate': 2.516e-05, 'epoch': 6.51} 50%|████▉ | 4975/10000 [19:30:11<19:33:04, 14.01s/it] 50%|████▉ | 4976/10000 [19:30:25<19:29:25, 13.97s/it] {'loss': 0.0212, 'learning_rate': 2.5155000000000002e-05, 'epoch': 6.51} 50%|████▉ | 4976/10000 [19:30:25<19:29:25, 13.97s/it] 50%|████▉ | 4977/10000 [19:30:39<19:27:33, 13.95s/it] {'loss': 0.0223, 'learning_rate': 2.515e-05, 'epoch': 6.51} 50%|████▉ | 4977/10000 [19:30:39<19:27:33, 13.95s/it] 50%|████▉ | 4978/10000 [19:30:53<19:26:25, 13.94s/it] {'loss': 0.0196, 'learning_rate': 2.5145e-05, 'epoch': 6.52} 50%|████▉ | 4978/10000 [19:30:53<19:26:25, 13.94s/it] 50%|████▉ | 4979/10000 [19:31:07<19:25:48, 13.93s/it] {'loss': 0.0181, 'learning_rate': 2.5140000000000003e-05, 'epoch': 6.52} 50%|████▉ | 4979/10000 [19:31:07<19:25:48, 13.93s/it] 50%|████▉ | 4980/10000 [19:31:21<19:25:26, 13.93s/it] {'loss': 0.0189, 'learning_rate': 2.5135000000000002e-05, 'epoch': 6.52} 50%|████▉ | 4980/10000 [19:31:21<19:25:26, 13.93s/it] 50%|████▉ | 4981/10000 [19:31:35<19:28:52, 13.97s/it] {'loss': 0.0219, 'learning_rate': 2.5130000000000005e-05, 'epoch': 6.52} 50%|████▉ | 4981/10000 [19:31:35<19:28:52, 13.97s/it] 50%|████▉ | 4982/10000 [19:31:49<19:26:22, 13.95s/it] {'loss': 0.0187, 'learning_rate': 2.5124999999999997e-05, 'epoch': 6.52} 50%|████▉ | 4982/10000 [19:31:49<19:26:22, 13.95s/it] 50%|████▉ | 4983/10000 [19:32:02<19:24:28, 13.93s/it] {'loss': 0.0222, 'learning_rate': 2.512e-05, 'epoch': 6.52} 50%|████▉ | 4983/10000 [19:32:02<19:24:28, 13.93s/it] 50%|████▉ | 4984/10000 [19:32:16<19:25:50, 13.95s/it] {'loss': 0.0176, 'learning_rate': 2.5115e-05, 'epoch': 6.52} 50%|████▉ | 4984/10000 [19:32:17<19:25:50, 13.95s/it] 50%|████▉ | 4985/10000 [19:32:30<19:25:07, 13.94s/it] {'loss': 0.0224, 'learning_rate': 2.5110000000000002e-05, 'epoch': 6.52} 50%|████▉ | 4985/10000 [19:32:30<19:25:07, 13.94s/it] 50%|████▉ | 4986/10000 [19:32:44<19:24:08, 13.93s/it] {'loss': 0.0246, 'learning_rate': 2.5105e-05, 'epoch': 6.53} 50%|████▉ | 4986/10000 [19:32:44<19:24:08, 13.93s/it] 50%|████▉ | 4987/10000 [19:32:58<19:22:31, 13.91s/it] {'loss': 0.0241, 'learning_rate': 2.51e-05, 'epoch': 6.53} 50%|████▉ | 4987/10000 [19:32:58<19:22:31, 13.91s/it] 50%|████▉ | 4988/10000 [19:33:12<19:22:42, 13.92s/it] {'loss': 0.0191, 'learning_rate': 2.5095000000000003e-05, 'epoch': 6.53} 50%|████▉ | 4988/10000 [19:33:12<19:22:42, 13.92s/it] 50%|████▉ | 4989/10000 [19:33:26<19:23:24, 13.93s/it] {'loss': 0.0178, 'learning_rate': 2.5090000000000002e-05, 'epoch': 6.53} 50%|████▉ | 4989/10000 [19:33:26<19:23:24, 13.93s/it] 50%|████▉ | 4990/10000 [19:33:40<19:20:44, 13.90s/it] {'loss': 0.0173, 'learning_rate': 2.5085000000000005e-05, 'epoch': 6.53} 50%|████▉ | 4990/10000 [19:33:40<19:20:44, 13.90s/it] 50%|████▉ | 4991/10000 [19:33:54<19:19:20, 13.89s/it] {'loss': 0.0187, 'learning_rate': 2.5080000000000004e-05, 'epoch': 6.53} 50%|████▉ | 4991/10000 [19:33:54<19:19:20, 13.89s/it] 50%|████▉ | 4992/10000 [19:34:08<19:24:14, 13.95s/it] {'loss': 0.0222, 'learning_rate': 2.5075e-05, 'epoch': 6.53} 50%|████▉ | 4992/10000 [19:34:08<19:24:14, 13.95s/it] 50%|████▉ | 4993/10000 [19:34:22<19:21:06, 13.91s/it] {'loss': 0.0219, 'learning_rate': 2.507e-05, 'epoch': 6.54} 50%|████▉ | 4993/10000 [19:34:22<19:21:06, 13.91s/it] 50%|████▉ | 4994/10000 [19:34:36<19:25:35, 13.97s/it] {'loss': 0.0174, 'learning_rate': 2.5064999999999998e-05, 'epoch': 6.54} 50%|████▉ | 4994/10000 [19:34:36<19:25:35, 13.97s/it] 50%|████▉ | 4995/10000 [19:34:50<19:23:59, 13.95s/it] {'loss': 0.029, 'learning_rate': 2.506e-05, 'epoch': 6.54} 50%|████▉ | 4995/10000 [19:34:50<19:23:59, 13.95s/it] 50%|████▉ | 4996/10000 [19:35:04<19:21:40, 13.93s/it] {'loss': 0.021, 'learning_rate': 2.5055e-05, 'epoch': 6.54} 50%|████▉ | 4996/10000 [19:35:04<19:21:40, 13.93s/it] 50%|████▉ | 4997/10000 [19:35:18<19:21:47, 13.93s/it] {'loss': 0.017, 'learning_rate': 2.5050000000000002e-05, 'epoch': 6.54} 50%|████▉ | 4997/10000 [19:35:18<19:21:47, 13.93s/it] 50%|████▉ | 4998/10000 [19:35:31<19:18:19, 13.89s/it] {'loss': 0.0204, 'learning_rate': 2.5045e-05, 'epoch': 6.54} 50%|████▉ | 4998/10000 [19:35:31<19:18:19, 13.89s/it] 50%|████▉ | 4999/10000 [19:35:45<19:16:50, 13.88s/it] {'loss': 0.0205, 'learning_rate': 2.504e-05, 'epoch': 6.54} 50%|████▉ | 4999/10000 [19:35:45<19:16:50, 13.88s/it] 50%|█████ | 5000/10000 [19:35:59<19:14:07, 13.85s/it] {'loss': 0.0223, 'learning_rate': 2.5035000000000003e-05, 'epoch': 6.54} 50%|█████ | 5000/10000 [19:35:59<19:14:07, 13.85s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 15:54:07,267 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-5000/config.json [INFO|configuration_utils.py:364] 2024-11-04 15:54:07,270 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-5000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 15:55:06,432 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-5000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 15:55:06,435 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-5000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 15:55:06,437 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-5000/special_tokens_map.json [2024-11-04 15:55:06,457] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is about to be saved! [2024-11-04 15:55:06,512] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt [2024-11-04 15:55:06,512] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt... [2024-11-04 15:56:05,416] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/mp_rank_00_model_states.pt. [2024-11-04 15:56:05,521] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 15:57:59,735] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 15:57:59,740] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 15:57:59,740] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step5000 is ready now! 50%|█████ | 5001/10000 [19:40:10<118:11:04, 85.11s/it] {'loss': 0.0199, 'learning_rate': 2.5030000000000003e-05, 'epoch': 6.55} 50%|█████ | 5001/10000 [19:40:10<118:11:04, 85.11s/it] 50%|█████ | 5002/10000 [19:40:24<88:26:24, 63.70s/it] {'loss': 0.0194, 'learning_rate': 2.5025e-05, 'epoch': 6.55} 50%|█████ | 5002/10000 [19:40:24<88:26:24, 63.70s/it] 50%|█████ | 5003/10000 [19:40:38<67:36:56, 48.71s/it] {'loss': 0.0197, 'learning_rate': 2.5019999999999998e-05, 'epoch': 6.55} 50%|█████ | 5003/10000 [19:40:38<67:36:56, 48.71s/it] 50%|█████ | 5004/10000 [19:40:52<53:02:19, 38.22s/it] {'loss': 0.025, 'learning_rate': 2.5015e-05, 'epoch': 6.55} 50%|█████ | 5004/10000 [19:40:52<53:02:19, 38.22s/it] 50%|█████ | 5005/10000 [19:41:05<42:50:58, 30.88s/it] {'loss': 0.0217, 'learning_rate': 2.501e-05, 'epoch': 6.55} 50%|█████ | 5005/10000 [19:41:05<42:50:58, 30.88s/it] 50%|█████ | 5006/10000 [19:41:19<35:44:53, 25.77s/it] {'loss': 0.0165, 'learning_rate': 2.5005000000000002e-05, 'epoch': 6.55} 50%|█████ | 5006/10000 [19:41:19<35:44:53, 25.77s/it] 50%|█████ | 5007/10000 [19:41:33<30:49:30, 22.23s/it] {'loss': 0.0244, 'learning_rate': 2.5e-05, 'epoch': 6.55} 50%|█████ | 5007/10000 [19:41:33<30:49:30, 22.23s/it] 50%|█████ | 5008/10000 [19:41:47<27:22:22, 19.74s/it] {'loss': 0.0367, 'learning_rate': 2.4995e-05, 'epoch': 6.55} 50%|█████ | 5008/10000 [19:41:47<27:22:22, 19.74s/it] 50%|█████ | 5009/10000 [19:42:01<24:57:28, 18.00s/it] {'loss': 0.0207, 'learning_rate': 2.4990000000000003e-05, 'epoch': 6.56} 50%|█████ | 5009/10000 [19:42:01<24:57:28, 18.00s/it] 50%|█████ | 5010/10000 [19:42:15<23:17:43, 16.81s/it] {'loss': 0.0193, 'learning_rate': 2.4985e-05, 'epoch': 6.56} 50%|█████ | 5010/10000 [19:42:15<23:17:43, 16.81s/it] 50%|█████ | 5011/10000 [19:42:29<22:11:26, 16.01s/it] {'loss': 0.0197, 'learning_rate': 2.498e-05, 'epoch': 6.56} 50%|█████ | 5011/10000 [19:42:29<22:11:26, 16.01s/it] 50%|█████ | 5012/10000 [19:42:43<21:18:52, 15.38s/it] {'loss': 0.0233, 'learning_rate': 2.4975e-05, 'epoch': 6.56} 50%|█████ | 5012/10000 [19:42:43<21:18:52, 15.38s/it] 50%|█████ | 5013/10000 [19:42:57<20:41:49, 14.94s/it] {'loss': 0.0227, 'learning_rate': 2.4970000000000003e-05, 'epoch': 6.56} 50%|█████ | 5013/10000 [19:42:57<20:41:49, 14.94s/it] 50%|█████ | 5014/10000 [19:43:11<20:19:08, 14.67s/it] {'loss': 0.024, 'learning_rate': 2.4965000000000002e-05, 'epoch': 6.56} 50%|█████ | 5014/10000 [19:43:11<20:19:08, 14.67s/it] 50%|█████ | 5015/10000 [19:43:25<20:02:15, 14.47s/it] {'loss': 0.0222, 'learning_rate': 2.496e-05, 'epoch': 6.56} 50%|█████ | 5015/10000 [19:43:25<20:02:15, 14.47s/it] 50%|█████ | 5016/10000 [19:43:39<19:48:39, 14.31s/it] {'loss': 0.0221, 'learning_rate': 2.4955e-05, 'epoch': 6.57} 50%|█████ | 5016/10000 [19:43:39<19:48:39, 14.31s/it] 50%|█████ | 5017/10000 [19:43:53<19:41:26, 14.23s/it] {'loss': 0.0215, 'learning_rate': 2.495e-05, 'epoch': 6.57} 50%|█████ | 5017/10000 [19:43:53<19:41:26, 14.23s/it] 50%|█████ | 5018/10000 [19:44:07<19:37:36, 14.18s/it] {'loss': 0.0284, 'learning_rate': 2.4945000000000003e-05, 'epoch': 6.57} 50%|█████ | 5018/10000 [19:44:07<19:37:36, 14.18s/it] 50%|█████ | 5019/10000 [19:44:21<19:29:29, 14.09s/it] {'loss': 0.0249, 'learning_rate': 2.4940000000000002e-05, 'epoch': 6.57} 50%|█████ | 5019/10000 [19:44:21<19:29:29, 14.09s/it] 50%|█████ | 5020/10000 [19:44:35<19:24:41, 14.03s/it] {'loss': 0.0187, 'learning_rate': 2.4935e-05, 'epoch': 6.57} 50%|█████ | 5020/10000 [19:44:35<19:24:41, 14.03s/it] 50%|█████ | 5021/10000 [19:44:49<19:24:10, 14.03s/it] {'loss': 0.0281, 'learning_rate': 2.493e-05, 'epoch': 6.57} 50%|█████ | 5021/10000 [19:44:49<19:24:10, 14.03s/it] 50%|█████ | 5022/10000 [19:45:03<19:23:18, 14.02s/it] {'loss': 0.0226, 'learning_rate': 2.4925000000000003e-05, 'epoch': 6.57} 50%|█████ | 5022/10000 [19:45:03<19:23:18, 14.02s/it] 50%|█████ | 5023/10000 [19:45:17<19:24:26, 14.04s/it] {'loss': 0.0234, 'learning_rate': 2.4920000000000002e-05, 'epoch': 6.57} 50%|█████ | 5023/10000 [19:45:17<19:24:26, 14.04s/it] 50%|█████ | 5024/10000 [19:45:31<19:21:52, 14.01s/it] {'loss': 0.0222, 'learning_rate': 2.4915e-05, 'epoch': 6.58} 50%|█████ | 5024/10000 [19:45:31<19:21:52, 14.01s/it] 50%|█████ | 5025/10000 [19:45:45<19:20:54, 14.00s/it] {'loss': 0.022, 'learning_rate': 2.491e-05, 'epoch': 6.58} 50%|█████ | 5025/10000 [19:45:45<19:20:54, 14.00s/it] 50%|█████ | 5026/10000 [19:45:59<19:23:07, 14.03s/it] {'loss': 0.0338, 'learning_rate': 2.4905e-05, 'epoch': 6.58} 50%|█████ | 5026/10000 [19:45:59<19:23:07, 14.03s/it] 50%|█████ | 5027/10000 [19:46:13<19:20:30, 14.00s/it] {'loss': 0.0203, 'learning_rate': 2.4900000000000002e-05, 'epoch': 6.58} 50%|█████ | 5027/10000 [19:46:13<19:20:30, 14.00s/it] 50%|█████ | 5028/10000 [19:46:27<19:19:16, 13.99s/it] {'loss': 0.0231, 'learning_rate': 2.4895e-05, 'epoch': 6.58} 50%|█████ | 5028/10000 [19:46:27<19:19:16, 13.99s/it] 50%|█████ | 5029/10000 [19:46:41<19:19:54, 14.00s/it] {'loss': 0.0176, 'learning_rate': 2.489e-05, 'epoch': 6.58} 50%|█████ | 5029/10000 [19:46:41<19:19:54, 14.00s/it] 50%|█████ | 5030/10000 [19:46:55<19:22:32, 14.03s/it] {'loss': 0.0204, 'learning_rate': 2.4885e-05, 'epoch': 6.58} 50%|█████ | 5030/10000 [19:46:55<19:22:32, 14.03s/it] 50%|█████ | 5031/10000 [19:47:09<19:20:21, 14.01s/it] {'loss': 0.0233, 'learning_rate': 2.488e-05, 'epoch': 6.59} 50%|█████ | 5031/10000 [19:47:09<19:20:21, 14.01s/it] 50%|█████ | 5032/10000 [19:47:23<19:15:20, 13.95s/it] {'loss': 0.0199, 'learning_rate': 2.4875e-05, 'epoch': 6.59} 50%|█████ | 5032/10000 [19:47:23<19:15:20, 13.95s/it] 50%|█████ | 5033/10000 [19:47:37<19:15:54, 13.96s/it] {'loss': 0.0234, 'learning_rate': 2.487e-05, 'epoch': 6.59} 50%|█████ | 5033/10000 [19:47:37<19:15:54, 13.96s/it] 50%|█████ | 5034/10000 [19:47:51<19:14:12, 13.95s/it] {'loss': 0.0267, 'learning_rate': 2.4865000000000003e-05, 'epoch': 6.59} 50%|█████ | 5034/10000 [19:47:51<19:14:12, 13.95s/it] 50%|█████ | 5035/10000 [19:48:05<19:15:43, 13.97s/it] {'loss': 0.0194, 'learning_rate': 2.486e-05, 'epoch': 6.59} 50%|█████ | 5035/10000 [19:48:05<19:15:43, 13.97s/it] 50%|█████ | 5036/10000 [19:48:19<19:13:06, 13.94s/it] {'loss': 0.0211, 'learning_rate': 2.4855000000000002e-05, 'epoch': 6.59} 50%|█████ | 5036/10000 [19:48:19<19:13:06, 13.94s/it] 50%|█████ | 5037/10000 [19:48:32<19:11:52, 13.93s/it] {'loss': 0.0217, 'learning_rate': 2.485e-05, 'epoch': 6.59} 50%|█████ | 5037/10000 [19:48:32<19:11:52, 13.93s/it] 50%|█████ | 5038/10000 [19:48:46<19:10:59, 13.92s/it] {'loss': 0.0241, 'learning_rate': 2.4845e-05, 'epoch': 6.59} 50%|█████ | 5038/10000 [19:48:46<19:10:59, 13.92s/it] 50%|█████ | 5039/10000 [19:49:00<19:10:38, 13.92s/it] {'loss': 0.025, 'learning_rate': 2.4840000000000003e-05, 'epoch': 6.6} 50%|█████ | 5039/10000 [19:49:00<19:10:38, 13.92s/it] 50%|█████ | 5040/10000 [19:49:14<19:09:01, 13.90s/it] {'loss': 0.0234, 'learning_rate': 2.4835e-05, 'epoch': 6.6} 50%|█████ | 5040/10000 [19:49:14<19:09:01, 13.90s/it] 50%|█████ | 5041/10000 [19:49:28<19:10:05, 13.92s/it] {'loss': 0.0199, 'learning_rate': 2.483e-05, 'epoch': 6.6} 50%|█████ | 5041/10000 [19:49:28<19:10:05, 13.92s/it] 50%|█████ | 5042/10000 [19:49:42<19:09:05, 13.91s/it] {'loss': 0.0205, 'learning_rate': 2.4825e-05, 'epoch': 6.6} 50%|█████ | 5042/10000 [19:49:42<19:09:05, 13.91s/it] 50%|█████ | 5043/10000 [19:49:56<19:11:35, 13.94s/it] {'loss': 0.0247, 'learning_rate': 2.4820000000000003e-05, 'epoch': 6.6} 50%|█████ | 5043/10000 [19:49:56<19:11:35, 13.94s/it] 50%|█████ | 5044/10000 [19:50:10<19:13:46, 13.97s/it] {'loss': 0.0262, 'learning_rate': 2.4815000000000002e-05, 'epoch': 6.6} 50%|█████ | 5044/10000 [19:50:10<19:13:46, 13.97s/it] 50%|█████ | 5045/10000 [19:50:24<19:10:56, 13.94s/it] {'loss': 0.0252, 'learning_rate': 2.481e-05, 'epoch': 6.6} 50%|█████ | 5045/10000 [19:50:24<19:10:56, 13.94s/it] 50%|█████ | 5046/10000 [19:50:38<19:13:57, 13.98s/it] {'loss': 0.0229, 'learning_rate': 2.4805e-05, 'epoch': 6.6} 50%|█████ | 5046/10000 [19:50:38<19:13:57, 13.98s/it] 50%|█████ | 5047/10000 [19:50:52<19:09:57, 13.93s/it] {'loss': 0.02, 'learning_rate': 2.48e-05, 'epoch': 6.61} 50%|█████ | 5047/10000 [19:50:52<19:09:57, 13.93s/it] 50%|█████ | 5048/10000 [19:51:06<19:08:17, 13.91s/it] {'loss': 0.022, 'learning_rate': 2.4795000000000002e-05, 'epoch': 6.61} 50%|█████ | 5048/10000 [19:51:06<19:08:17, 13.91s/it] 50%|█████ | 5049/10000 [19:51:20<19:09:30, 13.93s/it] {'loss': 0.0201, 'learning_rate': 2.479e-05, 'epoch': 6.61} 50%|█████ | 5049/10000 [19:51:20<19:09:30, 13.93s/it] 50%|█████ | 5050/10000 [19:51:34<19:10:29, 13.95s/it] {'loss': 0.0215, 'learning_rate': 2.4785e-05, 'epoch': 6.61} 50%|█████ | 5050/10000 [19:51:34<19:10:29, 13.95s/it] 51%|█████ | 5051/10000 [19:51:48<19:12:01, 13.97s/it] {'loss': 0.0216, 'learning_rate': 2.478e-05, 'epoch': 6.61} 51%|█████ | 5051/10000 [19:51:48<19:12:01, 13.97s/it] 51%|█████ | 5052/10000 [19:52:02<19:14:10, 14.00s/it] {'loss': 0.0151, 'learning_rate': 2.4775000000000003e-05, 'epoch': 6.61} 51%|█████ | 5052/10000 [19:52:02<19:14:10, 14.00s/it] 51%|█████ | 5053/10000 [19:52:16<19:11:46, 13.97s/it] {'loss': 0.0219, 'learning_rate': 2.4770000000000002e-05, 'epoch': 6.61} 51%|█████ | 5053/10000 [19:52:16<19:11:46, 13.97s/it] 51%|█████ | 5054/10000 [19:52:30<19:10:43, 13.96s/it] {'loss': 0.0252, 'learning_rate': 2.4765e-05, 'epoch': 6.62} 51%|█████ | 5054/10000 [19:52:30<19:10:43, 13.96s/it] 51%|█████ | 5055/10000 [19:52:43<19:10:24, 13.96s/it] {'loss': 0.0219, 'learning_rate': 2.476e-05, 'epoch': 6.62} 51%|█████ | 5055/10000 [19:52:44<19:10:24, 13.96s/it] 51%|█████ | 5056/10000 [19:52:57<19:10:06, 13.96s/it] {'loss': 0.0194, 'learning_rate': 2.4755e-05, 'epoch': 6.62} 51%|█████ | 5056/10000 [19:52:57<19:10:06, 13.96s/it] 51%|█████ | 5057/10000 [19:53:11<19:10:35, 13.97s/it] {'loss': 0.0233, 'learning_rate': 2.4750000000000002e-05, 'epoch': 6.62} 51%|█████ | 5057/10000 [19:53:11<19:10:35, 13.97s/it] 51%|█████ | 5058/10000 [19:53:25<19:11:55, 13.99s/it] {'loss': 0.0253, 'learning_rate': 2.4745e-05, 'epoch': 6.62} 51%|█████ | 5058/10000 [19:53:25<19:11:55, 13.99s/it] 51%|█████ | 5059/10000 [19:53:39<19:12:08, 13.99s/it] {'loss': 0.0214, 'learning_rate': 2.4740000000000004e-05, 'epoch': 6.62} 51%|█████ | 5059/10000 [19:53:39<19:12:08, 13.99s/it] 51%|█████ | 5060/10000 [19:53:53<19:07:47, 13.94s/it] {'loss': 0.0246, 'learning_rate': 2.4735e-05, 'epoch': 6.62} 51%|█████ | 5060/10000 [19:53:53<19:07:47, 13.94s/it] 51%|█████ | 5061/10000 [19:54:07<19:05:45, 13.92s/it] {'loss': 0.024, 'learning_rate': 2.473e-05, 'epoch': 6.62} 51%|█████ | 5061/10000 [19:54:07<19:05:45, 13.92s/it] 51%|█████ | 5062/10000 [19:54:21<19:04:04, 13.90s/it] {'loss': 0.0235, 'learning_rate': 2.4725e-05, 'epoch': 6.63} 51%|█████ | 5062/10000 [19:54:21<19:04:04, 13.90s/it] 51%|█████ | 5063/10000 [19:54:35<19:04:30, 13.91s/it] {'loss': 0.0223, 'learning_rate': 2.472e-05, 'epoch': 6.63} 51%|█████ | 5063/10000 [19:54:35<19:04:30, 13.91s/it] 51%|█████ | 5064/10000 [19:54:49<19:04:38, 13.91s/it] {'loss': 0.0199, 'learning_rate': 2.4715000000000003e-05, 'epoch': 6.63} 51%|█████ | 5064/10000 [19:54:49<19:04:38, 13.91s/it] 51%|█████ | 5065/10000 [19:55:03<19:05:44, 13.93s/it] {'loss': 0.0265, 'learning_rate': 2.471e-05, 'epoch': 6.63} 51%|█████ | 5065/10000 [19:55:03<19:05:44, 13.93s/it] 51%|█████ | 5066/10000 [19:55:17<19:03:43, 13.91s/it] {'loss': 0.0209, 'learning_rate': 2.4705e-05, 'epoch': 6.63} 51%|█████ | 5066/10000 [19:55:17<19:03:43, 13.91s/it] 51%|█████ | 5067/10000 [19:55:31<19:06:36, 13.95s/it] {'loss': 0.0279, 'learning_rate': 2.47e-05, 'epoch': 6.63} 51%|█████ | 5067/10000 [19:55:31<19:06:36, 13.95s/it] 51%|█████ | 5068/10000 [19:55:45<19:08:52, 13.98s/it] {'loss': 0.0233, 'learning_rate': 2.4695e-05, 'epoch': 6.63} 51%|█████ | 5068/10000 [19:55:45<19:08:52, 13.98s/it] 51%|█████ | 5069/10000 [19:55:59<19:07:47, 13.97s/it] {'loss': 0.0173, 'learning_rate': 2.4690000000000002e-05, 'epoch': 6.63} 51%|█████ | 5069/10000 [19:55:59<19:07:47, 13.97s/it] 51%|█████ | 5070/10000 [19:56:13<19:05:52, 13.95s/it] {'loss': 0.0189, 'learning_rate': 2.4685e-05, 'epoch': 6.64} 51%|█████ | 5070/10000 [19:56:13<19:05:52, 13.95s/it] 51%|█████ | 5071/10000 [19:56:26<19:04:13, 13.93s/it] {'loss': 0.0234, 'learning_rate': 2.468e-05, 'epoch': 6.64} 51%|█████ | 5071/10000 [19:56:27<19:04:13, 13.93s/it] 51%|█████ | 5072/10000 [19:56:41<19:06:08, 13.95s/it] {'loss': 0.0212, 'learning_rate': 2.4675e-05, 'epoch': 6.64} 51%|█████ | 5072/10000 [19:56:41<19:06:08, 13.95s/it] 51%|█████ | 5073/10000 [19:56:54<19:03:24, 13.92s/it] {'loss': 0.0276, 'learning_rate': 2.4670000000000003e-05, 'epoch': 6.64} 51%|█████ | 5073/10000 [19:56:54<19:03:24, 13.92s/it] 51%|█████ | 5074/10000 [19:57:08<19:04:04, 13.94s/it] {'loss': 0.0172, 'learning_rate': 2.4665000000000002e-05, 'epoch': 6.64} 51%|█████ | 5074/10000 [19:57:08<19:04:04, 13.94s/it] 51%|█████ | 5075/10000 [19:57:22<19:03:43, 13.93s/it] {'loss': 0.0231, 'learning_rate': 2.466e-05, 'epoch': 6.64} 51%|█████ | 5075/10000 [19:57:22<19:03:43, 13.93s/it] 51%|█████ | 5076/10000 [19:57:36<19:02:51, 13.93s/it] {'loss': 0.0179, 'learning_rate': 2.4655e-05, 'epoch': 6.64} 51%|█████ | 5076/10000 [19:57:36<19:02:51, 13.93s/it] 51%|█████ | 5077/10000 [19:57:50<19:06:18, 13.97s/it] {'loss': 0.0185, 'learning_rate': 2.465e-05, 'epoch': 6.65} 51%|█████ | 5077/10000 [19:57:50<19:06:18, 13.97s/it] 51%|█████ | 5078/10000 [19:58:04<19:03:56, 13.94s/it] {'loss': 0.0218, 'learning_rate': 2.4645000000000002e-05, 'epoch': 6.65} 51%|█████ | 5078/10000 [19:58:04<19:03:56, 13.94s/it] 51%|█████ | 5079/10000 [19:58:18<19:04:34, 13.96s/it] {'loss': 0.0209, 'learning_rate': 2.464e-05, 'epoch': 6.65} 51%|█████ | 5079/10000 [19:58:18<19:04:34, 13.96s/it] 51%|█████ | 5080/10000 [19:58:32<19:00:14, 13.91s/it] {'loss': 0.0212, 'learning_rate': 2.4635000000000004e-05, 'epoch': 6.65} 51%|█████ | 5080/10000 [19:58:32<19:00:14, 13.91s/it] 51%|█████ | 5081/10000 [19:58:46<19:02:26, 13.93s/it] {'loss': 0.0195, 'learning_rate': 2.463e-05, 'epoch': 6.65} 51%|█████ | 5081/10000 [19:58:46<19:02:26, 13.93s/it] 51%|█████ | 5082/10000 [19:59:00<19:01:02, 13.92s/it] {'loss': 0.0269, 'learning_rate': 2.4625000000000002e-05, 'epoch': 6.65} 51%|█████ | 5082/10000 [19:59:00<19:01:02, 13.92s/it] 51%|█████ | 5083/10000 [19:59:14<19:02:15, 13.94s/it] {'loss': 0.0201, 'learning_rate': 2.462e-05, 'epoch': 6.65} 51%|█████ | 5083/10000 [19:59:14<19:02:15, 13.94s/it] 51%|█████ | 5084/10000 [19:59:28<19:01:20, 13.93s/it] {'loss': 0.0203, 'learning_rate': 2.4615e-05, 'epoch': 6.65} 51%|█████ | 5084/10000 [19:59:28<19:01:20, 13.93s/it] 51%|█████ | 5085/10000 [19:59:42<19:00:01, 13.92s/it] {'loss': 0.0211, 'learning_rate': 2.4610000000000003e-05, 'epoch': 6.66} 51%|█████ | 5085/10000 [19:59:42<19:00:01, 13.92s/it] 51%|█████ | 5086/10000 [19:59:56<19:01:49, 13.94s/it] {'loss': 0.0225, 'learning_rate': 2.4605e-05, 'epoch': 6.66} 51%|█████ | 5086/10000 [19:59:56<19:01:49, 13.94s/it] 51%|█████ | 5087/10000 [20:00:09<19:00:23, 13.93s/it] {'loss': 0.0203, 'learning_rate': 2.46e-05, 'epoch': 6.66} 51%|█████ | 5087/10000 [20:00:09<19:00:23, 13.93s/it] 51%|█████ | 5088/10000 [20:00:23<19:00:50, 13.94s/it] {'loss': 0.0224, 'learning_rate': 2.4595e-05, 'epoch': 6.66} 51%|█████ | 5088/10000 [20:00:23<19:00:50, 13.94s/it] 51%|█████ | 5089/10000 [20:00:37<19:00:36, 13.94s/it] {'loss': 0.0231, 'learning_rate': 2.4590000000000003e-05, 'epoch': 6.66} 51%|█████ | 5089/10000 [20:00:37<19:00:36, 13.94s/it] 51%|█████ | 5090/10000 [20:00:51<18:59:41, 13.93s/it] {'loss': 0.0212, 'learning_rate': 2.4585000000000003e-05, 'epoch': 6.66} 51%|█████ | 5090/10000 [20:00:51<18:59:41, 13.93s/it] 51%|█████ | 5091/10000 [20:01:05<19:02:27, 13.96s/it] {'loss': 0.0204, 'learning_rate': 2.4580000000000002e-05, 'epoch': 6.66} 51%|█████ | 5091/10000 [20:01:05<19:02:27, 13.96s/it] 51%|█████ | 5092/10000 [20:01:19<19:06:33, 14.02s/it] {'loss': 0.0236, 'learning_rate': 2.4575e-05, 'epoch': 6.66} 51%|█████ | 5092/10000 [20:01:19<19:06:33, 14.02s/it] 51%|█████ | 5093/10000 [20:01:33<19:01:53, 13.96s/it] {'loss': 0.0211, 'learning_rate': 2.457e-05, 'epoch': 6.67} 51%|█████ | 5093/10000 [20:01:33<19:01:53, 13.96s/it] 51%|█████ | 5094/10000 [20:01:47<18:58:33, 13.92s/it] {'loss': 0.0243, 'learning_rate': 2.4565000000000003e-05, 'epoch': 6.67} 51%|█████ | 5094/10000 [20:01:47<18:58:33, 13.92s/it] 51%|█████ | 5095/10000 [20:02:01<18:59:36, 13.94s/it] {'loss': 0.0244, 'learning_rate': 2.4560000000000002e-05, 'epoch': 6.67} 51%|█████ | 5095/10000 [20:02:01<18:59:36, 13.94s/it] 51%|█████ | 5096/10000 [20:02:15<19:00:49, 13.96s/it] {'loss': 0.0172, 'learning_rate': 2.4555e-05, 'epoch': 6.67} 51%|█████ | 5096/10000 [20:02:15<19:00:49, 13.96s/it] 51%|█████ | 5097/10000 [20:02:29<19:01:27, 13.97s/it] {'loss': 0.0196, 'learning_rate': 2.455e-05, 'epoch': 6.67} 51%|█████ | 5097/10000 [20:02:29<19:01:27, 13.97s/it] 51%|█████ | 5098/10000 [20:02:43<19:01:46, 13.98s/it] {'loss': 0.0233, 'learning_rate': 2.4545000000000003e-05, 'epoch': 6.67} 51%|█████ | 5098/10000 [20:02:43<19:01:46, 13.98s/it] 51%|█████ | 5099/10000 [20:02:57<18:59:49, 13.95s/it] {'loss': 0.0177, 'learning_rate': 2.4540000000000002e-05, 'epoch': 6.67} 51%|█████ | 5099/10000 [20:02:57<18:59:49, 13.95s/it] 51%|█████ | 5100/10000 [20:03:11<18:57:33, 13.93s/it] {'loss': 0.0198, 'learning_rate': 2.4535e-05, 'epoch': 6.68} 51%|█████ | 5100/10000 [20:03:11<18:57:33, 13.93s/it] 51%|█████ | 5101/10000 [20:03:25<18:59:32, 13.96s/it] {'loss': 0.021, 'learning_rate': 2.453e-05, 'epoch': 6.68} 51%|█████ | 5101/10000 [20:03:25<18:59:32, 13.96s/it] 51%|█████ | 5102/10000 [20:03:39<19:01:49, 13.99s/it] {'loss': 0.0205, 'learning_rate': 2.4525e-05, 'epoch': 6.68} 51%|█████ | 5102/10000 [20:03:39<19:01:49, 13.99s/it] 51%|█████ | 5103/10000 [20:03:53<19:01:49, 13.99s/it] {'loss': 0.0171, 'learning_rate': 2.4520000000000002e-05, 'epoch': 6.68} 51%|█████ | 5103/10000 [20:03:53<19:01:49, 13.99s/it] 51%|█████ | 5104/10000 [20:04:07<19:00:51, 13.98s/it] {'loss': 0.024, 'learning_rate': 2.4515e-05, 'epoch': 6.68} 51%|█████ | 5104/10000 [20:04:07<19:00:51, 13.98s/it] 51%|█████ | 5105/10000 [20:04:21<18:58:32, 13.96s/it] {'loss': 0.0236, 'learning_rate': 2.451e-05, 'epoch': 6.68} 51%|█████ | 5105/10000 [20:04:21<18:58:32, 13.96s/it] 51%|█████ | 5106/10000 [20:04:35<19:00:34, 13.98s/it] {'loss': 0.022, 'learning_rate': 2.4505e-05, 'epoch': 6.68} 51%|█████ | 5106/10000 [20:04:35<19:00:34, 13.98s/it] 51%|█████ | 5107/10000 [20:04:49<18:58:40, 13.96s/it] {'loss': 0.0217, 'learning_rate': 2.45e-05, 'epoch': 6.68} 51%|█████ | 5107/10000 [20:04:49<18:58:40, 13.96s/it] 51%|█████ | 5108/10000 [20:05:02<18:53:29, 13.90s/it] {'loss': 0.0233, 'learning_rate': 2.4495000000000002e-05, 'epoch': 6.69} 51%|█████ | 5108/10000 [20:05:03<18:53:29, 13.90s/it] 51%|█████ | 5109/10000 [20:05:16<18:54:36, 13.92s/it] {'loss': 0.0225, 'learning_rate': 2.449e-05, 'epoch': 6.69} 51%|█████ | 5109/10000 [20:05:16<18:54:36, 13.92s/it] 51%|█████ | 5110/10000 [20:05:30<18:56:20, 13.94s/it] {'loss': 0.0213, 'learning_rate': 2.4485000000000004e-05, 'epoch': 6.69} 51%|█████ | 5110/10000 [20:05:30<18:56:20, 13.94s/it] 51%|█████ | 5111/10000 [20:05:44<18:53:00, 13.90s/it] {'loss': 0.0206, 'learning_rate': 2.448e-05, 'epoch': 6.69} 51%|█████ | 5111/10000 [20:05:44<18:53:00, 13.90s/it] 51%|█████ | 5112/10000 [20:05:58<18:52:00, 13.90s/it] {'loss': 0.023, 'learning_rate': 2.4475000000000002e-05, 'epoch': 6.69} 51%|█████ | 5112/10000 [20:05:58<18:52:00, 13.90s/it] 51%|█████ | 5113/10000 [20:06:12<18:55:51, 13.95s/it] {'loss': 0.0225, 'learning_rate': 2.447e-05, 'epoch': 6.69} 51%|█████ | 5113/10000 [20:06:12<18:55:51, 13.95s/it] 51%|█████ | 5114/10000 [20:06:26<18:55:55, 13.95s/it] {'loss': 0.0231, 'learning_rate': 2.4465e-05, 'epoch': 6.69} 51%|█████ | 5114/10000 [20:06:26<18:55:55, 13.95s/it] 51%|█████ | 5115/10000 [20:06:40<18:57:20, 13.97s/it] {'loss': 0.0239, 'learning_rate': 2.4460000000000003e-05, 'epoch': 6.7} 51%|█████ | 5115/10000 [20:06:40<18:57:20, 13.97s/it] 51%|█████ | 5116/10000 [20:06:54<18:57:18, 13.97s/it] {'loss': 0.0219, 'learning_rate': 2.4455e-05, 'epoch': 6.7} 51%|█████ | 5116/10000 [20:06:54<18:57:18, 13.97s/it] 51%|█████ | 5117/10000 [20:07:08<18:55:40, 13.95s/it] {'loss': 0.0219, 'learning_rate': 2.445e-05, 'epoch': 6.7} 51%|█████ | 5117/10000 [20:07:08<18:55:40, 13.95s/it] 51%|█████ | 5118/10000 [20:07:22<18:53:45, 13.93s/it] {'loss': 0.0222, 'learning_rate': 2.4445e-05, 'epoch': 6.7} 51%|█████ | 5118/10000 [20:07:22<18:53:45, 13.93s/it] 51%|█████ | 5119/10000 [20:07:36<18:54:56, 13.95s/it] {'loss': 0.0251, 'learning_rate': 2.4440000000000003e-05, 'epoch': 6.7} 51%|█████ | 5119/10000 [20:07:36<18:54:56, 13.95s/it] 51%|█████ | 5120/10000 [20:07:50<18:53:03, 13.93s/it] {'loss': 0.0216, 'learning_rate': 2.4435000000000002e-05, 'epoch': 6.7} 51%|█████ | 5120/10000 [20:07:50<18:53:03, 13.93s/it] 51%|█████ | 5121/10000 [20:08:04<18:55:06, 13.96s/it] {'loss': 0.0194, 'learning_rate': 2.443e-05, 'epoch': 6.7} 51%|█████ | 5121/10000 [20:08:04<18:55:06, 13.96s/it] 51%|█████ | 5122/10000 [20:08:18<18:51:53, 13.92s/it] {'loss': 0.0219, 'learning_rate': 2.4425e-05, 'epoch': 6.7} 51%|█████ | 5122/10000 [20:08:18<18:51:53, 13.92s/it] 51%|█████ | 5123/10000 [20:08:32<18:53:58, 13.95s/it] {'loss': 0.0253, 'learning_rate': 2.442e-05, 'epoch': 6.71} 51%|█████ | 5123/10000 [20:08:32<18:53:58, 13.95s/it] 51%|█████ | 5124/10000 [20:08:46<18:51:41, 13.93s/it] {'loss': 0.0202, 'learning_rate': 2.4415000000000003e-05, 'epoch': 6.71} 51%|█████ | 5124/10000 [20:08:46<18:51:41, 13.93s/it] 51%|█████▏ | 5125/10000 [20:09:00<18:52:34, 13.94s/it] {'loss': 0.0182, 'learning_rate': 2.4410000000000002e-05, 'epoch': 6.71} 51%|█████▏ | 5125/10000 [20:09:00<18:52:34, 13.94s/it] 51%|█████▏ | 5126/10000 [20:09:13<18:52:07, 13.94s/it] {'loss': 0.0195, 'learning_rate': 2.4405e-05, 'epoch': 6.71} 51%|█████▏ | 5126/10000 [20:09:14<18:52:07, 13.94s/it] 51%|█████▏ | 5127/10000 [20:09:27<18:50:26, 13.92s/it] {'loss': 0.0258, 'learning_rate': 2.44e-05, 'epoch': 6.71} 51%|█████▏ | 5127/10000 [20:09:27<18:50:26, 13.92s/it] 51%|█████▏ | 5128/10000 [20:09:41<18:53:19, 13.96s/it] {'loss': 0.0231, 'learning_rate': 2.4395000000000003e-05, 'epoch': 6.71} 51%|█████▏ | 5128/10000 [20:09:41<18:53:19, 13.96s/it] 51%|█████▏ | 5129/10000 [20:09:55<18:53:28, 13.96s/it] {'loss': 0.0178, 'learning_rate': 2.4390000000000002e-05, 'epoch': 6.71} 51%|█████▏ | 5129/10000 [20:09:55<18:53:28, 13.96s/it] 51%|█████▏ | 5130/10000 [20:10:09<18:50:46, 13.93s/it] {'loss': 0.0238, 'learning_rate': 2.4385e-05, 'epoch': 6.71} 51%|█████▏ | 5130/10000 [20:10:09<18:50:46, 13.93s/it] 51%|█████▏ | 5131/10000 [20:10:23<18:50:40, 13.93s/it] {'loss': 0.0244, 'learning_rate': 2.438e-05, 'epoch': 6.72} 51%|█████▏ | 5131/10000 [20:10:23<18:50:40, 13.93s/it] 51%|█████▏ | 5132/10000 [20:10:37<18:50:00, 13.93s/it] {'loss': 0.0232, 'learning_rate': 2.4375e-05, 'epoch': 6.72} 51%|█████▏ | 5132/10000 [20:10:37<18:50:00, 13.93s/it] 51%|█████▏ | 5133/10000 [20:10:51<18:52:40, 13.96s/it] {'loss': 0.0222, 'learning_rate': 2.4370000000000002e-05, 'epoch': 6.72} 51%|█████▏ | 5133/10000 [20:10:51<18:52:40, 13.96s/it] 51%|█████▏ | 5134/10000 [20:11:05<18:48:47, 13.92s/it] {'loss': 0.0228, 'learning_rate': 2.4365e-05, 'epoch': 6.72} 51%|█████▏ | 5134/10000 [20:11:05<18:48:47, 13.92s/it] 51%|█████▏ | 5135/10000 [20:11:19<18:51:05, 13.95s/it] {'loss': 0.0231, 'learning_rate': 2.4360000000000004e-05, 'epoch': 6.72} 51%|█████▏ | 5135/10000 [20:11:19<18:51:05, 13.95s/it] 51%|█████▏ | 5136/10000 [20:11:33<18:52:10, 13.97s/it] {'loss': 0.0169, 'learning_rate': 2.4355e-05, 'epoch': 6.72} 51%|█████▏ | 5136/10000 [20:11:33<18:52:10, 13.97s/it] 51%|█████▏ | 5137/10000 [20:11:47<18:49:59, 13.94s/it] {'loss': 0.0228, 'learning_rate': 2.435e-05, 'epoch': 6.72} 51%|█████▏ | 5137/10000 [20:11:47<18:49:59, 13.94s/it] 51%|█████▏ | 5138/10000 [20:12:01<18:50:25, 13.95s/it] {'loss': 0.0177, 'learning_rate': 2.4345e-05, 'epoch': 6.73} 51%|█████▏ | 5138/10000 [20:12:01<18:50:25, 13.95s/it] 51%|█████▏ | 5139/10000 [20:12:15<18:48:41, 13.93s/it] {'loss': 0.022, 'learning_rate': 2.434e-05, 'epoch': 6.73} 51%|█████▏ | 5139/10000 [20:12:15<18:48:41, 13.93s/it] 51%|█████▏ | 5140/10000 [20:12:29<18:51:09, 13.97s/it] {'loss': 0.0209, 'learning_rate': 2.4335000000000003e-05, 'epoch': 6.73} 51%|█████▏ | 5140/10000 [20:12:29<18:51:09, 13.97s/it] 51%|█████▏ | 5141/10000 [20:12:43<18:50:28, 13.96s/it] {'loss': 0.0262, 'learning_rate': 2.433e-05, 'epoch': 6.73} 51%|█████▏ | 5141/10000 [20:12:43<18:50:28, 13.96s/it] 51%|█████▏ | 5142/10000 [20:12:57<18:51:19, 13.97s/it] {'loss': 0.0198, 'learning_rate': 2.4325000000000002e-05, 'epoch': 6.73} 51%|█████▏ | 5142/10000 [20:12:57<18:51:19, 13.97s/it] 51%|█████▏ | 5143/10000 [20:13:11<18:55:02, 14.02s/it] {'loss': 0.0233, 'learning_rate': 2.432e-05, 'epoch': 6.73} 51%|█████▏ | 5143/10000 [20:13:11<18:55:02, 14.02s/it] 51%|█████▏ | 5144/10000 [20:13:25<18:54:54, 14.02s/it] {'loss': 0.0209, 'learning_rate': 2.4315e-05, 'epoch': 6.73} 51%|█████▏ | 5144/10000 [20:13:25<18:54:54, 14.02s/it] 51%|█████▏ | 5145/10000 [20:13:39<18:52:16, 13.99s/it] {'loss': 0.018, 'learning_rate': 2.4310000000000003e-05, 'epoch': 6.73} 51%|█████▏ | 5145/10000 [20:13:39<18:52:16, 13.99s/it] 51%|█████▏ | 5146/10000 [20:13:53<18:54:24, 14.02s/it] {'loss': 0.0233, 'learning_rate': 2.4305e-05, 'epoch': 6.74} 51%|█████▏ | 5146/10000 [20:13:53<18:54:24, 14.02s/it] 51%|█████▏ | 5147/10000 [20:14:07<18:57:01, 14.06s/it] {'loss': 0.0281, 'learning_rate': 2.43e-05, 'epoch': 6.74} 51%|█████▏ | 5147/10000 [20:14:07<18:57:01, 14.06s/it] 51%|█████▏ | 5148/10000 [20:14:21<18:54:20, 14.03s/it] {'loss': 0.0246, 'learning_rate': 2.4295e-05, 'epoch': 6.74} 51%|█████▏ | 5148/10000 [20:14:21<18:54:20, 14.03s/it] 51%|█████▏ | 5149/10000 [20:14:35<18:55:45, 14.05s/it] {'loss': 0.0253, 'learning_rate': 2.4290000000000003e-05, 'epoch': 6.74} 51%|█████▏ | 5149/10000 [20:14:35<18:55:45, 14.05s/it] 52%|█████▏ | 5150/10000 [20:14:49<18:54:22, 14.03s/it] {'loss': 0.0198, 'learning_rate': 2.4285000000000002e-05, 'epoch': 6.74} 52%|█████▏ | 5150/10000 [20:14:49<18:54:22, 14.03s/it] 52%|█████▏ | 5151/10000 [20:15:03<18:54:18, 14.04s/it] {'loss': 0.0244, 'learning_rate': 2.428e-05, 'epoch': 6.74} 52%|█████▏ | 5151/10000 [20:15:03<18:54:18, 14.04s/it] 52%|█████▏ | 5152/10000 [20:15:17<18:51:28, 14.00s/it] {'loss': 0.0228, 'learning_rate': 2.4275e-05, 'epoch': 6.74} 52%|█████▏ | 5152/10000 [20:15:17<18:51:28, 14.00s/it] 52%|█████▏ | 5153/10000 [20:15:31<18:47:47, 13.96s/it] {'loss': 0.0202, 'learning_rate': 2.427e-05, 'epoch': 6.74} 52%|█████▏ | 5153/10000 [20:15:31<18:47:47, 13.96s/it] 52%|█████▏ | 5154/10000 [20:15:45<18:46:05, 13.94s/it] {'loss': 0.0196, 'learning_rate': 2.4265000000000002e-05, 'epoch': 6.75} 52%|█████▏ | 5154/10000 [20:15:45<18:46:05, 13.94s/it] 52%|█████▏ | 5155/10000 [20:15:59<18:44:23, 13.92s/it] {'loss': 0.021, 'learning_rate': 2.426e-05, 'epoch': 6.75} 52%|█████▏ | 5155/10000 [20:15:59<18:44:23, 13.92s/it] 52%|█████▏ | 5156/10000 [20:16:13<18:41:21, 13.89s/it] {'loss': 0.0254, 'learning_rate': 2.4255e-05, 'epoch': 6.75} 52%|█████▏ | 5156/10000 [20:16:13<18:41:21, 13.89s/it] 52%|█████▏ | 5157/10000 [20:16:26<18:43:23, 13.92s/it] {'loss': 0.0251, 'learning_rate': 2.425e-05, 'epoch': 6.75} 52%|█████▏ | 5157/10000 [20:16:27<18:43:23, 13.92s/it] 52%|█████▏ | 5158/10000 [20:16:40<18:43:42, 13.92s/it] {'loss': 0.0224, 'learning_rate': 2.4245000000000002e-05, 'epoch': 6.75} 52%|█████▏ | 5158/10000 [20:16:40<18:43:42, 13.92s/it] 52%|█████▏ | 5159/10000 [20:16:54<18:44:14, 13.93s/it] {'loss': 0.0247, 'learning_rate': 2.4240000000000002e-05, 'epoch': 6.75} 52%|█████▏ | 5159/10000 [20:16:54<18:44:14, 13.93s/it] 52%|█████▏ | 5160/10000 [20:17:08<18:45:15, 13.95s/it] {'loss': 0.0216, 'learning_rate': 2.4235e-05, 'epoch': 6.75} 52%|█████▏ | 5160/10000 [20:17:08<18:45:15, 13.95s/it] 52%|█████▏ | 5161/10000 [20:17:22<18:46:23, 13.97s/it] {'loss': 0.0202, 'learning_rate': 2.423e-05, 'epoch': 6.76} 52%|█████▏ | 5161/10000 [20:17:22<18:46:23, 13.97s/it] 52%|█████▏ | 5162/10000 [20:17:36<18:48:37, 14.00s/it] {'loss': 0.0291, 'learning_rate': 2.4225e-05, 'epoch': 6.76} 52%|█████▏ | 5162/10000 [20:17:36<18:48:37, 14.00s/it] 52%|█████▏ | 5163/10000 [20:17:50<18:45:04, 13.96s/it] {'loss': 0.0246, 'learning_rate': 2.4220000000000002e-05, 'epoch': 6.76} 52%|█████▏ | 5163/10000 [20:17:50<18:45:04, 13.96s/it] 52%|█████▏ | 5164/10000 [20:18:04<18:44:38, 13.95s/it] {'loss': 0.0213, 'learning_rate': 2.4215e-05, 'epoch': 6.76} 52%|█████▏ | 5164/10000 [20:18:04<18:44:38, 13.95s/it] 52%|█████▏ | 5165/10000 [20:18:18<18:44:25, 13.95s/it] {'loss': 0.0216, 'learning_rate': 2.4210000000000004e-05, 'epoch': 6.76} 52%|█████▏ | 5165/10000 [20:18:18<18:44:25, 13.95s/it] 52%|█████▏ | 5166/10000 [20:18:32<18:47:01, 13.99s/it] {'loss': 0.019, 'learning_rate': 2.4205e-05, 'epoch': 6.76} 52%|█████▏ | 5166/10000 [20:18:32<18:47:01, 13.99s/it] 52%|█████▏ | 5167/10000 [20:18:46<18:45:48, 13.98s/it] {'loss': 0.023, 'learning_rate': 2.4200000000000002e-05, 'epoch': 6.76} 52%|█████▏ | 5167/10000 [20:18:46<18:45:48, 13.98s/it] 52%|█████▏ | 5168/10000 [20:19:00<18:47:00, 13.99s/it] {'loss': 0.0237, 'learning_rate': 2.4195e-05, 'epoch': 6.76} 52%|█████▏ | 5168/10000 [20:19:00<18:47:00, 13.99s/it] 52%|█████▏ | 5169/10000 [20:19:14<18:44:57, 13.97s/it] {'loss': 0.0248, 'learning_rate': 2.419e-05, 'epoch': 6.77} 52%|█████▏ | 5169/10000 [20:19:14<18:44:57, 13.97s/it] 52%|█████▏ | 5170/10000 [20:19:28<18:45:07, 13.98s/it] {'loss': 0.023, 'learning_rate': 2.4185000000000003e-05, 'epoch': 6.77} 52%|█████▏ | 5170/10000 [20:19:28<18:45:07, 13.98s/it] 52%|█████▏ | 5171/10000 [20:19:42<18:41:47, 13.94s/it] {'loss': 0.021, 'learning_rate': 2.418e-05, 'epoch': 6.77} 52%|█████▏ | 5171/10000 [20:19:42<18:41:47, 13.94s/it] 52%|█████▏ | 5172/10000 [20:19:56<18:38:59, 13.91s/it] {'loss': 0.0166, 'learning_rate': 2.4175e-05, 'epoch': 6.77} 52%|█████▏ | 5172/10000 [20:19:56<18:38:59, 13.91s/it] 52%|█████▏ | 5173/10000 [20:20:10<18:43:08, 13.96s/it] {'loss': 0.0206, 'learning_rate': 2.417e-05, 'epoch': 6.77} 52%|█████▏ | 5173/10000 [20:20:10<18:43:08, 13.96s/it] 52%|█████▏ | 5174/10000 [20:20:24<18:45:19, 13.99s/it] {'loss': 0.0179, 'learning_rate': 2.4165e-05, 'epoch': 6.77} 52%|█████▏ | 5174/10000 [20:20:24<18:45:19, 13.99s/it] 52%|█████▏ | 5175/10000 [20:20:38<18:43:22, 13.97s/it] {'loss': 0.0184, 'learning_rate': 2.4160000000000002e-05, 'epoch': 6.77} 52%|█████▏ | 5175/10000 [20:20:38<18:43:22, 13.97s/it] 52%|█████▏ | 5176/10000 [20:20:52<18:42:14, 13.96s/it] {'loss': 0.0192, 'learning_rate': 2.4154999999999998e-05, 'epoch': 6.77} 52%|█████▏ | 5176/10000 [20:20:52<18:42:14, 13.96s/it] 52%|█████▏ | 5177/10000 [20:21:06<18:40:31, 13.94s/it] {'loss': 0.1087, 'learning_rate': 2.415e-05, 'epoch': 6.78} 52%|█████▏ | 5177/10000 [20:21:06<18:40:31, 13.94s/it] 52%|█████▏ | 5178/10000 [20:21:20<18:42:24, 13.97s/it] {'loss': 0.0215, 'learning_rate': 2.4145e-05, 'epoch': 6.78} 52%|█████▏ | 5178/10000 [20:21:20<18:42:24, 13.97s/it] 52%|█████▏ | 5179/10000 [20:21:34<18:41:55, 13.96s/it] {'loss': 0.0249, 'learning_rate': 2.4140000000000003e-05, 'epoch': 6.78} 52%|█████▏ | 5179/10000 [20:21:34<18:41:55, 13.96s/it] 52%|█████▏ | 5180/10000 [20:21:48<18:42:23, 13.97s/it] {'loss': 0.0188, 'learning_rate': 2.4135000000000002e-05, 'epoch': 6.78} 52%|█████▏ | 5180/10000 [20:21:48<18:42:23, 13.97s/it] 52%|█████▏ | 5181/10000 [20:22:02<18:40:19, 13.95s/it] {'loss': 0.023, 'learning_rate': 2.413e-05, 'epoch': 6.78} 52%|█████▏ | 5181/10000 [20:22:02<18:40:19, 13.95s/it][2024-11-04 16:40:22,980] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 52%|█████▏ | 5182/10000 [20:22:15<18:20:02, 13.70s/it] {'loss': 0.0987, 'learning_rate': 2.413e-05, 'epoch': 6.78} 52%|█████▏ | 5182/10000 [20:22:15<18:20:02, 13.70s/it] 52%|█████▏ | 5183/10000 [20:22:29<18:25:44, 13.77s/it] {'loss': 0.0207, 'learning_rate': 2.4125e-05, 'epoch': 6.78} 52%|█████▏ | 5183/10000 [20:22:29<18:25:44, 13.77s/it] 52%|█████▏ | 5184/10000 [20:22:43<18:28:33, 13.81s/it] {'loss': 0.0231, 'learning_rate': 2.412e-05, 'epoch': 6.79} 52%|█████▏ | 5184/10000 [20:22:43<18:28:33, 13.81s/it] 52%|█████▏ | 5185/10000 [20:22:57<18:31:47, 13.85s/it] {'loss': 0.026, 'learning_rate': 2.4115000000000002e-05, 'epoch': 6.79} 52%|█████▏ | 5185/10000 [20:22:57<18:31:47, 13.85s/it] 52%|█████▏ | 5186/10000 [20:23:10<18:33:26, 13.88s/it] {'loss': 0.0209, 'learning_rate': 2.411e-05, 'epoch': 6.79} 52%|█████▏ | 5186/10000 [20:23:10<18:33:26, 13.88s/it] 52%|█████▏ | 5187/10000 [20:23:24<18:35:57, 13.91s/it] {'loss': 0.0235, 'learning_rate': 2.4105e-05, 'epoch': 6.79} 52%|█████▏ | 5187/10000 [20:23:25<18:35:57, 13.91s/it] 52%|█████▏ | 5188/10000 [20:23:38<18:37:39, 13.94s/it] {'loss': 0.0205, 'learning_rate': 2.41e-05, 'epoch': 6.79} 52%|█████▏ | 5188/10000 [20:23:38<18:37:39, 13.94s/it] 52%|█████▏ | 5189/10000 [20:23:52<18:36:45, 13.93s/it] {'loss': 0.0233, 'learning_rate': 2.4095000000000002e-05, 'epoch': 6.79} 52%|█████▏ | 5189/10000 [20:23:52<18:36:45, 13.93s/it] 52%|█████▏ | 5190/10000 [20:24:06<18:36:20, 13.93s/it] {'loss': 0.0219, 'learning_rate': 2.409e-05, 'epoch': 6.79} 52%|█████▏ | 5190/10000 [20:24:06<18:36:20, 13.93s/it] 52%|█████▏ | 5191/10000 [20:24:20<18:35:48, 13.92s/it] {'loss': 0.0249, 'learning_rate': 2.4085e-05, 'epoch': 6.79} 52%|█████▏ | 5191/10000 [20:24:20<18:35:48, 13.92s/it] 52%|█████▏ | 5192/10000 [20:24:34<18:38:14, 13.95s/it] {'loss': 0.0224, 'learning_rate': 2.408e-05, 'epoch': 6.8} 52%|█████▏ | 5192/10000 [20:24:34<18:38:14, 13.95s/it] 52%|█████▏ | 5193/10000 [20:24:48<18:41:17, 14.00s/it] {'loss': 0.0227, 'learning_rate': 2.4075e-05, 'epoch': 6.8} 52%|█████▏ | 5193/10000 [20:24:48<18:41:17, 14.00s/it] 52%|█████▏ | 5194/10000 [20:25:02<18:40:31, 13.99s/it] {'loss': 0.022, 'learning_rate': 2.407e-05, 'epoch': 6.8} 52%|█████▏ | 5194/10000 [20:25:02<18:40:31, 13.99s/it] 52%|█████▏ | 5195/10000 [20:25:16<18:37:31, 13.95s/it] {'loss': 0.0201, 'learning_rate': 2.4065e-05, 'epoch': 6.8} 52%|█████▏ | 5195/10000 [20:25:16<18:37:31, 13.95s/it] 52%|█████▏ | 5196/10000 [20:25:30<18:36:46, 13.95s/it] {'loss': 0.0194, 'learning_rate': 2.4060000000000003e-05, 'epoch': 6.8} 52%|█████▏ | 5196/10000 [20:25:30<18:36:46, 13.95s/it] 52%|█████▏ | 5197/10000 [20:25:44<18:37:57, 13.97s/it] {'loss': 0.0195, 'learning_rate': 2.4055000000000003e-05, 'epoch': 6.8} 52%|█████▏ | 5197/10000 [20:25:44<18:37:57, 13.97s/it] 52%|█████▏ | 5198/10000 [20:25:58<18:37:35, 13.96s/it] {'loss': 0.0221, 'learning_rate': 2.4050000000000002e-05, 'epoch': 6.8} 52%|█████▏ | 5198/10000 [20:25:58<18:37:35, 13.96s/it] 52%|█████▏ | 5199/10000 [20:26:12<18:34:58, 13.93s/it] {'loss': 0.0208, 'learning_rate': 2.4045e-05, 'epoch': 6.8} 52%|█████▏ | 5199/10000 [20:26:12<18:34:58, 13.93s/it] 52%|█████▏ | 5200/10000 [20:26:26<18:33:36, 13.92s/it] {'loss': 0.0208, 'learning_rate': 2.404e-05, 'epoch': 6.81} 52%|█████▏ | 5200/10000 [20:26:26<18:33:36, 13.92s/it] 52%|█████▏ | 5201/10000 [20:26:40<18:38:10, 13.98s/it] {'loss': 0.0225, 'learning_rate': 2.4035000000000003e-05, 'epoch': 6.81} 52%|█████▏ | 5201/10000 [20:26:40<18:38:10, 13.98s/it] 52%|█████▏ | 5202/10000 [20:26:54<18:35:26, 13.95s/it] {'loss': 0.0223, 'learning_rate': 2.4030000000000002e-05, 'epoch': 6.81} 52%|█████▏ | 5202/10000 [20:26:54<18:35:26, 13.95s/it] 52%|█████▏ | 5203/10000 [20:27:08<18:32:41, 13.92s/it] {'loss': 0.0218, 'learning_rate': 2.4025e-05, 'epoch': 6.81} 52%|█████▏ | 5203/10000 [20:27:08<18:32:41, 13.92s/it] 52%|█████▏ | 5204/10000 [20:27:22<18:31:18, 13.90s/it] {'loss': 0.0293, 'learning_rate': 2.402e-05, 'epoch': 6.81} 52%|█████▏ | 5204/10000 [20:27:22<18:31:18, 13.90s/it] 52%|█████▏ | 5205/10000 [20:27:35<18:29:38, 13.89s/it] {'loss': 0.0208, 'learning_rate': 2.4015000000000003e-05, 'epoch': 6.81} 52%|█████▏ | 5205/10000 [20:27:35<18:29:38, 13.89s/it] 52%|█████▏ | 5206/10000 [20:27:49<18:31:23, 13.91s/it] {'loss': 0.0235, 'learning_rate': 2.4010000000000002e-05, 'epoch': 6.81} 52%|█████▏ | 5206/10000 [20:27:49<18:31:23, 13.91s/it] 52%|█████▏ | 5207/10000 [20:28:03<18:36:02, 13.97s/it] {'loss': 0.0244, 'learning_rate': 2.4005e-05, 'epoch': 6.82} 52%|█████▏ | 5207/10000 [20:28:03<18:36:02, 13.97s/it] 52%|█████▏ | 5208/10000 [20:28:17<18:36:22, 13.98s/it] {'loss': 0.0262, 'learning_rate': 2.4e-05, 'epoch': 6.82} 52%|█████▏ | 5208/10000 [20:28:17<18:36:22, 13.98s/it] 52%|█████▏ | 5209/10000 [20:28:31<18:34:33, 13.96s/it] {'loss': 0.0222, 'learning_rate': 2.3995e-05, 'epoch': 6.82} 52%|█████▏ | 5209/10000 [20:28:31<18:34:33, 13.96s/it] 52%|█████▏ | 5210/10000 [20:28:45<18:29:57, 13.90s/it] {'loss': 0.0224, 'learning_rate': 2.3990000000000002e-05, 'epoch': 6.82} 52%|█████▏ | 5210/10000 [20:28:45<18:29:57, 13.90s/it] 52%|█████▏ | 5211/10000 [20:28:59<18:31:16, 13.92s/it] {'loss': 0.0226, 'learning_rate': 2.3985e-05, 'epoch': 6.82} 52%|█████▏ | 5211/10000 [20:28:59<18:31:16, 13.92s/it] 52%|█████▏ | 5212/10000 [20:29:13<18:30:25, 13.92s/it] {'loss': 0.0239, 'learning_rate': 2.398e-05, 'epoch': 6.82} 52%|█████▏ | 5212/10000 [20:29:13<18:30:25, 13.92s/it] 52%|█████▏ | 5213/10000 [20:29:27<18:31:52, 13.94s/it] {'loss': 0.0204, 'learning_rate': 2.3975e-05, 'epoch': 6.82} 52%|█████▏ | 5213/10000 [20:29:27<18:31:52, 13.94s/it] 52%|█████▏ | 5214/10000 [20:29:41<18:33:33, 13.96s/it] {'loss': 0.0269, 'learning_rate': 2.397e-05, 'epoch': 6.82} 52%|█████▏ | 5214/10000 [20:29:41<18:33:33, 13.96s/it] 52%|█████▏ | 5215/10000 [20:29:55<18:34:24, 13.97s/it] {'loss': 0.0208, 'learning_rate': 2.3965000000000002e-05, 'epoch': 6.83} 52%|█████▏ | 5215/10000 [20:29:55<18:34:24, 13.97s/it] 52%|█████▏ | 5216/10000 [20:30:09<18:32:45, 13.96s/it] {'loss': 0.0244, 'learning_rate': 2.396e-05, 'epoch': 6.83} 52%|█████▏ | 5216/10000 [20:30:09<18:32:45, 13.96s/it] 52%|█████▏ | 5217/10000 [20:30:23<18:33:25, 13.97s/it] {'loss': 0.0217, 'learning_rate': 2.3955000000000004e-05, 'epoch': 6.83} 52%|█████▏ | 5217/10000 [20:30:23<18:33:25, 13.97s/it] 52%|█████▏ | 5218/10000 [20:30:37<18:32:45, 13.96s/it] {'loss': 0.0204, 'learning_rate': 2.395e-05, 'epoch': 6.83} 52%|█████▏ | 5218/10000 [20:30:37<18:32:45, 13.96s/it] 52%|█████▏ | 5219/10000 [20:30:51<18:31:58, 13.95s/it] {'loss': 0.0226, 'learning_rate': 2.3945000000000002e-05, 'epoch': 6.83} 52%|█████▏ | 5219/10000 [20:30:51<18:31:58, 13.95s/it] 52%|█████▏ | 5220/10000 [20:31:05<18:30:20, 13.94s/it] {'loss': 0.0208, 'learning_rate': 2.394e-05, 'epoch': 6.83} 52%|█████▏ | 5220/10000 [20:31:05<18:30:20, 13.94s/it] 52%|█████▏ | 5221/10000 [20:31:19<18:29:59, 13.94s/it] {'loss': 0.0205, 'learning_rate': 2.3935e-05, 'epoch': 6.83} 52%|█████▏ | 5221/10000 [20:31:19<18:29:59, 13.94s/it] 52%|█████▏ | 5222/10000 [20:31:33<18:33:37, 13.98s/it] {'loss': 0.0195, 'learning_rate': 2.3930000000000003e-05, 'epoch': 6.84} 52%|█████▏ | 5222/10000 [20:31:33<18:33:37, 13.98s/it] 52%|█████▏ | 5223/10000 [20:31:47<18:33:13, 13.98s/it] {'loss': 0.024, 'learning_rate': 2.3925e-05, 'epoch': 6.84} 52%|█████▏ | 5223/10000 [20:31:47<18:33:13, 13.98s/it] 52%|█████▏ | 5224/10000 [20:32:01<18:32:52, 13.98s/it] {'loss': 0.024, 'learning_rate': 2.392e-05, 'epoch': 6.84} 52%|█████▏ | 5224/10000 [20:32:01<18:32:52, 13.98s/it] 52%|█████▏ | 5225/10000 [20:32:14<18:27:55, 13.92s/it] {'loss': 0.0196, 'learning_rate': 2.3915e-05, 'epoch': 6.84} 52%|█████▏ | 5225/10000 [20:32:14<18:27:55, 13.92s/it] 52%|█████▏ | 5226/10000 [20:32:28<18:26:13, 13.90s/it] {'loss': 0.0208, 'learning_rate': 2.3910000000000003e-05, 'epoch': 6.84} 52%|█████▏ | 5226/10000 [20:32:28<18:26:13, 13.90s/it] 52%|█████▏ | 5227/10000 [20:32:42<18:28:20, 13.93s/it] {'loss': 0.0204, 'learning_rate': 2.3905000000000002e-05, 'epoch': 6.84} 52%|█████▏ | 5227/10000 [20:32:42<18:28:20, 13.93s/it] 52%|█████▏ | 5228/10000 [20:32:56<18:28:03, 13.93s/it] {'loss': 0.0207, 'learning_rate': 2.39e-05, 'epoch': 6.84} 52%|█████▏ | 5228/10000 [20:32:56<18:28:03, 13.93s/it] 52%|█████▏ | 5229/10000 [20:33:10<18:27:32, 13.93s/it] {'loss': 0.0218, 'learning_rate': 2.3895e-05, 'epoch': 6.84} 52%|█████▏ | 5229/10000 [20:33:10<18:27:32, 13.93s/it] 52%|█████▏ | 5230/10000 [20:33:24<18:28:20, 13.94s/it] {'loss': 0.0209, 'learning_rate': 2.389e-05, 'epoch': 6.85} 52%|█████▏ | 5230/10000 [20:33:24<18:28:20, 13.94s/it] 52%|█████▏ | 5231/10000 [20:33:38<18:27:53, 13.94s/it] {'loss': 0.0192, 'learning_rate': 2.3885000000000003e-05, 'epoch': 6.85} 52%|█████▏ | 5231/10000 [20:33:38<18:27:53, 13.94s/it] 52%|█████▏ | 5232/10000 [20:33:52<18:29:41, 13.96s/it] {'loss': 0.0173, 'learning_rate': 2.3880000000000002e-05, 'epoch': 6.85} 52%|█████▏ | 5232/10000 [20:33:52<18:29:41, 13.96s/it] 52%|█████▏ | 5233/10000 [20:34:06<18:25:40, 13.92s/it] {'loss': 0.0278, 'learning_rate': 2.3875e-05, 'epoch': 6.85} 52%|█████▏ | 5233/10000 [20:34:06<18:25:40, 13.92s/it] 52%|█████▏ | 5234/10000 [20:34:20<18:24:43, 13.91s/it] {'loss': 0.0192, 'learning_rate': 2.387e-05, 'epoch': 6.85} 52%|█████▏ | 5234/10000 [20:34:20<18:24:43, 13.91s/it] 52%|█████▏ | 5235/10000 [20:34:34<18:24:46, 13.91s/it] {'loss': 0.0215, 'learning_rate': 2.3865000000000003e-05, 'epoch': 6.85} 52%|█████▏ | 5235/10000 [20:34:34<18:24:46, 13.91s/it] 52%|█████▏ | 5236/10000 [20:34:48<18:26:45, 13.94s/it] {'loss': 0.0206, 'learning_rate': 2.3860000000000002e-05, 'epoch': 6.85} 52%|█████▏ | 5236/10000 [20:34:48<18:26:45, 13.94s/it] 52%|█████▏ | 5237/10000 [20:35:02<18:30:15, 13.99s/it] {'loss': 0.0195, 'learning_rate': 2.3855e-05, 'epoch': 6.85} 52%|█████▏ | 5237/10000 [20:35:02<18:30:15, 13.99s/it] 52%|█████▏ | 5238/10000 [20:35:16<18:31:36, 14.01s/it] {'loss': 0.0178, 'learning_rate': 2.385e-05, 'epoch': 6.86} 52%|█████▏ | 5238/10000 [20:35:16<18:31:36, 14.01s/it] 52%|█████▏ | 5239/10000 [20:35:30<18:29:19, 13.98s/it] {'loss': 0.0177, 'learning_rate': 2.3845e-05, 'epoch': 6.86} 52%|█████▏ | 5239/10000 [20:35:30<18:29:19, 13.98s/it] 52%|█████▏ | 5240/10000 [20:35:44<18:29:40, 13.99s/it] {'loss': 0.0255, 'learning_rate': 2.3840000000000002e-05, 'epoch': 6.86} 52%|█████▏ | 5240/10000 [20:35:44<18:29:40, 13.99s/it] 52%|█████▏ | 5241/10000 [20:35:58<18:27:32, 13.96s/it] {'loss': 0.0238, 'learning_rate': 2.3835e-05, 'epoch': 6.86} 52%|█████▏ | 5241/10000 [20:35:58<18:27:32, 13.96s/it] 52%|█████▏ | 5242/10000 [20:36:12<18:25:13, 13.94s/it] {'loss': 0.0205, 'learning_rate': 2.3830000000000004e-05, 'epoch': 6.86} 52%|█████▏ | 5242/10000 [20:36:12<18:25:13, 13.94s/it] 52%|█████▏ | 5243/10000 [20:36:25<18:22:34, 13.91s/it] {'loss': 0.0244, 'learning_rate': 2.3825e-05, 'epoch': 6.86} 52%|█████▏ | 5243/10000 [20:36:25<18:22:34, 13.91s/it] 52%|█████▏ | 5244/10000 [20:36:39<18:21:48, 13.90s/it] {'loss': 0.0219, 'learning_rate': 2.3820000000000002e-05, 'epoch': 6.86} 52%|█████▏ | 5244/10000 [20:36:39<18:21:48, 13.90s/it] 52%|█████▏ | 5245/10000 [20:36:53<18:22:20, 13.91s/it] {'loss': 0.0243, 'learning_rate': 2.3815e-05, 'epoch': 6.87} 52%|█████▏ | 5245/10000 [20:36:53<18:22:20, 13.91s/it] 52%|█████▏ | 5246/10000 [20:37:07<18:26:54, 13.97s/it] {'loss': 0.0217, 'learning_rate': 2.381e-05, 'epoch': 6.87} 52%|█████▏ | 5246/10000 [20:37:07<18:26:54, 13.97s/it] 52%|█████▏ | 5247/10000 [20:37:21<18:26:38, 13.97s/it] {'loss': 0.0215, 'learning_rate': 2.3805000000000003e-05, 'epoch': 6.87} 52%|█████▏ | 5247/10000 [20:37:21<18:26:38, 13.97s/it] 52%|█████▏ | 5248/10000 [20:37:35<18:26:49, 13.98s/it] {'loss': 0.0268, 'learning_rate': 2.38e-05, 'epoch': 6.87} 52%|█████▏ | 5248/10000 [20:37:35<18:26:49, 13.98s/it] 52%|█████▏ | 5249/10000 [20:37:49<18:24:22, 13.95s/it] {'loss': 0.0214, 'learning_rate': 2.3795000000000002e-05, 'epoch': 6.87} 52%|█████▏ | 5249/10000 [20:37:49<18:24:22, 13.95s/it] 52%|█████▎ | 5250/10000 [20:38:03<18:23:16, 13.94s/it] {'loss': 0.0224, 'learning_rate': 2.379e-05, 'epoch': 6.87} 52%|█████▎ | 5250/10000 [20:38:03<18:23:16, 13.94s/it] 53%|█████▎ | 5251/10000 [20:38:17<18:20:56, 13.91s/it] {'loss': 0.0206, 'learning_rate': 2.3785e-05, 'epoch': 6.87} 53%|█████▎ | 5251/10000 [20:38:17<18:20:56, 13.91s/it] 53%|█████▎ | 5252/10000 [20:38:31<18:17:26, 13.87s/it] {'loss': 0.0233, 'learning_rate': 2.3780000000000003e-05, 'epoch': 6.87} 53%|█████▎ | 5252/10000 [20:38:31<18:17:26, 13.87s/it] 53%|█████▎ | 5253/10000 [20:38:44<18:14:04, 13.83s/it] {'loss': 0.0188, 'learning_rate': 2.3775e-05, 'epoch': 6.88} 53%|█████▎ | 5253/10000 [20:38:44<18:14:04, 13.83s/it] 53%|█████▎ | 5254/10000 [20:38:58<18:18:47, 13.89s/it] {'loss': 0.0223, 'learning_rate': 2.377e-05, 'epoch': 6.88} 53%|█████▎ | 5254/10000 [20:38:59<18:18:47, 13.89s/it] 53%|█████▎ | 5255/10000 [20:39:13<18:23:22, 13.95s/it] {'loss': 0.0227, 'learning_rate': 2.3765e-05, 'epoch': 6.88} 53%|█████▎ | 5255/10000 [20:39:13<18:23:22, 13.95s/it] 53%|█████▎ | 5256/10000 [20:39:27<18:23:15, 13.95s/it] {'loss': 0.0247, 'learning_rate': 2.3760000000000003e-05, 'epoch': 6.88} 53%|█████▎ | 5256/10000 [20:39:27<18:23:15, 13.95s/it] 53%|█████▎ | 5257/10000 [20:39:41<18:23:45, 13.96s/it] {'loss': 0.0208, 'learning_rate': 2.3755000000000002e-05, 'epoch': 6.88} 53%|█████▎ | 5257/10000 [20:39:41<18:23:45, 13.96s/it] 53%|█████▎ | 5258/10000 [20:39:54<18:24:12, 13.97s/it] {'loss': 0.0239, 'learning_rate': 2.375e-05, 'epoch': 6.88} 53%|█████▎ | 5258/10000 [20:39:55<18:24:12, 13.97s/it] 53%|█████▎ | 5259/10000 [20:40:08<18:21:59, 13.95s/it] {'loss': 0.0255, 'learning_rate': 2.3745e-05, 'epoch': 6.88} 53%|█████▎ | 5259/10000 [20:40:08<18:21:59, 13.95s/it] 53%|█████▎ | 5260/10000 [20:40:22<18:18:54, 13.91s/it] {'loss': 0.0265, 'learning_rate': 2.374e-05, 'epoch': 6.88} 53%|█████▎ | 5260/10000 [20:40:22<18:18:54, 13.91s/it] 53%|█████▎ | 5261/10000 [20:40:36<18:22:49, 13.96s/it] {'loss': 0.0213, 'learning_rate': 2.3735000000000002e-05, 'epoch': 6.89} 53%|█████▎ | 5261/10000 [20:40:36<18:22:49, 13.96s/it] 53%|█████▎ | 5262/10000 [20:40:50<18:25:06, 13.99s/it] {'loss': 0.023, 'learning_rate': 2.373e-05, 'epoch': 6.89} 53%|█████▎ | 5262/10000 [20:40:50<18:25:06, 13.99s/it] 53%|█████▎ | 5263/10000 [20:41:04<18:21:29, 13.95s/it] {'loss': 0.0214, 'learning_rate': 2.3725e-05, 'epoch': 6.89} 53%|█████▎ | 5263/10000 [20:41:04<18:21:29, 13.95s/it] 53%|█████▎ | 5264/10000 [20:41:18<18:23:04, 13.97s/it] {'loss': 0.0231, 'learning_rate': 2.372e-05, 'epoch': 6.89} 53%|█████▎ | 5264/10000 [20:41:18<18:23:04, 13.97s/it] 53%|█████▎ | 5265/10000 [20:41:32<18:18:16, 13.92s/it] {'loss': 0.0249, 'learning_rate': 2.3715000000000002e-05, 'epoch': 6.89} 53%|█████▎ | 5265/10000 [20:41:32<18:18:16, 13.92s/it] 53%|█████▎ | 5266/10000 [20:41:46<18:19:25, 13.93s/it] {'loss': 0.0183, 'learning_rate': 2.371e-05, 'epoch': 6.89} 53%|█████▎ | 5266/10000 [20:41:46<18:19:25, 13.93s/it] 53%|█████▎ | 5267/10000 [20:42:00<18:17:35, 13.91s/it] {'loss': 0.0232, 'learning_rate': 2.3705e-05, 'epoch': 6.89} 53%|█████▎ | 5267/10000 [20:42:00<18:17:35, 13.91s/it] 53%|█████▎ | 5268/10000 [20:42:14<18:17:46, 13.92s/it] {'loss': 0.0267, 'learning_rate': 2.37e-05, 'epoch': 6.9} 53%|█████▎ | 5268/10000 [20:42:14<18:17:46, 13.92s/it] 53%|█████▎ | 5269/10000 [20:42:28<18:15:01, 13.89s/it] {'loss': 0.0251, 'learning_rate': 2.3695e-05, 'epoch': 6.9} 53%|█████▎ | 5269/10000 [20:42:28<18:15:01, 13.89s/it] 53%|█████▎ | 5270/10000 [20:42:42<18:16:23, 13.91s/it] {'loss': 0.0222, 'learning_rate': 2.3690000000000002e-05, 'epoch': 6.9} 53%|█████▎ | 5270/10000 [20:42:42<18:16:23, 13.91s/it] 53%|█████▎ | 5271/10000 [20:42:56<18:20:59, 13.97s/it] {'loss': 0.0191, 'learning_rate': 2.3685e-05, 'epoch': 6.9} 53%|█████▎ | 5271/10000 [20:42:56<18:20:59, 13.97s/it] 53%|█████▎ | 5272/10000 [20:43:10<18:20:09, 13.96s/it] {'loss': 0.0208, 'learning_rate': 2.3680000000000004e-05, 'epoch': 6.9} 53%|█████▎ | 5272/10000 [20:43:10<18:20:09, 13.96s/it] 53%|█████▎ | 5273/10000 [20:43:24<18:20:24, 13.97s/it] {'loss': 0.0266, 'learning_rate': 2.3675e-05, 'epoch': 6.9} 53%|█████▎ | 5273/10000 [20:43:24<18:20:24, 13.97s/it] 53%|█████▎ | 5274/10000 [20:43:38<18:22:53, 14.00s/it] {'loss': 0.0202, 'learning_rate': 2.3670000000000002e-05, 'epoch': 6.9} 53%|█████▎ | 5274/10000 [20:43:38<18:22:53, 14.00s/it] 53%|█████▎ | 5275/10000 [20:43:52<18:18:58, 13.96s/it] {'loss': 0.0222, 'learning_rate': 2.3665e-05, 'epoch': 6.9} 53%|█████▎ | 5275/10000 [20:43:52<18:18:58, 13.96s/it] 53%|█████▎ | 5276/10000 [20:44:05<18:18:47, 13.96s/it] {'loss': 0.0207, 'learning_rate': 2.366e-05, 'epoch': 6.91} 53%|█████▎ | 5276/10000 [20:44:06<18:18:47, 13.96s/it] 53%|█████▎ | 5277/10000 [20:44:19<18:18:03, 13.95s/it] {'loss': 0.022, 'learning_rate': 2.3655000000000003e-05, 'epoch': 6.91} 53%|█████▎ | 5277/10000 [20:44:19<18:18:03, 13.95s/it] 53%|█████▎ | 5278/10000 [20:44:33<18:19:17, 13.97s/it] {'loss': 0.0224, 'learning_rate': 2.365e-05, 'epoch': 6.91} 53%|█████▎ | 5278/10000 [20:44:33<18:19:17, 13.97s/it] 53%|█████▎ | 5279/10000 [20:44:47<18:18:53, 13.97s/it] {'loss': 0.0269, 'learning_rate': 2.3645e-05, 'epoch': 6.91} 53%|█████▎ | 5279/10000 [20:44:47<18:18:53, 13.97s/it] 53%|█████▎ | 5280/10000 [20:45:01<18:14:27, 13.91s/it] {'loss': 0.0222, 'learning_rate': 2.364e-05, 'epoch': 6.91} 53%|█████▎ | 5280/10000 [20:45:01<18:14:27, 13.91s/it] 53%|█████▎ | 5281/10000 [20:45:15<18:15:51, 13.93s/it] {'loss': 0.0213, 'learning_rate': 2.3635000000000003e-05, 'epoch': 6.91} 53%|█████▎ | 5281/10000 [20:45:15<18:15:51, 13.93s/it] 53%|█████▎ | 5282/10000 [20:45:29<18:13:47, 13.91s/it] {'loss': 0.0219, 'learning_rate': 2.3630000000000002e-05, 'epoch': 6.91} 53%|█████▎ | 5282/10000 [20:45:29<18:13:47, 13.91s/it] 53%|█████▎ | 5283/10000 [20:45:43<18:17:17, 13.96s/it] {'loss': 0.0266, 'learning_rate': 2.3624999999999998e-05, 'epoch': 6.91} 53%|█████▎ | 5283/10000 [20:45:43<18:17:17, 13.96s/it] 53%|█████▎ | 5284/10000 [20:45:57<18:20:07, 14.00s/it] {'loss': 0.0235, 'learning_rate': 2.362e-05, 'epoch': 6.92} 53%|█████▎ | 5284/10000 [20:45:57<18:20:07, 14.00s/it] 53%|█████▎ | 5285/10000 [20:46:11<18:16:21, 13.95s/it] {'loss': 0.0232, 'learning_rate': 2.3615e-05, 'epoch': 6.92} 53%|█████▎ | 5285/10000 [20:46:11<18:16:21, 13.95s/it] 53%|█████▎ | 5286/10000 [20:46:25<18:16:04, 13.95s/it] {'loss': 0.0222, 'learning_rate': 2.3610000000000003e-05, 'epoch': 6.92} 53%|█████▎ | 5286/10000 [20:46:25<18:16:04, 13.95s/it] 53%|█████▎ | 5287/10000 [20:46:39<18:17:23, 13.97s/it] {'loss': 0.0216, 'learning_rate': 2.3605000000000002e-05, 'epoch': 6.92} 53%|█████▎ | 5287/10000 [20:46:39<18:17:23, 13.97s/it] 53%|█████▎ | 5288/10000 [20:46:53<18:18:50, 13.99s/it] {'loss': 0.0232, 'learning_rate': 2.36e-05, 'epoch': 6.92} 53%|█████▎ | 5288/10000 [20:46:53<18:18:50, 13.99s/it] 53%|█████▎ | 5289/10000 [20:47:07<18:19:29, 14.00s/it] {'loss': 0.0249, 'learning_rate': 2.3595e-05, 'epoch': 6.92} 53%|█████▎ | 5289/10000 [20:47:07<18:19:29, 14.00s/it] 53%|█████▎ | 5290/10000 [20:47:21<18:16:26, 13.97s/it] {'loss': 0.0193, 'learning_rate': 2.359e-05, 'epoch': 6.92} 53%|█████▎ | 5290/10000 [20:47:21<18:16:26, 13.97s/it] 53%|█████▎ | 5291/10000 [20:47:35<18:14:55, 13.95s/it] {'loss': 0.0184, 'learning_rate': 2.3585000000000002e-05, 'epoch': 6.93} 53%|█████▎ | 5291/10000 [20:47:35<18:14:55, 13.95s/it] 53%|█████▎ | 5292/10000 [20:47:49<18:15:38, 13.96s/it] {'loss': 0.0264, 'learning_rate': 2.358e-05, 'epoch': 6.93} 53%|█████▎ | 5292/10000 [20:47:49<18:15:38, 13.96s/it] 53%|█████▎ | 5293/10000 [20:48:03<18:15:12, 13.96s/it] {'loss': 0.0207, 'learning_rate': 2.3575e-05, 'epoch': 6.93} 53%|█████▎ | 5293/10000 [20:48:03<18:15:12, 13.96s/it] 53%|█████▎ | 5294/10000 [20:48:17<18:13:54, 13.95s/it] {'loss': 0.02, 'learning_rate': 2.357e-05, 'epoch': 6.93} 53%|█████▎ | 5294/10000 [20:48:17<18:13:54, 13.95s/it] 53%|█████▎ | 5295/10000 [20:48:31<18:14:41, 13.96s/it] {'loss': 0.0222, 'learning_rate': 2.3565000000000002e-05, 'epoch': 6.93} 53%|█████▎ | 5295/10000 [20:48:31<18:14:41, 13.96s/it] 53%|█████▎ | 5296/10000 [20:48:45<18:16:37, 13.99s/it] {'loss': 0.0212, 'learning_rate': 2.356e-05, 'epoch': 6.93} 53%|█████▎ | 5296/10000 [20:48:45<18:16:37, 13.99s/it] 53%|█████▎ | 5297/10000 [20:48:59<18:17:03, 14.00s/it] {'loss': 0.0204, 'learning_rate': 2.3555e-05, 'epoch': 6.93} 53%|█████▎ | 5297/10000 [20:48:59<18:17:03, 14.00s/it] 53%|█████▎ | 5298/10000 [20:49:13<18:13:41, 13.96s/it] {'loss': 0.0228, 'learning_rate': 2.355e-05, 'epoch': 6.93} 53%|█████▎ | 5298/10000 [20:49:13<18:13:41, 13.96s/it] 53%|█████▎ | 5299/10000 [20:49:27<18:11:25, 13.93s/it] {'loss': 0.0222, 'learning_rate': 2.3545e-05, 'epoch': 6.94} 53%|█████▎ | 5299/10000 [20:49:27<18:11:25, 13.93s/it] 53%|█████▎ | 5300/10000 [20:49:40<18:09:57, 13.91s/it] {'loss': 0.0232, 'learning_rate': 2.354e-05, 'epoch': 6.94} 53%|█████▎ | 5300/10000 [20:49:40<18:09:57, 13.91s/it] 53%|█████▎ | 5301/10000 [20:49:54<18:10:35, 13.93s/it] {'loss': 0.0255, 'learning_rate': 2.3535e-05, 'epoch': 6.94} 53%|█████▎ | 5301/10000 [20:49:54<18:10:35, 13.93s/it] 53%|█████▎ | 5302/10000 [20:50:08<18:12:09, 13.95s/it] {'loss': 0.0187, 'learning_rate': 2.3530000000000003e-05, 'epoch': 6.94} 53%|█████▎ | 5302/10000 [20:50:08<18:12:09, 13.95s/it] 53%|█████▎ | 5303/10000 [20:50:22<18:10:43, 13.93s/it] {'loss': 0.026, 'learning_rate': 2.3525e-05, 'epoch': 6.94} 53%|█████▎ | 5303/10000 [20:50:22<18:10:43, 13.93s/it] 53%|█████▎ | 5304/10000 [20:50:36<18:11:52, 13.95s/it] {'loss': 0.0215, 'learning_rate': 2.3520000000000002e-05, 'epoch': 6.94} 53%|█████▎ | 5304/10000 [20:50:36<18:11:52, 13.95s/it] 53%|█████▎ | 5305/10000 [20:50:50<18:12:36, 13.96s/it] {'loss': 0.024, 'learning_rate': 2.3515e-05, 'epoch': 6.94} 53%|█████▎ | 5305/10000 [20:50:50<18:12:36, 13.96s/it] 53%|█████▎ | 5306/10000 [20:51:04<18:09:07, 13.92s/it] {'loss': 0.0188, 'learning_rate': 2.351e-05, 'epoch': 6.95} 53%|█████▎ | 5306/10000 [20:51:04<18:09:07, 13.92s/it] 53%|█████▎ | 5307/10000 [20:51:18<18:08:26, 13.92s/it] {'loss': 0.0233, 'learning_rate': 2.3505000000000003e-05, 'epoch': 6.95} 53%|█████▎ | 5307/10000 [20:51:18<18:08:26, 13.92s/it] 53%|█████▎ | 5308/10000 [20:51:32<18:10:07, 13.94s/it] {'loss': 0.0239, 'learning_rate': 2.35e-05, 'epoch': 6.95} 53%|█████▎ | 5308/10000 [20:51:32<18:10:07, 13.94s/it] 53%|█████▎ | 5309/10000 [20:51:46<18:10:54, 13.95s/it] {'loss': 0.0245, 'learning_rate': 2.3495e-05, 'epoch': 6.95} 53%|█████▎ | 5309/10000 [20:51:46<18:10:54, 13.95s/it] 53%|█████▎ | 5310/10000 [20:52:00<18:11:50, 13.97s/it] {'loss': 0.0283, 'learning_rate': 2.349e-05, 'epoch': 6.95} 53%|█████▎ | 5310/10000 [20:52:00<18:11:50, 13.97s/it] 53%|█████▎ | 5311/10000 [20:52:14<18:15:17, 14.02s/it] {'loss': 0.0179, 'learning_rate': 2.3485000000000003e-05, 'epoch': 6.95} 53%|█████▎ | 5311/10000 [20:52:14<18:15:17, 14.02s/it] 53%|█████▎ | 5312/10000 [20:52:28<18:13:39, 14.00s/it] {'loss': 0.0194, 'learning_rate': 2.3480000000000002e-05, 'epoch': 6.95} 53%|█████▎ | 5312/10000 [20:52:28<18:13:39, 14.00s/it] 53%|█████▎ | 5313/10000 [20:52:42<18:11:00, 13.97s/it] {'loss': 0.0214, 'learning_rate': 2.3475e-05, 'epoch': 6.95} 53%|█████▎ | 5313/10000 [20:52:42<18:11:00, 13.97s/it] 53%|█████▎ | 5314/10000 [20:52:56<18:10:14, 13.96s/it] {'loss': 0.0176, 'learning_rate': 2.347e-05, 'epoch': 6.96} 53%|█████▎ | 5314/10000 [20:52:56<18:10:14, 13.96s/it] 53%|█████▎ | 5315/10000 [20:53:10<18:08:57, 13.95s/it] {'loss': 0.0192, 'learning_rate': 2.3465e-05, 'epoch': 6.96} 53%|█████▎ | 5315/10000 [20:53:10<18:08:57, 13.95s/it] 53%|█████▎ | 5316/10000 [20:53:24<18:09:20, 13.95s/it] {'loss': 0.0197, 'learning_rate': 2.3460000000000002e-05, 'epoch': 6.96} 53%|█████▎ | 5316/10000 [20:53:24<18:09:20, 13.95s/it] 53%|█████▎ | 5317/10000 [20:53:38<18:06:11, 13.92s/it] {'loss': 0.024, 'learning_rate': 2.3455e-05, 'epoch': 6.96} 53%|█████▎ | 5317/10000 [20:53:38<18:06:11, 13.92s/it] 53%|█████▎ | 5318/10000 [20:53:51<18:01:40, 13.86s/it] {'loss': 0.0185, 'learning_rate': 2.345e-05, 'epoch': 6.96} 53%|█████▎ | 5318/10000 [20:53:51<18:01:40, 13.86s/it] 53%|█████▎ | 5319/10000 [20:54:05<18:01:13, 13.86s/it] {'loss': 0.0209, 'learning_rate': 2.3445e-05, 'epoch': 6.96} 53%|█████▎ | 5319/10000 [20:54:05<18:01:13, 13.86s/it] 53%|█████▎ | 5320/10000 [20:54:19<18:00:47, 13.86s/it] {'loss': 0.0202, 'learning_rate': 2.344e-05, 'epoch': 6.96} 53%|█████▎ | 5320/10000 [20:54:19<18:00:47, 13.86s/it] 53%|█████▎ | 5321/10000 [20:54:33<18:01:09, 13.86s/it] {'loss': 0.0223, 'learning_rate': 2.3435000000000002e-05, 'epoch': 6.96} 53%|█████▎ | 5321/10000 [20:54:33<18:01:09, 13.86s/it] 53%|█████▎ | 5322/10000 [20:54:47<18:01:12, 13.87s/it] {'loss': 0.0236, 'learning_rate': 2.343e-05, 'epoch': 6.97} 53%|█████▎ | 5322/10000 [20:54:47<18:01:12, 13.87s/it] 53%|█████▎ | 5323/10000 [20:55:01<18:03:53, 13.90s/it] {'loss': 0.0262, 'learning_rate': 2.3425000000000004e-05, 'epoch': 6.97} 53%|█████▎ | 5323/10000 [20:55:01<18:03:53, 13.90s/it] 53%|█████▎ | 5324/10000 [20:55:15<17:59:53, 13.86s/it] {'loss': 0.0278, 'learning_rate': 2.342e-05, 'epoch': 6.97} 53%|█████▎ | 5324/10000 [20:55:15<17:59:53, 13.86s/it] 53%|█████▎ | 5325/10000 [20:55:28<17:59:55, 13.86s/it] {'loss': 0.0221, 'learning_rate': 2.3415000000000002e-05, 'epoch': 6.97} 53%|█████▎ | 5325/10000 [20:55:28<17:59:55, 13.86s/it] 53%|█████▎ | 5326/10000 [20:55:42<18:01:34, 13.88s/it] {'loss': 0.0261, 'learning_rate': 2.341e-05, 'epoch': 6.97} 53%|█████▎ | 5326/10000 [20:55:42<18:01:34, 13.88s/it] 53%|█████▎ | 5327/10000 [20:55:56<17:58:59, 13.85s/it] {'loss': 0.0243, 'learning_rate': 2.3405e-05, 'epoch': 6.97} 53%|█████▎ | 5327/10000 [20:55:56<17:58:59, 13.85s/it] 53%|█████▎ | 5328/10000 [20:56:10<18:00:31, 13.88s/it] {'loss': 0.0249, 'learning_rate': 2.3400000000000003e-05, 'epoch': 6.97} 53%|█████▎ | 5328/10000 [20:56:10<18:00:31, 13.88s/it] 53%|█████▎ | 5329/10000 [20:56:24<17:58:34, 13.85s/it] {'loss': 0.022, 'learning_rate': 2.3395e-05, 'epoch': 6.98} 53%|█████▎ | 5329/10000 [20:56:24<17:58:34, 13.85s/it] 53%|█████▎ | 5330/10000 [20:56:38<17:56:07, 13.83s/it] {'loss': 0.0208, 'learning_rate': 2.339e-05, 'epoch': 6.98} 53%|█████▎ | 5330/10000 [20:56:38<17:56:07, 13.83s/it] 53%|█████▎ | 5331/10000 [20:56:51<17:57:26, 13.85s/it] {'loss': 0.0193, 'learning_rate': 2.3385e-05, 'epoch': 6.98} 53%|█████▎ | 5331/10000 [20:56:52<17:57:26, 13.85s/it] 53%|█████▎ | 5332/10000 [20:57:05<17:57:27, 13.85s/it] {'loss': 0.0249, 'learning_rate': 2.3380000000000003e-05, 'epoch': 6.98} 53%|█████▎ | 5332/10000 [20:57:05<17:57:27, 13.85s/it] 53%|█████▎ | 5333/10000 [20:57:19<17:57:25, 13.85s/it] {'loss': 0.0163, 'learning_rate': 2.3375000000000002e-05, 'epoch': 6.98} 53%|█████▎ | 5333/10000 [20:57:19<17:57:25, 13.85s/it] 53%|█████▎ | 5334/10000 [20:57:33<17:58:16, 13.87s/it] {'loss': 0.0262, 'learning_rate': 2.337e-05, 'epoch': 6.98} 53%|█████▎ | 5334/10000 [20:57:33<17:58:16, 13.87s/it] 53%|█████▎ | 5335/10000 [20:57:47<17:58:39, 13.87s/it] {'loss': 0.0278, 'learning_rate': 2.3365e-05, 'epoch': 6.98} 53%|█████▎ | 5335/10000 [20:57:47<17:58:39, 13.87s/it] 53%|█████▎ | 5336/10000 [20:58:01<17:58:28, 13.87s/it] {'loss': 0.0204, 'learning_rate': 2.336e-05, 'epoch': 6.98} 53%|█████▎ | 5336/10000 [20:58:01<17:58:28, 13.87s/it] 53%|█████▎ | 5337/10000 [20:58:15<17:54:51, 13.83s/it] {'loss': 0.0262, 'learning_rate': 2.3355000000000003e-05, 'epoch': 6.99} 53%|█████▎ | 5337/10000 [20:58:15<17:54:51, 13.83s/it] 53%|█████▎ | 5338/10000 [20:58:28<17:54:53, 13.83s/it] {'loss': 0.0283, 'learning_rate': 2.3350000000000002e-05, 'epoch': 6.99} 53%|█████▎ | 5338/10000 [20:58:28<17:54:53, 13.83s/it] 53%|█████▎ | 5339/10000 [20:58:42<17:57:27, 13.87s/it] {'loss': 0.0177, 'learning_rate': 2.3345e-05, 'epoch': 6.99} 53%|█████▎ | 5339/10000 [20:58:42<17:57:27, 13.87s/it] 53%|█████▎ | 5340/10000 [20:58:56<17:55:35, 13.85s/it] {'loss': 0.0237, 'learning_rate': 2.334e-05, 'epoch': 6.99} 53%|█████▎ | 5340/10000 [20:58:56<17:55:35, 13.85s/it] 53%|█████▎ | 5341/10000 [20:59:10<17:53:12, 13.82s/it] {'loss': 0.018, 'learning_rate': 2.3335000000000003e-05, 'epoch': 6.99} 53%|█████▎ | 5341/10000 [20:59:10<17:53:12, 13.82s/it] 53%|█████▎ | 5342/10000 [20:59:24<17:50:51, 13.79s/it] {'loss': 0.0208, 'learning_rate': 2.3330000000000002e-05, 'epoch': 6.99} 53%|█████▎ | 5342/10000 [20:59:24<17:50:51, 13.79s/it] 53%|█████▎ | 5343/10000 [20:59:37<17:50:35, 13.79s/it] {'loss': 0.0189, 'learning_rate': 2.3325e-05, 'epoch': 6.99} 53%|█████▎ | 5343/10000 [20:59:37<17:50:35, 13.79s/it] 53%|█████▎ | 5344/10000 [20:59:51<17:50:40, 13.80s/it] {'loss': 0.0216, 'learning_rate': 2.332e-05, 'epoch': 6.99} 53%|█████▎ | 5344/10000 [20:59:51<17:50:40, 13.80s/it] 53%|█████▎ | 5345/10000 [21:00:05<17:56:00, 13.87s/it] {'loss': 0.0238, 'learning_rate': 2.3315e-05, 'epoch': 7.0} 53%|█████▎ | 5345/10000 [21:00:05<17:56:00, 13.87s/it] 53%|█████▎ | 5346/10000 [21:00:19<17:53:16, 13.84s/it] {'loss': 0.0206, 'learning_rate': 2.3310000000000002e-05, 'epoch': 7.0} 53%|█████▎ | 5346/10000 [21:00:19<17:53:16, 13.84s/it] 53%|█████▎ | 5347/10000 [21:00:33<17:55:00, 13.86s/it] {'loss': 0.0232, 'learning_rate': 2.3305e-05, 'epoch': 7.0} 53%|█████▎ | 5347/10000 [21:00:33<17:55:00, 13.86s/it] 53%|█████▎ | 5348/10000 [21:00:45<17:22:03, 13.44s/it] {'loss': 0.0262, 'learning_rate': 2.3300000000000004e-05, 'epoch': 7.0} 53%|█████▎ | 5348/10000 [21:00:45<17:22:03, 13.44s/it] 53%|█████▎ | 5349/10000 [21:00:59<17:30:47, 13.56s/it] {'loss': 0.0137, 'learning_rate': 2.3295e-05, 'epoch': 7.0} 53%|█████▎ | 5349/10000 [21:00:59<17:30:47, 13.56s/it] 54%|█████▎ | 5350/10000 [21:01:13<17:32:55, 13.59s/it] {'loss': 0.0131, 'learning_rate': 2.3290000000000002e-05, 'epoch': 7.0} 54%|█████▎ | 5350/10000 [21:01:13<17:32:55, 13.59s/it] 54%|█████▎ | 5351/10000 [21:01:27<17:41:42, 13.70s/it] {'loss': 0.0137, 'learning_rate': 2.3285e-05, 'epoch': 7.0} 54%|█████▎ | 5351/10000 [21:01:27<17:41:42, 13.70s/it] 54%|█████▎ | 5352/10000 [21:01:41<17:44:51, 13.75s/it] {'loss': 0.0132, 'learning_rate': 2.328e-05, 'epoch': 7.01} 54%|█████▎ | 5352/10000 [21:01:41<17:44:51, 13.75s/it] 54%|█████▎ | 5353/10000 [21:01:55<17:46:28, 13.77s/it] {'loss': 0.0141, 'learning_rate': 2.3275000000000003e-05, 'epoch': 7.01} 54%|█████▎ | 5353/10000 [21:01:55<17:46:28, 13.77s/it] 54%|█████▎ | 5354/10000 [21:02:08<17:48:13, 13.80s/it] {'loss': 0.0107, 'learning_rate': 2.327e-05, 'epoch': 7.01} 54%|█████▎ | 5354/10000 [21:02:08<17:48:13, 13.80s/it] 54%|█████▎ | 5355/10000 [21:02:22<17:50:12, 13.82s/it] {'loss': 0.0102, 'learning_rate': 2.3265000000000002e-05, 'epoch': 7.01} 54%|█████▎ | 5355/10000 [21:02:22<17:50:12, 13.82s/it] 54%|█████▎ | 5356/10000 [21:02:36<17:50:57, 13.84s/it] {'loss': 0.0114, 'learning_rate': 2.326e-05, 'epoch': 7.01} 54%|█████▎ | 5356/10000 [21:02:36<17:50:57, 13.84s/it] 54%|█████▎ | 5357/10000 [21:02:50<17:52:26, 13.86s/it] {'loss': 0.0115, 'learning_rate': 2.3255e-05, 'epoch': 7.01} 54%|█████▎ | 5357/10000 [21:02:50<17:52:26, 13.86s/it] 54%|█████▎ | 5358/10000 [21:03:04<17:50:30, 13.84s/it] {'loss': 0.0104, 'learning_rate': 2.3250000000000003e-05, 'epoch': 7.01} 54%|█████▎ | 5358/10000 [21:03:04<17:50:30, 13.84s/it] 54%|█████▎ | 5359/10000 [21:03:18<17:52:49, 13.87s/it] {'loss': 0.0104, 'learning_rate': 2.3245e-05, 'epoch': 7.01} 54%|█████▎ | 5359/10000 [21:03:18<17:52:49, 13.87s/it] 54%|█████▎ | 5360/10000 [21:03:32<17:54:36, 13.90s/it] {'loss': 0.0144, 'learning_rate': 2.324e-05, 'epoch': 7.02} 54%|█████▎ | 5360/10000 [21:03:32<17:54:36, 13.90s/it] 54%|█████▎ | 5361/10000 [21:03:46<17:53:34, 13.89s/it] {'loss': 0.0151, 'learning_rate': 2.3235e-05, 'epoch': 7.02} 54%|█████▎ | 5361/10000 [21:03:46<17:53:34, 13.89s/it] 54%|█████▎ | 5362/10000 [21:03:59<17:49:21, 13.83s/it] {'loss': 0.0117, 'learning_rate': 2.3230000000000003e-05, 'epoch': 7.02} 54%|█████▎ | 5362/10000 [21:03:59<17:49:21, 13.83s/it] 54%|█████▎ | 5363/10000 [21:04:13<17:49:06, 13.83s/it] {'loss': 0.0166, 'learning_rate': 2.3225000000000002e-05, 'epoch': 7.02} 54%|█████▎ | 5363/10000 [21:04:13<17:49:06, 13.83s/it] 54%|█████▎ | 5364/10000 [21:04:27<17:45:24, 13.79s/it] {'loss': 0.0121, 'learning_rate': 2.322e-05, 'epoch': 7.02} 54%|█████▎ | 5364/10000 [21:04:27<17:45:24, 13.79s/it] 54%|█████▎ | 5365/10000 [21:04:41<17:49:31, 13.84s/it] {'loss': 0.0132, 'learning_rate': 2.3215e-05, 'epoch': 7.02} 54%|█████▎ | 5365/10000 [21:04:41<17:49:31, 13.84s/it] 54%|█████▎ | 5366/10000 [21:04:55<17:50:11, 13.86s/it] {'loss': 0.0125, 'learning_rate': 2.321e-05, 'epoch': 7.02} 54%|█████▎ | 5366/10000 [21:04:55<17:50:11, 13.86s/it] 54%|█████▎ | 5367/10000 [21:05:09<17:51:32, 13.88s/it] {'loss': 0.0141, 'learning_rate': 2.3205000000000002e-05, 'epoch': 7.02} 54%|█████▎ | 5367/10000 [21:05:09<17:51:32, 13.88s/it] 54%|█████▎ | 5368/10000 [21:05:22<17:48:01, 13.83s/it] {'loss': 0.0103, 'learning_rate': 2.32e-05, 'epoch': 7.03} 54%|█████▎ | 5368/10000 [21:05:22<17:48:01, 13.83s/it] 54%|█████▎ | 5369/10000 [21:05:36<17:46:59, 13.82s/it] {'loss': 0.0121, 'learning_rate': 2.3195e-05, 'epoch': 7.03} 54%|█████▎ | 5369/10000 [21:05:36<17:46:59, 13.82s/it] 54%|█████▎ | 5370/10000 [21:05:50<17:47:14, 13.83s/it] {'loss': 0.0133, 'learning_rate': 2.319e-05, 'epoch': 7.03} 54%|█████▎ | 5370/10000 [21:05:50<17:47:14, 13.83s/it] 54%|█████▎ | 5371/10000 [21:06:04<17:44:48, 13.80s/it] {'loss': 0.0142, 'learning_rate': 2.3185000000000002e-05, 'epoch': 7.03} 54%|█████▎ | 5371/10000 [21:06:04<17:44:48, 13.80s/it] 54%|█████▎ | 5372/10000 [21:06:18<17:44:45, 13.80s/it] {'loss': 0.0134, 'learning_rate': 2.318e-05, 'epoch': 7.03} 54%|█████▎ | 5372/10000 [21:06:18<17:44:45, 13.80s/it] 54%|█████▎ | 5373/10000 [21:06:31<17:46:20, 13.83s/it] {'loss': 0.0109, 'learning_rate': 2.3175e-05, 'epoch': 7.03} 54%|█████▎ | 5373/10000 [21:06:31<17:46:20, 13.83s/it] 54%|█████▎ | 5374/10000 [21:06:45<17:43:21, 13.79s/it] {'loss': 0.0132, 'learning_rate': 2.317e-05, 'epoch': 7.03} 54%|█████▎ | 5374/10000 [21:06:45<17:43:21, 13.79s/it] 54%|█████▍ | 5375/10000 [21:06:59<17:47:44, 13.85s/it] {'loss': 0.0107, 'learning_rate': 2.3165e-05, 'epoch': 7.04} 54%|█████▍ | 5375/10000 [21:06:59<17:47:44, 13.85s/it] 54%|█████▍ | 5376/10000 [21:07:13<17:47:36, 13.85s/it] {'loss': 0.0117, 'learning_rate': 2.3160000000000002e-05, 'epoch': 7.04} 54%|█████▍ | 5376/10000 [21:07:13<17:47:36, 13.85s/it] 54%|█████▍ | 5377/10000 [21:07:27<17:47:04, 13.85s/it] {'loss': 0.0133, 'learning_rate': 2.3155e-05, 'epoch': 7.04} 54%|█████▍ | 5377/10000 [21:07:27<17:47:04, 13.85s/it] 54%|█████▍ | 5378/10000 [21:07:41<17:46:15, 13.84s/it] {'loss': 0.012, 'learning_rate': 2.3150000000000004e-05, 'epoch': 7.04} 54%|█████▍ | 5378/10000 [21:07:41<17:46:15, 13.84s/it] 54%|█████▍ | 5379/10000 [21:07:55<17:47:05, 13.86s/it] {'loss': 0.0092, 'learning_rate': 2.3145e-05, 'epoch': 7.04} 54%|█████▍ | 5379/10000 [21:07:55<17:47:05, 13.86s/it] 54%|█████▍ | 5380/10000 [21:08:09<17:48:51, 13.88s/it] {'loss': 0.0134, 'learning_rate': 2.3140000000000002e-05, 'epoch': 7.04} 54%|█████▍ | 5380/10000 [21:08:09<17:48:51, 13.88s/it] 54%|█████▍ | 5381/10000 [21:08:22<17:47:42, 13.87s/it] {'loss': 0.0111, 'learning_rate': 2.3135e-05, 'epoch': 7.04} 54%|█████▍ | 5381/10000 [21:08:22<17:47:42, 13.87s/it] 54%|█████▍ | 5382/10000 [21:08:36<17:46:16, 13.85s/it] {'loss': 0.0126, 'learning_rate': 2.313e-05, 'epoch': 7.04} 54%|█████▍ | 5382/10000 [21:08:36<17:46:16, 13.85s/it] 54%|█████▍ | 5383/10000 [21:08:50<17:46:49, 13.86s/it] {'loss': 0.0105, 'learning_rate': 2.3125000000000003e-05, 'epoch': 7.05} 54%|█████▍ | 5383/10000 [21:08:50<17:46:49, 13.86s/it] 54%|█████▍ | 5384/10000 [21:09:04<17:44:34, 13.84s/it] {'loss': 0.0136, 'learning_rate': 2.312e-05, 'epoch': 7.05} 54%|█████▍ | 5384/10000 [21:09:04<17:44:34, 13.84s/it] 54%|█████▍ | 5385/10000 [21:09:18<17:46:48, 13.87s/it] {'loss': 0.0125, 'learning_rate': 2.3115e-05, 'epoch': 7.05} 54%|█████▍ | 5385/10000 [21:09:18<17:46:48, 13.87s/it] 54%|█████▍ | 5386/10000 [21:09:32<17:46:53, 13.87s/it] {'loss': 0.0112, 'learning_rate': 2.311e-05, 'epoch': 7.05} 54%|█████▍ | 5386/10000 [21:09:32<17:46:53, 13.87s/it] 54%|█████▍ | 5387/10000 [21:09:46<17:48:00, 13.89s/it] {'loss': 0.0084, 'learning_rate': 2.3105000000000003e-05, 'epoch': 7.05} 54%|█████▍ | 5387/10000 [21:09:46<17:48:00, 13.89s/it] 54%|█████▍ | 5388/10000 [21:09:59<17:46:12, 13.87s/it] {'loss': 0.0132, 'learning_rate': 2.3100000000000002e-05, 'epoch': 7.05} 54%|█████▍ | 5388/10000 [21:09:59<17:46:12, 13.87s/it] 54%|█████▍ | 5389/10000 [21:10:13<17:46:19, 13.88s/it] {'loss': 0.0146, 'learning_rate': 2.3095e-05, 'epoch': 7.05} 54%|█████▍ | 5389/10000 [21:10:13<17:46:19, 13.88s/it] 54%|█████▍ | 5390/10000 [21:10:27<17:44:02, 13.85s/it] {'loss': 0.0138, 'learning_rate': 2.309e-05, 'epoch': 7.05} 54%|█████▍ | 5390/10000 [21:10:27<17:44:02, 13.85s/it] 54%|█████▍ | 5391/10000 [21:10:41<17:43:37, 13.85s/it] {'loss': 0.0118, 'learning_rate': 2.3085e-05, 'epoch': 7.06} 54%|█████▍ | 5391/10000 [21:10:41<17:43:37, 13.85s/it] 54%|█████▍ | 5392/10000 [21:10:55<17:45:29, 13.87s/it] {'loss': 0.0113, 'learning_rate': 2.3080000000000003e-05, 'epoch': 7.06} 54%|█████▍ | 5392/10000 [21:10:55<17:45:29, 13.87s/it] 54%|█████▍ | 5393/10000 [21:11:09<17:45:54, 13.88s/it] {'loss': 0.0148, 'learning_rate': 2.3075000000000002e-05, 'epoch': 7.06} 54%|█████▍ | 5393/10000 [21:11:09<17:45:54, 13.88s/it] 54%|█████▍ | 5394/10000 [21:11:23<17:48:45, 13.92s/it] {'loss': 0.0139, 'learning_rate': 2.307e-05, 'epoch': 7.06} 54%|█████▍ | 5394/10000 [21:11:23<17:48:45, 13.92s/it] 54%|█████▍ | 5395/10000 [21:11:37<17:46:01, 13.89s/it] {'loss': 0.0122, 'learning_rate': 2.3065e-05, 'epoch': 7.06} 54%|█████▍ | 5395/10000 [21:11:37<17:46:01, 13.89s/it] 54%|█████▍ | 5396/10000 [21:11:50<17:43:29, 13.86s/it] {'loss': 0.014, 'learning_rate': 2.306e-05, 'epoch': 7.06} 54%|█████▍ | 5396/10000 [21:11:50<17:43:29, 13.86s/it] 54%|█████▍ | 5397/10000 [21:12:04<17:41:11, 13.83s/it] {'loss': 0.0149, 'learning_rate': 2.3055000000000002e-05, 'epoch': 7.06} 54%|█████▍ | 5397/10000 [21:12:04<17:41:11, 13.83s/it] 54%|█████▍ | 5398/10000 [21:12:18<17:41:26, 13.84s/it] {'loss': 0.0127, 'learning_rate': 2.305e-05, 'epoch': 7.07} 54%|█████▍ | 5398/10000 [21:12:18<17:41:26, 13.84s/it] 54%|█████▍ | 5399/10000 [21:12:32<17:39:30, 13.82s/it] {'loss': 0.0151, 'learning_rate': 2.3045e-05, 'epoch': 7.07} 54%|█████▍ | 5399/10000 [21:12:32<17:39:30, 13.82s/it] 54%|█████▍ | 5400/10000 [21:12:46<17:41:19, 13.84s/it] {'loss': 0.0119, 'learning_rate': 2.304e-05, 'epoch': 7.07} 54%|█████▍ | 5400/10000 [21:12:46<17:41:19, 13.84s/it] 54%|█████▍ | 5401/10000 [21:13:00<17:41:49, 13.85s/it] {'loss': 0.0108, 'learning_rate': 2.3035000000000002e-05, 'epoch': 7.07} 54%|█████▍ | 5401/10000 [21:13:00<17:41:49, 13.85s/it] 54%|█████▍ | 5402/10000 [21:13:13<17:40:35, 13.84s/it] {'loss': 0.0126, 'learning_rate': 2.303e-05, 'epoch': 7.07} 54%|█████▍ | 5402/10000 [21:13:13<17:40:35, 13.84s/it] 54%|█████▍ | 5403/10000 [21:13:27<17:42:29, 13.87s/it] {'loss': 0.0125, 'learning_rate': 2.3025e-05, 'epoch': 7.07} 54%|█████▍ | 5403/10000 [21:13:27<17:42:29, 13.87s/it] 54%|█████▍ | 5404/10000 [21:13:41<17:42:04, 13.87s/it] {'loss': 0.0114, 'learning_rate': 2.302e-05, 'epoch': 7.07} 54%|█████▍ | 5404/10000 [21:13:41<17:42:04, 13.87s/it] 54%|█████▍ | 5405/10000 [21:13:55<17:41:35, 13.86s/it] {'loss': 0.0158, 'learning_rate': 2.3015e-05, 'epoch': 7.07} 54%|█████▍ | 5405/10000 [21:13:55<17:41:35, 13.86s/it] 54%|█████▍ | 5406/10000 [21:14:09<17:38:27, 13.82s/it] {'loss': 0.0168, 'learning_rate': 2.301e-05, 'epoch': 7.08} 54%|█████▍ | 5406/10000 [21:14:09<17:38:27, 13.82s/it] 54%|█████▍ | 5407/10000 [21:14:23<17:41:19, 13.86s/it] {'loss': 0.0148, 'learning_rate': 2.3005e-05, 'epoch': 7.08} 54%|█████▍ | 5407/10000 [21:14:23<17:41:19, 13.86s/it] 54%|█████▍ | 5408/10000 [21:14:37<17:41:08, 13.87s/it] {'loss': 0.0113, 'learning_rate': 2.3000000000000003e-05, 'epoch': 7.08} 54%|█████▍ | 5408/10000 [21:14:37<17:41:08, 13.87s/it] 54%|█████▍ | 5409/10000 [21:14:51<17:42:16, 13.88s/it] {'loss': 0.0107, 'learning_rate': 2.2995e-05, 'epoch': 7.08} 54%|█████▍ | 5409/10000 [21:14:51<17:42:16, 13.88s/it] 54%|█████▍ | 5410/10000 [21:15:04<17:42:22, 13.89s/it] {'loss': 0.0123, 'learning_rate': 2.2990000000000002e-05, 'epoch': 7.08} 54%|█████▍ | 5410/10000 [21:15:04<17:42:22, 13.89s/it] 54%|█████▍ | 5411/10000 [21:15:18<17:41:29, 13.88s/it] {'loss': 0.0112, 'learning_rate': 2.2985e-05, 'epoch': 7.08} 54%|█████▍ | 5411/10000 [21:15:18<17:41:29, 13.88s/it] 54%|█████▍ | 5412/10000 [21:15:32<17:39:22, 13.85s/it] {'loss': 0.0109, 'learning_rate': 2.298e-05, 'epoch': 7.08} 54%|█████▍ | 5412/10000 [21:15:32<17:39:22, 13.85s/it] 54%|█████▍ | 5413/10000 [21:15:46<17:41:11, 13.88s/it] {'loss': 0.0122, 'learning_rate': 2.2975000000000003e-05, 'epoch': 7.09} 54%|█████▍ | 5413/10000 [21:15:46<17:41:11, 13.88s/it] 54%|█████▍ | 5414/10000 [21:16:00<17:37:40, 13.84s/it] {'loss': 0.0085, 'learning_rate': 2.297e-05, 'epoch': 7.09} 54%|█████▍ | 5414/10000 [21:16:00<17:37:40, 13.84s/it] 54%|█████▍ | 5415/10000 [21:16:14<17:38:20, 13.85s/it] {'loss': 0.0128, 'learning_rate': 2.2965e-05, 'epoch': 7.09} 54%|█████▍ | 5415/10000 [21:16:14<17:38:20, 13.85s/it] 54%|█████▍ | 5416/10000 [21:16:27<17:38:24, 13.85s/it] {'loss': 0.0141, 'learning_rate': 2.296e-05, 'epoch': 7.09} 54%|█████▍ | 5416/10000 [21:16:28<17:38:24, 13.85s/it] 54%|█████▍ | 5417/10000 [21:16:41<17:39:07, 13.87s/it] {'loss': 0.0122, 'learning_rate': 2.2955000000000003e-05, 'epoch': 7.09} 54%|█████▍ | 5417/10000 [21:16:41<17:39:07, 13.87s/it] 54%|█████▍ | 5418/10000 [21:16:55<17:38:19, 13.86s/it] {'loss': 0.011, 'learning_rate': 2.2950000000000002e-05, 'epoch': 7.09} 54%|█████▍ | 5418/10000 [21:16:55<17:38:19, 13.86s/it] 54%|█████▍ | 5419/10000 [21:17:09<17:41:04, 13.90s/it] {'loss': 0.0109, 'learning_rate': 2.2945e-05, 'epoch': 7.09} 54%|█████▍ | 5419/10000 [21:17:09<17:41:04, 13.90s/it] 54%|█████▍ | 5420/10000 [21:17:23<17:38:44, 13.87s/it] {'loss': 0.0121, 'learning_rate': 2.294e-05, 'epoch': 7.09} 54%|█████▍ | 5420/10000 [21:17:23<17:38:44, 13.87s/it] 54%|█████▍ | 5421/10000 [21:17:37<17:39:19, 13.88s/it] {'loss': 0.0109, 'learning_rate': 2.2935e-05, 'epoch': 7.1} 54%|█████▍ | 5421/10000 [21:17:37<17:39:19, 13.88s/it] 54%|█████▍ | 5422/10000 [21:17:51<17:40:47, 13.90s/it] {'loss': 0.012, 'learning_rate': 2.2930000000000002e-05, 'epoch': 7.1} 54%|█████▍ | 5422/10000 [21:17:51<17:40:47, 13.90s/it] 54%|█████▍ | 5423/10000 [21:18:05<17:44:07, 13.95s/it] {'loss': 0.0148, 'learning_rate': 2.2925e-05, 'epoch': 7.1} 54%|█████▍ | 5423/10000 [21:18:05<17:44:07, 13.95s/it] 54%|█████▍ | 5424/10000 [21:18:19<17:45:30, 13.97s/it] {'loss': 0.0136, 'learning_rate': 2.292e-05, 'epoch': 7.1} 54%|█████▍ | 5424/10000 [21:18:19<17:45:30, 13.97s/it] 54%|█████▍ | 5425/10000 [21:18:33<17:44:10, 13.96s/it] {'loss': 0.0122, 'learning_rate': 2.2915e-05, 'epoch': 7.1} 54%|█████▍ | 5425/10000 [21:18:33<17:44:10, 13.96s/it] 54%|█████▍ | 5426/10000 [21:18:47<17:44:37, 13.97s/it] {'loss': 0.0158, 'learning_rate': 2.2910000000000003e-05, 'epoch': 7.1} 54%|█████▍ | 5426/10000 [21:18:47<17:44:37, 13.97s/it] 54%|█████▍ | 5427/10000 [21:19:01<17:44:44, 13.97s/it] {'loss': 0.0142, 'learning_rate': 2.2905000000000002e-05, 'epoch': 7.1} 54%|█████▍ | 5427/10000 [21:19:01<17:44:44, 13.97s/it] 54%|█████▍ | 5428/10000 [21:19:15<17:42:16, 13.94s/it] {'loss': 0.0137, 'learning_rate': 2.29e-05, 'epoch': 7.1} 54%|█████▍ | 5428/10000 [21:19:15<17:42:16, 13.94s/it] 54%|█████▍ | 5429/10000 [21:19:29<17:42:55, 13.95s/it] {'loss': 0.0119, 'learning_rate': 2.2895e-05, 'epoch': 7.11} 54%|█████▍ | 5429/10000 [21:19:29<17:42:55, 13.95s/it] 54%|█████▍ | 5430/10000 [21:19:43<17:39:31, 13.91s/it] {'loss': 0.0122, 'learning_rate': 2.289e-05, 'epoch': 7.11} 54%|█████▍ | 5430/10000 [21:19:43<17:39:31, 13.91s/it] 54%|█████▍ | 5431/10000 [21:19:56<17:39:28, 13.91s/it] {'loss': 0.0109, 'learning_rate': 2.2885000000000002e-05, 'epoch': 7.11} 54%|█████▍ | 5431/10000 [21:19:56<17:39:28, 13.91s/it] 54%|█████▍ | 5432/10000 [21:20:10<17:40:41, 13.93s/it] {'loss': 0.0134, 'learning_rate': 2.288e-05, 'epoch': 7.11} 54%|█████▍ | 5432/10000 [21:20:10<17:40:41, 13.93s/it] 54%|█████▍ | 5433/10000 [21:20:24<17:42:56, 13.96s/it] {'loss': 0.0129, 'learning_rate': 2.2875e-05, 'epoch': 7.11} 54%|█████▍ | 5433/10000 [21:20:24<17:42:56, 13.96s/it] 54%|█████▍ | 5434/10000 [21:20:38<17:40:55, 13.94s/it] {'loss': 0.0161, 'learning_rate': 2.287e-05, 'epoch': 7.11} 54%|█████▍ | 5434/10000 [21:20:38<17:40:55, 13.94s/it] 54%|█████▍ | 5435/10000 [21:20:52<17:39:47, 13.93s/it] {'loss': 0.0116, 'learning_rate': 2.2865e-05, 'epoch': 7.11} 54%|█████▍ | 5435/10000 [21:20:52<17:39:47, 13.93s/it] 54%|█████▍ | 5436/10000 [21:21:06<17:37:56, 13.91s/it] {'loss': 0.011, 'learning_rate': 2.286e-05, 'epoch': 7.12} 54%|█████▍ | 5436/10000 [21:21:06<17:37:56, 13.91s/it] 54%|█████▍ | 5437/10000 [21:21:20<17:39:32, 13.93s/it] {'loss': 0.0152, 'learning_rate': 2.2855e-05, 'epoch': 7.12} 54%|█████▍ | 5437/10000 [21:21:20<17:39:32, 13.93s/it] 54%|█████▍ | 5438/10000 [21:21:34<17:37:07, 13.90s/it] {'loss': 0.0096, 'learning_rate': 2.2850000000000003e-05, 'epoch': 7.12} 54%|█████▍ | 5438/10000 [21:21:34<17:37:07, 13.90s/it] 54%|█████▍ | 5439/10000 [21:21:48<17:35:12, 13.88s/it] {'loss': 0.0123, 'learning_rate': 2.2845e-05, 'epoch': 7.12} 54%|█████▍ | 5439/10000 [21:21:48<17:35:12, 13.88s/it] 54%|█████▍ | 5440/10000 [21:22:02<17:34:06, 13.87s/it] {'loss': 0.0113, 'learning_rate': 2.284e-05, 'epoch': 7.12} 54%|█████▍ | 5440/10000 [21:22:02<17:34:06, 13.87s/it] 54%|█████▍ | 5441/10000 [21:22:16<17:37:07, 13.91s/it] {'loss': 0.0115, 'learning_rate': 2.2835e-05, 'epoch': 7.12} 54%|█████▍ | 5441/10000 [21:22:16<17:37:07, 13.91s/it] 54%|█████▍ | 5442/10000 [21:22:29<17:36:27, 13.91s/it] {'loss': 0.0154, 'learning_rate': 2.283e-05, 'epoch': 7.12} 54%|█████▍ | 5442/10000 [21:22:30<17:36:27, 13.91s/it] 54%|█████▍ | 5443/10000 [21:22:43<17:33:56, 13.88s/it] {'loss': 0.0114, 'learning_rate': 2.2825000000000003e-05, 'epoch': 7.12} 54%|█████▍ | 5443/10000 [21:22:43<17:33:56, 13.88s/it] 54%|█████▍ | 5444/10000 [21:22:57<17:36:43, 13.92s/it] {'loss': 0.0121, 'learning_rate': 2.282e-05, 'epoch': 7.13} 54%|█████▍ | 5444/10000 [21:22:57<17:36:43, 13.92s/it] 54%|█████▍ | 5445/10000 [21:23:11<17:35:57, 13.91s/it] {'loss': 0.0125, 'learning_rate': 2.2815e-05, 'epoch': 7.13} 54%|█████▍ | 5445/10000 [21:23:11<17:35:57, 13.91s/it] 54%|█████▍ | 5446/10000 [21:23:25<17:36:20, 13.92s/it] {'loss': 0.0139, 'learning_rate': 2.281e-05, 'epoch': 7.13} 54%|█████▍ | 5446/10000 [21:23:25<17:36:20, 13.92s/it] 54%|█████▍ | 5447/10000 [21:23:39<17:36:26, 13.92s/it] {'loss': 0.0129, 'learning_rate': 2.2805000000000003e-05, 'epoch': 7.13} 54%|█████▍ | 5447/10000 [21:23:39<17:36:26, 13.92s/it] 54%|█████▍ | 5448/10000 [21:23:53<17:36:04, 13.92s/it] {'loss': 0.0135, 'learning_rate': 2.2800000000000002e-05, 'epoch': 7.13} 54%|█████▍ | 5448/10000 [21:23:53<17:36:04, 13.92s/it] 54%|█████▍ | 5449/10000 [21:24:07<17:34:57, 13.91s/it] {'loss': 0.0119, 'learning_rate': 2.2795e-05, 'epoch': 7.13} 54%|█████▍ | 5449/10000 [21:24:07<17:34:57, 13.91s/it] 55%|█████▍ | 5450/10000 [21:24:21<17:36:15, 13.93s/it] {'loss': 0.0116, 'learning_rate': 2.279e-05, 'epoch': 7.13} 55%|█████▍ | 5450/10000 [21:24:21<17:36:15, 13.93s/it] 55%|█████▍ | 5451/10000 [21:24:35<17:35:54, 13.93s/it] {'loss': 0.0138, 'learning_rate': 2.2785e-05, 'epoch': 7.13} 55%|█████▍ | 5451/10000 [21:24:35<17:35:54, 13.93s/it] 55%|█████▍ | 5452/10000 [21:24:49<17:35:31, 13.93s/it] {'loss': 0.0118, 'learning_rate': 2.2780000000000002e-05, 'epoch': 7.14} 55%|█████▍ | 5452/10000 [21:24:49<17:35:31, 13.93s/it] 55%|█████▍ | 5453/10000 [21:25:03<17:33:48, 13.91s/it] {'loss': 0.0134, 'learning_rate': 2.2775e-05, 'epoch': 7.14} 55%|█████▍ | 5453/10000 [21:25:03<17:33:48, 13.91s/it] 55%|█████▍ | 5454/10000 [21:25:16<17:32:27, 13.89s/it] {'loss': 0.0148, 'learning_rate': 2.2770000000000004e-05, 'epoch': 7.14} 55%|█████▍ | 5454/10000 [21:25:16<17:32:27, 13.89s/it] 55%|█████▍ | 5455/10000 [21:25:30<17:36:28, 13.95s/it] {'loss': 0.0094, 'learning_rate': 2.2765e-05, 'epoch': 7.14} 55%|█████▍ | 5455/10000 [21:25:31<17:36:28, 13.95s/it] 55%|█████▍ | 5456/10000 [21:25:44<17:36:34, 13.95s/it] {'loss': 0.0118, 'learning_rate': 2.2760000000000002e-05, 'epoch': 7.14} 55%|█████▍ | 5456/10000 [21:25:44<17:36:34, 13.95s/it] 55%|█████▍ | 5457/10000 [21:25:58<17:33:32, 13.91s/it] {'loss': 0.0175, 'learning_rate': 2.2755e-05, 'epoch': 7.14} 55%|█████▍ | 5457/10000 [21:25:58<17:33:32, 13.91s/it] 55%|█████▍ | 5458/10000 [21:26:12<17:30:58, 13.88s/it] {'loss': 0.01, 'learning_rate': 2.275e-05, 'epoch': 7.14} 55%|█████▍ | 5458/10000 [21:26:12<17:30:58, 13.88s/it] 55%|█████▍ | 5459/10000 [21:26:26<17:31:25, 13.89s/it] {'loss': 0.0116, 'learning_rate': 2.2745000000000003e-05, 'epoch': 7.15} 55%|█████▍ | 5459/10000 [21:26:26<17:31:25, 13.89s/it] 55%|█████▍ | 5460/10000 [21:26:40<17:32:14, 13.91s/it] {'loss': 0.0144, 'learning_rate': 2.274e-05, 'epoch': 7.15} 55%|█████▍ | 5460/10000 [21:26:40<17:32:14, 13.91s/it] 55%|█████▍ | 5461/10000 [21:26:54<17:32:35, 13.91s/it] {'loss': 0.0126, 'learning_rate': 2.2735000000000002e-05, 'epoch': 7.15} 55%|█████▍ | 5461/10000 [21:26:54<17:32:35, 13.91s/it] 55%|█████▍ | 5462/10000 [21:27:08<17:32:29, 13.92s/it] {'loss': 0.0118, 'learning_rate': 2.273e-05, 'epoch': 7.15} 55%|█████▍ | 5462/10000 [21:27:08<17:32:29, 13.92s/it] 55%|█████▍ | 5463/10000 [21:27:22<17:34:03, 13.94s/it] {'loss': 0.0165, 'learning_rate': 2.2725000000000003e-05, 'epoch': 7.15} 55%|█████▍ | 5463/10000 [21:27:22<17:34:03, 13.94s/it] 55%|█████▍ | 5464/10000 [21:27:36<17:36:49, 13.98s/it] {'loss': 0.0127, 'learning_rate': 2.2720000000000003e-05, 'epoch': 7.15} 55%|█████▍ | 5464/10000 [21:27:36<17:36:49, 13.98s/it] 55%|█████▍ | 5465/10000 [21:27:50<17:33:35, 13.94s/it] {'loss': 0.0088, 'learning_rate': 2.2715e-05, 'epoch': 7.15} 55%|█████▍ | 5465/10000 [21:27:50<17:33:35, 13.94s/it] 55%|█████▍ | 5466/10000 [21:28:04<17:30:48, 13.91s/it] {'loss': 0.0134, 'learning_rate': 2.271e-05, 'epoch': 7.15} 55%|█████▍ | 5466/10000 [21:28:04<17:30:48, 13.91s/it] 55%|█████▍ | 5467/10000 [21:28:17<17:30:03, 13.90s/it] {'loss': 0.0126, 'learning_rate': 2.2705e-05, 'epoch': 7.16} 55%|█████▍ | 5467/10000 [21:28:17<17:30:03, 13.90s/it] 55%|█████▍ | 5468/10000 [21:28:31<17:28:06, 13.88s/it] {'loss': 0.0106, 'learning_rate': 2.2700000000000003e-05, 'epoch': 7.16} 55%|█████▍ | 5468/10000 [21:28:31<17:28:06, 13.88s/it] 55%|█████▍ | 5469/10000 [21:28:45<17:27:07, 13.87s/it] {'loss': 0.0101, 'learning_rate': 2.2695000000000002e-05, 'epoch': 7.16} 55%|█████▍ | 5469/10000 [21:28:45<17:27:07, 13.87s/it] 55%|█████▍ | 5470/10000 [21:28:59<17:28:44, 13.89s/it] {'loss': 0.0113, 'learning_rate': 2.269e-05, 'epoch': 7.16} 55%|█████▍ | 5470/10000 [21:28:59<17:28:44, 13.89s/it] 55%|█████▍ | 5471/10000 [21:29:13<17:27:18, 13.87s/it] {'loss': 0.0114, 'learning_rate': 2.2685e-05, 'epoch': 7.16} 55%|█████▍ | 5471/10000 [21:29:13<17:27:18, 13.87s/it] 55%|█████▍ | 5472/10000 [21:29:27<17:29:19, 13.90s/it] {'loss': 0.0118, 'learning_rate': 2.268e-05, 'epoch': 7.16} 55%|█████▍ | 5472/10000 [21:29:27<17:29:19, 13.90s/it] 55%|█████▍ | 5473/10000 [21:29:41<17:27:45, 13.89s/it] {'loss': 0.0093, 'learning_rate': 2.2675000000000002e-05, 'epoch': 7.16} 55%|█████▍ | 5473/10000 [21:29:41<17:27:45, 13.89s/it] 55%|█████▍ | 5474/10000 [21:29:54<17:26:10, 13.87s/it] {'loss': 0.0197, 'learning_rate': 2.267e-05, 'epoch': 7.16} 55%|█████▍ | 5474/10000 [21:29:55<17:26:10, 13.87s/it] 55%|█████▍ | 5475/10000 [21:30:08<17:27:24, 13.89s/it] {'loss': 0.0111, 'learning_rate': 2.2665e-05, 'epoch': 7.17} 55%|█████▍ | 5475/10000 [21:30:08<17:27:24, 13.89s/it] 55%|█████▍ | 5476/10000 [21:30:22<17:28:56, 13.91s/it] {'loss': 0.0116, 'learning_rate': 2.266e-05, 'epoch': 7.17} 55%|█████▍ | 5476/10000 [21:30:22<17:28:56, 13.91s/it] 55%|█████▍ | 5477/10000 [21:30:36<17:27:37, 13.90s/it] {'loss': 0.012, 'learning_rate': 2.2655000000000002e-05, 'epoch': 7.17} 55%|█████▍ | 5477/10000 [21:30:36<17:27:37, 13.90s/it] 55%|█████▍ | 5478/10000 [21:30:50<17:25:23, 13.87s/it] {'loss': 0.0111, 'learning_rate': 2.265e-05, 'epoch': 7.17} 55%|█████▍ | 5478/10000 [21:30:50<17:25:23, 13.87s/it] 55%|█████▍ | 5479/10000 [21:31:04<17:24:21, 13.86s/it] {'loss': 0.0119, 'learning_rate': 2.2645e-05, 'epoch': 7.17} 55%|█████▍ | 5479/10000 [21:31:04<17:24:21, 13.86s/it] 55%|█████▍ | 5480/10000 [21:31:18<17:23:29, 13.85s/it] {'loss': 0.0085, 'learning_rate': 2.264e-05, 'epoch': 7.17} 55%|█████▍ | 5480/10000 [21:31:18<17:23:29, 13.85s/it] 55%|█████▍ | 5481/10000 [21:31:32<17:25:44, 13.88s/it] {'loss': 0.0146, 'learning_rate': 2.2635e-05, 'epoch': 7.17} 55%|█████▍ | 5481/10000 [21:31:32<17:25:44, 13.88s/it] 55%|█████▍ | 5482/10000 [21:31:45<17:23:11, 13.85s/it] {'loss': 0.0103, 'learning_rate': 2.2630000000000002e-05, 'epoch': 7.18} 55%|█████▍ | 5482/10000 [21:31:46<17:23:11, 13.85s/it] 55%|█████▍ | 5483/10000 [21:31:59<17:23:20, 13.86s/it] {'loss': 0.013, 'learning_rate': 2.2625e-05, 'epoch': 7.18} 55%|█████▍ | 5483/10000 [21:31:59<17:23:20, 13.86s/it] 55%|█████▍ | 5484/10000 [21:32:13<17:21:49, 13.84s/it] {'loss': 0.0129, 'learning_rate': 2.2620000000000004e-05, 'epoch': 7.18} 55%|█████▍ | 5484/10000 [21:32:13<17:21:49, 13.84s/it] 55%|█████▍ | 5485/10000 [21:32:27<17:19:33, 13.81s/it] {'loss': 0.0137, 'learning_rate': 2.2615e-05, 'epoch': 7.18} 55%|█████▍ | 5485/10000 [21:32:27<17:19:33, 13.81s/it] 55%|█████▍ | 5486/10000 [21:32:41<17:19:27, 13.82s/it] {'loss': 0.0098, 'learning_rate': 2.2610000000000002e-05, 'epoch': 7.18} 55%|█████▍ | 5486/10000 [21:32:41<17:19:27, 13.82s/it] 55%|█████▍ | 5487/10000 [21:32:54<17:17:58, 13.80s/it] {'loss': 0.0131, 'learning_rate': 2.2605e-05, 'epoch': 7.18} 55%|█████▍ | 5487/10000 [21:32:55<17:17:58, 13.80s/it] 55%|█████▍ | 5488/10000 [21:33:08<17:17:25, 13.80s/it] {'loss': 0.0117, 'learning_rate': 2.26e-05, 'epoch': 7.18} 55%|█████▍ | 5488/10000 [21:33:08<17:17:25, 13.80s/it] 55%|█████▍ | 5489/10000 [21:33:22<17:15:24, 13.77s/it] {'loss': 0.0131, 'learning_rate': 2.2595000000000003e-05, 'epoch': 7.18} 55%|█████▍ | 5489/10000 [21:33:22<17:15:24, 13.77s/it] 55%|█████▍ | 5490/10000 [21:33:36<17:12:35, 13.74s/it] {'loss': 0.0127, 'learning_rate': 2.259e-05, 'epoch': 7.19} 55%|█████▍ | 5490/10000 [21:33:36<17:12:35, 13.74s/it] 55%|█████▍ | 5491/10000 [21:33:49<17:13:47, 13.76s/it] {'loss': 0.0087, 'learning_rate': 2.2585e-05, 'epoch': 7.19} 55%|█████▍ | 5491/10000 [21:33:49<17:13:47, 13.76s/it] 55%|█████▍ | 5492/10000 [21:34:03<17:13:27, 13.76s/it] {'loss': 0.0093, 'learning_rate': 2.258e-05, 'epoch': 7.19} 55%|█████▍ | 5492/10000 [21:34:03<17:13:27, 13.76s/it] 55%|█████▍ | 5493/10000 [21:34:17<17:16:23, 13.80s/it] {'loss': 0.0111, 'learning_rate': 2.2575000000000003e-05, 'epoch': 7.19} 55%|█████▍ | 5493/10000 [21:34:17<17:16:23, 13.80s/it] 55%|█████▍ | 5494/10000 [21:34:31<17:15:52, 13.79s/it] {'loss': 0.0127, 'learning_rate': 2.2570000000000002e-05, 'epoch': 7.19} 55%|█████▍ | 5494/10000 [21:34:31<17:15:52, 13.79s/it] 55%|█████▍ | 5495/10000 [21:34:45<17:14:58, 13.78s/it] {'loss': 0.0135, 'learning_rate': 2.2565e-05, 'epoch': 7.19} 55%|█████▍ | 5495/10000 [21:34:45<17:14:58, 13.78s/it] 55%|█████▍ | 5496/10000 [21:34:59<17:16:50, 13.81s/it] {'loss': 0.0111, 'learning_rate': 2.256e-05, 'epoch': 7.19} 55%|█████▍ | 5496/10000 [21:34:59<17:16:50, 13.81s/it] 55%|█████▍ | 5497/10000 [21:35:12<17:17:45, 13.83s/it] {'loss': 0.0144, 'learning_rate': 2.2555e-05, 'epoch': 7.2} 55%|█████▍ | 5497/10000 [21:35:12<17:17:45, 13.83s/it] 55%|█████▍ | 5498/10000 [21:35:26<17:18:25, 13.84s/it] {'loss': 0.0131, 'learning_rate': 2.2550000000000003e-05, 'epoch': 7.2} 55%|█████▍ | 5498/10000 [21:35:26<17:18:25, 13.84s/it] 55%|█████▍ | 5499/10000 [21:35:40<17:15:03, 13.80s/it] {'loss': 0.014, 'learning_rate': 2.2545000000000002e-05, 'epoch': 7.2} 55%|█████▍ | 5499/10000 [21:35:40<17:15:03, 13.80s/it] 55%|█████▌ | 5500/10000 [21:35:54<17:17:06, 13.83s/it] {'loss': 0.0131, 'learning_rate': 2.254e-05, 'epoch': 7.2} 55%|█████▌ | 5500/10000 [21:35:54<17:17:06, 13.83s/it] 55%|█████▌ | 5501/10000 [21:36:08<17:16:20, 13.82s/it] {'loss': 0.0144, 'learning_rate': 2.2535e-05, 'epoch': 7.2} 55%|█████▌ | 5501/10000 [21:36:08<17:16:20, 13.82s/it] 55%|█████▌ | 5502/10000 [21:36:21<17:15:41, 13.82s/it] {'loss': 0.0128, 'learning_rate': 2.253e-05, 'epoch': 7.2} 55%|█████▌ | 5502/10000 [21:36:21<17:15:41, 13.82s/it] 55%|█████▌ | 5503/10000 [21:36:35<17:14:43, 13.81s/it] {'loss': 0.0113, 'learning_rate': 2.2525000000000002e-05, 'epoch': 7.2} 55%|█████▌ | 5503/10000 [21:36:35<17:14:43, 13.81s/it] 55%|█████▌ | 5504/10000 [21:36:49<17:12:27, 13.78s/it] {'loss': 0.0099, 'learning_rate': 2.252e-05, 'epoch': 7.2} 55%|█████▌ | 5504/10000 [21:36:49<17:12:27, 13.78s/it] 55%|█████▌ | 5505/10000 [21:37:03<17:14:56, 13.81s/it] {'loss': 0.0122, 'learning_rate': 2.2515e-05, 'epoch': 7.21} 55%|█████▌ | 5505/10000 [21:37:03<17:14:56, 13.81s/it] 55%|█████▌ | 5506/10000 [21:37:17<17:14:11, 13.81s/it] {'loss': 0.0126, 'learning_rate': 2.251e-05, 'epoch': 7.21} 55%|█████▌ | 5506/10000 [21:37:17<17:14:11, 13.81s/it] 55%|█████▌ | 5507/10000 [21:37:30<17:14:11, 13.81s/it] {'loss': 0.0133, 'learning_rate': 2.2505000000000002e-05, 'epoch': 7.21} 55%|█████▌ | 5507/10000 [21:37:30<17:14:11, 13.81s/it] 55%|█████▌ | 5508/10000 [21:37:44<17:14:10, 13.81s/it] {'loss': 0.0121, 'learning_rate': 2.25e-05, 'epoch': 7.21} 55%|█████▌ | 5508/10000 [21:37:44<17:14:10, 13.81s/it] 55%|█████▌ | 5509/10000 [21:37:58<17:12:50, 13.80s/it] {'loss': 0.0106, 'learning_rate': 2.2495e-05, 'epoch': 7.21} 55%|█████▌ | 5509/10000 [21:37:58<17:12:50, 13.80s/it] 55%|█████▌ | 5510/10000 [21:38:12<17:12:59, 13.80s/it] {'loss': 0.0178, 'learning_rate': 2.249e-05, 'epoch': 7.21} 55%|█████▌ | 5510/10000 [21:38:12<17:12:59, 13.80s/it] 55%|█████▌ | 5511/10000 [21:38:26<17:11:14, 13.78s/it] {'loss': 0.0131, 'learning_rate': 2.2485e-05, 'epoch': 7.21} 55%|█████▌ | 5511/10000 [21:38:26<17:11:14, 13.78s/it] 55%|█████▌ | 5512/10000 [21:38:39<17:09:24, 13.76s/it] {'loss': 0.0144, 'learning_rate': 2.248e-05, 'epoch': 7.21} 55%|█████▌ | 5512/10000 [21:38:39<17:09:24, 13.76s/it] 55%|█████▌ | 5513/10000 [21:38:53<17:09:18, 13.76s/it] {'loss': 0.0137, 'learning_rate': 2.2475e-05, 'epoch': 7.22} 55%|█████▌ | 5513/10000 [21:38:53<17:09:18, 13.76s/it] 55%|█████▌ | 5514/10000 [21:39:07<17:10:10, 13.78s/it] {'loss': 0.0102, 'learning_rate': 2.2470000000000003e-05, 'epoch': 7.22} 55%|█████▌ | 5514/10000 [21:39:07<17:10:10, 13.78s/it] 55%|█████▌ | 5515/10000 [21:39:21<17:12:01, 13.81s/it] {'loss': 0.0131, 'learning_rate': 2.2465e-05, 'epoch': 7.22} 55%|█████▌ | 5515/10000 [21:39:21<17:12:01, 13.81s/it] 55%|█████▌ | 5516/10000 [21:39:35<17:11:24, 13.80s/it] {'loss': 0.0134, 'learning_rate': 2.2460000000000002e-05, 'epoch': 7.22} 55%|█████▌ | 5516/10000 [21:39:35<17:11:24, 13.80s/it] 55%|█████▌ | 5517/10000 [21:39:48<17:09:02, 13.77s/it] {'loss': 0.016, 'learning_rate': 2.2455e-05, 'epoch': 7.22} 55%|█████▌ | 5517/10000 [21:39:48<17:09:02, 13.77s/it] 55%|█████▌ | 5518/10000 [21:40:02<17:10:34, 13.80s/it] {'loss': 0.012, 'learning_rate': 2.245e-05, 'epoch': 7.22} 55%|█████▌ | 5518/10000 [21:40:02<17:10:34, 13.80s/it] 55%|█████▌ | 5519/10000 [21:40:16<17:11:11, 13.81s/it] {'loss': 0.0143, 'learning_rate': 2.2445000000000003e-05, 'epoch': 7.22} 55%|█████▌ | 5519/10000 [21:40:16<17:11:11, 13.81s/it] 55%|█████▌ | 5520/10000 [21:40:30<17:13:28, 13.84s/it] {'loss': 0.0144, 'learning_rate': 2.244e-05, 'epoch': 7.23} 55%|█████▌ | 5520/10000 [21:40:30<17:13:28, 13.84s/it] 55%|█████▌ | 5521/10000 [21:40:44<17:13:36, 13.85s/it] {'loss': 0.0149, 'learning_rate': 2.2435e-05, 'epoch': 7.23} 55%|█████▌ | 5521/10000 [21:40:44<17:13:36, 13.85s/it] 55%|█████▌ | 5522/10000 [21:40:57<17:09:49, 13.80s/it] {'loss': 0.0122, 'learning_rate': 2.243e-05, 'epoch': 7.23} 55%|█████▌ | 5522/10000 [21:40:57<17:09:49, 13.80s/it] 55%|█████▌ | 5523/10000 [21:41:11<17:12:39, 13.84s/it] {'loss': 0.0122, 'learning_rate': 2.2425000000000003e-05, 'epoch': 7.23} 55%|█████▌ | 5523/10000 [21:41:11<17:12:39, 13.84s/it] 55%|█████▌ | 5524/10000 [21:41:25<17:12:21, 13.84s/it] {'loss': 0.0125, 'learning_rate': 2.2420000000000002e-05, 'epoch': 7.23} 55%|█████▌ | 5524/10000 [21:41:25<17:12:21, 13.84s/it] 55%|█████▌ | 5525/10000 [21:41:39<17:12:56, 13.85s/it] {'loss': 0.0118, 'learning_rate': 2.2415e-05, 'epoch': 7.23} 55%|█████▌ | 5525/10000 [21:41:39<17:12:56, 13.85s/it] 55%|█████▌ | 5526/10000 [21:41:53<17:12:52, 13.85s/it] {'loss': 0.0108, 'learning_rate': 2.241e-05, 'epoch': 7.23} 55%|█████▌ | 5526/10000 [21:41:53<17:12:52, 13.85s/it] 55%|█████▌ | 5527/10000 [21:42:07<17:11:06, 13.83s/it] {'loss': 0.0107, 'learning_rate': 2.2405e-05, 'epoch': 7.23} 55%|█████▌ | 5527/10000 [21:42:07<17:11:06, 13.83s/it] 55%|█████▌ | 5528/10000 [21:42:20<17:09:02, 13.81s/it] {'loss': 0.0165, 'learning_rate': 2.2400000000000002e-05, 'epoch': 7.24} 55%|█████▌ | 5528/10000 [21:42:20<17:09:02, 13.81s/it] 55%|█████▌ | 5529/10000 [21:42:34<17:10:42, 13.83s/it] {'loss': 0.0133, 'learning_rate': 2.2395e-05, 'epoch': 7.24} 55%|█████▌ | 5529/10000 [21:42:34<17:10:42, 13.83s/it] 55%|█████▌ | 5530/10000 [21:42:48<17:08:04, 13.80s/it] {'loss': 0.0112, 'learning_rate': 2.239e-05, 'epoch': 7.24} 55%|█████▌ | 5530/10000 [21:42:48<17:08:04, 13.80s/it] 55%|█████▌ | 5531/10000 [21:43:02<17:05:45, 13.77s/it] {'loss': 0.014, 'learning_rate': 2.2385e-05, 'epoch': 7.24} 55%|█████▌ | 5531/10000 [21:43:02<17:05:45, 13.77s/it] 55%|█████▌ | 5532/10000 [21:43:16<17:04:59, 13.76s/it] {'loss': 0.0126, 'learning_rate': 2.2380000000000003e-05, 'epoch': 7.24} 55%|█████▌ | 5532/10000 [21:43:16<17:04:59, 13.76s/it] 55%|█████▌ | 5533/10000 [21:43:29<17:04:16, 13.76s/it] {'loss': 0.0103, 'learning_rate': 2.2375000000000002e-05, 'epoch': 7.24} 55%|█████▌ | 5533/10000 [21:43:29<17:04:16, 13.76s/it] 55%|█████▌ | 5534/10000 [21:43:43<17:05:35, 13.78s/it] {'loss': 0.0138, 'learning_rate': 2.237e-05, 'epoch': 7.24} 55%|█████▌ | 5534/10000 [21:43:43<17:05:35, 13.78s/it] 55%|█████▌ | 5535/10000 [21:43:57<17:05:44, 13.78s/it] {'loss': 0.009, 'learning_rate': 2.2365e-05, 'epoch': 7.24} 55%|█████▌ | 5535/10000 [21:43:57<17:05:44, 13.78s/it] 55%|█████▌ | 5536/10000 [21:44:11<17:02:37, 13.74s/it] {'loss': 0.0098, 'learning_rate': 2.236e-05, 'epoch': 7.25} 55%|█████▌ | 5536/10000 [21:44:11<17:02:37, 13.74s/it] 55%|█████▌ | 5537/10000 [21:44:24<17:04:44, 13.78s/it] {'loss': 0.0132, 'learning_rate': 2.2355000000000002e-05, 'epoch': 7.25} 55%|█████▌ | 5537/10000 [21:44:24<17:04:44, 13.78s/it] 55%|█████▌ | 5538/10000 [21:44:38<17:07:53, 13.82s/it] {'loss': 0.0105, 'learning_rate': 2.235e-05, 'epoch': 7.25} 55%|█████▌ | 5538/10000 [21:44:38<17:07:53, 13.82s/it] 55%|█████▌ | 5539/10000 [21:44:52<17:08:41, 13.84s/it] {'loss': 0.0124, 'learning_rate': 2.2345e-05, 'epoch': 7.25} 55%|█████▌ | 5539/10000 [21:44:52<17:08:41, 13.84s/it] 55%|█████▌ | 5540/10000 [21:45:06<17:05:07, 13.79s/it] {'loss': 0.0108, 'learning_rate': 2.234e-05, 'epoch': 7.25} 55%|█████▌ | 5540/10000 [21:45:06<17:05:07, 13.79s/it] 55%|█████▌ | 5541/10000 [21:45:20<17:05:45, 13.80s/it] {'loss': 0.013, 'learning_rate': 2.2335e-05, 'epoch': 7.25} 55%|█████▌ | 5541/10000 [21:45:20<17:05:45, 13.80s/it] 55%|█████▌ | 5542/10000 [21:45:34<17:06:00, 13.81s/it] {'loss': 0.0136, 'learning_rate': 2.233e-05, 'epoch': 7.25} 55%|█████▌ | 5542/10000 [21:45:34<17:06:00, 13.81s/it] 55%|█████▌ | 5543/10000 [21:45:47<17:07:42, 13.84s/it] {'loss': 0.016, 'learning_rate': 2.2325e-05, 'epoch': 7.26} 55%|█████▌ | 5543/10000 [21:45:47<17:07:42, 13.84s/it] 55%|█████▌ | 5544/10000 [21:46:01<17:04:49, 13.80s/it] {'loss': 0.0102, 'learning_rate': 2.2320000000000003e-05, 'epoch': 7.26} 55%|█████▌ | 5544/10000 [21:46:01<17:04:49, 13.80s/it] 55%|█████▌ | 5545/10000 [21:46:15<17:06:13, 13.82s/it] {'loss': 0.0243, 'learning_rate': 2.2315e-05, 'epoch': 7.26} 55%|█████▌ | 5545/10000 [21:46:15<17:06:13, 13.82s/it] 55%|█████▌ | 5546/10000 [21:46:29<17:05:36, 13.82s/it] {'loss': 0.0127, 'learning_rate': 2.231e-05, 'epoch': 7.26} 55%|█████▌ | 5546/10000 [21:46:29<17:05:36, 13.82s/it] 55%|█████▌ | 5547/10000 [21:46:43<17:04:49, 13.81s/it] {'loss': 0.0174, 'learning_rate': 2.2305e-05, 'epoch': 7.26} 55%|█████▌ | 5547/10000 [21:46:43<17:04:49, 13.81s/it] 55%|█████▌ | 5548/10000 [21:46:56<17:05:29, 13.82s/it] {'loss': 0.014, 'learning_rate': 2.23e-05, 'epoch': 7.26} 55%|█████▌ | 5548/10000 [21:46:56<17:05:29, 13.82s/it] 55%|█████▌ | 5549/10000 [21:47:10<17:06:42, 13.84s/it] {'loss': 0.0124, 'learning_rate': 2.2295000000000003e-05, 'epoch': 7.26} 55%|█████▌ | 5549/10000 [21:47:10<17:06:42, 13.84s/it] 56%|█████▌ | 5550/10000 [21:47:24<17:04:55, 13.82s/it] {'loss': 0.0133, 'learning_rate': 2.229e-05, 'epoch': 7.26} 56%|█████▌ | 5550/10000 [21:47:24<17:04:55, 13.82s/it] 56%|█████▌ | 5551/10000 [21:47:38<17:06:33, 13.84s/it] {'loss': 0.0131, 'learning_rate': 2.2285e-05, 'epoch': 7.27} 56%|█████▌ | 5551/10000 [21:47:38<17:06:33, 13.84s/it] 56%|█████▌ | 5552/10000 [21:47:52<17:07:03, 13.85s/it] {'loss': 0.0133, 'learning_rate': 2.228e-05, 'epoch': 7.27} 56%|█████▌ | 5552/10000 [21:47:52<17:07:03, 13.85s/it] 56%|█████▌ | 5553/10000 [21:48:06<17:08:44, 13.88s/it] {'loss': 0.0182, 'learning_rate': 2.2275000000000003e-05, 'epoch': 7.27} 56%|█████▌ | 5553/10000 [21:48:06<17:08:44, 13.88s/it] 56%|█████▌ | 5554/10000 [21:48:20<17:07:37, 13.87s/it] {'loss': 0.013, 'learning_rate': 2.2270000000000002e-05, 'epoch': 7.27} 56%|█████▌ | 5554/10000 [21:48:20<17:07:37, 13.87s/it] 56%|█████▌ | 5555/10000 [21:48:33<17:04:59, 13.84s/it] {'loss': 0.0129, 'learning_rate': 2.2265e-05, 'epoch': 7.27} 56%|█████▌ | 5555/10000 [21:48:33<17:04:59, 13.84s/it] 56%|█████▌ | 5556/10000 [21:48:47<17:08:50, 13.89s/it] {'loss': 0.0113, 'learning_rate': 2.226e-05, 'epoch': 7.27} 56%|█████▌ | 5556/10000 [21:48:47<17:08:50, 13.89s/it] 56%|█████▌ | 5557/10000 [21:49:01<17:08:38, 13.89s/it] {'loss': 0.0133, 'learning_rate': 2.2255e-05, 'epoch': 7.27} 56%|█████▌ | 5557/10000 [21:49:01<17:08:38, 13.89s/it] 56%|█████▌ | 5558/10000 [21:49:15<17:10:22, 13.92s/it] {'loss': 0.0134, 'learning_rate': 2.2250000000000002e-05, 'epoch': 7.27} 56%|█████▌ | 5558/10000 [21:49:15<17:10:22, 13.92s/it] 56%|█████▌ | 5559/10000 [21:49:29<17:06:12, 13.86s/it] {'loss': 0.0111, 'learning_rate': 2.2245e-05, 'epoch': 7.28} 56%|█████▌ | 5559/10000 [21:49:29<17:06:12, 13.86s/it] 56%|█████▌ | 5560/10000 [21:49:43<17:05:52, 13.86s/it] {'loss': 0.0128, 'learning_rate': 2.224e-05, 'epoch': 7.28} 56%|█████▌ | 5560/10000 [21:49:43<17:05:52, 13.86s/it] 56%|█████▌ | 5561/10000 [21:49:57<17:05:34, 13.86s/it] {'loss': 0.012, 'learning_rate': 2.2235e-05, 'epoch': 7.28} 56%|█████▌ | 5561/10000 [21:49:57<17:05:34, 13.86s/it] 56%|█████▌ | 5562/10000 [21:50:11<17:02:37, 13.83s/it] {'loss': 0.0145, 'learning_rate': 2.2230000000000002e-05, 'epoch': 7.28} 56%|█████▌ | 5562/10000 [21:50:11<17:02:37, 13.83s/it] 56%|█████▌ | 5563/10000 [21:50:24<17:01:39, 13.82s/it] {'loss': 0.0119, 'learning_rate': 2.2225e-05, 'epoch': 7.28} 56%|█████▌ | 5563/10000 [21:50:24<17:01:39, 13.82s/it] 56%|█████▌ | 5564/10000 [21:50:38<17:04:10, 13.85s/it] {'loss': 0.0156, 'learning_rate': 2.222e-05, 'epoch': 7.28} 56%|█████▌ | 5564/10000 [21:50:38<17:04:10, 13.85s/it] 56%|█████▌ | 5565/10000 [21:50:52<17:06:34, 13.89s/it] {'loss': 0.0119, 'learning_rate': 2.2215e-05, 'epoch': 7.28} 56%|█████▌ | 5565/10000 [21:50:52<17:06:34, 13.89s/it] 56%|█████▌ | 5566/10000 [21:51:06<17:02:22, 13.83s/it] {'loss': 0.0134, 'learning_rate': 2.221e-05, 'epoch': 7.29} 56%|█████▌ | 5566/10000 [21:51:06<17:02:22, 13.83s/it] 56%|█████▌ | 5567/10000 [21:51:20<17:01:24, 13.82s/it] {'loss': 0.014, 'learning_rate': 2.2205000000000002e-05, 'epoch': 7.29} 56%|█████▌ | 5567/10000 [21:51:20<17:01:24, 13.82s/it] 56%|█████▌ | 5568/10000 [21:51:33<16:59:53, 13.81s/it] {'loss': 0.011, 'learning_rate': 2.22e-05, 'epoch': 7.29} 56%|█████▌ | 5568/10000 [21:51:34<16:59:53, 13.81s/it] 56%|█████▌ | 5569/10000 [21:51:47<16:58:42, 13.79s/it] {'loss': 0.0137, 'learning_rate': 2.2195000000000003e-05, 'epoch': 7.29} 56%|█████▌ | 5569/10000 [21:51:47<16:58:42, 13.79s/it] 56%|█████▌ | 5570/10000 [21:52:01<17:00:33, 13.82s/it] {'loss': 0.0105, 'learning_rate': 2.219e-05, 'epoch': 7.29} 56%|█████▌ | 5570/10000 [21:52:01<17:00:33, 13.82s/it] 56%|█████▌ | 5571/10000 [21:52:15<17:01:40, 13.84s/it] {'loss': 0.0118, 'learning_rate': 2.2185000000000002e-05, 'epoch': 7.29} 56%|█████▌ | 5571/10000 [21:52:15<17:01:40, 13.84s/it] 56%|█████▌ | 5572/10000 [21:52:29<17:00:12, 13.82s/it] {'loss': 0.011, 'learning_rate': 2.218e-05, 'epoch': 7.29} 56%|█████▌ | 5572/10000 [21:52:29<17:00:12, 13.82s/it] 56%|█████▌ | 5573/10000 [21:52:43<17:00:08, 13.83s/it] {'loss': 0.0108, 'learning_rate': 2.2175e-05, 'epoch': 7.29} 56%|█████▌ | 5573/10000 [21:52:43<17:00:08, 13.83s/it] 56%|█████▌ | 5574/10000 [21:52:56<16:59:30, 13.82s/it] {'loss': 0.0106, 'learning_rate': 2.2170000000000003e-05, 'epoch': 7.3} 56%|█████▌ | 5574/10000 [21:52:56<16:59:30, 13.82s/it] 56%|█████▌ | 5575/10000 [21:53:10<17:01:29, 13.85s/it] {'loss': 0.0157, 'learning_rate': 2.2165000000000002e-05, 'epoch': 7.3} 56%|█████▌ | 5575/10000 [21:53:10<17:01:29, 13.85s/it] 56%|█████▌ | 5576/10000 [21:53:24<17:00:18, 13.84s/it] {'loss': 0.0092, 'learning_rate': 2.216e-05, 'epoch': 7.3} 56%|█████▌ | 5576/10000 [21:53:24<17:00:18, 13.84s/it] 56%|█████▌ | 5577/10000 [21:53:38<16:58:52, 13.82s/it] {'loss': 0.0141, 'learning_rate': 2.2155e-05, 'epoch': 7.3} 56%|█████▌ | 5577/10000 [21:53:38<16:58:52, 13.82s/it] 56%|█████▌ | 5578/10000 [21:53:52<16:57:54, 13.81s/it] {'loss': 0.0135, 'learning_rate': 2.215e-05, 'epoch': 7.3} 56%|█████▌ | 5578/10000 [21:53:52<16:57:54, 13.81s/it] 56%|█████▌ | 5579/10000 [21:54:06<16:58:13, 13.82s/it] {'loss': 0.0117, 'learning_rate': 2.2145000000000002e-05, 'epoch': 7.3} 56%|█████▌ | 5579/10000 [21:54:06<16:58:13, 13.82s/it] 56%|█████▌ | 5580/10000 [21:54:19<16:58:21, 13.82s/it] {'loss': 0.0118, 'learning_rate': 2.214e-05, 'epoch': 7.3} 56%|█████▌ | 5580/10000 [21:54:19<16:58:21, 13.82s/it] 56%|█████▌ | 5581/10000 [21:54:33<16:59:28, 13.84s/it] {'loss': 0.0103, 'learning_rate': 2.2135e-05, 'epoch': 7.3} 56%|█████▌ | 5581/10000 [21:54:33<16:59:28, 13.84s/it] 56%|█████▌ | 5582/10000 [21:54:47<16:58:34, 13.83s/it] {'loss': 0.0117, 'learning_rate': 2.213e-05, 'epoch': 7.31} 56%|█████▌ | 5582/10000 [21:54:47<16:58:34, 13.83s/it] 56%|█████▌ | 5583/10000 [21:55:01<16:56:35, 13.81s/it] {'loss': 0.011, 'learning_rate': 2.2125000000000002e-05, 'epoch': 7.31} 56%|█████▌ | 5583/10000 [21:55:01<16:56:35, 13.81s/it] 56%|█████▌ | 5584/10000 [21:55:15<16:55:06, 13.79s/it] {'loss': 0.0111, 'learning_rate': 2.212e-05, 'epoch': 7.31} 56%|█████▌ | 5584/10000 [21:55:15<16:55:06, 13.79s/it] 56%|█████▌ | 5585/10000 [21:55:29<16:59:05, 13.85s/it] {'loss': 0.0133, 'learning_rate': 2.2115e-05, 'epoch': 7.31} 56%|█████▌ | 5585/10000 [21:55:29<16:59:05, 13.85s/it] 56%|█████▌ | 5586/10000 [21:55:43<17:01:10, 13.88s/it] {'loss': 0.0118, 'learning_rate': 2.211e-05, 'epoch': 7.31} 56%|█████▌ | 5586/10000 [21:55:43<17:01:10, 13.88s/it] 56%|█████▌ | 5587/10000 [21:55:56<16:57:55, 13.84s/it] {'loss': 0.0104, 'learning_rate': 2.2105e-05, 'epoch': 7.31} 56%|█████▌ | 5587/10000 [21:55:56<16:57:55, 13.84s/it] 56%|█████▌ | 5588/10000 [21:56:10<16:58:16, 13.85s/it] {'loss': 0.0105, 'learning_rate': 2.2100000000000002e-05, 'epoch': 7.31} 56%|█████▌ | 5588/10000 [21:56:10<16:58:16, 13.85s/it] 56%|█████▌ | 5589/10000 [21:56:24<16:56:51, 13.83s/it] {'loss': 0.0089, 'learning_rate': 2.2095e-05, 'epoch': 7.32} 56%|█████▌ | 5589/10000 [21:56:24<16:56:51, 13.83s/it] 56%|█████▌ | 5590/10000 [21:56:38<16:54:30, 13.80s/it] {'loss': 0.0114, 'learning_rate': 2.2090000000000004e-05, 'epoch': 7.32} 56%|█████▌ | 5590/10000 [21:56:38<16:54:30, 13.80s/it] 56%|█████▌ | 5591/10000 [21:56:51<16:53:16, 13.79s/it] {'loss': 0.013, 'learning_rate': 2.2085e-05, 'epoch': 7.32} 56%|█████▌ | 5591/10000 [21:56:52<16:53:16, 13.79s/it] 56%|█████▌ | 5592/10000 [21:57:05<16:52:01, 13.78s/it] {'loss': 0.0132, 'learning_rate': 2.2080000000000002e-05, 'epoch': 7.32} 56%|█████▌ | 5592/10000 [21:57:05<16:52:01, 13.78s/it] 56%|█████▌ | 5593/10000 [21:57:19<16:52:52, 13.79s/it] {'loss': 0.0144, 'learning_rate': 2.2075e-05, 'epoch': 7.32} 56%|█████▌ | 5593/10000 [21:57:19<16:52:52, 13.79s/it] 56%|█████▌ | 5594/10000 [21:57:33<16:52:59, 13.79s/it] {'loss': 0.0119, 'learning_rate': 2.207e-05, 'epoch': 7.32} 56%|█████▌ | 5594/10000 [21:57:33<16:52:59, 13.79s/it] 56%|█████▌ | 5595/10000 [21:57:47<16:52:23, 13.79s/it] {'loss': 0.0134, 'learning_rate': 2.2065000000000003e-05, 'epoch': 7.32} 56%|█████▌ | 5595/10000 [21:57:47<16:52:23, 13.79s/it] 56%|█████▌ | 5596/10000 [21:58:00<16:53:00, 13.80s/it] {'loss': 0.0134, 'learning_rate': 2.206e-05, 'epoch': 7.32} 56%|█████▌ | 5596/10000 [21:58:00<16:53:00, 13.80s/it] 56%|█████▌ | 5597/10000 [21:58:14<16:52:56, 13.80s/it] {'loss': 0.0091, 'learning_rate': 2.2055e-05, 'epoch': 7.33} 56%|█████▌ | 5597/10000 [21:58:14<16:52:56, 13.80s/it] 56%|█████▌ | 5598/10000 [21:58:28<16:53:40, 13.82s/it] {'loss': 0.0144, 'learning_rate': 2.205e-05, 'epoch': 7.33} 56%|█████▌ | 5598/10000 [21:58:28<16:53:40, 13.82s/it] 56%|█████▌ | 5599/10000 [21:58:42<16:52:15, 13.80s/it] {'loss': 0.0132, 'learning_rate': 2.2045000000000003e-05, 'epoch': 7.33} 56%|█████▌ | 5599/10000 [21:58:42<16:52:15, 13.80s/it] 56%|█████▌ | 5600/10000 [21:58:56<16:53:59, 13.83s/it] {'loss': 0.0144, 'learning_rate': 2.2040000000000002e-05, 'epoch': 7.33} 56%|█████▌ | 5600/10000 [21:58:56<16:53:59, 13.83s/it] 56%|█████▌ | 5601/10000 [21:59:10<16:53:22, 13.82s/it] {'loss': 0.0104, 'learning_rate': 2.2035e-05, 'epoch': 7.33} 56%|█████▌ | 5601/10000 [21:59:10<16:53:22, 13.82s/it] 56%|█████▌ | 5602/10000 [21:59:23<16:51:23, 13.80s/it] {'loss': 0.0127, 'learning_rate': 2.203e-05, 'epoch': 7.33} 56%|█████▌ | 5602/10000 [21:59:23<16:51:23, 13.80s/it] 56%|█████▌ | 5603/10000 [21:59:37<16:49:32, 13.78s/it] {'loss': 0.0093, 'learning_rate': 2.2025e-05, 'epoch': 7.33} 56%|█████▌ | 5603/10000 [21:59:37<16:49:32, 13.78s/it] 56%|█████▌ | 5604/10000 [21:59:51<16:54:15, 13.84s/it] {'loss': 0.0136, 'learning_rate': 2.2020000000000003e-05, 'epoch': 7.34} 56%|█████▌ | 5604/10000 [21:59:51<16:54:15, 13.84s/it] 56%|█████▌ | 5605/10000 [22:00:05<16:52:06, 13.82s/it] {'loss': 0.0131, 'learning_rate': 2.2015000000000002e-05, 'epoch': 7.34} 56%|█████▌ | 5605/10000 [22:00:05<16:52:06, 13.82s/it] 56%|█████▌ | 5606/10000 [22:00:19<16:52:00, 13.82s/it] {'loss': 0.0122, 'learning_rate': 2.201e-05, 'epoch': 7.34} 56%|█████▌ | 5606/10000 [22:00:19<16:52:00, 13.82s/it] 56%|█████▌ | 5607/10000 [22:00:32<16:53:11, 13.84s/it] {'loss': 0.0149, 'learning_rate': 2.2005e-05, 'epoch': 7.34} 56%|█████▌ | 5607/10000 [22:00:33<16:53:11, 13.84s/it] 56%|█████▌ | 5608/10000 [22:00:46<16:50:14, 13.80s/it] {'loss': 0.0126, 'learning_rate': 2.2000000000000003e-05, 'epoch': 7.34} 56%|█████▌ | 5608/10000 [22:00:46<16:50:14, 13.80s/it] 56%|█████▌ | 5609/10000 [22:01:00<16:53:43, 13.85s/it] {'loss': 0.0127, 'learning_rate': 2.1995000000000002e-05, 'epoch': 7.34} 56%|█████▌ | 5609/10000 [22:01:00<16:53:43, 13.85s/it] 56%|█████▌ | 5610/10000 [22:01:14<16:56:26, 13.89s/it] {'loss': 0.017, 'learning_rate': 2.199e-05, 'epoch': 7.34} 56%|█████▌ | 5610/10000 [22:01:14<16:56:26, 13.89s/it] 56%|█████▌ | 5611/10000 [22:01:28<16:59:48, 13.94s/it] {'loss': 0.0141, 'learning_rate': 2.1985e-05, 'epoch': 7.34} 56%|█████▌ | 5611/10000 [22:01:28<16:59:48, 13.94s/it] 56%|█████▌ | 5612/10000 [22:01:42<16:58:21, 13.92s/it] {'loss': 0.0104, 'learning_rate': 2.198e-05, 'epoch': 7.35} 56%|█████▌ | 5612/10000 [22:01:42<16:58:21, 13.92s/it] 56%|█████▌ | 5613/10000 [22:01:56<16:58:06, 13.92s/it] {'loss': 0.014, 'learning_rate': 2.1975000000000002e-05, 'epoch': 7.35} 56%|█████▌ | 5613/10000 [22:01:56<16:58:06, 13.92s/it] 56%|█████▌ | 5614/10000 [22:02:10<16:57:37, 13.92s/it] {'loss': 0.0133, 'learning_rate': 2.197e-05, 'epoch': 7.35} 56%|█████▌ | 5614/10000 [22:02:10<16:57:37, 13.92s/it] 56%|█████▌ | 5615/10000 [22:02:24<16:57:05, 13.92s/it] {'loss': 0.012, 'learning_rate': 2.1965e-05, 'epoch': 7.35} 56%|█████▌ | 5615/10000 [22:02:24<16:57:05, 13.92s/it] 56%|█████▌ | 5616/10000 [22:02:38<16:55:08, 13.89s/it] {'loss': 0.0102, 'learning_rate': 2.196e-05, 'epoch': 7.35} 56%|█████▌ | 5616/10000 [22:02:38<16:55:08, 13.89s/it] 56%|█████▌ | 5617/10000 [22:02:52<16:54:46, 13.89s/it] {'loss': 0.0108, 'learning_rate': 2.1955e-05, 'epoch': 7.35} 56%|█████▌ | 5617/10000 [22:02:52<16:54:46, 13.89s/it] 56%|█████▌ | 5618/10000 [22:03:05<16:51:38, 13.85s/it] {'loss': 0.0118, 'learning_rate': 2.195e-05, 'epoch': 7.35} 56%|█████▌ | 5618/10000 [22:03:05<16:51:38, 13.85s/it] 56%|█████▌ | 5619/10000 [22:03:19<16:52:03, 13.86s/it] {'loss': 0.0127, 'learning_rate': 2.1945e-05, 'epoch': 7.35} 56%|█████▌ | 5619/10000 [22:03:19<16:52:03, 13.86s/it] 56%|█████▌ | 5620/10000 [22:03:33<16:53:43, 13.89s/it] {'loss': 0.0183, 'learning_rate': 2.1940000000000003e-05, 'epoch': 7.36} 56%|█████▌ | 5620/10000 [22:03:33<16:53:43, 13.89s/it] 56%|█████▌ | 5621/10000 [22:03:47<16:50:20, 13.84s/it] {'loss': 0.0137, 'learning_rate': 2.1935e-05, 'epoch': 7.36} 56%|█████▌ | 5621/10000 [22:03:47<16:50:20, 13.84s/it] 56%|█████▌ | 5622/10000 [22:04:01<16:56:04, 13.93s/it] {'loss': 0.0149, 'learning_rate': 2.1930000000000002e-05, 'epoch': 7.36} 56%|█████▌ | 5622/10000 [22:04:01<16:56:04, 13.93s/it] 56%|█████▌ | 5623/10000 [22:04:15<16:55:43, 13.92s/it] {'loss': 0.0102, 'learning_rate': 2.1925e-05, 'epoch': 7.36} 56%|█████▌ | 5623/10000 [22:04:15<16:55:43, 13.92s/it] 56%|█████▌ | 5624/10000 [22:04:29<16:54:21, 13.91s/it] {'loss': 0.0125, 'learning_rate': 2.192e-05, 'epoch': 7.36} 56%|█████▌ | 5624/10000 [22:04:29<16:54:21, 13.91s/it] 56%|█████▋ | 5625/10000 [22:04:43<16:53:41, 13.90s/it] {'loss': 0.0125, 'learning_rate': 2.1915000000000003e-05, 'epoch': 7.36} 56%|█████▋ | 5625/10000 [22:04:43<16:53:41, 13.90s/it] 56%|█████▋ | 5626/10000 [22:04:56<16:51:03, 13.87s/it] {'loss': 0.0112, 'learning_rate': 2.191e-05, 'epoch': 7.36} 56%|█████▋ | 5626/10000 [22:04:57<16:51:03, 13.87s/it] 56%|█████▋ | 5627/10000 [22:05:10<16:49:44, 13.85s/it] {'loss': 0.0105, 'learning_rate': 2.1905e-05, 'epoch': 7.37} 56%|█████▋ | 5627/10000 [22:05:10<16:49:44, 13.85s/it] 56%|█████▋ | 5628/10000 [22:05:24<16:49:18, 13.85s/it] {'loss': 0.0131, 'learning_rate': 2.19e-05, 'epoch': 7.37} 56%|█████▋ | 5628/10000 [22:05:24<16:49:18, 13.85s/it] 56%|█████▋ | 5629/10000 [22:05:38<16:48:53, 13.85s/it] {'loss': 0.0127, 'learning_rate': 2.1895000000000003e-05, 'epoch': 7.37} 56%|█████▋ | 5629/10000 [22:05:38<16:48:53, 13.85s/it] 56%|█████▋ | 5630/10000 [22:05:52<16:48:38, 13.85s/it] {'loss': 0.0138, 'learning_rate': 2.1890000000000002e-05, 'epoch': 7.37} 56%|█████▋ | 5630/10000 [22:05:52<16:48:38, 13.85s/it] 56%|█████▋ | 5631/10000 [22:06:06<16:47:52, 13.84s/it] {'loss': 0.0139, 'learning_rate': 2.1885e-05, 'epoch': 7.37} 56%|█████▋ | 5631/10000 [22:06:06<16:47:52, 13.84s/it] 56%|█████▋ | 5632/10000 [22:06:20<16:48:03, 13.85s/it] {'loss': 0.0209, 'learning_rate': 2.188e-05, 'epoch': 7.37} 56%|█████▋ | 5632/10000 [22:06:20<16:48:03, 13.85s/it] 56%|█████▋ | 5633/10000 [22:06:33<16:47:08, 13.84s/it] {'loss': 0.0104, 'learning_rate': 2.1875e-05, 'epoch': 7.37} 56%|█████▋ | 5633/10000 [22:06:33<16:47:08, 13.84s/it] 56%|█████▋ | 5634/10000 [22:06:47<16:46:13, 13.83s/it] {'loss': 0.0108, 'learning_rate': 2.1870000000000002e-05, 'epoch': 7.37} 56%|█████▋ | 5634/10000 [22:06:47<16:46:13, 13.83s/it] 56%|█████▋ | 5635/10000 [22:07:01<16:46:50, 13.84s/it] {'loss': 0.0152, 'learning_rate': 2.1865e-05, 'epoch': 7.38} 56%|█████▋ | 5635/10000 [22:07:01<16:46:50, 13.84s/it] 56%|█████▋ | 5636/10000 [22:07:15<16:44:57, 13.82s/it] {'loss': 0.0121, 'learning_rate': 2.186e-05, 'epoch': 7.38} 56%|█████▋ | 5636/10000 [22:07:15<16:44:57, 13.82s/it] 56%|█████▋ | 5637/10000 [22:07:29<16:45:02, 13.82s/it] {'loss': 0.0093, 'learning_rate': 2.1855e-05, 'epoch': 7.38} 56%|█████▋ | 5637/10000 [22:07:29<16:45:02, 13.82s/it] 56%|█████▋ | 5638/10000 [22:07:42<16:45:34, 13.83s/it] {'loss': 0.0119, 'learning_rate': 2.1850000000000003e-05, 'epoch': 7.38} 56%|█████▋ | 5638/10000 [22:07:43<16:45:34, 13.83s/it] 56%|█████▋ | 5639/10000 [22:07:56<16:44:54, 13.83s/it] {'loss': 0.0141, 'learning_rate': 2.1845000000000002e-05, 'epoch': 7.38} 56%|█████▋ | 5639/10000 [22:07:56<16:44:54, 13.83s/it] 56%|█████▋ | 5640/10000 [22:08:10<16:42:53, 13.80s/it] {'loss': 0.0087, 'learning_rate': 2.184e-05, 'epoch': 7.38} 56%|█████▋ | 5640/10000 [22:08:10<16:42:53, 13.80s/it] 56%|█████▋ | 5641/10000 [22:08:24<16:45:04, 13.83s/it] {'loss': 0.0119, 'learning_rate': 2.1835e-05, 'epoch': 7.38} 56%|█████▋ | 5641/10000 [22:08:24<16:45:04, 13.83s/it] 56%|█████▋ | 5642/10000 [22:08:38<16:45:20, 13.84s/it] {'loss': 0.0121, 'learning_rate': 2.183e-05, 'epoch': 7.38} 56%|█████▋ | 5642/10000 [22:08:38<16:45:20, 13.84s/it] 56%|█████▋ | 5643/10000 [22:08:52<16:44:53, 13.84s/it] {'loss': 0.0115, 'learning_rate': 2.1825000000000002e-05, 'epoch': 7.39} 56%|█████▋ | 5643/10000 [22:08:52<16:44:53, 13.84s/it] 56%|█████▋ | 5644/10000 [22:09:06<16:46:41, 13.87s/it] {'loss': 0.0117, 'learning_rate': 2.182e-05, 'epoch': 7.39} 56%|█████▋ | 5644/10000 [22:09:06<16:46:41, 13.87s/it] 56%|█████▋ | 5645/10000 [22:09:20<16:48:19, 13.89s/it] {'loss': 0.0107, 'learning_rate': 2.1815000000000004e-05, 'epoch': 7.39} 56%|█████▋ | 5645/10000 [22:09:20<16:48:19, 13.89s/it] 56%|█████▋ | 5646/10000 [22:09:33<16:46:39, 13.87s/it] {'loss': 0.0111, 'learning_rate': 2.181e-05, 'epoch': 7.39} 56%|█████▋ | 5646/10000 [22:09:33<16:46:39, 13.87s/it] 56%|█████▋ | 5647/10000 [22:09:47<16:44:38, 13.85s/it] {'loss': 0.0149, 'learning_rate': 2.1805e-05, 'epoch': 7.39} 56%|█████▋ | 5647/10000 [22:09:47<16:44:38, 13.85s/it] 56%|█████▋ | 5648/10000 [22:10:01<16:45:16, 13.86s/it] {'loss': 0.0113, 'learning_rate': 2.18e-05, 'epoch': 7.39} 56%|█████▋ | 5648/10000 [22:10:01<16:45:16, 13.86s/it] 56%|█████▋ | 5649/10000 [22:10:15<16:41:42, 13.81s/it] {'loss': 0.0917, 'learning_rate': 2.1795e-05, 'epoch': 7.39} 56%|█████▋ | 5649/10000 [22:10:15<16:41:42, 13.81s/it] 56%|█████▋ | 5650/10000 [22:10:29<16:43:09, 13.84s/it] {'loss': 0.0144, 'learning_rate': 2.1790000000000003e-05, 'epoch': 7.4} 56%|█████▋ | 5650/10000 [22:10:29<16:43:09, 13.84s/it] 57%|█████▋ | 5651/10000 [22:10:42<16:42:27, 13.83s/it] {'loss': 0.0136, 'learning_rate': 2.1785e-05, 'epoch': 7.4} 57%|█████▋ | 5651/10000 [22:10:42<16:42:27, 13.83s/it] 57%|█████▋ | 5652/10000 [22:10:56<16:45:18, 13.87s/it] {'loss': 0.0102, 'learning_rate': 2.178e-05, 'epoch': 7.4} 57%|█████▋ | 5652/10000 [22:10:56<16:45:18, 13.87s/it] 57%|█████▋ | 5653/10000 [22:11:10<16:44:50, 13.87s/it] {'loss': 0.0113, 'learning_rate': 2.1775e-05, 'epoch': 7.4} 57%|█████▋ | 5653/10000 [22:11:10<16:44:50, 13.87s/it] 57%|█████▋ | 5654/10000 [22:11:24<16:44:26, 13.87s/it] {'loss': 0.012, 'learning_rate': 2.177e-05, 'epoch': 7.4} 57%|█████▋ | 5654/10000 [22:11:24<16:44:26, 13.87s/it] 57%|█████▋ | 5655/10000 [22:11:38<16:44:20, 13.87s/it] {'loss': 0.0123, 'learning_rate': 2.1765000000000003e-05, 'epoch': 7.4} 57%|█████▋ | 5655/10000 [22:11:38<16:44:20, 13.87s/it] 57%|█████▋ | 5656/10000 [22:11:52<16:43:24, 13.86s/it] {'loss': 0.0141, 'learning_rate': 2.176e-05, 'epoch': 7.4} 57%|█████▋ | 5656/10000 [22:11:52<16:43:24, 13.86s/it] 57%|█████▋ | 5657/10000 [22:12:06<16:43:20, 13.86s/it] {'loss': 0.0138, 'learning_rate': 2.1755e-05, 'epoch': 7.4} 57%|█████▋ | 5657/10000 [22:12:06<16:43:20, 13.86s/it] 57%|█████▋ | 5658/10000 [22:12:20<16:42:02, 13.85s/it] {'loss': 0.0103, 'learning_rate': 2.175e-05, 'epoch': 7.41} 57%|█████▋ | 5658/10000 [22:12:20<16:42:02, 13.85s/it] 57%|█████▋ | 5659/10000 [22:12:33<16:41:44, 13.85s/it] {'loss': 0.0134, 'learning_rate': 2.1745000000000003e-05, 'epoch': 7.41} 57%|█████▋ | 5659/10000 [22:12:33<16:41:44, 13.85s/it] 57%|█████▋ | 5660/10000 [22:12:47<16:37:56, 13.80s/it] {'loss': 0.0133, 'learning_rate': 2.1740000000000002e-05, 'epoch': 7.41} 57%|█████▋ | 5660/10000 [22:12:47<16:37:56, 13.80s/it] 57%|█████▋ | 5661/10000 [22:13:01<16:41:39, 13.85s/it] {'loss': 0.0115, 'learning_rate': 2.1735e-05, 'epoch': 7.41} 57%|█████▋ | 5661/10000 [22:13:01<16:41:39, 13.85s/it] 57%|█████▋ | 5662/10000 [22:13:15<16:43:12, 13.88s/it] {'loss': 0.0134, 'learning_rate': 2.173e-05, 'epoch': 7.41} 57%|█████▋ | 5662/10000 [22:13:15<16:43:12, 13.88s/it] 57%|█████▋ | 5663/10000 [22:13:29<16:40:30, 13.84s/it] {'loss': 0.0126, 'learning_rate': 2.1725e-05, 'epoch': 7.41} 57%|█████▋ | 5663/10000 [22:13:29<16:40:30, 13.84s/it] 57%|█████▋ | 5664/10000 [22:13:43<16:41:13, 13.85s/it] {'loss': 0.0153, 'learning_rate': 2.1720000000000002e-05, 'epoch': 7.41} 57%|█████▋ | 5664/10000 [22:13:43<16:41:13, 13.85s/it] 57%|█████▋ | 5665/10000 [22:13:56<16:40:35, 13.85s/it] {'loss': 0.0098, 'learning_rate': 2.1715e-05, 'epoch': 7.41} 57%|█████▋ | 5665/10000 [22:13:56<16:40:35, 13.85s/it] 57%|█████▋ | 5666/10000 [22:14:10<16:38:57, 13.83s/it] {'loss': 0.0105, 'learning_rate': 2.171e-05, 'epoch': 7.42} 57%|█████▋ | 5666/10000 [22:14:10<16:38:57, 13.83s/it] 57%|█████▋ | 5667/10000 [22:14:24<16:40:35, 13.86s/it] {'loss': 0.0123, 'learning_rate': 2.1705e-05, 'epoch': 7.42} 57%|█████▋ | 5667/10000 [22:14:24<16:40:35, 13.86s/it] 57%|█████▋ | 5668/10000 [22:14:38<16:36:59, 13.81s/it] {'loss': 0.0133, 'learning_rate': 2.1700000000000002e-05, 'epoch': 7.42} 57%|█████▋ | 5668/10000 [22:14:38<16:36:59, 13.81s/it] 57%|█████▋ | 5669/10000 [22:14:52<16:38:23, 13.83s/it] {'loss': 0.0175, 'learning_rate': 2.1695e-05, 'epoch': 7.42} 57%|█████▋ | 5669/10000 [22:14:52<16:38:23, 13.83s/it] 57%|█████▋ | 5670/10000 [22:15:06<16:38:32, 13.84s/it] {'loss': 0.0123, 'learning_rate': 2.169e-05, 'epoch': 7.42} 57%|█████▋ | 5670/10000 [22:15:06<16:38:32, 13.84s/it] 57%|█████▋ | 5671/10000 [22:15:19<16:39:24, 13.85s/it] {'loss': 0.0103, 'learning_rate': 2.1685e-05, 'epoch': 7.42} 57%|█████▋ | 5671/10000 [22:15:19<16:39:24, 13.85s/it] 57%|█████▋ | 5672/10000 [22:15:33<16:41:42, 13.89s/it] {'loss': 0.013, 'learning_rate': 2.168e-05, 'epoch': 7.42} 57%|█████▋ | 5672/10000 [22:15:33<16:41:42, 13.89s/it] 57%|█████▋ | 5673/10000 [22:15:47<16:38:39, 13.85s/it] {'loss': 0.0119, 'learning_rate': 2.1675e-05, 'epoch': 7.43} 57%|█████▋ | 5673/10000 [22:15:47<16:38:39, 13.85s/it] 57%|█████▋ | 5674/10000 [22:16:01<16:37:27, 13.83s/it] {'loss': 0.0122, 'learning_rate': 2.167e-05, 'epoch': 7.43} 57%|█████▋ | 5674/10000 [22:16:01<16:37:27, 13.83s/it] 57%|█████▋ | 5675/10000 [22:16:15<16:38:06, 13.85s/it] {'loss': 0.0127, 'learning_rate': 2.1665000000000003e-05, 'epoch': 7.43} 57%|█████▋ | 5675/10000 [22:16:15<16:38:06, 13.85s/it] 57%|█████▋ | 5676/10000 [22:16:29<16:37:08, 13.84s/it] {'loss': 0.0131, 'learning_rate': 2.166e-05, 'epoch': 7.43} 57%|█████▋ | 5676/10000 [22:16:29<16:37:08, 13.84s/it][2024-11-04 18:34:49,997] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 57%|█████▋ | 5677/10000 [22:16:42<16:20:33, 13.61s/it] {'loss': 0.0486, 'learning_rate': 2.166e-05, 'epoch': 7.43} 57%|█████▋ | 5677/10000 [22:16:42<16:20:33, 13.61s/it] 57%|█████▋ | 5678/10000 [22:16:56<16:23:59, 13.66s/it] {'loss': 0.0156, 'learning_rate': 2.1655000000000002e-05, 'epoch': 7.43} 57%|█████▋ | 5678/10000 [22:16:56<16:23:59, 13.66s/it] 57%|█████▋ | 5679/10000 [22:17:09<16:26:24, 13.70s/it] {'loss': 0.011, 'learning_rate': 2.165e-05, 'epoch': 7.43} 57%|█████▋ | 5679/10000 [22:17:09<16:26:24, 13.70s/it] 57%|█████▋ | 5680/10000 [22:17:23<16:30:26, 13.76s/it] {'loss': 0.0158, 'learning_rate': 2.1645e-05, 'epoch': 7.43} 57%|█████▋ | 5680/10000 [22:17:23<16:30:26, 13.76s/it] 57%|█████▋ | 5681/10000 [22:17:37<16:29:04, 13.74s/it] {'loss': 0.0138, 'learning_rate': 2.1640000000000003e-05, 'epoch': 7.44} 57%|█████▋ | 5681/10000 [22:17:37<16:29:04, 13.74s/it] 57%|█████▋ | 5682/10000 [22:17:51<16:31:21, 13.78s/it] {'loss': 0.0114, 'learning_rate': 2.1635e-05, 'epoch': 7.44} 57%|█████▋ | 5682/10000 [22:17:51<16:31:21, 13.78s/it] 57%|█████▋ | 5683/10000 [22:18:04<16:29:40, 13.76s/it] {'loss': 0.0117, 'learning_rate': 2.163e-05, 'epoch': 7.44} 57%|█████▋ | 5683/10000 [22:18:04<16:29:40, 13.76s/it] 57%|█████▋ | 5684/10000 [22:18:18<16:31:58, 13.79s/it] {'loss': 0.0131, 'learning_rate': 2.1625e-05, 'epoch': 7.44} 57%|█████▋ | 5684/10000 [22:18:18<16:31:58, 13.79s/it] 57%|█████▋ | 5685/10000 [22:18:32<16:33:56, 13.82s/it] {'loss': 0.0136, 'learning_rate': 2.162e-05, 'epoch': 7.44} 57%|█████▋ | 5685/10000 [22:18:32<16:33:56, 13.82s/it] 57%|█████▋ | 5686/10000 [22:18:46<16:32:26, 13.80s/it] {'loss': 0.014, 'learning_rate': 2.1615000000000002e-05, 'epoch': 7.44} 57%|█████▋ | 5686/10000 [22:18:46<16:32:26, 13.80s/it] 57%|█████▋ | 5687/10000 [22:19:00<16:32:25, 13.81s/it] {'loss': 0.0113, 'learning_rate': 2.1609999999999998e-05, 'epoch': 7.44} 57%|█████▋ | 5687/10000 [22:19:00<16:32:25, 13.81s/it] 57%|█████▋ | 5688/10000 [22:19:14<16:32:56, 13.82s/it] {'loss': 0.013, 'learning_rate': 2.1605e-05, 'epoch': 7.45} 57%|█████▋ | 5688/10000 [22:19:14<16:32:56, 13.82s/it] 57%|█████▋ | 5689/10000 [22:19:27<16:29:21, 13.77s/it] {'loss': 0.0134, 'learning_rate': 2.16e-05, 'epoch': 7.45} 57%|█████▋ | 5689/10000 [22:19:27<16:29:21, 13.77s/it] 57%|█████▋ | 5690/10000 [22:19:41<16:29:17, 13.77s/it] {'loss': 0.0143, 'learning_rate': 2.1595000000000002e-05, 'epoch': 7.45} 57%|█████▋ | 5690/10000 [22:19:41<16:29:17, 13.77s/it] 57%|█████▋ | 5691/10000 [22:19:55<16:34:58, 13.85s/it] {'loss': 0.0125, 'learning_rate': 2.159e-05, 'epoch': 7.45} 57%|█████▋ | 5691/10000 [22:19:55<16:34:58, 13.85s/it] 57%|█████▋ | 5692/10000 [22:20:09<16:35:12, 13.86s/it] {'loss': 0.0154, 'learning_rate': 2.1585e-05, 'epoch': 7.45} 57%|█████▋ | 5692/10000 [22:20:09<16:35:12, 13.86s/it] 57%|█████▋ | 5693/10000 [22:20:23<16:34:33, 13.86s/it] {'loss': 0.0127, 'learning_rate': 2.158e-05, 'epoch': 7.45} 57%|█████▋ | 5693/10000 [22:20:23<16:34:33, 13.86s/it] 57%|█████▋ | 5694/10000 [22:20:37<16:30:52, 13.81s/it] {'loss': 0.0143, 'learning_rate': 2.1575e-05, 'epoch': 7.45} 57%|█████▋ | 5694/10000 [22:20:37<16:30:52, 13.81s/it] 57%|█████▋ | 5695/10000 [22:20:50<16:29:50, 13.80s/it] {'loss': 0.0136, 'learning_rate': 2.1570000000000002e-05, 'epoch': 7.45} 57%|█████▋ | 5695/10000 [22:20:50<16:29:50, 13.80s/it] 57%|█████▋ | 5696/10000 [22:21:04<16:29:55, 13.80s/it] {'loss': 0.0122, 'learning_rate': 2.1565e-05, 'epoch': 7.46} 57%|█████▋ | 5696/10000 [22:21:04<16:29:55, 13.80s/it] 57%|█████▋ | 5697/10000 [22:21:18<16:31:30, 13.83s/it] {'loss': 0.0131, 'learning_rate': 2.1560000000000004e-05, 'epoch': 7.46} 57%|█████▋ | 5697/10000 [22:21:18<16:31:30, 13.83s/it] 57%|█████▋ | 5698/10000 [22:21:32<16:30:59, 13.82s/it] {'loss': 0.0115, 'learning_rate': 2.1555e-05, 'epoch': 7.46} 57%|█████▋ | 5698/10000 [22:21:32<16:30:59, 13.82s/it] 57%|█████▋ | 5699/10000 [22:21:46<16:31:20, 13.83s/it] {'loss': 0.0136, 'learning_rate': 2.1550000000000002e-05, 'epoch': 7.46} 57%|█████▋ | 5699/10000 [22:21:46<16:31:20, 13.83s/it] 57%|█████▋ | 5700/10000 [22:22:00<16:31:34, 13.84s/it] {'loss': 0.0101, 'learning_rate': 2.1545e-05, 'epoch': 7.46} 57%|█████▋ | 5700/10000 [22:22:00<16:31:34, 13.84s/it] 57%|█████▋ | 5701/10000 [22:22:14<16:34:54, 13.89s/it] {'loss': 0.0141, 'learning_rate': 2.154e-05, 'epoch': 7.46} 57%|█████▋ | 5701/10000 [22:22:14<16:34:54, 13.89s/it] 57%|█████▋ | 5702/10000 [22:22:27<16:32:27, 13.85s/it] {'loss': 0.0129, 'learning_rate': 2.1535000000000003e-05, 'epoch': 7.46} 57%|█████▋ | 5702/10000 [22:22:27<16:32:27, 13.85s/it] 57%|█████▋ | 5703/10000 [22:22:41<16:35:19, 13.90s/it] {'loss': 0.0125, 'learning_rate': 2.153e-05, 'epoch': 7.46} 57%|█████▋ | 5703/10000 [22:22:41<16:35:19, 13.90s/it] 57%|█████▋ | 5704/10000 [22:22:55<16:32:18, 13.86s/it] {'loss': 0.0128, 'learning_rate': 2.1525e-05, 'epoch': 7.47} 57%|█████▋ | 5704/10000 [22:22:55<16:32:18, 13.86s/it] 57%|█████▋ | 5705/10000 [22:23:09<16:30:03, 13.83s/it] {'loss': 0.0139, 'learning_rate': 2.152e-05, 'epoch': 7.47} 57%|█████▋ | 5705/10000 [22:23:09<16:30:03, 13.83s/it] 57%|█████▋ | 5706/10000 [22:23:23<16:30:07, 13.84s/it] {'loss': 0.0166, 'learning_rate': 2.1515000000000003e-05, 'epoch': 7.47} 57%|█████▋ | 5706/10000 [22:23:23<16:30:07, 13.84s/it] 57%|█████▋ | 5707/10000 [22:23:37<16:31:42, 13.86s/it] {'loss': 0.0139, 'learning_rate': 2.1510000000000002e-05, 'epoch': 7.47} 57%|█████▋ | 5707/10000 [22:23:37<16:31:42, 13.86s/it] 57%|█████▋ | 5708/10000 [22:23:50<16:32:10, 13.87s/it] {'loss': 0.0126, 'learning_rate': 2.1505e-05, 'epoch': 7.47} 57%|█████▋ | 5708/10000 [22:23:51<16:32:10, 13.87s/it] 57%|█████▋ | 5709/10000 [22:24:04<16:31:36, 13.87s/it] {'loss': 0.0123, 'learning_rate': 2.15e-05, 'epoch': 7.47} 57%|█████▋ | 5709/10000 [22:24:04<16:31:36, 13.87s/it] 57%|█████▋ | 5710/10000 [22:24:18<16:31:15, 13.86s/it] {'loss': 0.0136, 'learning_rate': 2.1495e-05, 'epoch': 7.47} 57%|█████▋ | 5710/10000 [22:24:18<16:31:15, 13.86s/it] 57%|█████▋ | 5711/10000 [22:24:32<16:29:03, 13.84s/it] {'loss': 0.0137, 'learning_rate': 2.1490000000000003e-05, 'epoch': 7.48} 57%|█████▋ | 5711/10000 [22:24:32<16:29:03, 13.84s/it] 57%|█████▋ | 5712/10000 [22:24:46<16:26:46, 13.81s/it] {'loss': 0.0102, 'learning_rate': 2.1485000000000002e-05, 'epoch': 7.48} 57%|█████▋ | 5712/10000 [22:24:46<16:26:46, 13.81s/it] 57%|█████▋ | 5713/10000 [22:25:00<16:26:17, 13.80s/it] {'loss': 0.014, 'learning_rate': 2.148e-05, 'epoch': 7.48} 57%|█████▋ | 5713/10000 [22:25:00<16:26:17, 13.80s/it] 57%|█████▋ | 5714/10000 [22:25:13<16:28:27, 13.84s/it] {'loss': 0.015, 'learning_rate': 2.1475e-05, 'epoch': 7.48} 57%|█████▋ | 5714/10000 [22:25:13<16:28:27, 13.84s/it] 57%|█████▋ | 5715/10000 [22:25:27<16:26:21, 13.81s/it] {'loss': 0.0123, 'learning_rate': 2.1470000000000003e-05, 'epoch': 7.48} 57%|█████▋ | 5715/10000 [22:25:27<16:26:21, 13.81s/it] 57%|█████▋ | 5716/10000 [22:25:41<16:28:06, 13.84s/it] {'loss': 0.0114, 'learning_rate': 2.1465000000000002e-05, 'epoch': 7.48} 57%|█████▋ | 5716/10000 [22:25:41<16:28:06, 13.84s/it] 57%|█████▋ | 5717/10000 [22:25:55<16:28:58, 13.85s/it] {'loss': 0.013, 'learning_rate': 2.146e-05, 'epoch': 7.48} 57%|█████▋ | 5717/10000 [22:25:55<16:28:58, 13.85s/it] 57%|█████▋ | 5718/10000 [22:26:09<16:26:48, 13.83s/it] {'loss': 0.0129, 'learning_rate': 2.1455e-05, 'epoch': 7.48} 57%|█████▋ | 5718/10000 [22:26:09<16:26:48, 13.83s/it] 57%|█████▋ | 5719/10000 [22:26:23<16:25:28, 13.81s/it] {'loss': 0.0113, 'learning_rate': 2.145e-05, 'epoch': 7.49} 57%|█████▋ | 5719/10000 [22:26:23<16:25:28, 13.81s/it] 57%|█████▋ | 5720/10000 [22:26:36<16:24:42, 13.80s/it] {'loss': 0.0162, 'learning_rate': 2.1445000000000002e-05, 'epoch': 7.49} 57%|█████▋ | 5720/10000 [22:26:36<16:24:42, 13.80s/it] 57%|█████▋ | 5721/10000 [22:26:50<16:24:55, 13.81s/it] {'loss': 0.0154, 'learning_rate': 2.144e-05, 'epoch': 7.49} 57%|█████▋ | 5721/10000 [22:26:50<16:24:55, 13.81s/it] 57%|█████▋ | 5722/10000 [22:27:04<16:23:42, 13.80s/it] {'loss': 0.0112, 'learning_rate': 2.1435000000000004e-05, 'epoch': 7.49} 57%|█████▋ | 5722/10000 [22:27:04<16:23:42, 13.80s/it] 57%|█████▋ | 5723/10000 [22:27:18<16:22:15, 13.78s/it] {'loss': 0.0122, 'learning_rate': 2.143e-05, 'epoch': 7.49} 57%|█████▋ | 5723/10000 [22:27:18<16:22:15, 13.78s/it] 57%|█████▋ | 5724/10000 [22:27:31<16:21:52, 13.78s/it] {'loss': 0.0139, 'learning_rate': 2.1425e-05, 'epoch': 7.49} 57%|█████▋ | 5724/10000 [22:27:31<16:21:52, 13.78s/it] 57%|█████▋ | 5725/10000 [22:27:45<16:22:22, 13.79s/it] {'loss': 0.0134, 'learning_rate': 2.142e-05, 'epoch': 7.49} 57%|█████▋ | 5725/10000 [22:27:45<16:22:22, 13.79s/it] 57%|█████▋ | 5726/10000 [22:27:59<16:23:36, 13.81s/it] {'loss': 0.0134, 'learning_rate': 2.1415e-05, 'epoch': 7.49} 57%|█████▋ | 5726/10000 [22:27:59<16:23:36, 13.81s/it] 57%|█████▋ | 5727/10000 [22:28:13<16:23:00, 13.80s/it] {'loss': 0.0119, 'learning_rate': 2.1410000000000003e-05, 'epoch': 7.5} 57%|█████▋ | 5727/10000 [22:28:13<16:23:00, 13.80s/it] 57%|█████▋ | 5728/10000 [22:28:27<16:24:52, 13.83s/it] {'loss': 0.0103, 'learning_rate': 2.1405e-05, 'epoch': 7.5} 57%|█████▋ | 5728/10000 [22:28:27<16:24:52, 13.83s/it] 57%|█████▋ | 5729/10000 [22:28:41<16:24:27, 13.83s/it] {'loss': 0.0111, 'learning_rate': 2.1400000000000002e-05, 'epoch': 7.5} 57%|█████▋ | 5729/10000 [22:28:41<16:24:27, 13.83s/it] 57%|█████▋ | 5730/10000 [22:28:54<16:22:55, 13.81s/it] {'loss': 0.0116, 'learning_rate': 2.1395e-05, 'epoch': 7.5} 57%|█████▋ | 5730/10000 [22:28:54<16:22:55, 13.81s/it] 57%|█████▋ | 5731/10000 [22:29:08<16:23:27, 13.82s/it] {'loss': 0.0135, 'learning_rate': 2.139e-05, 'epoch': 7.5} 57%|█████▋ | 5731/10000 [22:29:08<16:23:27, 13.82s/it] 57%|█████▋ | 5732/10000 [22:29:22<16:23:08, 13.82s/it] {'loss': 0.0108, 'learning_rate': 2.1385000000000003e-05, 'epoch': 7.5} 57%|█████▋ | 5732/10000 [22:29:22<16:23:08, 13.82s/it] 57%|█████▋ | 5733/10000 [22:29:36<16:24:22, 13.84s/it] {'loss': 0.0147, 'learning_rate': 2.138e-05, 'epoch': 7.5} 57%|█████▋ | 5733/10000 [22:29:36<16:24:22, 13.84s/it] 57%|█████▋ | 5734/10000 [22:29:50<16:21:52, 13.81s/it] {'loss': 0.0142, 'learning_rate': 2.1375e-05, 'epoch': 7.51} 57%|█████▋ | 5734/10000 [22:29:50<16:21:52, 13.81s/it] 57%|█████▋ | 5735/10000 [22:30:03<16:21:48, 13.81s/it] {'loss': 0.0108, 'learning_rate': 2.137e-05, 'epoch': 7.51} 57%|█████▋ | 5735/10000 [22:30:04<16:21:48, 13.81s/it] 57%|█████▋ | 5736/10000 [22:30:17<16:22:34, 13.83s/it] {'loss': 0.0124, 'learning_rate': 2.1365000000000003e-05, 'epoch': 7.51} 57%|█████▋ | 5736/10000 [22:30:17<16:22:34, 13.83s/it] 57%|█████▋ | 5737/10000 [22:30:31<16:24:47, 13.86s/it] {'loss': 0.0109, 'learning_rate': 2.1360000000000002e-05, 'epoch': 7.51} 57%|█████▋ | 5737/10000 [22:30:31<16:24:47, 13.86s/it] 57%|█████▋ | 5738/10000 [22:30:45<16:25:48, 13.88s/it] {'loss': 0.0129, 'learning_rate': 2.1355e-05, 'epoch': 7.51} 57%|█████▋ | 5738/10000 [22:30:45<16:25:48, 13.88s/it] 57%|█████▋ | 5739/10000 [22:30:59<16:25:11, 13.87s/it] {'loss': 0.0132, 'learning_rate': 2.135e-05, 'epoch': 7.51} 57%|█████▋ | 5739/10000 [22:30:59<16:25:11, 13.87s/it] 57%|█████▋ | 5740/10000 [22:31:13<16:25:06, 13.87s/it] {'loss': 0.0108, 'learning_rate': 2.1345e-05, 'epoch': 7.51} 57%|█████▋ | 5740/10000 [22:31:13<16:25:06, 13.87s/it] 57%|█████▋ | 5741/10000 [22:31:27<16:23:23, 13.85s/it] {'loss': 0.0115, 'learning_rate': 2.1340000000000002e-05, 'epoch': 7.51} 57%|█████▋ | 5741/10000 [22:31:27<16:23:23, 13.85s/it] 57%|█████▋ | 5742/10000 [22:31:40<16:21:19, 13.83s/it] {'loss': 0.0117, 'learning_rate': 2.1335e-05, 'epoch': 7.52} 57%|█████▋ | 5742/10000 [22:31:41<16:21:19, 13.83s/it] 57%|█████▋ | 5743/10000 [22:31:54<16:20:50, 13.82s/it] {'loss': 0.0127, 'learning_rate': 2.133e-05, 'epoch': 7.52} 57%|█████▋ | 5743/10000 [22:31:54<16:20:50, 13.82s/it] 57%|█████▋ | 5744/10000 [22:32:08<16:21:17, 13.83s/it] {'loss': 0.0133, 'learning_rate': 2.1325e-05, 'epoch': 7.52} 57%|█████▋ | 5744/10000 [22:32:08<16:21:17, 13.83s/it] 57%|█████▋ | 5745/10000 [22:32:22<16:22:21, 13.85s/it] {'loss': 0.0167, 'learning_rate': 2.1320000000000003e-05, 'epoch': 7.52} 57%|█████▋ | 5745/10000 [22:32:22<16:22:21, 13.85s/it] 57%|█████▋ | 5746/10000 [22:32:36<16:20:30, 13.83s/it] {'loss': 0.0126, 'learning_rate': 2.1315000000000002e-05, 'epoch': 7.52} 57%|█████▋ | 5746/10000 [22:32:36<16:20:30, 13.83s/it] 57%|█████▋ | 5747/10000 [22:32:50<16:21:47, 13.85s/it] {'loss': 0.0106, 'learning_rate': 2.131e-05, 'epoch': 7.52} 57%|█████▋ | 5747/10000 [22:32:50<16:21:47, 13.85s/it] 57%|█████▋ | 5748/10000 [22:33:04<16:23:04, 13.87s/it] {'loss': 0.0138, 'learning_rate': 2.1305e-05, 'epoch': 7.52} 57%|█████▋ | 5748/10000 [22:33:04<16:23:04, 13.87s/it] 57%|█████▋ | 5749/10000 [22:33:18<16:23:00, 13.87s/it] {'loss': 0.0135, 'learning_rate': 2.13e-05, 'epoch': 7.52} 57%|█████▋ | 5749/10000 [22:33:18<16:23:00, 13.87s/it] 57%|█████▊ | 5750/10000 [22:33:31<16:20:22, 13.84s/it] {'loss': 0.0097, 'learning_rate': 2.1295000000000002e-05, 'epoch': 7.53} 57%|█████▊ | 5750/10000 [22:33:31<16:20:22, 13.84s/it] 58%|█████▊ | 5751/10000 [22:33:45<16:22:21, 13.87s/it] {'loss': 0.0149, 'learning_rate': 2.129e-05, 'epoch': 7.53} 58%|█████▊ | 5751/10000 [22:33:45<16:22:21, 13.87s/it] 58%|█████▊ | 5752/10000 [22:33:59<16:19:34, 13.84s/it] {'loss': 0.0117, 'learning_rate': 2.1285000000000004e-05, 'epoch': 7.53} 58%|█████▊ | 5752/10000 [22:33:59<16:19:34, 13.84s/it] 58%|█████▊ | 5753/10000 [22:34:13<16:16:22, 13.79s/it] {'loss': 0.0108, 'learning_rate': 2.128e-05, 'epoch': 7.53} 58%|█████▊ | 5753/10000 [22:34:13<16:16:22, 13.79s/it] 58%|█████▊ | 5754/10000 [22:34:27<16:20:49, 13.86s/it] {'loss': 0.0141, 'learning_rate': 2.1275000000000002e-05, 'epoch': 7.53} 58%|█████▊ | 5754/10000 [22:34:27<16:20:49, 13.86s/it] 58%|█████▊ | 5755/10000 [22:34:41<16:19:38, 13.85s/it] {'loss': 0.014, 'learning_rate': 2.127e-05, 'epoch': 7.53} 58%|█████▊ | 5755/10000 [22:34:41<16:19:38, 13.85s/it] 58%|█████▊ | 5756/10000 [22:34:54<16:19:39, 13.85s/it] {'loss': 0.0127, 'learning_rate': 2.1265e-05, 'epoch': 7.53} 58%|█████▊ | 5756/10000 [22:34:54<16:19:39, 13.85s/it] 58%|█████▊ | 5757/10000 [22:35:08<16:18:07, 13.83s/it] {'loss': 0.0105, 'learning_rate': 2.1260000000000003e-05, 'epoch': 7.54} 58%|█████▊ | 5757/10000 [22:35:08<16:18:07, 13.83s/it] 58%|█████▊ | 5758/10000 [22:35:22<16:16:27, 13.81s/it] {'loss': 0.0163, 'learning_rate': 2.1255e-05, 'epoch': 7.54} 58%|█████▊ | 5758/10000 [22:35:22<16:16:27, 13.81s/it] 58%|█████▊ | 5759/10000 [22:35:36<16:14:52, 13.79s/it] {'loss': 0.0128, 'learning_rate': 2.125e-05, 'epoch': 7.54} 58%|█████▊ | 5759/10000 [22:35:36<16:14:52, 13.79s/it] 58%|█████▊ | 5760/10000 [22:35:50<16:17:29, 13.83s/it] {'loss': 0.0122, 'learning_rate': 2.1245e-05, 'epoch': 7.54} 58%|█████▊ | 5760/10000 [22:35:50<16:17:29, 13.83s/it] 58%|█████▊ | 5761/10000 [22:36:03<16:16:36, 13.82s/it] {'loss': 0.0121, 'learning_rate': 2.124e-05, 'epoch': 7.54} 58%|█████▊ | 5761/10000 [22:36:03<16:16:36, 13.82s/it] 58%|█████▊ | 5762/10000 [22:36:17<16:21:24, 13.89s/it] {'loss': 0.0134, 'learning_rate': 2.1235000000000003e-05, 'epoch': 7.54} 58%|█████▊ | 5762/10000 [22:36:18<16:21:24, 13.89s/it] 58%|█████▊ | 5763/10000 [22:36:31<16:22:17, 13.91s/it] {'loss': 0.0125, 'learning_rate': 2.123e-05, 'epoch': 7.54} 58%|█████▊ | 5763/10000 [22:36:31<16:22:17, 13.91s/it] 58%|█████▊ | 5764/10000 [22:36:45<16:21:30, 13.90s/it] {'loss': 0.0131, 'learning_rate': 2.1225e-05, 'epoch': 7.54} 58%|█████▊ | 5764/10000 [22:36:45<16:21:30, 13.90s/it] 58%|█████▊ | 5765/10000 [22:36:59<16:17:26, 13.85s/it] {'loss': 0.014, 'learning_rate': 2.122e-05, 'epoch': 7.55} 58%|█████▊ | 5765/10000 [22:36:59<16:17:26, 13.85s/it] 58%|█████▊ | 5766/10000 [22:37:13<16:15:43, 13.83s/it] {'loss': 0.0149, 'learning_rate': 2.1215000000000003e-05, 'epoch': 7.55} 58%|█████▊ | 5766/10000 [22:37:13<16:15:43, 13.83s/it] 58%|█████▊ | 5767/10000 [22:37:27<16:14:41, 13.82s/it] {'loss': 0.0161, 'learning_rate': 2.1210000000000002e-05, 'epoch': 7.55} 58%|█████▊ | 5767/10000 [22:37:27<16:14:41, 13.82s/it] 58%|█████▊ | 5768/10000 [22:37:40<16:14:01, 13.81s/it] {'loss': 0.0147, 'learning_rate': 2.1205e-05, 'epoch': 7.55} 58%|█████▊ | 5768/10000 [22:37:40<16:14:01, 13.81s/it] 58%|█████▊ | 5769/10000 [22:37:54<16:13:32, 13.81s/it] {'loss': 0.0153, 'learning_rate': 2.12e-05, 'epoch': 7.55} 58%|█████▊ | 5769/10000 [22:37:54<16:13:32, 13.81s/it] 58%|█████▊ | 5770/10000 [22:38:08<16:14:51, 13.83s/it] {'loss': 0.0144, 'learning_rate': 2.1195e-05, 'epoch': 7.55} 58%|█████▊ | 5770/10000 [22:38:08<16:14:51, 13.83s/it] 58%|█████▊ | 5771/10000 [22:38:22<16:14:33, 13.83s/it] {'loss': 0.0138, 'learning_rate': 2.1190000000000002e-05, 'epoch': 7.55} 58%|█████▊ | 5771/10000 [22:38:22<16:14:33, 13.83s/it] 58%|█████▊ | 5772/10000 [22:38:36<16:14:24, 13.83s/it] {'loss': 0.0167, 'learning_rate': 2.1185e-05, 'epoch': 7.55} 58%|█████▊ | 5772/10000 [22:38:36<16:14:24, 13.83s/it] 58%|█████▊ | 5773/10000 [22:38:50<16:13:27, 13.82s/it] {'loss': 0.0135, 'learning_rate': 2.118e-05, 'epoch': 7.56} 58%|█████▊ | 5773/10000 [22:38:50<16:13:27, 13.82s/it] 58%|█████▊ | 5774/10000 [22:39:03<16:12:07, 13.80s/it] {'loss': 0.0102, 'learning_rate': 2.1175e-05, 'epoch': 7.56} 58%|█████▊ | 5774/10000 [22:39:03<16:12:07, 13.80s/it] 58%|█████▊ | 5775/10000 [22:39:17<16:13:14, 13.82s/it] {'loss': 0.0143, 'learning_rate': 2.1170000000000002e-05, 'epoch': 7.56} 58%|█████▊ | 5775/10000 [22:39:17<16:13:14, 13.82s/it] 58%|█████▊ | 5776/10000 [22:39:31<16:16:50, 13.88s/it] {'loss': 0.0155, 'learning_rate': 2.1165e-05, 'epoch': 7.56} 58%|█████▊ | 5776/10000 [22:39:31<16:16:50, 13.88s/it] 58%|█████▊ | 5777/10000 [22:39:45<16:16:39, 13.88s/it] {'loss': 0.0154, 'learning_rate': 2.116e-05, 'epoch': 7.56} 58%|█████▊ | 5777/10000 [22:39:45<16:16:39, 13.88s/it] 58%|█████▊ | 5778/10000 [22:39:59<16:14:57, 13.86s/it] {'loss': 0.0136, 'learning_rate': 2.1155e-05, 'epoch': 7.56} 58%|█████▊ | 5778/10000 [22:39:59<16:14:57, 13.86s/it] 58%|█████▊ | 5779/10000 [22:40:13<16:14:15, 13.85s/it] {'loss': 0.0148, 'learning_rate': 2.115e-05, 'epoch': 7.56} 58%|█████▊ | 5779/10000 [22:40:13<16:14:15, 13.85s/it] 58%|█████▊ | 5780/10000 [22:40:26<16:13:15, 13.84s/it] {'loss': 0.0112, 'learning_rate': 2.1145e-05, 'epoch': 7.57} 58%|█████▊ | 5780/10000 [22:40:27<16:13:15, 13.84s/it] 58%|█████▊ | 5781/10000 [22:40:40<16:10:52, 13.81s/it] {'loss': 0.01, 'learning_rate': 2.114e-05, 'epoch': 7.57} 58%|█████▊ | 5781/10000 [22:40:40<16:10:52, 13.81s/it] 58%|█████▊ | 5782/10000 [22:40:54<16:08:30, 13.78s/it] {'loss': 0.0115, 'learning_rate': 2.1135000000000003e-05, 'epoch': 7.57} 58%|█████▊ | 5782/10000 [22:40:54<16:08:30, 13.78s/it] 58%|█████▊ | 5783/10000 [22:41:08<16:10:25, 13.81s/it] {'loss': 0.0109, 'learning_rate': 2.113e-05, 'epoch': 7.57} 58%|█████▊ | 5783/10000 [22:41:08<16:10:25, 13.81s/it] 58%|█████▊ | 5784/10000 [22:41:22<16:09:40, 13.80s/it] {'loss': 0.0259, 'learning_rate': 2.1125000000000002e-05, 'epoch': 7.57} 58%|█████▊ | 5784/10000 [22:41:22<16:09:40, 13.80s/it] 58%|█████▊ | 5785/10000 [22:41:35<16:10:21, 13.81s/it] {'loss': 0.0126, 'learning_rate': 2.112e-05, 'epoch': 7.57} 58%|█████▊ | 5785/10000 [22:41:35<16:10:21, 13.81s/it] 58%|█████▊ | 5786/10000 [22:41:49<16:11:09, 13.83s/it] {'loss': 0.0126, 'learning_rate': 2.1115e-05, 'epoch': 7.57} 58%|█████▊ | 5786/10000 [22:41:49<16:11:09, 13.83s/it] 58%|█████▊ | 5787/10000 [22:42:03<16:10:30, 13.82s/it] {'loss': 0.0124, 'learning_rate': 2.1110000000000003e-05, 'epoch': 7.57} 58%|█████▊ | 5787/10000 [22:42:03<16:10:30, 13.82s/it] 58%|█████▊ | 5788/10000 [22:42:17<16:09:34, 13.81s/it] {'loss': 0.0153, 'learning_rate': 2.1105e-05, 'epoch': 7.58} 58%|█████▊ | 5788/10000 [22:42:17<16:09:34, 13.81s/it] 58%|█████▊ | 5789/10000 [22:42:31<16:10:32, 13.83s/it] {'loss': 0.0182, 'learning_rate': 2.11e-05, 'epoch': 7.58} 58%|█████▊ | 5789/10000 [22:42:31<16:10:32, 13.83s/it] 58%|█████▊ | 5790/10000 [22:42:45<16:09:41, 13.82s/it] {'loss': 0.0115, 'learning_rate': 2.1095e-05, 'epoch': 7.58} 58%|█████▊ | 5790/10000 [22:42:45<16:09:41, 13.82s/it] 58%|█████▊ | 5791/10000 [22:42:58<16:08:22, 13.80s/it] {'loss': 0.0137, 'learning_rate': 2.1090000000000003e-05, 'epoch': 7.58} 58%|█████▊ | 5791/10000 [22:42:58<16:08:22, 13.80s/it] 58%|█████▊ | 5792/10000 [22:43:12<16:08:29, 13.81s/it] {'loss': 0.0119, 'learning_rate': 2.1085000000000002e-05, 'epoch': 7.58} 58%|█████▊ | 5792/10000 [22:43:12<16:08:29, 13.81s/it] 58%|█████▊ | 5793/10000 [22:43:26<16:07:17, 13.80s/it] {'loss': 0.0124, 'learning_rate': 2.1079999999999998e-05, 'epoch': 7.58} 58%|█████▊ | 5793/10000 [22:43:26<16:07:17, 13.80s/it] 58%|█████▊ | 5794/10000 [22:43:40<16:09:40, 13.83s/it] {'loss': 0.0143, 'learning_rate': 2.1075e-05, 'epoch': 7.58} 58%|█████▊ | 5794/10000 [22:43:40<16:09:40, 13.83s/it] 58%|█████▊ | 5795/10000 [22:43:54<16:09:13, 13.83s/it] {'loss': 0.0172, 'learning_rate': 2.107e-05, 'epoch': 7.59} 58%|█████▊ | 5795/10000 [22:43:54<16:09:13, 13.83s/it] 58%|█████▊ | 5796/10000 [22:44:07<16:08:49, 13.83s/it] {'loss': 0.0123, 'learning_rate': 2.1065000000000002e-05, 'epoch': 7.59} 58%|█████▊ | 5796/10000 [22:44:07<16:08:49, 13.83s/it] 58%|█████▊ | 5797/10000 [22:44:21<16:12:08, 13.88s/it] {'loss': 0.0136, 'learning_rate': 2.106e-05, 'epoch': 7.59} 58%|█████▊ | 5797/10000 [22:44:21<16:12:08, 13.88s/it] 58%|█████▊ | 5798/10000 [22:44:35<16:11:23, 13.87s/it] {'loss': 0.0109, 'learning_rate': 2.1055e-05, 'epoch': 7.59} 58%|█████▊ | 5798/10000 [22:44:35<16:11:23, 13.87s/it] 58%|█████▊ | 5799/10000 [22:44:49<16:10:20, 13.86s/it] {'loss': 0.0137, 'learning_rate': 2.105e-05, 'epoch': 7.59} 58%|█████▊ | 5799/10000 [22:44:49<16:10:20, 13.86s/it] 58%|█████▊ | 5800/10000 [22:45:03<16:10:23, 13.86s/it] {'loss': 0.0184, 'learning_rate': 2.1045e-05, 'epoch': 7.59} 58%|█████▊ | 5800/10000 [22:45:03<16:10:23, 13.86s/it] 58%|█████▊ | 5801/10000 [22:45:17<16:09:05, 13.85s/it] {'loss': 0.0117, 'learning_rate': 2.1040000000000002e-05, 'epoch': 7.59} 58%|█████▊ | 5801/10000 [22:45:17<16:09:05, 13.85s/it] 58%|█████▊ | 5802/10000 [22:45:31<16:09:22, 13.85s/it] {'loss': 0.0151, 'learning_rate': 2.1035e-05, 'epoch': 7.59} 58%|█████▊ | 5802/10000 [22:45:31<16:09:22, 13.85s/it] 58%|█████▊ | 5803/10000 [22:45:45<16:09:23, 13.86s/it] {'loss': 0.0146, 'learning_rate': 2.103e-05, 'epoch': 7.6} 58%|█████▊ | 5803/10000 [22:45:45<16:09:23, 13.86s/it] 58%|█████▊ | 5804/10000 [22:45:59<16:10:55, 13.88s/it] {'loss': 0.0129, 'learning_rate': 2.1025e-05, 'epoch': 7.6} 58%|█████▊ | 5804/10000 [22:45:59<16:10:55, 13.88s/it] 58%|█████▊ | 5805/10000 [22:46:13<16:13:06, 13.92s/it] {'loss': 0.0129, 'learning_rate': 2.1020000000000002e-05, 'epoch': 7.6} 58%|█████▊ | 5805/10000 [22:46:13<16:13:06, 13.92s/it] 58%|█████▊ | 5806/10000 [22:47:09<30:58:14, 26.58s/it] {'loss': 0.0131, 'learning_rate': 2.1015e-05, 'epoch': 7.6} 58%|█████▊ | 5806/10000 [22:47:09<30:58:14, 26.58s/it] 58%|█████▊ | 5807/10000 [22:47:23<26:32:12, 22.78s/it] {'loss': 0.0144, 'learning_rate': 2.101e-05, 'epoch': 7.6} 58%|█████▊ | 5807/10000 [22:47:23<26:32:12, 22.78s/it] 58%|█████▊ | 5808/10000 [22:47:36<23:23:57, 20.09s/it] {'loss': 0.0151, 'learning_rate': 2.1005e-05, 'epoch': 7.6} 58%|█████▊ | 5808/10000 [22:47:36<23:23:57, 20.09s/it] 58%|█████▊ | 5809/10000 [22:47:50<21:12:07, 18.21s/it] {'loss': 0.0161, 'learning_rate': 2.1e-05, 'epoch': 7.6} 58%|█████▊ | 5809/10000 [22:47:50<21:12:07, 18.21s/it] 58%|█████▊ | 5810/10000 [22:48:04<19:42:24, 16.93s/it] {'loss': 0.0141, 'learning_rate': 2.0995e-05, 'epoch': 7.6} 58%|█████▊ | 5810/10000 [22:48:04<19:42:24, 16.93s/it] 58%|█████▊ | 5811/10000 [22:48:18<18:38:47, 16.02s/it] {'loss': 0.015, 'learning_rate': 2.099e-05, 'epoch': 7.61} 58%|█████▊ | 5811/10000 [22:48:18<18:38:47, 16.02s/it] 58%|█████▊ | 5812/10000 [22:48:32<17:54:21, 15.39s/it] {'loss': 0.0172, 'learning_rate': 2.0985000000000003e-05, 'epoch': 7.61} 58%|█████▊ | 5812/10000 [22:48:32<17:54:21, 15.39s/it] 58%|█████▊ | 5813/10000 [22:48:46<17:21:27, 14.92s/it] {'loss': 0.017, 'learning_rate': 2.098e-05, 'epoch': 7.61} 58%|█████▊ | 5813/10000 [22:48:46<17:21:27, 14.92s/it] 58%|█████▊ | 5814/10000 [22:49:00<16:58:42, 14.60s/it] {'loss': 0.0119, 'learning_rate': 2.0975e-05, 'epoch': 7.61} 58%|█████▊ | 5814/10000 [22:49:00<16:58:42, 14.60s/it] 58%|█████▊ | 5815/10000 [22:49:14<16:43:55, 14.39s/it] {'loss': 0.0136, 'learning_rate': 2.097e-05, 'epoch': 7.61} 58%|█████▊ | 5815/10000 [22:49:14<16:43:55, 14.39s/it] 58%|█████▊ | 5816/10000 [22:49:27<16:32:56, 14.24s/it] {'loss': 0.0139, 'learning_rate': 2.0965e-05, 'epoch': 7.61} 58%|█████▊ | 5816/10000 [22:49:27<16:32:56, 14.24s/it] 58%|█████▊ | 5817/10000 [22:49:41<16:25:07, 14.13s/it] {'loss': 0.0158, 'learning_rate': 2.0960000000000003e-05, 'epoch': 7.61} 58%|█████▊ | 5817/10000 [22:49:41<16:25:07, 14.13s/it] 58%|█████▊ | 5818/10000 [22:49:55<16:18:08, 14.03s/it] {'loss': 0.0155, 'learning_rate': 2.0955e-05, 'epoch': 7.62} 58%|█████▊ | 5818/10000 [22:49:55<16:18:08, 14.03s/it] 58%|█████▊ | 5819/10000 [22:50:09<16:13:41, 13.97s/it] {'loss': 0.0127, 'learning_rate': 2.095e-05, 'epoch': 7.62} 58%|█████▊ | 5819/10000 [22:50:09<16:13:41, 13.97s/it] 58%|█████▊ | 5820/10000 [22:50:23<16:09:42, 13.92s/it] {'loss': 0.0136, 'learning_rate': 2.0945e-05, 'epoch': 7.62} 58%|█████▊ | 5820/10000 [22:50:23<16:09:42, 13.92s/it] 58%|█████▊ | 5821/10000 [22:50:37<16:06:47, 13.88s/it] {'loss': 0.014, 'learning_rate': 2.0940000000000003e-05, 'epoch': 7.62} 58%|█████▊ | 5821/10000 [22:50:37<16:06:47, 13.88s/it] 58%|█████▊ | 5822/10000 [22:50:51<16:08:20, 13.91s/it] {'loss': 0.0169, 'learning_rate': 2.0935000000000002e-05, 'epoch': 7.62} 58%|█████▊ | 5822/10000 [22:50:51<16:08:20, 13.91s/it] 58%|█████▊ | 5823/10000 [22:51:04<16:04:35, 13.86s/it] {'loss': 0.014, 'learning_rate': 2.093e-05, 'epoch': 7.62} 58%|█████▊ | 5823/10000 [22:51:04<16:04:35, 13.86s/it] 58%|█████▊ | 5824/10000 [22:51:18<16:04:56, 13.86s/it] {'loss': 0.0133, 'learning_rate': 2.0925e-05, 'epoch': 7.62} 58%|█████▊ | 5824/10000 [22:51:18<16:04:56, 13.86s/it] 58%|█████▊ | 5825/10000 [22:51:32<16:06:42, 13.89s/it] {'loss': 0.0125, 'learning_rate': 2.092e-05, 'epoch': 7.62} 58%|█████▊ | 5825/10000 [22:51:32<16:06:42, 13.89s/it] 58%|█████▊ | 5826/10000 [22:51:46<16:05:18, 13.88s/it] {'loss': 0.0148, 'learning_rate': 2.0915000000000002e-05, 'epoch': 7.63} 58%|█████▊ | 5826/10000 [22:51:46<16:05:18, 13.88s/it] 58%|█████▊ | 5827/10000 [22:52:00<16:02:42, 13.84s/it] {'loss': 0.0164, 'learning_rate': 2.091e-05, 'epoch': 7.63} 58%|█████▊ | 5827/10000 [22:52:00<16:02:42, 13.84s/it] 58%|█████▊ | 5828/10000 [22:52:14<16:02:07, 13.84s/it] {'loss': 0.0103, 'learning_rate': 2.0905000000000004e-05, 'epoch': 7.63} 58%|█████▊ | 5828/10000 [22:52:14<16:02:07, 13.84s/it] 58%|█████▊ | 5829/10000 [22:52:27<16:03:02, 13.85s/it] {'loss': 0.0158, 'learning_rate': 2.09e-05, 'epoch': 7.63} 58%|█████▊ | 5829/10000 [22:52:27<16:03:02, 13.85s/it] 58%|█████▊ | 5830/10000 [22:52:41<16:05:11, 13.89s/it] {'loss': 0.0131, 'learning_rate': 2.0895e-05, 'epoch': 7.63} 58%|█████▊ | 5830/10000 [22:52:41<16:05:11, 13.89s/it] 58%|█████▊ | 5831/10000 [22:52:55<16:04:24, 13.88s/it] {'loss': 0.0135, 'learning_rate': 2.089e-05, 'epoch': 7.63} 58%|█████▊ | 5831/10000 [22:52:55<16:04:24, 13.88s/it] 58%|█████▊ | 5832/10000 [22:53:09<16:03:14, 13.87s/it] {'loss': 0.0183, 'learning_rate': 2.0885e-05, 'epoch': 7.63} 58%|█████▊ | 5832/10000 [22:53:09<16:03:14, 13.87s/it] 58%|█████▊ | 5833/10000 [22:53:23<16:03:53, 13.88s/it] {'loss': 0.0139, 'learning_rate': 2.0880000000000003e-05, 'epoch': 7.63} 58%|█████▊ | 5833/10000 [22:53:23<16:03:53, 13.88s/it] 58%|█████▊ | 5834/10000 [22:53:37<16:01:45, 13.85s/it] {'loss': 0.0136, 'learning_rate': 2.0875e-05, 'epoch': 7.64} 58%|█████▊ | 5834/10000 [22:53:37<16:01:45, 13.85s/it] 58%|█████▊ | 5835/10000 [22:53:51<15:59:22, 13.82s/it] {'loss': 0.0129, 'learning_rate': 2.0870000000000002e-05, 'epoch': 7.64} 58%|█████▊ | 5835/10000 [22:53:51<15:59:22, 13.82s/it] 58%|█████▊ | 5836/10000 [22:54:04<15:57:27, 13.80s/it] {'loss': 0.0128, 'learning_rate': 2.0865e-05, 'epoch': 7.64} 58%|█████▊ | 5836/10000 [22:54:04<15:57:27, 13.80s/it] 58%|█████▊ | 5837/10000 [22:54:18<15:58:43, 13.82s/it] {'loss': 0.0139, 'learning_rate': 2.086e-05, 'epoch': 7.64} 58%|█████▊ | 5837/10000 [22:54:18<15:58:43, 13.82s/it] 58%|█████▊ | 5838/10000 [22:54:32<16:01:07, 13.86s/it] {'loss': 0.0139, 'learning_rate': 2.0855000000000003e-05, 'epoch': 7.64} 58%|█████▊ | 5838/10000 [22:54:32<16:01:07, 13.86s/it] 58%|█████▊ | 5839/10000 [22:54:46<15:58:24, 13.82s/it] {'loss': 0.016, 'learning_rate': 2.085e-05, 'epoch': 7.64} 58%|█████▊ | 5839/10000 [22:54:46<15:58:24, 13.82s/it] 58%|█████▊ | 5840/10000 [22:55:00<16:01:15, 13.86s/it] {'loss': 0.0124, 'learning_rate': 2.0845e-05, 'epoch': 7.64} 58%|█████▊ | 5840/10000 [22:55:00<16:01:15, 13.86s/it] 58%|█████▊ | 5841/10000 [22:55:14<16:00:22, 13.85s/it] {'loss': 0.0182, 'learning_rate': 2.084e-05, 'epoch': 7.65} 58%|█████▊ | 5841/10000 [22:55:14<16:00:22, 13.85s/it] 58%|█████▊ | 5842/10000 [22:55:27<16:00:08, 13.85s/it] {'loss': 0.0135, 'learning_rate': 2.0835000000000003e-05, 'epoch': 7.65} 58%|█████▊ | 5842/10000 [22:55:27<16:00:08, 13.85s/it] 58%|█████▊ | 5843/10000 [22:55:41<16:00:47, 13.87s/it] {'loss': 0.014, 'learning_rate': 2.0830000000000002e-05, 'epoch': 7.65} 58%|█████▊ | 5843/10000 [22:55:41<16:00:47, 13.87s/it] 58%|█████▊ | 5844/10000 [22:55:55<15:59:13, 13.85s/it] {'loss': 0.0137, 'learning_rate': 2.0825e-05, 'epoch': 7.65} 58%|█████▊ | 5844/10000 [22:55:55<15:59:13, 13.85s/it] 58%|█████▊ | 5845/10000 [22:56:09<15:59:19, 13.85s/it] {'loss': 0.0139, 'learning_rate': 2.082e-05, 'epoch': 7.65} 58%|█████▊ | 5845/10000 [22:56:09<15:59:19, 13.85s/it] 58%|█████▊ | 5846/10000 [22:56:23<15:56:09, 13.81s/it] {'loss': 0.0126, 'learning_rate': 2.0815e-05, 'epoch': 7.65} 58%|█████▊ | 5846/10000 [22:56:23<15:56:09, 13.81s/it] 58%|█████▊ | 5847/10000 [22:56:37<15:55:55, 13.81s/it] {'loss': 0.0118, 'learning_rate': 2.0810000000000002e-05, 'epoch': 7.65} 58%|█████▊ | 5847/10000 [22:56:37<15:55:55, 13.81s/it] 58%|█████▊ | 5848/10000 [22:56:50<15:56:15, 13.82s/it] {'loss': 0.014, 'learning_rate': 2.0805e-05, 'epoch': 7.65} 58%|█████▊ | 5848/10000 [22:56:50<15:56:15, 13.82s/it] 58%|█████▊ | 5849/10000 [22:57:04<15:56:50, 13.83s/it] {'loss': 0.0146, 'learning_rate': 2.08e-05, 'epoch': 7.66} 58%|█████▊ | 5849/10000 [22:57:04<15:56:50, 13.83s/it] 58%|█████▊ | 5850/10000 [22:57:18<15:58:23, 13.86s/it] {'loss': 0.015, 'learning_rate': 2.0795e-05, 'epoch': 7.66} 58%|█████▊ | 5850/10000 [22:57:18<15:58:23, 13.86s/it] 59%|█████▊ | 5851/10000 [22:57:32<15:55:56, 13.82s/it] {'loss': 0.016, 'learning_rate': 2.0790000000000003e-05, 'epoch': 7.66} 59%|█████▊ | 5851/10000 [22:57:32<15:55:56, 13.82s/it] 59%|█████▊ | 5852/10000 [22:57:46<15:55:37, 13.82s/it] {'loss': 0.0193, 'learning_rate': 2.0785000000000002e-05, 'epoch': 7.66} 59%|█████▊ | 5852/10000 [22:57:46<15:55:37, 13.82s/it] 59%|█████▊ | 5853/10000 [22:58:00<15:55:16, 13.82s/it] {'loss': 0.0149, 'learning_rate': 2.078e-05, 'epoch': 7.66} 59%|█████▊ | 5853/10000 [22:58:00<15:55:16, 13.82s/it] 59%|█████▊ | 5854/10000 [22:58:13<15:55:16, 13.82s/it] {'loss': 0.0148, 'learning_rate': 2.0775e-05, 'epoch': 7.66} 59%|█████▊ | 5854/10000 [22:58:13<15:55:16, 13.82s/it] 59%|█████▊ | 5855/10000 [22:58:27<15:54:40, 13.82s/it] {'loss': 0.0127, 'learning_rate': 2.077e-05, 'epoch': 7.66} 59%|█████▊ | 5855/10000 [22:58:27<15:54:40, 13.82s/it] 59%|█████▊ | 5856/10000 [22:58:41<15:56:05, 13.84s/it] {'loss': 0.012, 'learning_rate': 2.0765000000000002e-05, 'epoch': 7.66} 59%|█████▊ | 5856/10000 [22:58:41<15:56:05, 13.84s/it] 59%|█████▊ | 5857/10000 [22:58:55<15:55:14, 13.83s/it] {'loss': 0.0154, 'learning_rate': 2.076e-05, 'epoch': 7.67} 59%|█████▊ | 5857/10000 [22:58:55<15:55:14, 13.83s/it] 59%|█████▊ | 5858/10000 [22:59:09<15:55:53, 13.85s/it] {'loss': 0.0129, 'learning_rate': 2.0755000000000004e-05, 'epoch': 7.67} 59%|█████▊ | 5858/10000 [22:59:09<15:55:53, 13.85s/it] 59%|█████▊ | 5859/10000 [22:59:23<15:57:12, 13.87s/it] {'loss': 0.0123, 'learning_rate': 2.075e-05, 'epoch': 7.67} 59%|█████▊ | 5859/10000 [22:59:23<15:57:12, 13.87s/it] 59%|█████▊ | 5860/10000 [22:59:37<15:57:09, 13.87s/it] {'loss': 0.0149, 'learning_rate': 2.0745000000000002e-05, 'epoch': 7.67} 59%|█████▊ | 5860/10000 [22:59:37<15:57:09, 13.87s/it] 59%|█████▊ | 5861/10000 [22:59:50<15:57:20, 13.88s/it] {'loss': 0.0121, 'learning_rate': 2.074e-05, 'epoch': 7.67} 59%|█████▊ | 5861/10000 [22:59:50<15:57:20, 13.88s/it] 59%|█████▊ | 5862/10000 [23:00:04<15:55:38, 13.86s/it] {'loss': 0.0141, 'learning_rate': 2.0735e-05, 'epoch': 7.67} 59%|█████▊ | 5862/10000 [23:00:04<15:55:38, 13.86s/it] 59%|█████▊ | 5863/10000 [23:00:18<15:56:39, 13.87s/it] {'loss': 0.0165, 'learning_rate': 2.0730000000000003e-05, 'epoch': 7.67} 59%|█████▊ | 5863/10000 [23:00:18<15:56:39, 13.87s/it] 59%|█████▊ | 5864/10000 [23:00:32<15:53:48, 13.84s/it] {'loss': 0.0141, 'learning_rate': 2.0725e-05, 'epoch': 7.68} 59%|█████▊ | 5864/10000 [23:00:32<15:53:48, 13.84s/it] 59%|█████▊ | 5865/10000 [23:00:46<15:53:48, 13.84s/it] {'loss': 0.0136, 'learning_rate': 2.072e-05, 'epoch': 7.68} 59%|█████▊ | 5865/10000 [23:00:46<15:53:48, 13.84s/it] 59%|█████▊ | 5866/10000 [23:01:00<15:54:44, 13.86s/it] {'loss': 0.0112, 'learning_rate': 2.0715e-05, 'epoch': 7.68} 59%|█████▊ | 5866/10000 [23:01:00<15:54:44, 13.86s/it] 59%|█████▊ | 5867/10000 [23:01:14<15:55:26, 13.87s/it] {'loss': 0.0143, 'learning_rate': 2.0710000000000003e-05, 'epoch': 7.68} 59%|█████▊ | 5867/10000 [23:01:14<15:55:26, 13.87s/it] 59%|█████▊ | 5868/10000 [23:01:28<15:57:11, 13.90s/it] {'loss': 0.0122, 'learning_rate': 2.0705000000000003e-05, 'epoch': 7.68} 59%|█████▊ | 5868/10000 [23:01:28<15:57:11, 13.90s/it] 59%|█████▊ | 5869/10000 [23:01:41<15:55:55, 13.88s/it] {'loss': 0.0147, 'learning_rate': 2.07e-05, 'epoch': 7.68} 59%|█████▊ | 5869/10000 [23:01:41<15:55:55, 13.88s/it] 59%|█████▊ | 5870/10000 [23:01:55<15:56:44, 13.90s/it] {'loss': 0.0138, 'learning_rate': 2.0695e-05, 'epoch': 7.68} 59%|█████▊ | 5870/10000 [23:01:55<15:56:44, 13.90s/it] 59%|█████▊ | 5871/10000 [23:02:09<15:54:56, 13.88s/it] {'loss': 0.0163, 'learning_rate': 2.069e-05, 'epoch': 7.68} 59%|█████▊ | 5871/10000 [23:02:09<15:54:56, 13.88s/it] 59%|█████▊ | 5872/10000 [23:02:23<15:53:18, 13.86s/it] {'loss': 0.0139, 'learning_rate': 2.0685000000000003e-05, 'epoch': 7.69} 59%|█████▊ | 5872/10000 [23:02:23<15:53:18, 13.86s/it] 59%|█████▊ | 5873/10000 [23:02:37<15:53:00, 13.86s/it] {'loss': 0.0134, 'learning_rate': 2.0680000000000002e-05, 'epoch': 7.69} 59%|█████▊ | 5873/10000 [23:02:37<15:53:00, 13.86s/it] 59%|█████▊ | 5874/10000 [23:02:51<15:51:17, 13.83s/it] {'loss': 0.0117, 'learning_rate': 2.0675e-05, 'epoch': 7.69} 59%|█████▊ | 5874/10000 [23:02:51<15:51:17, 13.83s/it] 59%|█████▉ | 5875/10000 [23:03:04<15:50:26, 13.82s/it] {'loss': 0.0112, 'learning_rate': 2.067e-05, 'epoch': 7.69} 59%|█████▉ | 5875/10000 [23:03:04<15:50:26, 13.82s/it] 59%|█████▉ | 5876/10000 [23:03:18<15:48:18, 13.80s/it] {'loss': 0.0143, 'learning_rate': 2.0665e-05, 'epoch': 7.69} 59%|█████▉ | 5876/10000 [23:03:18<15:48:18, 13.80s/it] 59%|█████▉ | 5877/10000 [23:03:32<15:47:19, 13.79s/it] {'loss': 0.0126, 'learning_rate': 2.0660000000000002e-05, 'epoch': 7.69} 59%|█████▉ | 5877/10000 [23:03:32<15:47:19, 13.79s/it] 59%|█████▉ | 5878/10000 [23:03:46<15:49:50, 13.83s/it] {'loss': 0.0129, 'learning_rate': 2.0655e-05, 'epoch': 7.69} 59%|█████▉ | 5878/10000 [23:03:46<15:49:50, 13.83s/it] 59%|█████▉ | 5879/10000 [23:04:00<15:52:00, 13.86s/it] {'loss': 0.0167, 'learning_rate': 2.065e-05, 'epoch': 7.7} 59%|█████▉ | 5879/10000 [23:04:00<15:52:00, 13.86s/it] 59%|█████▉ | 5880/10000 [23:04:14<15:54:56, 13.91s/it] {'loss': 0.0138, 'learning_rate': 2.0645e-05, 'epoch': 7.7} 59%|█████▉ | 5880/10000 [23:04:14<15:54:56, 13.91s/it] 59%|█████▉ | 5881/10000 [23:04:28<15:54:08, 13.90s/it] {'loss': 0.016, 'learning_rate': 2.0640000000000002e-05, 'epoch': 7.7} 59%|█████▉ | 5881/10000 [23:04:28<15:54:08, 13.90s/it] 59%|█████▉ | 5882/10000 [23:04:42<15:53:03, 13.89s/it] {'loss': 0.0107, 'learning_rate': 2.0635e-05, 'epoch': 7.7} 59%|█████▉ | 5882/10000 [23:04:42<15:53:03, 13.89s/it] 59%|█████▉ | 5883/10000 [23:04:55<15:52:57, 13.89s/it] {'loss': 0.0113, 'learning_rate': 2.063e-05, 'epoch': 7.7} 59%|█████▉ | 5883/10000 [23:04:55<15:52:57, 13.89s/it] 59%|█████▉ | 5884/10000 [23:05:09<15:54:20, 13.91s/it] {'loss': 0.0153, 'learning_rate': 2.0625e-05, 'epoch': 7.7} 59%|█████▉ | 5884/10000 [23:05:09<15:54:20, 13.91s/it] 59%|█████▉ | 5885/10000 [23:05:23<15:52:19, 13.89s/it] {'loss': 0.013, 'learning_rate': 2.062e-05, 'epoch': 7.7} 59%|█████▉ | 5885/10000 [23:05:23<15:52:19, 13.89s/it] 59%|█████▉ | 5886/10000 [23:05:37<15:54:06, 13.92s/it] {'loss': 0.0144, 'learning_rate': 2.0615e-05, 'epoch': 7.7} 59%|█████▉ | 5886/10000 [23:05:37<15:54:06, 13.92s/it] 59%|█████▉ | 5887/10000 [23:05:51<15:51:53, 13.89s/it] {'loss': 0.0177, 'learning_rate': 2.061e-05, 'epoch': 7.71} 59%|█████▉ | 5887/10000 [23:05:51<15:51:53, 13.89s/it] 59%|█████▉ | 5888/10000 [23:06:05<15:52:03, 13.89s/it] {'loss': 0.0135, 'learning_rate': 2.0605000000000003e-05, 'epoch': 7.71} 59%|█████▉ | 5888/10000 [23:06:05<15:52:03, 13.89s/it] 59%|█████▉ | 5889/10000 [23:06:19<15:52:20, 13.90s/it] {'loss': 0.0144, 'learning_rate': 2.06e-05, 'epoch': 7.71} 59%|█████▉ | 5889/10000 [23:06:19<15:52:20, 13.90s/it] 59%|█████▉ | 5890/10000 [23:06:33<15:51:52, 13.90s/it] {'loss': 0.0132, 'learning_rate': 2.0595000000000002e-05, 'epoch': 7.71} 59%|█████▉ | 5890/10000 [23:06:33<15:51:52, 13.90s/it] 59%|█████▉ | 5891/10000 [23:06:47<15:54:07, 13.93s/it] {'loss': 0.0133, 'learning_rate': 2.059e-05, 'epoch': 7.71} 59%|█████▉ | 5891/10000 [23:06:47<15:54:07, 13.93s/it] 59%|█████▉ | 5892/10000 [23:07:01<15:51:58, 13.90s/it] {'loss': 0.0095, 'learning_rate': 2.0585e-05, 'epoch': 7.71} 59%|█████▉ | 5892/10000 [23:07:01<15:51:58, 13.90s/it] 59%|█████▉ | 5893/10000 [23:07:14<15:51:09, 13.90s/it] {'loss': 0.0136, 'learning_rate': 2.0580000000000003e-05, 'epoch': 7.71} 59%|█████▉ | 5893/10000 [23:07:14<15:51:09, 13.90s/it] 59%|█████▉ | 5894/10000 [23:07:28<15:51:35, 13.91s/it] {'loss': 0.0145, 'learning_rate': 2.0575e-05, 'epoch': 7.71} 59%|█████▉ | 5894/10000 [23:07:28<15:51:35, 13.91s/it] 59%|█████▉ | 5895/10000 [23:07:42<15:52:14, 13.92s/it] {'loss': 0.0129, 'learning_rate': 2.057e-05, 'epoch': 7.72} 59%|█████▉ | 5895/10000 [23:07:42<15:52:14, 13.92s/it] 59%|█████▉ | 5896/10000 [23:07:56<15:51:27, 13.91s/it] {'loss': 0.0161, 'learning_rate': 2.0565e-05, 'epoch': 7.72} 59%|█████▉ | 5896/10000 [23:07:56<15:51:27, 13.91s/it] 59%|█████▉ | 5897/10000 [23:08:10<15:53:24, 13.94s/it] {'loss': 0.0156, 'learning_rate': 2.0560000000000003e-05, 'epoch': 7.72} 59%|█████▉ | 5897/10000 [23:08:10<15:53:24, 13.94s/it] 59%|█████▉ | 5898/10000 [23:08:24<15:53:52, 13.95s/it] {'loss': 0.0143, 'learning_rate': 2.0555000000000002e-05, 'epoch': 7.72} 59%|█████▉ | 5898/10000 [23:08:24<15:53:52, 13.95s/it] 59%|█████▉ | 5899/10000 [23:08:38<15:53:42, 13.95s/it] {'loss': 0.0168, 'learning_rate': 2.055e-05, 'epoch': 7.72} 59%|█████▉ | 5899/10000 [23:08:38<15:53:42, 13.95s/it] 59%|█████▉ | 5900/10000 [23:08:52<15:53:31, 13.95s/it] {'loss': 0.0146, 'learning_rate': 2.0545e-05, 'epoch': 7.72} 59%|█████▉ | 5900/10000 [23:08:52<15:53:31, 13.95s/it] 59%|█████▉ | 5901/10000 [23:09:06<15:50:20, 13.91s/it] {'loss': 0.0149, 'learning_rate': 2.054e-05, 'epoch': 7.72} 59%|█████▉ | 5901/10000 [23:09:06<15:50:20, 13.91s/it] 59%|█████▉ | 5902/10000 [23:09:20<15:48:28, 13.89s/it] {'loss': 0.0151, 'learning_rate': 2.0535000000000002e-05, 'epoch': 7.73} 59%|█████▉ | 5902/10000 [23:09:20<15:48:28, 13.89s/it] 59%|█████▉ | 5903/10000 [23:09:34<15:47:09, 13.87s/it] {'loss': 0.0138, 'learning_rate': 2.053e-05, 'epoch': 7.73} 59%|█████▉ | 5903/10000 [23:09:34<15:47:09, 13.87s/it] 59%|█████▉ | 5904/10000 [23:09:47<15:47:27, 13.88s/it] {'loss': 0.0112, 'learning_rate': 2.0525e-05, 'epoch': 7.73} 59%|█████▉ | 5904/10000 [23:09:48<15:47:27, 13.88s/it] 59%|█████▉ | 5905/10000 [23:10:01<15:46:52, 13.87s/it] {'loss': 0.0135, 'learning_rate': 2.052e-05, 'epoch': 7.73} 59%|█████▉ | 5905/10000 [23:10:01<15:46:52, 13.87s/it] 59%|█████▉ | 5906/10000 [23:10:15<15:46:31, 13.87s/it] {'loss': 0.0139, 'learning_rate': 2.0515e-05, 'epoch': 7.73} 59%|█████▉ | 5906/10000 [23:10:15<15:46:31, 13.87s/it] 59%|█████▉ | 5907/10000 [23:10:29<15:47:44, 13.89s/it] {'loss': 0.0116, 'learning_rate': 2.0510000000000002e-05, 'epoch': 7.73} 59%|█████▉ | 5907/10000 [23:10:29<15:47:44, 13.89s/it] 59%|█████▉ | 5908/10000 [23:10:43<15:49:54, 13.93s/it] {'loss': 0.0137, 'learning_rate': 2.0505e-05, 'epoch': 7.73} 59%|█████▉ | 5908/10000 [23:10:43<15:49:54, 13.93s/it] 59%|█████▉ | 5909/10000 [23:10:57<15:49:36, 13.93s/it] {'loss': 0.0129, 'learning_rate': 2.05e-05, 'epoch': 7.73} 59%|█████▉ | 5909/10000 [23:10:57<15:49:36, 13.93s/it] 59%|█████▉ | 5910/10000 [23:11:11<15:50:22, 13.94s/it] {'loss': 0.0189, 'learning_rate': 2.0495e-05, 'epoch': 7.74} 59%|█████▉ | 5910/10000 [23:11:11<15:50:22, 13.94s/it] 59%|█████▉ | 5911/10000 [23:11:25<15:50:12, 13.94s/it] {'loss': 0.0126, 'learning_rate': 2.0490000000000002e-05, 'epoch': 7.74} 59%|█████▉ | 5911/10000 [23:11:25<15:50:12, 13.94s/it] 59%|█████▉ | 5912/10000 [23:11:39<15:46:17, 13.89s/it] {'loss': 0.0108, 'learning_rate': 2.0485e-05, 'epoch': 7.74} 59%|█████▉ | 5912/10000 [23:11:39<15:46:17, 13.89s/it] 59%|█████▉ | 5913/10000 [23:11:53<15:45:30, 13.88s/it] {'loss': 0.0156, 'learning_rate': 2.048e-05, 'epoch': 7.74} 59%|█████▉ | 5913/10000 [23:11:53<15:45:30, 13.88s/it] 59%|█████▉ | 5914/10000 [23:12:07<15:45:59, 13.89s/it] {'loss': 0.0122, 'learning_rate': 2.0475e-05, 'epoch': 7.74} 59%|█████▉ | 5914/10000 [23:12:07<15:45:59, 13.89s/it] 59%|█████▉ | 5915/10000 [23:12:20<15:45:04, 13.88s/it] {'loss': 0.0111, 'learning_rate': 2.047e-05, 'epoch': 7.74} 59%|█████▉ | 5915/10000 [23:12:20<15:45:04, 13.88s/it] 59%|█████▉ | 5916/10000 [23:12:34<15:43:57, 13.87s/it] {'loss': 0.0131, 'learning_rate': 2.0465e-05, 'epoch': 7.74} 59%|█████▉ | 5916/10000 [23:12:34<15:43:57, 13.87s/it] 59%|█████▉ | 5917/10000 [23:12:48<15:46:28, 13.91s/it] {'loss': 0.0102, 'learning_rate': 2.046e-05, 'epoch': 7.74} 59%|█████▉ | 5917/10000 [23:12:48<15:46:28, 13.91s/it] 59%|█████▉ | 5918/10000 [23:13:02<15:52:01, 13.99s/it] {'loss': 0.0127, 'learning_rate': 2.0455000000000003e-05, 'epoch': 7.75} 59%|█████▉ | 5918/10000 [23:13:02<15:52:01, 13.99s/it] 59%|█████▉ | 5919/10000 [23:13:16<15:50:48, 13.98s/it] {'loss': 0.0144, 'learning_rate': 2.045e-05, 'epoch': 7.75} 59%|█████▉ | 5919/10000 [23:13:16<15:50:48, 13.98s/it] 59%|█████▉ | 5920/10000 [23:13:30<15:49:09, 13.96s/it] {'loss': 0.0168, 'learning_rate': 2.0445e-05, 'epoch': 7.75} 59%|█████▉ | 5920/10000 [23:13:30<15:49:09, 13.96s/it] 59%|█████▉ | 5921/10000 [23:13:44<15:50:15, 13.98s/it] {'loss': 0.013, 'learning_rate': 2.044e-05, 'epoch': 7.75} 59%|█████▉ | 5921/10000 [23:13:44<15:50:15, 13.98s/it] 59%|█████▉ | 5922/10000 [23:13:58<15:48:33, 13.96s/it] {'loss': 0.0125, 'learning_rate': 2.0435e-05, 'epoch': 7.75} 59%|█████▉ | 5922/10000 [23:13:58<15:48:33, 13.96s/it] 59%|█████▉ | 5923/10000 [23:14:12<15:48:03, 13.95s/it] {'loss': 0.0118, 'learning_rate': 2.0430000000000003e-05, 'epoch': 7.75} 59%|█████▉ | 5923/10000 [23:14:12<15:48:03, 13.95s/it] 59%|█████▉ | 5924/10000 [23:14:26<15:46:41, 13.94s/it] {'loss': 0.0134, 'learning_rate': 2.0425e-05, 'epoch': 7.75} 59%|█████▉ | 5924/10000 [23:14:26<15:46:41, 13.94s/it] 59%|█████▉ | 5925/10000 [23:14:40<15:45:34, 13.92s/it] {'loss': 0.0109, 'learning_rate': 2.042e-05, 'epoch': 7.76} 59%|█████▉ | 5925/10000 [23:14:40<15:45:34, 13.92s/it] 59%|█████▉ | 5926/10000 [23:14:54<15:43:55, 13.90s/it] {'loss': 0.0162, 'learning_rate': 2.0415e-05, 'epoch': 7.76} 59%|█████▉ | 5926/10000 [23:14:54<15:43:55, 13.90s/it] 59%|█████▉ | 5927/10000 [23:15:08<15:44:46, 13.92s/it] {'loss': 0.0125, 'learning_rate': 2.0410000000000003e-05, 'epoch': 7.76} 59%|█████▉ | 5927/10000 [23:15:08<15:44:46, 13.92s/it] 59%|█████▉ | 5928/10000 [23:15:22<15:44:40, 13.92s/it] {'loss': 0.0156, 'learning_rate': 2.0405000000000002e-05, 'epoch': 7.76} 59%|█████▉ | 5928/10000 [23:15:22<15:44:40, 13.92s/it] 59%|█████▉ | 5929/10000 [23:15:36<15:46:53, 13.96s/it] {'loss': 0.0145, 'learning_rate': 2.04e-05, 'epoch': 7.76} 59%|█████▉ | 5929/10000 [23:15:36<15:46:53, 13.96s/it] 59%|█████▉ | 5930/10000 [23:15:50<15:46:37, 13.96s/it] {'loss': 0.0151, 'learning_rate': 2.0395e-05, 'epoch': 7.76} 59%|█████▉ | 5930/10000 [23:15:50<15:46:37, 13.96s/it] 59%|█████▉ | 5931/10000 [23:16:04<15:45:18, 13.94s/it] {'loss': 0.0142, 'learning_rate': 2.039e-05, 'epoch': 7.76} 59%|█████▉ | 5931/10000 [23:16:04<15:45:18, 13.94s/it] 59%|█████▉ | 5932/10000 [23:16:18<15:45:39, 13.95s/it] {'loss': 0.0138, 'learning_rate': 2.0385000000000002e-05, 'epoch': 7.76} 59%|█████▉ | 5932/10000 [23:16:18<15:45:39, 13.95s/it] 59%|█████▉ | 5933/10000 [23:16:32<15:46:05, 13.96s/it] {'loss': 0.0128, 'learning_rate': 2.038e-05, 'epoch': 7.77} 59%|█████▉ | 5933/10000 [23:16:32<15:46:05, 13.96s/it] 59%|█████▉ | 5934/10000 [23:16:45<15:43:49, 13.93s/it] {'loss': 0.0124, 'learning_rate': 2.0375e-05, 'epoch': 7.77} 59%|█████▉ | 5934/10000 [23:16:45<15:43:49, 13.93s/it] 59%|█████▉ | 5935/10000 [23:16:59<15:45:38, 13.96s/it] {'loss': 0.0126, 'learning_rate': 2.037e-05, 'epoch': 7.77} 59%|█████▉ | 5935/10000 [23:16:59<15:45:38, 13.96s/it] 59%|█████▉ | 5936/10000 [23:17:13<15:44:49, 13.95s/it] {'loss': 0.0132, 'learning_rate': 2.0365000000000002e-05, 'epoch': 7.77} 59%|█████▉ | 5936/10000 [23:17:13<15:44:49, 13.95s/it] 59%|█████▉ | 5937/10000 [23:17:27<15:46:45, 13.98s/it] {'loss': 0.013, 'learning_rate': 2.036e-05, 'epoch': 7.77} 59%|█████▉ | 5937/10000 [23:17:27<15:46:45, 13.98s/it] 59%|█████▉ | 5938/10000 [23:17:41<15:44:54, 13.96s/it] {'loss': 0.0131, 'learning_rate': 2.0355e-05, 'epoch': 7.77} 59%|█████▉ | 5938/10000 [23:17:41<15:44:54, 13.96s/it] 59%|█████▉ | 5939/10000 [23:17:55<15:41:52, 13.92s/it] {'loss': 0.0133, 'learning_rate': 2.035e-05, 'epoch': 7.77} 59%|█████▉ | 5939/10000 [23:17:55<15:41:52, 13.92s/it] 59%|█████▉ | 5940/10000 [23:18:09<15:43:25, 13.94s/it] {'loss': 0.0118, 'learning_rate': 2.0345e-05, 'epoch': 7.77} 59%|█████▉ | 5940/10000 [23:18:09<15:43:25, 13.94s/it] 59%|█████▉ | 5941/10000 [23:18:23<15:41:31, 13.92s/it] {'loss': 0.0142, 'learning_rate': 2.0340000000000002e-05, 'epoch': 7.78} 59%|█████▉ | 5941/10000 [23:18:23<15:41:31, 13.92s/it] 59%|█████▉ | 5942/10000 [23:18:37<15:41:20, 13.92s/it] {'loss': 0.0136, 'learning_rate': 2.0335e-05, 'epoch': 7.78} 59%|█████▉ | 5942/10000 [23:18:37<15:41:20, 13.92s/it] 59%|█████▉ | 5943/10000 [23:18:51<15:42:26, 13.94s/it] {'loss': 0.0118, 'learning_rate': 2.033e-05, 'epoch': 7.78} 59%|█████▉ | 5943/10000 [23:18:51<15:42:26, 13.94s/it] 59%|█████▉ | 5944/10000 [23:19:05<15:41:12, 13.92s/it] {'loss': 0.0225, 'learning_rate': 2.0325e-05, 'epoch': 7.78} 59%|█████▉ | 5944/10000 [23:19:05<15:41:12, 13.92s/it] 59%|█████▉ | 5945/10000 [23:19:19<15:39:36, 13.90s/it] {'loss': 0.0142, 'learning_rate': 2.032e-05, 'epoch': 7.78} 59%|█████▉ | 5945/10000 [23:19:19<15:39:36, 13.90s/it] 59%|█████▉ | 5946/10000 [23:19:33<15:40:45, 13.92s/it] {'loss': 0.0129, 'learning_rate': 2.0315e-05, 'epoch': 7.78} 59%|█████▉ | 5946/10000 [23:19:33<15:40:45, 13.92s/it] 59%|█████▉ | 5947/10000 [23:19:46<15:37:29, 13.88s/it] {'loss': 0.0115, 'learning_rate': 2.031e-05, 'epoch': 7.78} 59%|█████▉ | 5947/10000 [23:19:46<15:37:29, 13.88s/it] 59%|█████▉ | 5948/10000 [23:20:00<15:36:57, 13.87s/it] {'loss': 0.0157, 'learning_rate': 2.0305000000000003e-05, 'epoch': 7.79} 59%|█████▉ | 5948/10000 [23:20:00<15:36:57, 13.87s/it] 59%|█████▉ | 5949/10000 [23:20:14<15:38:47, 13.90s/it] {'loss': 0.0133, 'learning_rate': 2.0300000000000002e-05, 'epoch': 7.79} 59%|█████▉ | 5949/10000 [23:20:14<15:38:47, 13.90s/it] 60%|█████▉ | 5950/10000 [23:20:28<15:39:19, 13.92s/it] {'loss': 0.012, 'learning_rate': 2.0295e-05, 'epoch': 7.79} 60%|█████▉ | 5950/10000 [23:20:28<15:39:19, 13.92s/it] 60%|█████▉ | 5951/10000 [23:20:42<15:36:10, 13.87s/it] {'loss': 0.0165, 'learning_rate': 2.029e-05, 'epoch': 7.79} 60%|█████▉ | 5951/10000 [23:20:42<15:36:10, 13.87s/it] 60%|█████▉ | 5952/10000 [23:20:56<15:34:44, 13.85s/it] {'loss': 0.0142, 'learning_rate': 2.0285e-05, 'epoch': 7.79} 60%|█████▉ | 5952/10000 [23:20:56<15:34:44, 13.85s/it] 60%|█████▉ | 5953/10000 [23:21:10<15:36:33, 13.89s/it] {'loss': 0.0159, 'learning_rate': 2.0280000000000002e-05, 'epoch': 7.79} 60%|█████▉ | 5953/10000 [23:21:10<15:36:33, 13.89s/it] 60%|█████▉ | 5954/10000 [23:21:24<15:36:48, 13.89s/it] {'loss': 0.0161, 'learning_rate': 2.0275e-05, 'epoch': 7.79} 60%|█████▉ | 5954/10000 [23:21:24<15:36:48, 13.89s/it] 60%|█████▉ | 5955/10000 [23:21:37<15:35:36, 13.88s/it] {'loss': 0.0082, 'learning_rate': 2.027e-05, 'epoch': 7.79} 60%|█████▉ | 5955/10000 [23:21:38<15:35:36, 13.88s/it] 60%|█████▉ | 5956/10000 [23:21:51<15:38:23, 13.92s/it] {'loss': 0.0149, 'learning_rate': 2.0265e-05, 'epoch': 7.8} 60%|█████▉ | 5956/10000 [23:21:52<15:38:23, 13.92s/it] 60%|█████▉ | 5957/10000 [23:22:05<15:35:35, 13.88s/it] {'loss': 0.0151, 'learning_rate': 2.0260000000000003e-05, 'epoch': 7.8} 60%|█████▉ | 5957/10000 [23:22:05<15:35:35, 13.88s/it] 60%|█████▉ | 5958/10000 [23:22:19<15:37:50, 13.92s/it] {'loss': 0.0125, 'learning_rate': 2.0255000000000002e-05, 'epoch': 7.8} 60%|█████▉ | 5958/10000 [23:22:19<15:37:50, 13.92s/it] 60%|█████▉ | 5959/10000 [23:22:33<15:37:49, 13.92s/it] {'loss': 0.0108, 'learning_rate': 2.025e-05, 'epoch': 7.8} 60%|█████▉ | 5959/10000 [23:22:33<15:37:49, 13.92s/it] 60%|█████▉ | 5960/10000 [23:22:47<15:37:04, 13.92s/it] {'loss': 0.0141, 'learning_rate': 2.0245e-05, 'epoch': 7.8} 60%|█████▉ | 5960/10000 [23:22:47<15:37:04, 13.92s/it] 60%|█████▉ | 5961/10000 [23:23:01<15:39:06, 13.95s/it] {'loss': 0.0156, 'learning_rate': 2.024e-05, 'epoch': 7.8} 60%|█████▉ | 5961/10000 [23:23:01<15:39:06, 13.95s/it] 60%|█████▉ | 5962/10000 [23:23:15<15:37:52, 13.94s/it] {'loss': 0.0118, 'learning_rate': 2.0235000000000002e-05, 'epoch': 7.8} 60%|█████▉ | 5962/10000 [23:23:15<15:37:52, 13.94s/it] 60%|█████▉ | 5963/10000 [23:23:29<15:35:46, 13.91s/it] {'loss': 0.0119, 'learning_rate': 2.023e-05, 'epoch': 7.8} 60%|█████▉ | 5963/10000 [23:23:29<15:35:46, 13.91s/it] 60%|█████▉ | 5964/10000 [23:23:43<15:33:17, 13.87s/it] {'loss': 0.0174, 'learning_rate': 2.0225000000000004e-05, 'epoch': 7.81} 60%|█████▉ | 5964/10000 [23:23:43<15:33:17, 13.87s/it] 60%|█████▉ | 5965/10000 [23:23:57<15:32:59, 13.87s/it] {'loss': 0.0163, 'learning_rate': 2.022e-05, 'epoch': 7.81} 60%|█████▉ | 5965/10000 [23:23:57<15:32:59, 13.87s/it] 60%|█████▉ | 5966/10000 [23:24:10<15:30:09, 13.83s/it] {'loss': 0.0154, 'learning_rate': 2.0215000000000002e-05, 'epoch': 7.81} 60%|█████▉ | 5966/10000 [23:24:10<15:30:09, 13.83s/it] 60%|█████▉ | 5967/10000 [23:24:24<15:33:34, 13.89s/it] {'loss': 0.0187, 'learning_rate': 2.021e-05, 'epoch': 7.81} 60%|█████▉ | 5967/10000 [23:24:24<15:33:34, 13.89s/it] 60%|█████▉ | 5968/10000 [23:24:38<15:31:42, 13.86s/it] {'loss': 0.0133, 'learning_rate': 2.0205e-05, 'epoch': 7.81} 60%|█████▉ | 5968/10000 [23:24:38<15:31:42, 13.86s/it] 60%|█████▉ | 5969/10000 [23:24:52<15:31:27, 13.86s/it] {'loss': 0.0184, 'learning_rate': 2.0200000000000003e-05, 'epoch': 7.81} 60%|█████▉ | 5969/10000 [23:24:52<15:31:27, 13.86s/it] 60%|█████▉ | 5970/10000 [23:25:06<15:32:21, 13.88s/it] {'loss': 0.0139, 'learning_rate': 2.0195e-05, 'epoch': 7.81} 60%|█████▉ | 5970/10000 [23:25:06<15:32:21, 13.88s/it] 60%|█████▉ | 5971/10000 [23:25:20<15:33:04, 13.90s/it] {'loss': 0.0179, 'learning_rate': 2.019e-05, 'epoch': 7.82} 60%|█████▉ | 5971/10000 [23:25:20<15:33:04, 13.90s/it] 60%|█████▉ | 5972/10000 [23:25:34<15:30:59, 13.87s/it] {'loss': 0.0131, 'learning_rate': 2.0185e-05, 'epoch': 7.82} 60%|█████▉ | 5972/10000 [23:25:34<15:30:59, 13.87s/it] 60%|█████▉ | 5973/10000 [23:25:48<15:32:56, 13.90s/it] {'loss': 0.0131, 'learning_rate': 2.0180000000000003e-05, 'epoch': 7.82} 60%|█████▉ | 5973/10000 [23:25:48<15:32:56, 13.90s/it] 60%|█████▉ | 5974/10000 [23:26:01<15:30:30, 13.87s/it] {'loss': 0.0112, 'learning_rate': 2.0175000000000003e-05, 'epoch': 7.82} 60%|█████▉ | 5974/10000 [23:26:01<15:30:30, 13.87s/it] 60%|█████▉ | 5975/10000 [23:26:15<15:33:26, 13.91s/it] {'loss': 0.0146, 'learning_rate': 2.017e-05, 'epoch': 7.82} 60%|█████▉ | 5975/10000 [23:26:15<15:33:26, 13.91s/it] 60%|█████▉ | 5976/10000 [23:26:29<15:34:26, 13.93s/it] {'loss': 0.0146, 'learning_rate': 2.0165e-05, 'epoch': 7.82} 60%|█████▉ | 5976/10000 [23:26:29<15:34:26, 13.93s/it] 60%|█████▉ | 5977/10000 [23:26:43<15:32:34, 13.91s/it] {'loss': 0.0134, 'learning_rate': 2.016e-05, 'epoch': 7.82} 60%|█████▉ | 5977/10000 [23:26:43<15:32:34, 13.91s/it] 60%|█████▉ | 5978/10000 [23:26:57<15:31:27, 13.90s/it] {'loss': 0.0149, 'learning_rate': 2.0155000000000003e-05, 'epoch': 7.82} 60%|█████▉ | 5978/10000 [23:26:57<15:31:27, 13.90s/it] 60%|█████▉ | 5979/10000 [23:27:11<15:32:26, 13.91s/it] {'loss': 0.0147, 'learning_rate': 2.0150000000000002e-05, 'epoch': 7.83} 60%|█████▉ | 5979/10000 [23:27:11<15:32:26, 13.91s/it] 60%|█████▉ | 5980/10000 [23:27:25<15:33:29, 13.93s/it] {'loss': 0.0123, 'learning_rate': 2.0145e-05, 'epoch': 7.83} 60%|█████▉ | 5980/10000 [23:27:25<15:33:29, 13.93s/it] 60%|█████▉ | 5981/10000 [23:27:39<15:33:14, 13.93s/it] {'loss': 0.0152, 'learning_rate': 2.014e-05, 'epoch': 7.83} 60%|█████▉ | 5981/10000 [23:27:39<15:33:14, 13.93s/it] 60%|█████▉ | 5982/10000 [23:27:53<15:32:34, 13.93s/it] {'loss': 0.0155, 'learning_rate': 2.0135e-05, 'epoch': 7.83} 60%|█████▉ | 5982/10000 [23:27:53<15:32:34, 13.93s/it] 60%|█████▉ | 5983/10000 [23:28:07<15:33:40, 13.95s/it] {'loss': 0.0151, 'learning_rate': 2.0130000000000002e-05, 'epoch': 7.83} 60%|█████▉ | 5983/10000 [23:28:07<15:33:40, 13.95s/it] 60%|█████▉ | 5984/10000 [23:28:21<15:31:47, 13.92s/it] {'loss': 0.0152, 'learning_rate': 2.0125e-05, 'epoch': 7.83} 60%|█████▉ | 5984/10000 [23:28:21<15:31:47, 13.92s/it] 60%|█████▉ | 5985/10000 [23:28:35<15:30:55, 13.91s/it] {'loss': 0.0118, 'learning_rate': 2.012e-05, 'epoch': 7.83} 60%|█████▉ | 5985/10000 [23:28:35<15:30:55, 13.91s/it] 60%|█████▉ | 5986/10000 [23:28:49<15:30:06, 13.90s/it] {'loss': 0.0137, 'learning_rate': 2.0115e-05, 'epoch': 7.84} 60%|█████▉ | 5986/10000 [23:28:49<15:30:06, 13.90s/it] 60%|█████▉ | 5987/10000 [23:29:02<15:30:42, 13.92s/it] {'loss': 0.0136, 'learning_rate': 2.0110000000000002e-05, 'epoch': 7.84} 60%|█████▉ | 5987/10000 [23:29:03<15:30:42, 13.92s/it] 60%|█████▉ | 5988/10000 [23:29:16<15:29:45, 13.90s/it] {'loss': 0.0143, 'learning_rate': 2.0105e-05, 'epoch': 7.84} 60%|█████▉ | 5988/10000 [23:29:16<15:29:45, 13.90s/it] 60%|█████▉ | 5989/10000 [23:29:30<15:27:33, 13.88s/it] {'loss': 0.0167, 'learning_rate': 2.01e-05, 'epoch': 7.84} 60%|█████▉ | 5989/10000 [23:29:30<15:27:33, 13.88s/it] 60%|█████▉ | 5990/10000 [23:29:44<15:28:13, 13.89s/it] {'loss': 0.0155, 'learning_rate': 2.0095e-05, 'epoch': 7.84} 60%|█████▉ | 5990/10000 [23:29:44<15:28:13, 13.89s/it] 60%|█████▉ | 5991/10000 [23:29:58<15:30:29, 13.93s/it] {'loss': 0.0163, 'learning_rate': 2.009e-05, 'epoch': 7.84} 60%|█████▉ | 5991/10000 [23:29:58<15:30:29, 13.93s/it] 60%|█████▉ | 5992/10000 [23:30:12<15:31:46, 13.95s/it] {'loss': 0.0122, 'learning_rate': 2.0085e-05, 'epoch': 7.84} 60%|█████▉ | 5992/10000 [23:30:12<15:31:46, 13.95s/it] 60%|█████▉ | 5993/10000 [23:30:26<15:29:34, 13.92s/it] {'loss': 0.0151, 'learning_rate': 2.008e-05, 'epoch': 7.84} 60%|█████▉ | 5993/10000 [23:30:26<15:29:34, 13.92s/it] 60%|█████▉ | 5994/10000 [23:30:40<15:28:55, 13.91s/it] {'loss': 0.0146, 'learning_rate': 2.0075000000000003e-05, 'epoch': 7.85} 60%|█████▉ | 5994/10000 [23:30:40<15:28:55, 13.91s/it] 60%|█████▉ | 5995/10000 [23:30:54<15:26:52, 13.89s/it] {'loss': 0.0129, 'learning_rate': 2.007e-05, 'epoch': 7.85} 60%|█████▉ | 5995/10000 [23:30:54<15:26:52, 13.89s/it] 60%|█████▉ | 5996/10000 [23:31:07<15:24:21, 13.85s/it] {'loss': 0.0141, 'learning_rate': 2.0065000000000002e-05, 'epoch': 7.85} 60%|█████▉ | 5996/10000 [23:31:07<15:24:21, 13.85s/it] 60%|█████▉ | 5997/10000 [23:31:21<15:27:00, 13.89s/it] {'loss': 0.017, 'learning_rate': 2.006e-05, 'epoch': 7.85} 60%|█████▉ | 5997/10000 [23:31:21<15:27:00, 13.89s/it] 60%|█████▉ | 5998/10000 [23:31:35<15:26:38, 13.89s/it] {'loss': 0.015, 'learning_rate': 2.0055e-05, 'epoch': 7.85} 60%|█████▉ | 5998/10000 [23:31:35<15:26:38, 13.89s/it] 60%|█████▉ | 5999/10000 [23:31:49<15:25:51, 13.88s/it] {'loss': 0.0124, 'learning_rate': 2.0050000000000003e-05, 'epoch': 7.85} 60%|█████▉ | 5999/10000 [23:31:49<15:25:51, 13.88s/it] 60%|██████ | 6000/10000 [23:32:03<15:26:00, 13.89s/it] {'loss': 0.0161, 'learning_rate': 2.0045e-05, 'epoch': 7.85} 60%|██████ | 6000/10000 [23:32:03<15:26:00, 13.89s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 19:50:11,454 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-6000/config.json [INFO|configuration_utils.py:364] 2024-11-04 19:50:11,455 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-6000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 19:51:17,961 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-6000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 19:51:17,964 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-6000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 19:51:17,965 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-6000/special_tokens_map.json [2024-11-04 19:51:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step6000 is about to be saved! [2024-11-04 19:51:18,055] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt [2024-11-04 19:51:18,055] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt... [2024-11-04 19:52:38,358] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/mp_rank_00_model_states.pt. [2024-11-04 19:52:38,526] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 19:54:03,165] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 19:54:03,462] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-6000/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 19:54:03,462] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step6000 is ready now! 60%|██████ | 6001/10000 [23:36:24<97:36:25, 87.87s/it] {'loss': 0.0143, 'learning_rate': 2.004e-05, 'epoch': 7.85} 60%|██████ | 6001/10000 [23:36:24<97:36:25, 87.87s/it] 60%|██████ | 6002/10000 [23:36:37<72:54:20, 65.65s/it] {'loss': 0.0118, 'learning_rate': 2.0035e-05, 'epoch': 7.86} 60%|██████ | 6002/10000 [23:36:37<72:54:20, 65.65s/it] 60%|██████ | 6003/10000 [23:36:51<55:37:27, 50.10s/it] {'loss': 0.0171, 'learning_rate': 2.0030000000000003e-05, 'epoch': 7.86} 60%|██████ | 6003/10000 [23:36:51<55:37:27, 50.10s/it] 60%|██████ | 6004/10000 [23:37:05<43:31:40, 39.21s/it] {'loss': 0.0145, 'learning_rate': 2.0025000000000002e-05, 'epoch': 7.86} 60%|██████ | 6004/10000 [23:37:05<43:31:40, 39.21s/it] 60%|██████ | 6005/10000 [23:37:19<35:04:06, 31.60s/it] {'loss': 0.0136, 'learning_rate': 2.002e-05, 'epoch': 7.86} 60%|██████ | 6005/10000 [23:37:19<35:04:06, 31.60s/it] 60%|██████ | 6006/10000 [23:37:33<29:10:13, 26.29s/it] {'loss': 0.0136, 'learning_rate': 2.0015e-05, 'epoch': 7.86} 60%|██████ | 6006/10000 [23:37:33<29:10:13, 26.29s/it] 60%|██████ | 6007/10000 [23:37:47<25:03:57, 22.60s/it] {'loss': 0.014, 'learning_rate': 2.001e-05, 'epoch': 7.86} 60%|██████ | 6007/10000 [23:37:47<25:03:57, 22.60s/it] 60%|██████ | 6008/10000 [23:38:01<22:14:06, 20.05s/it] {'loss': 0.0111, 'learning_rate': 2.0005000000000002e-05, 'epoch': 7.86} 60%|██████ | 6008/10000 [23:38:01<22:14:06, 20.05s/it] 60%|██████ | 6009/10000 [23:38:15<20:11:10, 18.21s/it] {'loss': 0.0128, 'learning_rate': 2e-05, 'epoch': 7.87} 60%|██████ | 6009/10000 [23:38:15<20:11:10, 18.21s/it] 60%|██████ | 6010/10000 [23:38:29<18:47:28, 16.95s/it] {'loss': 0.0151, 'learning_rate': 1.9995e-05, 'epoch': 7.87} 60%|██████ | 6010/10000 [23:38:29<18:47:28, 16.95s/it] 60%|██████ | 6011/10000 [23:38:43<17:47:46, 16.06s/it] {'loss': 0.0126, 'learning_rate': 1.999e-05, 'epoch': 7.87} 60%|██████ | 6011/10000 [23:38:43<17:47:46, 16.06s/it] 60%|██████ | 6012/10000 [23:38:57<17:06:41, 15.45s/it] {'loss': 0.0128, 'learning_rate': 1.9985000000000003e-05, 'epoch': 7.87} 60%|██████ | 6012/10000 [23:38:57<17:06:41, 15.45s/it] 60%|██████ | 6013/10000 [23:39:11<16:33:05, 14.94s/it] {'loss': 0.014, 'learning_rate': 1.9980000000000002e-05, 'epoch': 7.87} 60%|██████ | 6013/10000 [23:39:11<16:33:05, 14.94s/it] 60%|██████ | 6014/10000 [23:39:25<16:15:31, 14.68s/it] {'loss': 0.0151, 'learning_rate': 1.9975e-05, 'epoch': 7.87} 60%|██████ | 6014/10000 [23:39:25<16:15:31, 14.68s/it] 60%|██████ | 6015/10000 [23:39:39<16:00:03, 14.45s/it] {'loss': 0.0117, 'learning_rate': 1.997e-05, 'epoch': 7.87} 60%|██████ | 6015/10000 [23:39:39<16:00:03, 14.45s/it] 60%|██████ | 6016/10000 [23:39:52<15:48:23, 14.28s/it] {'loss': 0.0122, 'learning_rate': 1.9965e-05, 'epoch': 7.87} 60%|██████ | 6016/10000 [23:39:52<15:48:23, 14.28s/it] 60%|██████ | 6017/10000 [23:40:06<15:40:05, 14.16s/it] {'loss': 0.0112, 'learning_rate': 1.9960000000000002e-05, 'epoch': 7.88} 60%|██████ | 6017/10000 [23:40:06<15:40:05, 14.16s/it] 60%|██████ | 6018/10000 [23:40:20<15:35:42, 14.10s/it] {'loss': 0.0174, 'learning_rate': 1.9955e-05, 'epoch': 7.88} 60%|██████ | 6018/10000 [23:40:20<15:35:42, 14.10s/it] 60%|██████ | 6019/10000 [23:40:34<15:32:26, 14.05s/it] {'loss': 0.0151, 'learning_rate': 1.995e-05, 'epoch': 7.88} 60%|██████ | 6019/10000 [23:40:34<15:32:26, 14.05s/it] 60%|██████ | 6020/10000 [23:40:48<15:27:53, 13.99s/it] {'loss': 0.0143, 'learning_rate': 1.9945e-05, 'epoch': 7.88} 60%|██████ | 6020/10000 [23:40:48<15:27:53, 13.99s/it] 60%|██████ | 6021/10000 [23:41:02<15:30:46, 14.04s/it] {'loss': 0.0186, 'learning_rate': 1.994e-05, 'epoch': 7.88} 60%|██████ | 6021/10000 [23:41:02<15:30:46, 14.04s/it] 60%|██████ | 6022/10000 [23:41:16<15:27:44, 13.99s/it] {'loss': 0.0148, 'learning_rate': 1.9935e-05, 'epoch': 7.88} 60%|██████ | 6022/10000 [23:41:16<15:27:44, 13.99s/it] 60%|██████ | 6023/10000 [23:41:30<15:28:00, 14.00s/it] {'loss': 0.0137, 'learning_rate': 1.993e-05, 'epoch': 7.88} 60%|██████ | 6023/10000 [23:41:30<15:28:00, 14.00s/it] 60%|██████ | 6024/10000 [23:41:44<15:27:59, 14.00s/it] {'loss': 0.0129, 'learning_rate': 1.9925000000000003e-05, 'epoch': 7.88} 60%|██████ | 6024/10000 [23:41:44<15:27:59, 14.00s/it] 60%|██████ | 6025/10000 [23:41:58<15:27:55, 14.01s/it] {'loss': 0.0143, 'learning_rate': 1.992e-05, 'epoch': 7.89} 60%|██████ | 6025/10000 [23:41:58<15:27:55, 14.01s/it] 60%|██████ | 6026/10000 [23:42:12<15:24:16, 13.95s/it] {'loss': 0.0119, 'learning_rate': 1.9915e-05, 'epoch': 7.89} 60%|██████ | 6026/10000 [23:42:12<15:24:16, 13.95s/it] 60%|██████ | 6027/10000 [23:42:26<15:24:41, 13.96s/it] {'loss': 0.0095, 'learning_rate': 1.991e-05, 'epoch': 7.89} 60%|██████ | 6027/10000 [23:42:26<15:24:41, 13.96s/it] 60%|██████ | 6028/10000 [23:42:40<15:24:47, 13.97s/it] {'loss': 0.0152, 'learning_rate': 1.9905e-05, 'epoch': 7.89} 60%|██████ | 6028/10000 [23:42:40<15:24:47, 13.97s/it] 60%|██████ | 6029/10000 [23:42:54<15:23:33, 13.95s/it] {'loss': 0.0149, 'learning_rate': 1.9900000000000003e-05, 'epoch': 7.89} 60%|██████ | 6029/10000 [23:42:54<15:23:33, 13.95s/it] 60%|██████ | 6030/10000 [23:43:08<15:26:31, 14.00s/it] {'loss': 0.0154, 'learning_rate': 1.9895e-05, 'epoch': 7.89} 60%|██████ | 6030/10000 [23:43:08<15:26:31, 14.00s/it] 60%|██████ | 6031/10000 [23:43:22<15:23:51, 13.97s/it] {'loss': 0.0136, 'learning_rate': 1.989e-05, 'epoch': 7.89} 60%|██████ | 6031/10000 [23:43:22<15:23:51, 13.97s/it] 60%|██████ | 6032/10000 [23:43:36<15:26:16, 14.01s/it] {'loss': 0.012, 'learning_rate': 1.9885e-05, 'epoch': 7.9} 60%|██████ | 6032/10000 [23:43:36<15:26:16, 14.01s/it] 60%|██████ | 6033/10000 [23:43:50<15:22:59, 13.96s/it] {'loss': 0.0124, 'learning_rate': 1.9880000000000003e-05, 'epoch': 7.9} 60%|██████ | 6033/10000 [23:43:50<15:22:59, 13.96s/it] 60%|██████ | 6034/10000 [23:44:04<15:24:25, 13.99s/it] {'loss': 0.0159, 'learning_rate': 1.9875000000000002e-05, 'epoch': 7.9} 60%|██████ | 6034/10000 [23:44:04<15:24:25, 13.99s/it] 60%|██████ | 6035/10000 [23:44:18<15:25:20, 14.00s/it] {'loss': 0.0147, 'learning_rate': 1.987e-05, 'epoch': 7.9} 60%|██████ | 6035/10000 [23:44:18<15:25:20, 14.00s/it] 60%|██████ | 6036/10000 [23:44:32<15:23:53, 13.98s/it] {'loss': 0.0124, 'learning_rate': 1.9865e-05, 'epoch': 7.9} 60%|██████ | 6036/10000 [23:44:32<15:23:53, 13.98s/it] 60%|██████ | 6037/10000 [23:44:46<15:25:46, 14.02s/it] {'loss': 0.011, 'learning_rate': 1.986e-05, 'epoch': 7.9} 60%|██████ | 6037/10000 [23:44:46<15:25:46, 14.02s/it] 60%|██████ | 6038/10000 [23:45:00<15:23:52, 13.99s/it] {'loss': 0.0177, 'learning_rate': 1.9855000000000002e-05, 'epoch': 7.9} 60%|██████ | 6038/10000 [23:45:00<15:23:52, 13.99s/it] 60%|██████ | 6039/10000 [23:45:14<15:23:34, 13.99s/it] {'loss': 0.0138, 'learning_rate': 1.985e-05, 'epoch': 7.9} 60%|██████ | 6039/10000 [23:45:14<15:23:34, 13.99s/it] 60%|██████ | 6040/10000 [23:45:28<15:22:53, 13.98s/it] {'loss': 0.0132, 'learning_rate': 1.9845e-05, 'epoch': 7.91} 60%|██████ | 6040/10000 [23:45:28<15:22:53, 13.98s/it] 60%|██████ | 6041/10000 [23:45:42<15:21:34, 13.97s/it] {'loss': 0.0129, 'learning_rate': 1.984e-05, 'epoch': 7.91} 60%|██████ | 6041/10000 [23:45:42<15:21:34, 13.97s/it] 60%|██████ | 6042/10000 [23:45:56<15:23:26, 14.00s/it] {'loss': 0.0161, 'learning_rate': 1.9835000000000002e-05, 'epoch': 7.91} 60%|██████ | 6042/10000 [23:45:56<15:23:26, 14.00s/it] 60%|██████ | 6043/10000 [23:46:10<15:27:48, 14.07s/it] {'loss': 0.0101, 'learning_rate': 1.983e-05, 'epoch': 7.91} 60%|██████ | 6043/10000 [23:46:10<15:27:48, 14.07s/it] 60%|██████ | 6044/10000 [23:46:24<15:25:51, 14.04s/it] {'loss': 0.0188, 'learning_rate': 1.9825e-05, 'epoch': 7.91} 60%|██████ | 6044/10000 [23:46:24<15:25:51, 14.04s/it] 60%|██████ | 6045/10000 [23:46:38<15:25:23, 14.04s/it] {'loss': 0.0113, 'learning_rate': 1.982e-05, 'epoch': 7.91} 60%|██████ | 6045/10000 [23:46:38<15:25:23, 14.04s/it] 60%|██████ | 6046/10000 [23:46:52<15:24:09, 14.02s/it] {'loss': 0.0168, 'learning_rate': 1.9815e-05, 'epoch': 7.91} 60%|██████ | 6046/10000 [23:46:52<15:24:09, 14.02s/it] 60%|██████ | 6047/10000 [23:47:06<15:22:53, 14.01s/it] {'loss': 0.0231, 'learning_rate': 1.9810000000000002e-05, 'epoch': 7.91} 60%|██████ | 6047/10000 [23:47:06<15:22:53, 14.01s/it] 60%|██████ | 6048/10000 [23:47:20<15:21:12, 13.99s/it] {'loss': 0.015, 'learning_rate': 1.9805e-05, 'epoch': 7.92} 60%|██████ | 6048/10000 [23:47:20<15:21:12, 13.99s/it] 60%|██████ | 6049/10000 [23:47:34<15:19:14, 13.96s/it] {'loss': 0.0123, 'learning_rate': 1.9800000000000004e-05, 'epoch': 7.92} 60%|██████ | 6049/10000 [23:47:34<15:19:14, 13.96s/it] 60%|██████ | 6050/10000 [23:47:48<15:16:20, 13.92s/it] {'loss': 0.0149, 'learning_rate': 1.9795e-05, 'epoch': 7.92} 60%|██████ | 6050/10000 [23:47:48<15:16:20, 13.92s/it] 61%|██████ | 6051/10000 [23:48:02<15:18:28, 13.96s/it] {'loss': 0.0122, 'learning_rate': 1.979e-05, 'epoch': 7.92} 61%|██████ | 6051/10000 [23:48:02<15:18:28, 13.96s/it] 61%|██████ | 6052/10000 [23:48:16<15:18:17, 13.96s/it] {'loss': 0.014, 'learning_rate': 1.9785e-05, 'epoch': 7.92} 61%|██████ | 6052/10000 [23:48:16<15:18:17, 13.96s/it] 61%|██████ | 6053/10000 [23:48:30<15:22:23, 14.02s/it] {'loss': 0.0168, 'learning_rate': 1.978e-05, 'epoch': 7.92} 61%|██████ | 6053/10000 [23:48:30<15:22:23, 14.02s/it] 61%|██████ | 6054/10000 [23:48:44<15:22:53, 14.03s/it] {'loss': 0.0123, 'learning_rate': 1.9775000000000003e-05, 'epoch': 7.92} 61%|██████ | 6054/10000 [23:48:44<15:22:53, 14.03s/it] 61%|██████ | 6055/10000 [23:48:58<15:19:58, 13.99s/it] {'loss': 0.0125, 'learning_rate': 1.977e-05, 'epoch': 7.93} 61%|██████ | 6055/10000 [23:48:58<15:19:58, 13.99s/it] 61%|██████ | 6056/10000 [23:49:12<15:21:29, 14.02s/it] {'loss': 0.0136, 'learning_rate': 1.9765e-05, 'epoch': 7.93} 61%|██████ | 6056/10000 [23:49:12<15:21:29, 14.02s/it] 61%|██████ | 6057/10000 [23:49:26<15:18:53, 13.98s/it] {'loss': 0.0151, 'learning_rate': 1.976e-05, 'epoch': 7.93} 61%|██████ | 6057/10000 [23:49:26<15:18:53, 13.98s/it] 61%|██████ | 6058/10000 [23:49:40<15:20:58, 14.02s/it] {'loss': 0.0144, 'learning_rate': 1.9755e-05, 'epoch': 7.93} 61%|██████ | 6058/10000 [23:49:40<15:20:58, 14.02s/it] 61%|██████ | 6059/10000 [23:49:54<15:20:32, 14.01s/it] {'loss': 0.0135, 'learning_rate': 1.9750000000000002e-05, 'epoch': 7.93} 61%|██████ | 6059/10000 [23:49:54<15:20:32, 14.01s/it] 61%|██████ | 6060/10000 [23:50:08<15:18:55, 13.99s/it] {'loss': 0.012, 'learning_rate': 1.9744999999999998e-05, 'epoch': 7.93} 61%|██████ | 6060/10000 [23:50:08<15:18:55, 13.99s/it] 61%|██████ | 6061/10000 [23:50:22<15:19:02, 14.00s/it] {'loss': 0.0125, 'learning_rate': 1.974e-05, 'epoch': 7.93} 61%|██████ | 6061/10000 [23:50:22<15:19:02, 14.00s/it] 61%|██████ | 6062/10000 [23:50:36<15:17:52, 13.98s/it] {'loss': 0.0164, 'learning_rate': 1.9735e-05, 'epoch': 7.93} 61%|██████ | 6062/10000 [23:50:36<15:17:52, 13.98s/it] 61%|██████ | 6063/10000 [23:50:50<15:17:23, 13.98s/it] {'loss': 0.0155, 'learning_rate': 1.9730000000000003e-05, 'epoch': 7.94} 61%|██████ | 6063/10000 [23:50:50<15:17:23, 13.98s/it] 61%|██████ | 6064/10000 [23:51:04<15:15:43, 13.96s/it] {'loss': 0.0118, 'learning_rate': 1.9725000000000002e-05, 'epoch': 7.94} 61%|██████ | 6064/10000 [23:51:04<15:15:43, 13.96s/it] 61%|██████ | 6065/10000 [23:51:18<15:14:47, 13.95s/it] {'loss': 0.0127, 'learning_rate': 1.972e-05, 'epoch': 7.94} 61%|██████ | 6065/10000 [23:51:18<15:14:47, 13.95s/it] 61%|██████ | 6066/10000 [23:51:32<15:14:23, 13.95s/it] {'loss': 0.0164, 'learning_rate': 1.9715e-05, 'epoch': 7.94} 61%|██████ | 6066/10000 [23:51:32<15:14:23, 13.95s/it] 61%|██████ | 6067/10000 [23:51:45<15:12:28, 13.92s/it] {'loss': 0.0177, 'learning_rate': 1.971e-05, 'epoch': 7.94} 61%|██████ | 6067/10000 [23:51:45<15:12:28, 13.92s/it] 61%|██████ | 6068/10000 [23:52:00<15:17:39, 14.00s/it] {'loss': 0.0146, 'learning_rate': 1.9705000000000002e-05, 'epoch': 7.94} 61%|██████ | 6068/10000 [23:52:00<15:17:39, 14.00s/it] 61%|██████ | 6069/10000 [23:52:14<15:17:32, 14.00s/it] {'loss': 0.0166, 'learning_rate': 1.97e-05, 'epoch': 7.94} 61%|██████ | 6069/10000 [23:52:14<15:17:32, 14.00s/it] 61%|██████ | 6070/10000 [23:52:28<15:15:42, 13.98s/it] {'loss': 0.0145, 'learning_rate': 1.9695e-05, 'epoch': 7.95} 61%|██████ | 6070/10000 [23:52:28<15:15:42, 13.98s/it] 61%|██████ | 6071/10000 [23:52:42<15:20:01, 14.05s/it] {'loss': 0.015, 'learning_rate': 1.969e-05, 'epoch': 7.95} 61%|██████ | 6071/10000 [23:52:42<15:20:01, 14.05s/it] 61%|██████ | 6072/10000 [23:52:56<15:19:02, 14.04s/it] {'loss': 0.0148, 'learning_rate': 1.9685000000000002e-05, 'epoch': 7.95} 61%|██████ | 6072/10000 [23:52:56<15:19:02, 14.04s/it] 61%|██████ | 6073/10000 [23:53:10<15:17:02, 14.01s/it] {'loss': 0.0157, 'learning_rate': 1.968e-05, 'epoch': 7.95} 61%|██████ | 6073/10000 [23:53:10<15:17:02, 14.01s/it] 61%|██████ | 6074/10000 [23:53:24<15:16:02, 14.00s/it] {'loss': 0.0159, 'learning_rate': 1.9675e-05, 'epoch': 7.95} 61%|██████ | 6074/10000 [23:53:24<15:16:02, 14.00s/it] 61%|██████ | 6075/10000 [23:53:38<15:16:16, 14.01s/it] {'loss': 0.0117, 'learning_rate': 1.9670000000000003e-05, 'epoch': 7.95} 61%|██████ | 6075/10000 [23:53:38<15:16:16, 14.01s/it] 61%|██████ | 6076/10000 [23:53:52<15:15:49, 14.00s/it] {'loss': 0.0144, 'learning_rate': 1.9665e-05, 'epoch': 7.95} 61%|██████ | 6076/10000 [23:53:52<15:15:49, 14.00s/it] 61%|██████ | 6077/10000 [23:54:06<15:15:31, 14.00s/it] {'loss': 0.0156, 'learning_rate': 1.966e-05, 'epoch': 7.95} 61%|██████ | 6077/10000 [23:54:06<15:15:31, 14.00s/it] 61%|██████ | 6078/10000 [23:54:20<15:14:46, 13.99s/it] {'loss': 0.0129, 'learning_rate': 1.9655e-05, 'epoch': 7.96} 61%|██████ | 6078/10000 [23:54:20<15:14:46, 13.99s/it] 61%|██████ | 6079/10000 [23:54:34<15:13:47, 13.98s/it] {'loss': 0.0146, 'learning_rate': 1.9650000000000003e-05, 'epoch': 7.96} 61%|██████ | 6079/10000 [23:54:34<15:13:47, 13.98s/it] 61%|██████ | 6080/10000 [23:54:48<15:12:44, 13.97s/it] {'loss': 0.0118, 'learning_rate': 1.9645000000000002e-05, 'epoch': 7.96} 61%|██████ | 6080/10000 [23:54:48<15:12:44, 13.97s/it] 61%|██████ | 6081/10000 [23:55:01<15:11:40, 13.96s/it] {'loss': 0.0166, 'learning_rate': 1.9640000000000002e-05, 'epoch': 7.96} 61%|██████ | 6081/10000 [23:55:02<15:11:40, 13.96s/it] 61%|██████ | 6082/10000 [23:55:15<15:12:34, 13.98s/it] {'loss': 0.0131, 'learning_rate': 1.9635e-05, 'epoch': 7.96} 61%|██████ | 6082/10000 [23:55:16<15:12:34, 13.98s/it] 61%|██████ | 6083/10000 [23:55:29<15:11:28, 13.96s/it] {'loss': 0.0152, 'learning_rate': 1.963e-05, 'epoch': 7.96} 61%|██████ | 6083/10000 [23:55:29<15:11:28, 13.96s/it] 61%|██████ | 6084/10000 [23:55:44<15:14:01, 14.00s/it] {'loss': 0.0159, 'learning_rate': 1.9625000000000003e-05, 'epoch': 7.96} 61%|██████ | 6084/10000 [23:55:44<15:14:01, 14.00s/it] 61%|██████ | 6085/10000 [23:55:58<15:15:02, 14.02s/it] {'loss': 0.0153, 'learning_rate': 1.9620000000000002e-05, 'epoch': 7.96} 61%|██████ | 6085/10000 [23:55:58<15:15:02, 14.02s/it] 61%|██████ | 6086/10000 [23:56:11<15:11:12, 13.97s/it] {'loss': 0.0097, 'learning_rate': 1.9615e-05, 'epoch': 7.97} 61%|██████ | 6086/10000 [23:56:11<15:11:12, 13.97s/it] 61%|██████ | 6087/10000 [23:56:25<15:11:44, 13.98s/it] {'loss': 0.0149, 'learning_rate': 1.961e-05, 'epoch': 7.97} 61%|██████ | 6087/10000 [23:56:25<15:11:44, 13.98s/it] 61%|██████ | 6088/10000 [23:56:39<15:11:49, 13.99s/it] {'loss': 0.0143, 'learning_rate': 1.9605e-05, 'epoch': 7.97} 61%|██████ | 6088/10000 [23:56:39<15:11:49, 13.99s/it] 61%|██████ | 6089/10000 [23:56:53<15:12:58, 14.01s/it] {'loss': 0.0183, 'learning_rate': 1.9600000000000002e-05, 'epoch': 7.97} 61%|██████ | 6089/10000 [23:56:54<15:12:58, 14.01s/it] 61%|██████ | 6090/10000 [23:57:08<15:13:20, 14.02s/it] {'loss': 0.0143, 'learning_rate': 1.9595e-05, 'epoch': 7.97} 61%|██████ | 6090/10000 [23:57:08<15:13:20, 14.02s/it] 61%|██████ | 6091/10000 [23:57:22<15:16:09, 14.06s/it] {'loss': 0.0147, 'learning_rate': 1.959e-05, 'epoch': 7.97} 61%|██████ | 6091/10000 [23:57:22<15:16:09, 14.06s/it] 61%|██████ | 6092/10000 [23:57:36<15:14:13, 14.04s/it] {'loss': 0.0131, 'learning_rate': 1.9585e-05, 'epoch': 7.97} 61%|██████ | 6092/10000 [23:57:36<15:14:13, 14.04s/it] 61%|██████ | 6093/10000 [23:57:50<15:12:29, 14.01s/it] {'loss': 0.0145, 'learning_rate': 1.9580000000000002e-05, 'epoch': 7.98} 61%|██████ | 6093/10000 [23:57:50<15:12:29, 14.01s/it] 61%|██████ | 6094/10000 [23:58:04<15:11:23, 14.00s/it] {'loss': 0.0137, 'learning_rate': 1.9575e-05, 'epoch': 7.98} 61%|██████ | 6094/10000 [23:58:04<15:11:23, 14.00s/it] 61%|██████ | 6095/10000 [23:58:17<15:08:39, 13.96s/it] {'loss': 0.0129, 'learning_rate': 1.957e-05, 'epoch': 7.98} 61%|██████ | 6095/10000 [23:58:18<15:08:39, 13.96s/it] 61%|██████ | 6096/10000 [23:58:31<15:09:32, 13.98s/it] {'loss': 0.0137, 'learning_rate': 1.9565e-05, 'epoch': 7.98} 61%|██████ | 6096/10000 [23:58:32<15:09:32, 13.98s/it] 61%|██████ | 6097/10000 [23:58:46<15:13:13, 14.04s/it] {'loss': 0.0148, 'learning_rate': 1.956e-05, 'epoch': 7.98} 61%|██████ | 6097/10000 [23:58:46<15:13:13, 14.04s/it] 61%|██████ | 6098/10000 [23:59:00<15:11:07, 14.01s/it] {'loss': 0.0182, 'learning_rate': 1.9555e-05, 'epoch': 7.98} 61%|██████ | 6098/10000 [23:59:00<15:11:07, 14.01s/it] 61%|██████ | 6099/10000 [23:59:14<15:13:16, 14.05s/it] {'loss': 0.012, 'learning_rate': 1.955e-05, 'epoch': 7.98} 61%|██████ | 6099/10000 [23:59:14<15:13:16, 14.05s/it] 61%|██████ | 6100/10000 [23:59:28<15:13:21, 14.05s/it] {'loss': 0.0128, 'learning_rate': 1.9545000000000003e-05, 'epoch': 7.98} 61%|██████ | 6100/10000 [23:59:28<15:13:21, 14.05s/it] 61%|██████ | 6101/10000 [23:59:42<15:08:54, 13.99s/it] {'loss': 0.0122, 'learning_rate': 1.954e-05, 'epoch': 7.99} 61%|██████ | 6101/10000 [23:59:42<15:08:54, 13.99s/it] 61%|██████ | 6102/10000 [23:59:55<15:05:58, 13.95s/it] {'loss': 0.0107, 'learning_rate': 1.9535000000000002e-05, 'epoch': 7.99} 61%|██████ | 6102/10000 [23:59:56<15:05:58, 13.95s/it] 61%|██████ | 6103/10000 [24:00:09<15:05:47, 13.95s/it] {'loss': 0.016, 'learning_rate': 1.953e-05, 'epoch': 7.99} 61%|██████ | 6103/10000 [24:00:09<15:05:47, 13.95s/it] 61%|██████ | 6104/10000 [24:00:23<15:06:55, 13.97s/it] {'loss': 0.0164, 'learning_rate': 1.9525e-05, 'epoch': 7.99} 61%|██████ | 6104/10000 [24:00:24<15:06:55, 13.97s/it] 61%|██████ | 6105/10000 [24:00:37<15:07:31, 13.98s/it] {'loss': 0.017, 'learning_rate': 1.9520000000000003e-05, 'epoch': 7.99} 61%|██████ | 6105/10000 [24:00:38<15:07:31, 13.98s/it] 61%|██████ | 6106/10000 [24:00:51<15:06:11, 13.96s/it] {'loss': 0.0149, 'learning_rate': 1.9515e-05, 'epoch': 7.99} 61%|██████ | 6106/10000 [24:00:51<15:06:11, 13.96s/it] 61%|██████ | 6107/10000 [24:01:05<15:06:59, 13.98s/it] {'loss': 0.0156, 'learning_rate': 1.951e-05, 'epoch': 7.99} 61%|██████ | 6107/10000 [24:01:05<15:06:59, 13.98s/it] 61%|██████ | 6108/10000 [24:01:20<15:09:25, 14.02s/it] {'loss': 0.0147, 'learning_rate': 1.9505e-05, 'epoch': 7.99} 61%|██████ | 6108/10000 [24:01:20<15:09:25, 14.02s/it] 61%|██████ | 6109/10000 [24:01:34<15:08:38, 14.01s/it] {'loss': 0.0146, 'learning_rate': 1.9500000000000003e-05, 'epoch': 8.0} 61%|██████ | 6109/10000 [24:01:34<15:08:38, 14.01s/it] 61%|██████ | 6110/10000 [24:01:47<15:07:35, 14.00s/it] {'loss': 0.0149, 'learning_rate': 1.9495000000000002e-05, 'epoch': 8.0} 61%|██████ | 6110/10000 [24:01:48<15:07:35, 14.00s/it] 61%|██████ | 6111/10000 [24:02:02<15:08:02, 14.01s/it] {'loss': 0.0209, 'learning_rate': 1.949e-05, 'epoch': 8.0} 61%|██████ | 6111/10000 [24:02:02<15:08:02, 14.01s/it] 61%|██████ | 6112/10000 [24:02:14<14:41:48, 13.61s/it] {'loss': 0.0116, 'learning_rate': 1.9485e-05, 'epoch': 8.0} 61%|██████ | 6112/10000 [24:02:14<14:41:48, 13.61s/it] 61%|██████ | 6113/10000 [24:02:28<14:47:29, 13.70s/it] {'loss': 0.0082, 'learning_rate': 1.948e-05, 'epoch': 8.0} 61%|██████ | 6113/10000 [24:02:28<14:47:29, 13.70s/it] 61%|██████ | 6114/10000 [24:02:42<14:54:20, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.9475000000000002e-05, 'epoch': 8.0} 61%|██████ | 6114/10000 [24:02:42<14:54:20, 13.81s/it] 61%|██████ | 6115/10000 [24:02:56<14:56:58, 13.85s/it] {'loss': 0.0075, 'learning_rate': 1.947e-05, 'epoch': 8.0} 61%|██████ | 6115/10000 [24:02:56<14:56:58, 13.85s/it] 61%|██████ | 6116/10000 [24:03:10<14:59:46, 13.90s/it] {'loss': 0.0091, 'learning_rate': 1.9465e-05, 'epoch': 8.01} 61%|██████ | 6116/10000 [24:03:10<14:59:46, 13.90s/it] 61%|██████ | 6117/10000 [24:03:24<15:01:52, 13.94s/it] {'loss': 0.0098, 'learning_rate': 1.946e-05, 'epoch': 8.01} 61%|██████ | 6117/10000 [24:03:24<15:01:52, 13.94s/it] 61%|██████ | 6118/10000 [24:03:38<15:05:12, 13.99s/it] {'loss': 0.0077, 'learning_rate': 1.9455000000000003e-05, 'epoch': 8.01} 61%|██████ | 6118/10000 [24:03:38<15:05:12, 13.99s/it] 61%|██████ | 6119/10000 [24:03:52<15:03:49, 13.97s/it] {'loss': 0.008, 'learning_rate': 1.9450000000000002e-05, 'epoch': 8.01} 61%|██████ | 6119/10000 [24:03:52<15:03:49, 13.97s/it] 61%|██████ | 6120/10000 [24:04:06<15:03:43, 13.98s/it] {'loss': 0.0085, 'learning_rate': 1.9445e-05, 'epoch': 8.01} 61%|██████ | 6120/10000 [24:04:06<15:03:43, 13.98s/it] 61%|██████ | 6121/10000 [24:04:20<15:03:35, 13.98s/it] {'loss': 0.0087, 'learning_rate': 1.944e-05, 'epoch': 8.01} 61%|██████ | 6121/10000 [24:04:20<15:03:35, 13.98s/it] 61%|██████ | 6122/10000 [24:04:34<15:02:24, 13.96s/it] {'loss': 0.0075, 'learning_rate': 1.9435e-05, 'epoch': 8.01} 61%|██████ | 6122/10000 [24:04:34<15:02:24, 13.96s/it] 61%|██████ | 6123/10000 [24:04:48<15:00:31, 13.94s/it] {'loss': 0.0097, 'learning_rate': 1.9430000000000002e-05, 'epoch': 8.01} 61%|██████ | 6123/10000 [24:04:48<15:00:31, 13.94s/it] 61%|██████ | 6124/10000 [24:05:02<15:02:08, 13.96s/it] {'loss': 0.0078, 'learning_rate': 1.9425e-05, 'epoch': 8.02} 61%|██████ | 6124/10000 [24:05:02<15:02:08, 13.96s/it] 61%|██████▏ | 6125/10000 [24:05:16<15:02:02, 13.97s/it] {'loss': 0.0095, 'learning_rate': 1.942e-05, 'epoch': 8.02} 61%|██████▏ | 6125/10000 [24:05:16<15:02:02, 13.97s/it] 61%|██████▏ | 6126/10000 [24:05:30<15:03:26, 13.99s/it] {'loss': 0.0099, 'learning_rate': 1.9415e-05, 'epoch': 8.02} 61%|██████▏ | 6126/10000 [24:05:30<15:03:26, 13.99s/it] 61%|██████▏ | 6127/10000 [24:05:44<15:00:11, 13.95s/it] {'loss': 0.0086, 'learning_rate': 1.941e-05, 'epoch': 8.02} 61%|██████▏ | 6127/10000 [24:05:44<15:00:11, 13.95s/it] 61%|██████▏ | 6128/10000 [24:05:58<15:02:14, 13.98s/it] {'loss': 0.0102, 'learning_rate': 1.9405e-05, 'epoch': 8.02} 61%|██████▏ | 6128/10000 [24:05:58<15:02:14, 13.98s/it] 61%|██████▏ | 6129/10000 [24:06:12<15:00:21, 13.96s/it] {'loss': 0.0089, 'learning_rate': 1.94e-05, 'epoch': 8.02} 61%|██████▏ | 6129/10000 [24:06:12<15:00:21, 13.96s/it] 61%|██████▏ | 6130/10000 [24:06:26<15:02:27, 13.99s/it] {'loss': 0.008, 'learning_rate': 1.9395000000000003e-05, 'epoch': 8.02} 61%|██████▏ | 6130/10000 [24:06:26<15:02:27, 13.99s/it] 61%|██████▏ | 6131/10000 [24:06:40<15:03:09, 14.01s/it] {'loss': 0.0108, 'learning_rate': 1.939e-05, 'epoch': 8.02} 61%|██████▏ | 6131/10000 [24:06:40<15:03:09, 14.01s/it] 61%|██████▏ | 6132/10000 [24:06:54<15:00:34, 13.97s/it] {'loss': 0.0093, 'learning_rate': 1.9385e-05, 'epoch': 8.03} 61%|██████▏ | 6132/10000 [24:06:54<15:00:34, 13.97s/it] 61%|██████▏ | 6133/10000 [24:07:08<14:59:25, 13.96s/it] {'loss': 0.0083, 'learning_rate': 1.938e-05, 'epoch': 8.03} 61%|██████▏ | 6133/10000 [24:07:08<14:59:25, 13.96s/it] 61%|██████▏ | 6134/10000 [24:07:22<14:57:50, 13.93s/it] {'loss': 0.0086, 'learning_rate': 1.9375e-05, 'epoch': 8.03} 61%|██████▏ | 6134/10000 [24:07:22<14:57:50, 13.93s/it] 61%|██████▏ | 6135/10000 [24:07:35<14:55:23, 13.90s/it] {'loss': 0.007, 'learning_rate': 1.9370000000000003e-05, 'epoch': 8.03} 61%|██████▏ | 6135/10000 [24:07:35<14:55:23, 13.90s/it] 61%|██████▏ | 6136/10000 [24:07:49<14:55:42, 13.91s/it] {'loss': 0.0134, 'learning_rate': 1.9365e-05, 'epoch': 8.03} 61%|██████▏ | 6136/10000 [24:07:49<14:55:42, 13.91s/it] 61%|██████▏ | 6137/10000 [24:08:03<14:53:57, 13.88s/it] {'loss': 0.0086, 'learning_rate': 1.936e-05, 'epoch': 8.03} 61%|██████▏ | 6137/10000 [24:08:03<14:53:57, 13.88s/it] 61%|██████▏ | 6138/10000 [24:08:17<14:52:35, 13.87s/it] {'loss': 0.0077, 'learning_rate': 1.9355e-05, 'epoch': 8.03} 61%|██████▏ | 6138/10000 [24:08:17<14:52:35, 13.87s/it] 61%|██████▏ | 6139/10000 [24:08:31<14:51:31, 13.85s/it] {'loss': 0.0076, 'learning_rate': 1.9350000000000003e-05, 'epoch': 8.04} 61%|██████▏ | 6139/10000 [24:08:31<14:51:31, 13.85s/it] 61%|██████▏ | 6140/10000 [24:08:45<14:51:10, 13.85s/it] {'loss': 0.0104, 'learning_rate': 1.9345000000000002e-05, 'epoch': 8.04} 61%|██████▏ | 6140/10000 [24:08:45<14:51:10, 13.85s/it] 61%|██████▏ | 6141/10000 [24:08:59<14:53:19, 13.89s/it] {'loss': 0.0085, 'learning_rate': 1.934e-05, 'epoch': 8.04} 61%|██████▏ | 6141/10000 [24:08:59<14:53:19, 13.89s/it] 61%|██████▏ | 6142/10000 [24:09:13<14:54:55, 13.92s/it] {'loss': 0.0076, 'learning_rate': 1.9335e-05, 'epoch': 8.04} 61%|██████▏ | 6142/10000 [24:09:13<14:54:55, 13.92s/it] 61%|██████▏ | 6143/10000 [24:09:27<14:53:20, 13.90s/it] {'loss': 0.0072, 'learning_rate': 1.933e-05, 'epoch': 8.04} 61%|██████▏ | 6143/10000 [24:09:27<14:53:20, 13.90s/it] 61%|██████▏ | 6144/10000 [24:09:40<14:54:05, 13.91s/it] {'loss': 0.0077, 'learning_rate': 1.9325000000000002e-05, 'epoch': 8.04} 61%|██████▏ | 6144/10000 [24:09:41<14:54:05, 13.91s/it] 61%|██████▏ | 6145/10000 [24:09:54<14:52:51, 13.90s/it] {'loss': 0.0087, 'learning_rate': 1.932e-05, 'epoch': 8.04} 61%|██████▏ | 6145/10000 [24:09:54<14:52:51, 13.90s/it] 61%|██████▏ | 6146/10000 [24:10:08<14:52:55, 13.90s/it] {'loss': 0.0074, 'learning_rate': 1.9315e-05, 'epoch': 8.04} 61%|██████▏ | 6146/10000 [24:10:08<14:52:55, 13.90s/it] 61%|██████▏ | 6147/10000 [24:10:22<14:52:32, 13.90s/it] {'loss': 0.0108, 'learning_rate': 1.931e-05, 'epoch': 8.05} 61%|██████▏ | 6147/10000 [24:10:22<14:52:32, 13.90s/it] 61%|██████▏ | 6148/10000 [24:10:36<14:51:45, 13.89s/it] {'loss': 0.0085, 'learning_rate': 1.9305000000000002e-05, 'epoch': 8.05} 61%|██████▏ | 6148/10000 [24:10:36<14:51:45, 13.89s/it] 61%|██████▏ | 6149/10000 [24:10:50<14:52:33, 13.91s/it] {'loss': 0.0082, 'learning_rate': 1.93e-05, 'epoch': 8.05} 61%|██████▏ | 6149/10000 [24:10:50<14:52:33, 13.91s/it] 62%|██████▏ | 6150/10000 [24:11:04<14:50:17, 13.87s/it] {'loss': 0.009, 'learning_rate': 1.9295e-05, 'epoch': 8.05} 62%|██████▏ | 6150/10000 [24:11:04<14:50:17, 13.87s/it] 62%|██████▏ | 6151/10000 [24:11:18<14:50:13, 13.88s/it] {'loss': 0.0119, 'learning_rate': 1.929e-05, 'epoch': 8.05} 62%|██████▏ | 6151/10000 [24:11:18<14:50:13, 13.88s/it] 62%|██████▏ | 6152/10000 [24:11:31<14:48:38, 13.86s/it] {'loss': 0.0088, 'learning_rate': 1.9285e-05, 'epoch': 8.05} 62%|██████▏ | 6152/10000 [24:11:31<14:48:38, 13.86s/it] 62%|██████▏ | 6153/10000 [24:11:45<14:48:09, 13.85s/it] {'loss': 0.0106, 'learning_rate': 1.9280000000000002e-05, 'epoch': 8.05} 62%|██████▏ | 6153/10000 [24:11:45<14:48:09, 13.85s/it] 62%|██████▏ | 6154/10000 [24:11:59<14:45:43, 13.82s/it] {'loss': 0.0105, 'learning_rate': 1.9275e-05, 'epoch': 8.05} 62%|██████▏ | 6154/10000 [24:11:59<14:45:43, 13.82s/it] 62%|██████▏ | 6155/10000 [24:12:13<14:48:48, 13.87s/it] {'loss': 0.0083, 'learning_rate': 1.9270000000000004e-05, 'epoch': 8.06} 62%|██████▏ | 6155/10000 [24:12:13<14:48:48, 13.87s/it] 62%|██████▏ | 6156/10000 [24:12:27<14:49:26, 13.88s/it] {'loss': 0.0104, 'learning_rate': 1.9265e-05, 'epoch': 8.06} 62%|██████▏ | 6156/10000 [24:12:27<14:49:26, 13.88s/it] 62%|██████▏ | 6157/10000 [24:12:41<14:48:55, 13.88s/it] {'loss': 0.009, 'learning_rate': 1.9260000000000002e-05, 'epoch': 8.06} 62%|██████▏ | 6157/10000 [24:12:41<14:48:55, 13.88s/it] 62%|██████▏ | 6158/10000 [24:12:55<14:51:16, 13.92s/it] {'loss': 0.0087, 'learning_rate': 1.9255e-05, 'epoch': 8.06} 62%|██████▏ | 6158/10000 [24:12:55<14:51:16, 13.92s/it] 62%|██████▏ | 6159/10000 [24:13:09<14:50:07, 13.90s/it] {'loss': 0.0094, 'learning_rate': 1.925e-05, 'epoch': 8.06} 62%|██████▏ | 6159/10000 [24:13:09<14:50:07, 13.90s/it] 62%|██████▏ | 6160/10000 [24:13:23<14:48:47, 13.89s/it] {'loss': 0.0095, 'learning_rate': 1.9245000000000003e-05, 'epoch': 8.06} 62%|██████▏ | 6160/10000 [24:13:23<14:48:47, 13.89s/it] 62%|██████▏ | 6161/10000 [24:13:37<14:52:22, 13.95s/it] {'loss': 0.009, 'learning_rate': 1.924e-05, 'epoch': 8.06} 62%|██████▏ | 6161/10000 [24:13:37<14:52:22, 13.95s/it] 62%|██████▏ | 6162/10000 [24:13:50<14:48:45, 13.89s/it] {'loss': 0.0092, 'learning_rate': 1.9235e-05, 'epoch': 8.07} 62%|██████▏ | 6162/10000 [24:13:50<14:48:45, 13.89s/it] 62%|██████▏ | 6163/10000 [24:14:04<14:48:38, 13.90s/it] {'loss': 0.0094, 'learning_rate': 1.923e-05, 'epoch': 8.07} 62%|██████▏ | 6163/10000 [24:14:04<14:48:38, 13.90s/it] 62%|██████▏ | 6164/10000 [24:14:18<14:46:08, 13.86s/it] {'loss': 0.009, 'learning_rate': 1.9225e-05, 'epoch': 8.07} 62%|██████▏ | 6164/10000 [24:14:18<14:46:08, 13.86s/it] 62%|██████▏ | 6165/10000 [24:14:32<14:48:23, 13.90s/it] {'loss': 0.0077, 'learning_rate': 1.9220000000000002e-05, 'epoch': 8.07} 62%|██████▏ | 6165/10000 [24:14:32<14:48:23, 13.90s/it] 62%|██████▏ | 6166/10000 [24:14:46<14:48:29, 13.90s/it] {'loss': 0.0088, 'learning_rate': 1.9214999999999998e-05, 'epoch': 8.07} 62%|██████▏ | 6166/10000 [24:14:46<14:48:29, 13.90s/it] 62%|██████▏ | 6167/10000 [24:15:00<14:47:28, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.921e-05, 'epoch': 8.07} 62%|██████▏ | 6167/10000 [24:15:00<14:47:28, 13.89s/it] 62%|██████▏ | 6168/10000 [24:15:14<14:47:00, 13.89s/it] {'loss': 0.0096, 'learning_rate': 1.9205e-05, 'epoch': 8.07} 62%|██████▏ | 6168/10000 [24:15:14<14:47:00, 13.89s/it] 62%|██████▏ | 6169/10000 [24:15:28<14:45:22, 13.87s/it] {'loss': 0.0113, 'learning_rate': 1.9200000000000003e-05, 'epoch': 8.07} 62%|██████▏ | 6169/10000 [24:15:28<14:45:22, 13.87s/it] 62%|██████▏ | 6170/10000 [24:15:41<14:45:20, 13.87s/it] {'loss': 0.0084, 'learning_rate': 1.9195000000000002e-05, 'epoch': 8.08} 62%|██████▏ | 6170/10000 [24:15:41<14:45:20, 13.87s/it] 62%|██████▏ | 6171/10000 [24:15:55<14:47:50, 13.91s/it] {'loss': 0.0086, 'learning_rate': 1.919e-05, 'epoch': 8.08} 62%|██████▏ | 6171/10000 [24:15:55<14:47:50, 13.91s/it] 62%|██████▏ | 6172/10000 [24:16:09<14:48:12, 13.92s/it] {'loss': 0.0065, 'learning_rate': 1.9185e-05, 'epoch': 8.08} 62%|██████▏ | 6172/10000 [24:16:09<14:48:12, 13.92s/it] 62%|██████▏ | 6173/10000 [24:16:23<14:45:48, 13.89s/it] {'loss': 0.0087, 'learning_rate': 1.918e-05, 'epoch': 8.08} 62%|██████▏ | 6173/10000 [24:16:23<14:45:48, 13.89s/it] 62%|██████▏ | 6174/10000 [24:16:37<14:48:01, 13.93s/it] {'loss': 0.0089, 'learning_rate': 1.9175000000000002e-05, 'epoch': 8.08} 62%|██████▏ | 6174/10000 [24:16:37<14:48:01, 13.93s/it] 62%|██████▏ | 6175/10000 [24:16:51<14:48:30, 13.94s/it] {'loss': 0.0107, 'learning_rate': 1.917e-05, 'epoch': 8.08} 62%|██████▏ | 6175/10000 [24:16:51<14:48:30, 13.94s/it] 62%|██████▏ | 6176/10000 [24:17:05<14:51:28, 13.99s/it] {'loss': 0.0083, 'learning_rate': 1.9165e-05, 'epoch': 8.08} 62%|██████▏ | 6176/10000 [24:17:05<14:51:28, 13.99s/it] 62%|██████▏ | 6177/10000 [24:17:19<14:48:34, 13.95s/it] {'loss': 0.0094, 'learning_rate': 1.916e-05, 'epoch': 8.09} 62%|██████▏ | 6177/10000 [24:17:19<14:48:34, 13.95s/it] 62%|██████▏ | 6178/10000 [24:17:33<14:48:32, 13.95s/it] {'loss': 0.0093, 'learning_rate': 1.9155000000000002e-05, 'epoch': 8.09} 62%|██████▏ | 6178/10000 [24:17:33<14:48:32, 13.95s/it] 62%|██████▏ | 6179/10000 [24:17:47<14:47:32, 13.94s/it] {'loss': 0.0084, 'learning_rate': 1.915e-05, 'epoch': 8.09} 62%|██████▏ | 6179/10000 [24:17:47<14:47:32, 13.94s/it] 62%|██████▏ | 6180/10000 [24:18:01<14:45:04, 13.90s/it] {'loss': 0.0085, 'learning_rate': 1.9145e-05, 'epoch': 8.09} 62%|██████▏ | 6180/10000 [24:18:01<14:45:04, 13.90s/it] 62%|██████▏ | 6181/10000 [24:18:15<14:47:00, 13.94s/it] {'loss': 0.0086, 'learning_rate': 1.914e-05, 'epoch': 8.09} 62%|██████▏ | 6181/10000 [24:18:15<14:47:00, 13.94s/it] 62%|██████▏ | 6182/10000 [24:18:29<14:44:25, 13.90s/it] {'loss': 0.0077, 'learning_rate': 1.9135e-05, 'epoch': 8.09} 62%|██████▏ | 6182/10000 [24:18:29<14:44:25, 13.90s/it] 62%|██████▏ | 6183/10000 [24:18:43<14:45:44, 13.92s/it] {'loss': 0.0092, 'learning_rate': 1.913e-05, 'epoch': 8.09} 62%|██████▏ | 6183/10000 [24:18:43<14:45:44, 13.92s/it] 62%|██████▏ | 6184/10000 [24:18:56<14:42:00, 13.87s/it] {'loss': 0.0092, 'learning_rate': 1.9125e-05, 'epoch': 8.09} 62%|██████▏ | 6184/10000 [24:18:56<14:42:00, 13.87s/it] 62%|██████▏ | 6185/10000 [24:19:10<14:43:22, 13.89s/it] {'loss': 0.0093, 'learning_rate': 1.9120000000000003e-05, 'epoch': 8.1} 62%|██████▏ | 6185/10000 [24:19:10<14:43:22, 13.89s/it] 62%|██████▏ | 6186/10000 [24:19:24<14:42:49, 13.89s/it] {'loss': 0.0102, 'learning_rate': 1.9115e-05, 'epoch': 8.1} 62%|██████▏ | 6186/10000 [24:19:24<14:42:49, 13.89s/it] 62%|██████▏ | 6187/10000 [24:19:38<14:45:02, 13.93s/it] {'loss': 0.0094, 'learning_rate': 1.911e-05, 'epoch': 8.1} 62%|██████▏ | 6187/10000 [24:19:38<14:45:02, 13.93s/it] 62%|██████▏ | 6188/10000 [24:19:52<14:44:45, 13.93s/it] {'loss': 0.0073, 'learning_rate': 1.9105e-05, 'epoch': 8.1} 62%|██████▏ | 6188/10000 [24:19:52<14:44:45, 13.93s/it] 62%|██████▏ | 6189/10000 [24:20:06<14:44:56, 13.93s/it] {'loss': 0.0093, 'learning_rate': 1.91e-05, 'epoch': 8.1} 62%|██████▏ | 6189/10000 [24:20:06<14:44:56, 13.93s/it] 62%|██████▏ | 6190/10000 [24:20:20<14:43:45, 13.92s/it] {'loss': 0.0112, 'learning_rate': 1.9095000000000003e-05, 'epoch': 8.1} 62%|██████▏ | 6190/10000 [24:20:20<14:43:45, 13.92s/it] 62%|██████▏ | 6191/10000 [24:20:34<14:41:24, 13.88s/it] {'loss': 0.0079, 'learning_rate': 1.909e-05, 'epoch': 8.1} 62%|██████▏ | 6191/10000 [24:20:34<14:41:24, 13.88s/it] 62%|██████▏ | 6192/10000 [24:20:48<14:40:52, 13.88s/it] {'loss': 0.0091, 'learning_rate': 1.9085e-05, 'epoch': 8.1} 62%|██████▏ | 6192/10000 [24:20:48<14:40:52, 13.88s/it] 62%|██████▏ | 6193/10000 [24:21:01<14:40:23, 13.88s/it] {'loss': 0.0099, 'learning_rate': 1.908e-05, 'epoch': 8.11} 62%|██████▏ | 6193/10000 [24:21:02<14:40:23, 13.88s/it] 62%|██████▏ | 6194/10000 [24:21:15<14:38:55, 13.86s/it] {'loss': 0.0084, 'learning_rate': 1.9075000000000003e-05, 'epoch': 8.11} 62%|██████▏ | 6194/10000 [24:21:15<14:38:55, 13.86s/it] 62%|██████▏ | 6195/10000 [24:21:29<14:38:16, 13.85s/it] {'loss': 0.0098, 'learning_rate': 1.9070000000000002e-05, 'epoch': 8.11} 62%|██████▏ | 6195/10000 [24:21:29<14:38:16, 13.85s/it] 62%|██████▏ | 6196/10000 [24:21:43<14:40:00, 13.88s/it] {'loss': 0.0089, 'learning_rate': 1.9064999999999998e-05, 'epoch': 8.11} 62%|██████▏ | 6196/10000 [24:21:43<14:40:00, 13.88s/it] 62%|██████▏ | 6197/10000 [24:21:57<14:40:40, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.906e-05, 'epoch': 8.11} 62%|██████▏ | 6197/10000 [24:21:57<14:40:40, 13.89s/it] 62%|██████▏ | 6198/10000 [24:22:11<14:40:02, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.9055e-05, 'epoch': 8.11} 62%|██████▏ | 6198/10000 [24:22:11<14:40:02, 13.89s/it] 62%|██████▏ | 6199/10000 [24:22:25<14:41:36, 13.92s/it] {'loss': 0.009, 'learning_rate': 1.9050000000000002e-05, 'epoch': 8.11} 62%|██████▏ | 6199/10000 [24:22:25<14:41:36, 13.92s/it] 62%|██████▏ | 6200/10000 [24:22:39<14:44:06, 13.96s/it] {'loss': 0.0159, 'learning_rate': 1.9045e-05, 'epoch': 8.12} 62%|██████▏ | 6200/10000 [24:22:39<14:44:06, 13.96s/it] 62%|██████▏ | 6201/10000 [24:22:53<14:43:20, 13.95s/it] {'loss': 0.0105, 'learning_rate': 1.904e-05, 'epoch': 8.12} 62%|██████▏ | 6201/10000 [24:22:53<14:43:20, 13.95s/it] 62%|██████▏ | 6202/10000 [24:23:07<14:41:45, 13.93s/it] {'loss': 0.0095, 'learning_rate': 1.9035e-05, 'epoch': 8.12} 62%|██████▏ | 6202/10000 [24:23:07<14:41:45, 13.93s/it] 62%|██████▏ | 6203/10000 [24:23:21<14:43:59, 13.97s/it] {'loss': 0.0108, 'learning_rate': 1.903e-05, 'epoch': 8.12} 62%|██████▏ | 6203/10000 [24:23:21<14:43:59, 13.97s/it] 62%|██████▏ | 6204/10000 [24:23:35<14:42:13, 13.94s/it] {'loss': 0.0099, 'learning_rate': 1.9025e-05, 'epoch': 8.12} 62%|██████▏ | 6204/10000 [24:23:35<14:42:13, 13.94s/it] 62%|██████▏ | 6205/10000 [24:23:49<14:40:47, 13.93s/it] {'loss': 0.009, 'learning_rate': 1.902e-05, 'epoch': 8.12} 62%|██████▏ | 6205/10000 [24:23:49<14:40:47, 13.93s/it] 62%|██████▏ | 6206/10000 [24:24:02<14:40:35, 13.93s/it] {'loss': 0.0093, 'learning_rate': 1.9015000000000003e-05, 'epoch': 8.12} 62%|██████▏ | 6206/10000 [24:24:02<14:40:35, 13.93s/it] 62%|██████▏ | 6207/10000 [24:24:16<14:41:23, 13.94s/it] {'loss': 0.0084, 'learning_rate': 1.901e-05, 'epoch': 8.12} 62%|██████▏ | 6207/10000 [24:24:16<14:41:23, 13.94s/it] 62%|██████▏ | 6208/10000 [24:24:30<14:38:54, 13.91s/it] {'loss': 0.01, 'learning_rate': 1.9005000000000002e-05, 'epoch': 8.13} 62%|██████▏ | 6208/10000 [24:24:30<14:38:54, 13.91s/it] 62%|██████▏ | 6209/10000 [24:24:44<14:38:00, 13.90s/it] {'loss': 0.0086, 'learning_rate': 1.9e-05, 'epoch': 8.13} 62%|██████▏ | 6209/10000 [24:24:44<14:38:00, 13.90s/it] 62%|██████▏ | 6210/10000 [24:24:58<14:36:51, 13.88s/it] {'loss': 0.0089, 'learning_rate': 1.8995e-05, 'epoch': 8.13} 62%|██████▏ | 6210/10000 [24:24:58<14:36:51, 13.88s/it] 62%|██████▏ | 6211/10000 [24:25:12<14:36:12, 13.88s/it] {'loss': 0.0061, 'learning_rate': 1.8990000000000003e-05, 'epoch': 8.13} 62%|██████▏ | 6211/10000 [24:25:12<14:36:12, 13.88s/it] 62%|██████▏ | 6212/10000 [24:25:26<14:36:36, 13.89s/it] {'loss': 0.0063, 'learning_rate': 1.8985e-05, 'epoch': 8.13} 62%|██████▏ | 6212/10000 [24:25:26<14:36:36, 13.89s/it] 62%|██████▏ | 6213/10000 [24:25:40<14:37:16, 13.90s/it] {'loss': 0.0072, 'learning_rate': 1.898e-05, 'epoch': 8.13} 62%|██████▏ | 6213/10000 [24:25:40<14:37:16, 13.90s/it] 62%|██████▏ | 6214/10000 [24:25:53<14:34:28, 13.86s/it] {'loss': 0.0117, 'learning_rate': 1.8975e-05, 'epoch': 8.13} 62%|██████▏ | 6214/10000 [24:25:54<14:34:28, 13.86s/it] 62%|██████▏ | 6215/10000 [24:26:07<14:35:13, 13.87s/it] {'loss': 0.0098, 'learning_rate': 1.8970000000000003e-05, 'epoch': 8.13} 62%|██████▏ | 6215/10000 [24:26:07<14:35:13, 13.87s/it] 62%|██████▏ | 6216/10000 [24:26:21<14:34:37, 13.87s/it] {'loss': 0.0071, 'learning_rate': 1.8965000000000002e-05, 'epoch': 8.14} 62%|██████▏ | 6216/10000 [24:26:21<14:34:37, 13.87s/it] 62%|██████▏ | 6217/10000 [24:26:35<14:33:05, 13.85s/it] {'loss': 0.0097, 'learning_rate': 1.896e-05, 'epoch': 8.14} 62%|██████▏ | 6217/10000 [24:26:35<14:33:05, 13.85s/it] 62%|██████▏ | 6218/10000 [24:26:49<14:31:16, 13.82s/it] {'loss': 0.0083, 'learning_rate': 1.8955e-05, 'epoch': 8.14} 62%|██████▏ | 6218/10000 [24:26:49<14:31:16, 13.82s/it] 62%|██████▏ | 6219/10000 [24:27:03<14:32:51, 13.85s/it] {'loss': 0.0105, 'learning_rate': 1.895e-05, 'epoch': 8.14} 62%|██████▏ | 6219/10000 [24:27:03<14:32:51, 13.85s/it] 62%|██████▏ | 6220/10000 [24:27:17<14:32:55, 13.86s/it] {'loss': 0.007, 'learning_rate': 1.8945000000000002e-05, 'epoch': 8.14} 62%|██████▏ | 6220/10000 [24:27:17<14:32:55, 13.86s/it] 62%|██████▏ | 6221/10000 [24:27:30<14:31:54, 13.84s/it] {'loss': 0.009, 'learning_rate': 1.894e-05, 'epoch': 8.14} 62%|██████▏ | 6221/10000 [24:27:30<14:31:54, 13.84s/it] 62%|██████▏ | 6222/10000 [24:27:44<14:31:15, 13.84s/it] {'loss': 0.0079, 'learning_rate': 1.8935e-05, 'epoch': 8.14} 62%|██████▏ | 6222/10000 [24:27:44<14:31:15, 13.84s/it] 62%|██████▏ | 6223/10000 [24:27:58<14:31:08, 13.84s/it] {'loss': 0.0083, 'learning_rate': 1.893e-05, 'epoch': 8.15} 62%|██████▏ | 6223/10000 [24:27:58<14:31:08, 13.84s/it] 62%|██████▏ | 6224/10000 [24:28:12<14:31:28, 13.85s/it] {'loss': 0.0068, 'learning_rate': 1.8925000000000003e-05, 'epoch': 8.15} 62%|██████▏ | 6224/10000 [24:28:12<14:31:28, 13.85s/it] 62%|██████▏ | 6225/10000 [24:28:26<14:33:22, 13.88s/it] {'loss': 0.0108, 'learning_rate': 1.8920000000000002e-05, 'epoch': 8.15} 62%|██████▏ | 6225/10000 [24:28:26<14:33:22, 13.88s/it] 62%|██████▏ | 6226/10000 [24:28:40<14:33:06, 13.88s/it] {'loss': 0.0077, 'learning_rate': 1.8915e-05, 'epoch': 8.15} 62%|██████▏ | 6226/10000 [24:28:40<14:33:06, 13.88s/it] 62%|██████▏ | 6227/10000 [24:28:54<14:32:34, 13.88s/it] {'loss': 0.0098, 'learning_rate': 1.891e-05, 'epoch': 8.15} 62%|██████▏ | 6227/10000 [24:28:54<14:32:34, 13.88s/it] 62%|██████▏ | 6228/10000 [24:29:08<14:34:57, 13.92s/it] {'loss': 0.0066, 'learning_rate': 1.8905e-05, 'epoch': 8.15} 62%|██████▏ | 6228/10000 [24:29:08<14:34:57, 13.92s/it] 62%|██████▏ | 6229/10000 [24:29:22<14:34:54, 13.92s/it] {'loss': 0.0076, 'learning_rate': 1.8900000000000002e-05, 'epoch': 8.15} 62%|██████▏ | 6229/10000 [24:29:22<14:34:54, 13.92s/it] 62%|██████▏ | 6230/10000 [24:29:35<14:33:51, 13.91s/it] {'loss': 0.0181, 'learning_rate': 1.8895e-05, 'epoch': 8.15} 62%|██████▏ | 6230/10000 [24:29:35<14:33:51, 13.91s/it] 62%|██████▏ | 6231/10000 [24:29:49<14:33:37, 13.91s/it] {'loss': 0.0089, 'learning_rate': 1.8890000000000004e-05, 'epoch': 8.16} 62%|██████▏ | 6231/10000 [24:29:49<14:33:37, 13.91s/it] 62%|██████▏ | 6232/10000 [24:30:03<14:32:16, 13.89s/it] {'loss': 0.008, 'learning_rate': 1.8885e-05, 'epoch': 8.16} 62%|██████▏ | 6232/10000 [24:30:03<14:32:16, 13.89s/it] 62%|██████▏ | 6233/10000 [24:30:17<14:33:54, 13.92s/it] {'loss': 0.0097, 'learning_rate': 1.888e-05, 'epoch': 8.16} 62%|██████▏ | 6233/10000 [24:30:17<14:33:54, 13.92s/it] 62%|██████▏ | 6234/10000 [24:30:31<14:36:36, 13.97s/it] {'loss': 0.0077, 'learning_rate': 1.8875e-05, 'epoch': 8.16} 62%|██████▏ | 6234/10000 [24:30:31<14:36:36, 13.97s/it] 62%|██████▏ | 6235/10000 [24:30:45<14:35:05, 13.95s/it] {'loss': 0.0124, 'learning_rate': 1.887e-05, 'epoch': 8.16} 62%|██████▏ | 6235/10000 [24:30:45<14:35:05, 13.95s/it] 62%|██████▏ | 6236/10000 [24:30:59<14:32:27, 13.91s/it] {'loss': 0.0079, 'learning_rate': 1.8865000000000003e-05, 'epoch': 8.16} 62%|██████▏ | 6236/10000 [24:30:59<14:32:27, 13.91s/it] 62%|██████▏ | 6237/10000 [24:31:13<14:31:48, 13.90s/it] {'loss': 0.0098, 'learning_rate': 1.886e-05, 'epoch': 8.16} 62%|██████▏ | 6237/10000 [24:31:13<14:31:48, 13.90s/it] 62%|██████▏ | 6238/10000 [24:31:27<14:30:19, 13.88s/it] {'loss': 0.0096, 'learning_rate': 1.8855e-05, 'epoch': 8.16} 62%|██████▏ | 6238/10000 [24:31:27<14:30:19, 13.88s/it] 62%|██████▏ | 6239/10000 [24:31:41<14:30:02, 13.88s/it] {'loss': 0.0087, 'learning_rate': 1.885e-05, 'epoch': 8.17} 62%|██████▏ | 6239/10000 [24:31:41<14:30:02, 13.88s/it] 62%|██████▏ | 6240/10000 [24:31:54<14:29:24, 13.87s/it] {'loss': 0.0078, 'learning_rate': 1.8845e-05, 'epoch': 8.17} 62%|██████▏ | 6240/10000 [24:31:54<14:29:24, 13.87s/it] 62%|██████▏ | 6241/10000 [24:32:08<14:29:31, 13.88s/it] {'loss': 0.0121, 'learning_rate': 1.8840000000000003e-05, 'epoch': 8.17} 62%|██████▏ | 6241/10000 [24:32:08<14:29:31, 13.88s/it] 62%|██████▏ | 6242/10000 [24:32:22<14:33:31, 13.95s/it] {'loss': 0.0078, 'learning_rate': 1.8835e-05, 'epoch': 8.17} 62%|██████▏ | 6242/10000 [24:32:22<14:33:31, 13.95s/it] 62%|██████▏ | 6243/10000 [24:32:36<14:33:27, 13.95s/it] {'loss': 0.008, 'learning_rate': 1.883e-05, 'epoch': 8.17} 62%|██████▏ | 6243/10000 [24:32:36<14:33:27, 13.95s/it] 62%|██████▏ | 6244/10000 [24:32:50<14:31:38, 13.92s/it] {'loss': 0.008, 'learning_rate': 1.8825e-05, 'epoch': 8.17} 62%|██████▏ | 6244/10000 [24:32:50<14:31:38, 13.92s/it] 62%|██████▏ | 6245/10000 [24:33:04<14:32:12, 13.94s/it] {'loss': 0.0101, 'learning_rate': 1.8820000000000003e-05, 'epoch': 8.17} 62%|██████▏ | 6245/10000 [24:33:04<14:32:12, 13.94s/it] 62%|██████▏ | 6246/10000 [24:33:18<14:30:19, 13.91s/it] {'loss': 0.008, 'learning_rate': 1.8815000000000002e-05, 'epoch': 8.18} 62%|██████▏ | 6246/10000 [24:33:18<14:30:19, 13.91s/it] 62%|██████▏ | 6247/10000 [24:33:32<14:27:31, 13.87s/it] {'loss': 0.0096, 'learning_rate': 1.881e-05, 'epoch': 8.18} 62%|██████▏ | 6247/10000 [24:33:32<14:27:31, 13.87s/it] 62%|██████▏ | 6248/10000 [24:33:46<14:27:20, 13.87s/it] {'loss': 0.0081, 'learning_rate': 1.8805e-05, 'epoch': 8.18} 62%|██████▏ | 6248/10000 [24:33:46<14:27:20, 13.87s/it] 62%|██████▏ | 6249/10000 [24:34:00<14:26:36, 13.86s/it] {'loss': 0.0112, 'learning_rate': 1.88e-05, 'epoch': 8.18} 62%|██████▏ | 6249/10000 [24:34:00<14:26:36, 13.86s/it] 62%|██████▎ | 6250/10000 [24:34:13<14:26:03, 13.86s/it] {'loss': 0.0085, 'learning_rate': 1.8795000000000002e-05, 'epoch': 8.18} 62%|██████▎ | 6250/10000 [24:34:13<14:26:03, 13.86s/it] 63%|██████▎ | 6251/10000 [24:34:27<14:28:33, 13.90s/it] {'loss': 0.0066, 'learning_rate': 1.879e-05, 'epoch': 8.18} 63%|██████▎ | 6251/10000 [24:34:27<14:28:33, 13.90s/it] 63%|██████▎ | 6252/10000 [24:34:41<14:28:49, 13.91s/it] {'loss': 0.0089, 'learning_rate': 1.8785e-05, 'epoch': 8.18} 63%|██████▎ | 6252/10000 [24:34:41<14:28:49, 13.91s/it] 63%|██████▎ | 6253/10000 [24:34:55<14:27:27, 13.89s/it] {'loss': 0.0079, 'learning_rate': 1.878e-05, 'epoch': 8.18} 63%|██████▎ | 6253/10000 [24:34:55<14:27:27, 13.89s/it] 63%|██████▎ | 6254/10000 [24:35:09<14:27:16, 13.89s/it] {'loss': 0.0084, 'learning_rate': 1.8775000000000002e-05, 'epoch': 8.19} 63%|██████▎ | 6254/10000 [24:35:09<14:27:16, 13.89s/it] 63%|██████▎ | 6255/10000 [24:35:23<14:28:18, 13.91s/it] {'loss': 0.0097, 'learning_rate': 1.877e-05, 'epoch': 8.19} 63%|██████▎ | 6255/10000 [24:35:23<14:28:18, 13.91s/it] 63%|██████▎ | 6256/10000 [24:35:37<14:27:30, 13.90s/it] {'loss': 0.0119, 'learning_rate': 1.8765e-05, 'epoch': 8.19} 63%|██████▎ | 6256/10000 [24:35:37<14:27:30, 13.90s/it] 63%|██████▎ | 6257/10000 [24:35:51<14:29:37, 13.94s/it] {'loss': 0.009, 'learning_rate': 1.876e-05, 'epoch': 8.19} 63%|██████▎ | 6257/10000 [24:35:51<14:29:37, 13.94s/it] 63%|██████▎ | 6258/10000 [24:36:05<14:27:20, 13.91s/it] {'loss': 0.01, 'learning_rate': 1.8755e-05, 'epoch': 8.19} 63%|██████▎ | 6258/10000 [24:36:05<14:27:20, 13.91s/it] 63%|██████▎ | 6259/10000 [24:36:19<14:25:47, 13.89s/it] {'loss': 0.0093, 'learning_rate': 1.8750000000000002e-05, 'epoch': 8.19} 63%|██████▎ | 6259/10000 [24:36:19<14:25:47, 13.89s/it] 63%|██████▎ | 6260/10000 [24:36:32<14:25:17, 13.88s/it] {'loss': 0.0085, 'learning_rate': 1.8745e-05, 'epoch': 8.19} 63%|██████▎ | 6260/10000 [24:36:33<14:25:17, 13.88s/it] 63%|██████▎ | 6261/10000 [24:36:46<14:23:43, 13.86s/it] {'loss': 0.0084, 'learning_rate': 1.8740000000000004e-05, 'epoch': 8.2} 63%|██████▎ | 6261/10000 [24:36:46<14:23:43, 13.86s/it] 63%|██████▎ | 6262/10000 [24:37:00<14:24:03, 13.87s/it] {'loss': 0.01, 'learning_rate': 1.8735e-05, 'epoch': 8.2} 63%|██████▎ | 6262/10000 [24:37:00<14:24:03, 13.87s/it] 63%|██████▎ | 6263/10000 [24:37:14<14:21:45, 13.84s/it] {'loss': 0.0102, 'learning_rate': 1.8730000000000002e-05, 'epoch': 8.2} 63%|██████▎ | 6263/10000 [24:37:14<14:21:45, 13.84s/it] 63%|██████▎ | 6264/10000 [24:37:28<14:21:23, 13.83s/it] {'loss': 0.008, 'learning_rate': 1.8725e-05, 'epoch': 8.2} 63%|██████▎ | 6264/10000 [24:37:28<14:21:23, 13.83s/it] 63%|██████▎ | 6265/10000 [24:37:42<14:20:22, 13.82s/it] {'loss': 0.0073, 'learning_rate': 1.872e-05, 'epoch': 8.2} 63%|██████▎ | 6265/10000 [24:37:42<14:20:22, 13.82s/it] 63%|██████▎ | 6266/10000 [24:37:55<14:20:57, 13.83s/it] {'loss': 0.0091, 'learning_rate': 1.8715000000000003e-05, 'epoch': 8.2} 63%|██████▎ | 6266/10000 [24:37:55<14:20:57, 13.83s/it] 63%|██████▎ | 6267/10000 [24:38:09<14:21:39, 13.85s/it] {'loss': 0.0096, 'learning_rate': 1.871e-05, 'epoch': 8.2} 63%|██████▎ | 6267/10000 [24:38:09<14:21:39, 13.85s/it] 63%|██████▎ | 6268/10000 [24:38:23<14:20:57, 13.84s/it] {'loss': 0.0074, 'learning_rate': 1.8705e-05, 'epoch': 8.2} 63%|██████▎ | 6268/10000 [24:38:23<14:20:57, 13.84s/it] 63%|██████▎ | 6269/10000 [24:38:37<14:22:58, 13.88s/it] {'loss': 0.0081, 'learning_rate': 1.87e-05, 'epoch': 8.21} 63%|██████▎ | 6269/10000 [24:38:37<14:22:58, 13.88s/it] 63%|██████▎ | 6270/10000 [24:38:51<14:22:12, 13.87s/it] {'loss': 0.0111, 'learning_rate': 1.8695e-05, 'epoch': 8.21} 63%|██████▎ | 6270/10000 [24:38:51<14:22:12, 13.87s/it] 63%|██████▎ | 6271/10000 [24:39:05<14:24:15, 13.91s/it] {'loss': 0.0079, 'learning_rate': 1.8690000000000002e-05, 'epoch': 8.21} 63%|██████▎ | 6271/10000 [24:39:05<14:24:15, 13.91s/it] 63%|██████▎ | 6272/10000 [24:39:19<14:23:50, 13.90s/it] {'loss': 0.0107, 'learning_rate': 1.8684999999999998e-05, 'epoch': 8.21} 63%|██████▎ | 6272/10000 [24:39:19<14:23:50, 13.90s/it] 63%|██████▎ | 6273/10000 [24:39:33<14:23:24, 13.90s/it] {'loss': 0.0083, 'learning_rate': 1.868e-05, 'epoch': 8.21} 63%|██████▎ | 6273/10000 [24:39:33<14:23:24, 13.90s/it] 63%|██████▎ | 6274/10000 [24:39:47<14:23:32, 13.91s/it] {'loss': 0.0079, 'learning_rate': 1.8675e-05, 'epoch': 8.21} 63%|██████▎ | 6274/10000 [24:39:47<14:23:32, 13.91s/it] 63%|██████▎ | 6275/10000 [24:40:01<14:23:48, 13.91s/it] {'loss': 0.0094, 'learning_rate': 1.8670000000000003e-05, 'epoch': 8.21} 63%|██████▎ | 6275/10000 [24:40:01<14:23:48, 13.91s/it] 63%|██████▎ | 6276/10000 [24:40:14<14:23:24, 13.91s/it] {'loss': 0.0104, 'learning_rate': 1.8665000000000002e-05, 'epoch': 8.21} 63%|██████▎ | 6276/10000 [24:40:15<14:23:24, 13.91s/it] 63%|██████▎ | 6277/10000 [24:40:29<14:26:00, 13.96s/it] {'loss': 0.01, 'learning_rate': 1.866e-05, 'epoch': 8.22} 63%|██████▎ | 6277/10000 [24:40:29<14:26:00, 13.96s/it] 63%|██████▎ | 6278/10000 [24:40:42<14:23:59, 13.93s/it] {'loss': 0.0077, 'learning_rate': 1.8655e-05, 'epoch': 8.22} 63%|██████▎ | 6278/10000 [24:40:42<14:23:59, 13.93s/it] 63%|██████▎ | 6279/10000 [24:40:56<14:23:37, 13.93s/it] {'loss': 0.0079, 'learning_rate': 1.865e-05, 'epoch': 8.22} 63%|██████▎ | 6279/10000 [24:40:56<14:23:37, 13.93s/it] 63%|██████▎ | 6280/10000 [24:41:10<14:21:51, 13.90s/it] {'loss': 0.0104, 'learning_rate': 1.8645000000000002e-05, 'epoch': 8.22} 63%|██████▎ | 6280/10000 [24:41:10<14:21:51, 13.90s/it] 63%|██████▎ | 6281/10000 [24:41:24<14:21:12, 13.89s/it] {'loss': 0.0076, 'learning_rate': 1.864e-05, 'epoch': 8.22} 63%|██████▎ | 6281/10000 [24:41:24<14:21:12, 13.89s/it] 63%|██████▎ | 6282/10000 [24:41:38<14:23:11, 13.93s/it] {'loss': 0.0087, 'learning_rate': 1.8635e-05, 'epoch': 8.22} 63%|██████▎ | 6282/10000 [24:41:38<14:23:11, 13.93s/it] 63%|██████▎ | 6283/10000 [24:41:52<14:21:52, 13.91s/it] {'loss': 0.0069, 'learning_rate': 1.863e-05, 'epoch': 8.22} 63%|██████▎ | 6283/10000 [24:41:52<14:21:52, 13.91s/it] 63%|██████▎ | 6284/10000 [24:42:06<14:23:31, 13.94s/it] {'loss': 0.0089, 'learning_rate': 1.8625000000000002e-05, 'epoch': 8.23} 63%|██████▎ | 6284/10000 [24:42:06<14:23:31, 13.94s/it] 63%|██████▎ | 6285/10000 [24:42:20<14:21:34, 13.92s/it] {'loss': 0.0077, 'learning_rate': 1.862e-05, 'epoch': 8.23} 63%|██████▎ | 6285/10000 [24:42:20<14:21:34, 13.92s/it] 63%|██████▎ | 6286/10000 [24:42:34<14:20:50, 13.91s/it] {'loss': 0.0079, 'learning_rate': 1.8615e-05, 'epoch': 8.23} 63%|██████▎ | 6286/10000 [24:42:34<14:20:50, 13.91s/it] 63%|██████▎ | 6287/10000 [24:42:48<14:19:44, 13.89s/it] {'loss': 0.008, 'learning_rate': 1.861e-05, 'epoch': 8.23} 63%|██████▎ | 6287/10000 [24:42:48<14:19:44, 13.89s/it] 63%|██████▎ | 6288/10000 [24:43:01<14:19:50, 13.90s/it] {'loss': 0.0089, 'learning_rate': 1.8605e-05, 'epoch': 8.23} 63%|██████▎ | 6288/10000 [24:43:02<14:19:50, 13.90s/it] 63%|██████▎ | 6289/10000 [24:43:15<14:18:51, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.86e-05, 'epoch': 8.23} 63%|██████▎ | 6289/10000 [24:43:15<14:18:51, 13.89s/it] 63%|██████▎ | 6290/10000 [24:43:29<14:19:05, 13.89s/it] {'loss': 0.0101, 'learning_rate': 1.8595e-05, 'epoch': 8.23} 63%|██████▎ | 6290/10000 [24:43:29<14:19:05, 13.89s/it] 63%|██████▎ | 6291/10000 [24:43:43<14:18:50, 13.89s/it] {'loss': 0.0078, 'learning_rate': 1.8590000000000003e-05, 'epoch': 8.23} 63%|██████▎ | 6291/10000 [24:43:43<14:18:50, 13.89s/it] 63%|██████▎ | 6292/10000 [24:43:57<14:18:31, 13.89s/it] {'loss': 0.0086, 'learning_rate': 1.8585e-05, 'epoch': 8.24} 63%|██████▎ | 6292/10000 [24:43:57<14:18:31, 13.89s/it] 63%|██████▎ | 6293/10000 [24:44:11<14:17:36, 13.88s/it] {'loss': 0.0091, 'learning_rate': 1.858e-05, 'epoch': 8.24} 63%|██████▎ | 6293/10000 [24:44:11<14:17:36, 13.88s/it] 63%|██████▎ | 6294/10000 [24:44:25<14:17:56, 13.89s/it] {'loss': 0.0079, 'learning_rate': 1.8575e-05, 'epoch': 8.24} 63%|██████▎ | 6294/10000 [24:44:25<14:17:56, 13.89s/it] 63%|██████▎ | 6295/10000 [24:44:39<14:17:23, 13.88s/it] {'loss': 0.0097, 'learning_rate': 1.857e-05, 'epoch': 8.24} 63%|██████▎ | 6295/10000 [24:44:39<14:17:23, 13.88s/it] 63%|██████▎ | 6296/10000 [24:44:53<14:18:27, 13.91s/it] {'loss': 0.0108, 'learning_rate': 1.8565000000000003e-05, 'epoch': 8.24} 63%|██████▎ | 6296/10000 [24:44:53<14:18:27, 13.91s/it] 63%|██████▎ | 6297/10000 [24:45:06<14:17:55, 13.90s/it] {'loss': 0.0085, 'learning_rate': 1.856e-05, 'epoch': 8.24} 63%|██████▎ | 6297/10000 [24:45:07<14:17:55, 13.90s/it] 63%|██████▎ | 6298/10000 [24:45:20<14:17:03, 13.89s/it] {'loss': 0.0093, 'learning_rate': 1.8555e-05, 'epoch': 8.24} 63%|██████▎ | 6298/10000 [24:45:20<14:17:03, 13.89s/it] 63%|██████▎ | 6299/10000 [24:45:34<14:19:17, 13.93s/it] {'loss': 0.0142, 'learning_rate': 1.855e-05, 'epoch': 8.24} 63%|██████▎ | 6299/10000 [24:45:34<14:19:17, 13.93s/it] 63%|██████▎ | 6300/10000 [24:45:48<14:17:22, 13.90s/it] {'loss': 0.0109, 'learning_rate': 1.8545000000000003e-05, 'epoch': 8.25} 63%|██████▎ | 6300/10000 [24:45:48<14:17:22, 13.90s/it] 63%|██████▎ | 6301/10000 [24:46:02<14:16:03, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.8540000000000002e-05, 'epoch': 8.25} 63%|██████▎ | 6301/10000 [24:46:02<14:16:03, 13.89s/it] 63%|██████▎ | 6302/10000 [24:46:16<14:15:43, 13.88s/it] {'loss': 0.0079, 'learning_rate': 1.8535e-05, 'epoch': 8.25} 63%|██████▎ | 6302/10000 [24:46:16<14:15:43, 13.88s/it] 63%|██████▎ | 6303/10000 [24:46:30<14:16:36, 13.90s/it] {'loss': 0.0129, 'learning_rate': 1.853e-05, 'epoch': 8.25} 63%|██████▎ | 6303/10000 [24:46:30<14:16:36, 13.90s/it] 63%|██████▎ | 6304/10000 [24:46:44<14:15:52, 13.89s/it] {'loss': 0.0076, 'learning_rate': 1.8525e-05, 'epoch': 8.25} 63%|██████▎ | 6304/10000 [24:46:44<14:15:52, 13.89s/it] 63%|██████▎ | 6305/10000 [24:46:58<14:15:13, 13.89s/it] {'loss': 0.0086, 'learning_rate': 1.8520000000000002e-05, 'epoch': 8.25} 63%|██████▎ | 6305/10000 [24:46:58<14:15:13, 13.89s/it] 63%|██████▎ | 6306/10000 [24:47:12<14:16:56, 13.92s/it] {'loss': 0.0075, 'learning_rate': 1.8515e-05, 'epoch': 8.25} 63%|██████▎ | 6306/10000 [24:47:12<14:16:56, 13.92s/it] 63%|██████▎ | 6307/10000 [24:47:26<14:16:13, 13.91s/it] {'loss': 0.0072, 'learning_rate': 1.851e-05, 'epoch': 8.26} 63%|██████▎ | 6307/10000 [24:47:26<14:16:13, 13.91s/it] 63%|██████▎ | 6308/10000 [24:47:39<14:15:55, 13.91s/it] {'loss': 0.0075, 'learning_rate': 1.8505e-05, 'epoch': 8.26} 63%|██████▎ | 6308/10000 [24:47:39<14:15:55, 13.91s/it] 63%|██████▎ | 6309/10000 [24:47:53<14:14:28, 13.89s/it] {'loss': 0.0088, 'learning_rate': 1.85e-05, 'epoch': 8.26} 63%|██████▎ | 6309/10000 [24:47:53<14:14:28, 13.89s/it] 63%|██████▎ | 6310/10000 [24:48:07<14:13:54, 13.88s/it] {'loss': 0.0088, 'learning_rate': 1.8495e-05, 'epoch': 8.26} 63%|██████▎ | 6310/10000 [24:48:07<14:13:54, 13.88s/it] 63%|██████▎ | 6311/10000 [24:48:21<14:14:46, 13.90s/it] {'loss': 0.008, 'learning_rate': 1.849e-05, 'epoch': 8.26} 63%|██████▎ | 6311/10000 [24:48:21<14:14:46, 13.90s/it] 63%|██████▎ | 6312/10000 [24:48:35<14:14:25, 13.90s/it] {'loss': 0.0078, 'learning_rate': 1.8485e-05, 'epoch': 8.26} 63%|██████▎ | 6312/10000 [24:48:35<14:14:25, 13.90s/it] 63%|██████▎ | 6313/10000 [24:48:49<14:13:08, 13.88s/it] {'loss': 0.0076, 'learning_rate': 1.848e-05, 'epoch': 8.26} 63%|██████▎ | 6313/10000 [24:48:49<14:13:08, 13.88s/it] 63%|██████▎ | 6314/10000 [24:49:03<14:13:27, 13.89s/it] {'loss': 0.0068, 'learning_rate': 1.8475000000000002e-05, 'epoch': 8.26} 63%|██████▎ | 6314/10000 [24:49:03<14:13:27, 13.89s/it] 63%|██████▎ | 6315/10000 [24:49:17<14:11:39, 13.87s/it] {'loss': 0.0085, 'learning_rate': 1.847e-05, 'epoch': 8.27} 63%|██████▎ | 6315/10000 [24:49:17<14:11:39, 13.87s/it] 63%|██████▎ | 6316/10000 [24:49:30<14:11:48, 13.87s/it] {'loss': 0.0081, 'learning_rate': 1.8465e-05, 'epoch': 8.27} 63%|██████▎ | 6316/10000 [24:49:30<14:11:48, 13.87s/it] 63%|██████▎ | 6317/10000 [24:49:44<14:10:20, 13.85s/it] {'loss': 0.0074, 'learning_rate': 1.846e-05, 'epoch': 8.27} 63%|██████▎ | 6317/10000 [24:49:44<14:10:20, 13.85s/it] 63%|██████▎ | 6318/10000 [24:49:58<14:11:40, 13.88s/it] {'loss': 0.0093, 'learning_rate': 1.8455e-05, 'epoch': 8.27} 63%|██████▎ | 6318/10000 [24:49:58<14:11:40, 13.88s/it] 63%|██████▎ | 6319/10000 [24:50:12<14:10:48, 13.87s/it] {'loss': 0.0087, 'learning_rate': 1.845e-05, 'epoch': 8.27} 63%|██████▎ | 6319/10000 [24:50:12<14:10:48, 13.87s/it] 63%|██████▎ | 6320/10000 [24:50:26<14:12:04, 13.89s/it] {'loss': 0.0078, 'learning_rate': 1.8445e-05, 'epoch': 8.27} 63%|██████▎ | 6320/10000 [24:50:26<14:12:04, 13.89s/it] 63%|██████▎ | 6321/10000 [24:50:40<14:11:52, 13.89s/it] {'loss': 0.0151, 'learning_rate': 1.8440000000000003e-05, 'epoch': 8.27} 63%|██████▎ | 6321/10000 [24:50:40<14:11:52, 13.89s/it] 63%|██████▎ | 6322/10000 [24:50:54<14:10:52, 13.88s/it] {'loss': 0.0108, 'learning_rate': 1.8435000000000002e-05, 'epoch': 8.27} 63%|██████▎ | 6322/10000 [24:50:54<14:10:52, 13.88s/it] 63%|██████▎ | 6323/10000 [24:51:08<14:09:11, 13.86s/it] {'loss': 0.0083, 'learning_rate': 1.843e-05, 'epoch': 8.28} 63%|██████▎ | 6323/10000 [24:51:08<14:09:11, 13.86s/it] 63%|██████▎ | 6324/10000 [24:51:21<14:10:32, 13.88s/it] {'loss': 0.0086, 'learning_rate': 1.8425e-05, 'epoch': 8.28} 63%|██████▎ | 6324/10000 [24:51:21<14:10:32, 13.88s/it] 63%|██████▎ | 6325/10000 [24:51:35<14:10:24, 13.88s/it] {'loss': 0.0104, 'learning_rate': 1.842e-05, 'epoch': 8.28} 63%|██████▎ | 6325/10000 [24:51:35<14:10:24, 13.88s/it] 63%|██████▎ | 6326/10000 [24:51:49<14:10:12, 13.88s/it] {'loss': 0.0087, 'learning_rate': 1.8415000000000002e-05, 'epoch': 8.28} 63%|██████▎ | 6326/10000 [24:51:49<14:10:12, 13.88s/it] 63%|██████▎ | 6327/10000 [24:52:03<14:13:45, 13.95s/it] {'loss': 0.0068, 'learning_rate': 1.841e-05, 'epoch': 8.28} 63%|██████▎ | 6327/10000 [24:52:03<14:13:45, 13.95s/it] 63%|██████▎ | 6328/10000 [24:52:17<14:12:36, 13.93s/it] {'loss': 0.0099, 'learning_rate': 1.8405e-05, 'epoch': 8.28} 63%|██████▎ | 6328/10000 [24:52:17<14:12:36, 13.93s/it] 63%|██████▎ | 6329/10000 [24:52:31<14:11:33, 13.92s/it] {'loss': 0.0099, 'learning_rate': 1.84e-05, 'epoch': 8.28} 63%|██████▎ | 6329/10000 [24:52:31<14:11:33, 13.92s/it] 63%|██████▎ | 6330/10000 [24:52:45<14:15:20, 13.98s/it] {'loss': 0.0103, 'learning_rate': 1.8395000000000003e-05, 'epoch': 8.29} 63%|██████▎ | 6330/10000 [24:52:45<14:15:20, 13.98s/it] 63%|██████▎ | 6331/10000 [24:52:59<14:14:22, 13.97s/it] {'loss': 0.009, 'learning_rate': 1.8390000000000002e-05, 'epoch': 8.29} 63%|██████▎ | 6331/10000 [24:52:59<14:14:22, 13.97s/it] 63%|██████▎ | 6332/10000 [24:53:13<14:12:57, 13.95s/it] {'loss': 0.0079, 'learning_rate': 1.8385e-05, 'epoch': 8.29} 63%|██████▎ | 6332/10000 [24:53:13<14:12:57, 13.95s/it] 63%|██████▎ | 6333/10000 [24:53:27<14:09:53, 13.91s/it] {'loss': 0.0091, 'learning_rate': 1.838e-05, 'epoch': 8.29} 63%|██████▎ | 6333/10000 [24:53:27<14:09:53, 13.91s/it] 63%|██████▎ | 6334/10000 [24:53:41<14:08:56, 13.89s/it] {'loss': 0.0101, 'learning_rate': 1.8375e-05, 'epoch': 8.29} 63%|██████▎ | 6334/10000 [24:53:41<14:08:56, 13.89s/it] 63%|██████▎ | 6335/10000 [24:53:55<14:07:08, 13.87s/it] {'loss': 0.0067, 'learning_rate': 1.8370000000000002e-05, 'epoch': 8.29} 63%|██████▎ | 6335/10000 [24:53:55<14:07:08, 13.87s/it] 63%|██████▎ | 6336/10000 [24:54:08<14:06:06, 13.86s/it] {'loss': 0.0071, 'learning_rate': 1.8365e-05, 'epoch': 8.29} 63%|██████▎ | 6336/10000 [24:54:08<14:06:06, 13.86s/it] 63%|██████▎ | 6337/10000 [24:54:22<14:07:14, 13.88s/it] {'loss': 0.0074, 'learning_rate': 1.8360000000000004e-05, 'epoch': 8.29} 63%|██████▎ | 6337/10000 [24:54:22<14:07:14, 13.88s/it] 63%|██████▎ | 6338/10000 [24:54:36<14:11:15, 13.95s/it] {'loss': 0.0076, 'learning_rate': 1.8355e-05, 'epoch': 8.3} 63%|██████▎ | 6338/10000 [24:54:36<14:11:15, 13.95s/it] 63%|██████▎ | 6339/10000 [24:54:50<14:09:11, 13.92s/it] {'loss': 0.0091, 'learning_rate': 1.8350000000000002e-05, 'epoch': 8.3} 63%|██████▎ | 6339/10000 [24:54:50<14:09:11, 13.92s/it] 63%|██████▎ | 6340/10000 [24:55:04<14:09:31, 13.93s/it] {'loss': 0.0074, 'learning_rate': 1.8345e-05, 'epoch': 8.3} 63%|██████▎ | 6340/10000 [24:55:04<14:09:31, 13.93s/it] 63%|██████▎ | 6341/10000 [24:55:18<14:07:54, 13.90s/it] {'loss': 0.0078, 'learning_rate': 1.834e-05, 'epoch': 8.3} 63%|██████▎ | 6341/10000 [24:55:18<14:07:54, 13.90s/it] 63%|██████▎ | 6342/10000 [24:55:32<14:07:34, 13.90s/it] {'loss': 0.0075, 'learning_rate': 1.8335000000000003e-05, 'epoch': 8.3} 63%|██████▎ | 6342/10000 [24:55:32<14:07:34, 13.90s/it] 63%|██████▎ | 6343/10000 [24:55:46<14:08:52, 13.93s/it] {'loss': 0.0731, 'learning_rate': 1.833e-05, 'epoch': 8.3} 63%|██████▎ | 6343/10000 [24:55:46<14:08:52, 13.93s/it] 63%|██████▎ | 6344/10000 [24:56:00<14:07:29, 13.91s/it] {'loss': 0.0107, 'learning_rate': 1.8325e-05, 'epoch': 8.3} 63%|██████▎ | 6344/10000 [24:56:00<14:07:29, 13.91s/it] 63%|██████▎ | 6345/10000 [24:56:14<14:05:38, 13.88s/it] {'loss': 0.0079, 'learning_rate': 1.832e-05, 'epoch': 8.3} 63%|██████▎ | 6345/10000 [24:56:14<14:05:38, 13.88s/it] 63%|██████▎ | 6346/10000 [24:56:28<14:08:31, 13.93s/it] {'loss': 0.0078, 'learning_rate': 1.8315e-05, 'epoch': 8.31} 63%|██████▎ | 6346/10000 [24:56:28<14:08:31, 13.93s/it] 63%|██████▎ | 6347/10000 [24:56:42<14:05:59, 13.90s/it] {'loss': 0.0098, 'learning_rate': 1.8310000000000003e-05, 'epoch': 8.31} 63%|██████▎ | 6347/10000 [24:56:42<14:05:59, 13.90s/it] 63%|██████▎ | 6348/10000 [24:56:55<14:04:28, 13.87s/it] {'loss': 0.0089, 'learning_rate': 1.8305e-05, 'epoch': 8.31} 63%|██████▎ | 6348/10000 [24:56:55<14:04:28, 13.87s/it] 63%|██████▎ | 6349/10000 [24:57:09<14:02:12, 13.84s/it] {'loss': 0.0085, 'learning_rate': 1.83e-05, 'epoch': 8.31} 63%|██████▎ | 6349/10000 [24:57:09<14:02:12, 13.84s/it] 64%|██████▎ | 6350/10000 [24:57:23<14:04:43, 13.89s/it] {'loss': 0.0087, 'learning_rate': 1.8295e-05, 'epoch': 8.31} 64%|██████▎ | 6350/10000 [24:57:23<14:04:43, 13.89s/it] 64%|██████▎ | 6351/10000 [24:57:37<14:04:16, 13.88s/it] {'loss': 0.0066, 'learning_rate': 1.8290000000000003e-05, 'epoch': 8.31} 64%|██████▎ | 6351/10000 [24:57:37<14:04:16, 13.88s/it] 64%|██████▎ | 6352/10000 [24:57:51<14:04:52, 13.90s/it] {'loss': 0.0098, 'learning_rate': 1.8285000000000002e-05, 'epoch': 8.31} 64%|██████▎ | 6352/10000 [24:57:51<14:04:52, 13.90s/it] 64%|██████▎ | 6353/10000 [24:58:05<14:01:26, 13.84s/it] {'loss': 0.0105, 'learning_rate': 1.828e-05, 'epoch': 8.32} 64%|██████▎ | 6353/10000 [24:58:05<14:01:26, 13.84s/it] 64%|██████▎ | 6354/10000 [24:58:18<14:00:44, 13.84s/it] {'loss': 0.0103, 'learning_rate': 1.8275e-05, 'epoch': 8.32} 64%|██████▎ | 6354/10000 [24:58:18<14:00:44, 13.84s/it] 64%|██████▎ | 6355/10000 [24:58:32<14:00:41, 13.84s/it] {'loss': 0.0064, 'learning_rate': 1.827e-05, 'epoch': 8.32} 64%|██████▎ | 6355/10000 [24:58:32<14:00:41, 13.84s/it] 64%|██████▎ | 6356/10000 [24:58:46<14:04:08, 13.90s/it] {'loss': 0.0068, 'learning_rate': 1.8265000000000002e-05, 'epoch': 8.32} 64%|██████▎ | 6356/10000 [24:58:46<14:04:08, 13.90s/it] 64%|██████▎ | 6357/10000 [24:59:00<14:04:09, 13.90s/it] {'loss': 0.0069, 'learning_rate': 1.826e-05, 'epoch': 8.32} 64%|██████▎ | 6357/10000 [24:59:00<14:04:09, 13.90s/it] 64%|██████▎ | 6358/10000 [24:59:14<14:03:03, 13.89s/it] {'loss': 0.0099, 'learning_rate': 1.8255e-05, 'epoch': 8.32} 64%|██████▎ | 6358/10000 [24:59:14<14:03:03, 13.89s/it] 64%|██████▎ | 6359/10000 [24:59:28<14:01:50, 13.87s/it] {'loss': 0.0077, 'learning_rate': 1.825e-05, 'epoch': 8.32} 64%|██████▎ | 6359/10000 [24:59:28<14:01:50, 13.87s/it] 64%|██████▎ | 6360/10000 [24:59:42<14:02:41, 13.89s/it] {'loss': 0.0087, 'learning_rate': 1.8245000000000002e-05, 'epoch': 8.32} 64%|██████▎ | 6360/10000 [24:59:42<14:02:41, 13.89s/it] 64%|██████▎ | 6361/10000 [24:59:56<14:02:04, 13.88s/it] {'loss': 0.0083, 'learning_rate': 1.824e-05, 'epoch': 8.33} 64%|██████▎ | 6361/10000 [24:59:56<14:02:04, 13.88s/it] 64%|██████▎ | 6362/10000 [25:00:10<14:04:12, 13.92s/it] {'loss': 0.0113, 'learning_rate': 1.8235e-05, 'epoch': 8.33} 64%|██████▎ | 6362/10000 [25:00:10<14:04:12, 13.92s/it] 64%|██████▎ | 6363/10000 [25:00:24<14:04:30, 13.93s/it] {'loss': 0.009, 'learning_rate': 1.823e-05, 'epoch': 8.33} 64%|██████▎ | 6363/10000 [25:00:24<14:04:30, 13.93s/it] 64%|██████▎ | 6364/10000 [25:00:37<14:00:01, 13.86s/it] {'loss': 0.0078, 'learning_rate': 1.8225e-05, 'epoch': 8.33} 64%|██████▎ | 6364/10000 [25:00:37<14:00:01, 13.86s/it] 64%|██████▎ | 6365/10000 [25:00:51<13:59:38, 13.86s/it] {'loss': 0.0076, 'learning_rate': 1.8220000000000002e-05, 'epoch': 8.33} 64%|██████▎ | 6365/10000 [25:00:51<13:59:38, 13.86s/it] 64%|██████▎ | 6366/10000 [25:01:05<13:58:24, 13.84s/it] {'loss': 0.0094, 'learning_rate': 1.8215e-05, 'epoch': 8.33} 64%|██████▎ | 6366/10000 [25:01:05<13:58:24, 13.84s/it] 64%|██████▎ | 6367/10000 [25:01:19<13:58:43, 13.85s/it] {'loss': 0.0102, 'learning_rate': 1.8210000000000004e-05, 'epoch': 8.33} 64%|██████▎ | 6367/10000 [25:01:19<13:58:43, 13.85s/it] 64%|██████▎ | 6368/10000 [25:01:33<13:57:19, 13.83s/it] {'loss': 0.0085, 'learning_rate': 1.8205e-05, 'epoch': 8.34} 64%|██████▎ | 6368/10000 [25:01:33<13:57:19, 13.83s/it] 64%|██████▎ | 6369/10000 [25:01:47<13:57:22, 13.84s/it] {'loss': 0.0076, 'learning_rate': 1.8200000000000002e-05, 'epoch': 8.34} 64%|██████▎ | 6369/10000 [25:01:47<13:57:22, 13.84s/it] 64%|██████▎ | 6370/10000 [25:02:00<13:55:17, 13.81s/it] {'loss': 0.01, 'learning_rate': 1.8195e-05, 'epoch': 8.34} 64%|██████▎ | 6370/10000 [25:02:00<13:55:17, 13.81s/it] 64%|██████▎ | 6371/10000 [25:02:14<13:56:25, 13.83s/it] {'loss': 0.0112, 'learning_rate': 1.819e-05, 'epoch': 8.34} 64%|██████▎ | 6371/10000 [25:02:14<13:56:25, 13.83s/it] 64%|██████▎ | 6372/10000 [25:02:28<13:54:48, 13.81s/it] {'loss': 0.0098, 'learning_rate': 1.8185000000000003e-05, 'epoch': 8.34} 64%|██████▎ | 6372/10000 [25:02:28<13:54:48, 13.81s/it] 64%|██████▎ | 6373/10000 [25:02:42<13:53:38, 13.79s/it] {'loss': 0.0097, 'learning_rate': 1.818e-05, 'epoch': 8.34} 64%|██████▎ | 6373/10000 [25:02:42<13:53:38, 13.79s/it] 64%|██████▎ | 6374/10000 [25:02:56<13:55:41, 13.83s/it] {'loss': 0.0103, 'learning_rate': 1.8175e-05, 'epoch': 8.34} 64%|██████▎ | 6374/10000 [25:02:56<13:55:41, 13.83s/it] 64%|██████▍ | 6375/10000 [25:03:09<13:51:54, 13.77s/it] {'loss': 0.0115, 'learning_rate': 1.817e-05, 'epoch': 8.34} 64%|██████▍ | 6375/10000 [25:03:09<13:51:54, 13.77s/it] 64%|██████▍ | 6376/10000 [25:03:23<13:52:40, 13.79s/it] {'loss': 0.0103, 'learning_rate': 1.8165000000000003e-05, 'epoch': 8.35} 64%|██████▍ | 6376/10000 [25:03:23<13:52:40, 13.79s/it] 64%|██████▍ | 6377/10000 [25:03:37<13:52:33, 13.79s/it] {'loss': 0.0101, 'learning_rate': 1.8160000000000002e-05, 'epoch': 8.35} 64%|██████▍ | 6377/10000 [25:03:37<13:52:33, 13.79s/it] 64%|██████▍ | 6378/10000 [25:03:51<13:52:13, 13.79s/it] {'loss': 0.0067, 'learning_rate': 1.8154999999999998e-05, 'epoch': 8.35} 64%|██████▍ | 6378/10000 [25:03:51<13:52:13, 13.79s/it] 64%|██████▍ | 6379/10000 [25:04:04<13:50:18, 13.76s/it] {'loss': 0.0085, 'learning_rate': 1.815e-05, 'epoch': 8.35} 64%|██████▍ | 6379/10000 [25:04:04<13:50:18, 13.76s/it] 64%|██████▍ | 6380/10000 [25:04:18<13:49:44, 13.75s/it] {'loss': 0.0071, 'learning_rate': 1.8145e-05, 'epoch': 8.35} 64%|██████▍ | 6380/10000 [25:04:18<13:49:44, 13.75s/it] 64%|██████▍ | 6381/10000 [25:04:32<13:50:23, 13.77s/it] {'loss': 0.0086, 'learning_rate': 1.8140000000000003e-05, 'epoch': 8.35} 64%|██████▍ | 6381/10000 [25:04:32<13:50:23, 13.77s/it] 64%|██████▍ | 6382/10000 [25:04:46<13:51:37, 13.79s/it] {'loss': 0.0095, 'learning_rate': 1.8135000000000002e-05, 'epoch': 8.35} 64%|██████▍ | 6382/10000 [25:04:46<13:51:37, 13.79s/it] 64%|██████▍ | 6383/10000 [25:05:00<13:52:08, 13.80s/it] {'loss': 0.0073, 'learning_rate': 1.813e-05, 'epoch': 8.35} 64%|██████▍ | 6383/10000 [25:05:00<13:52:08, 13.80s/it] 64%|██████▍ | 6384/10000 [25:05:13<13:52:17, 13.81s/it] {'loss': 0.0092, 'learning_rate': 1.8125e-05, 'epoch': 8.36} 64%|██████▍ | 6384/10000 [25:05:13<13:52:17, 13.81s/it] 64%|██████▍ | 6385/10000 [25:05:27<13:50:23, 13.78s/it] {'loss': 0.0088, 'learning_rate': 1.812e-05, 'epoch': 8.36} 64%|██████▍ | 6385/10000 [25:05:27<13:50:23, 13.78s/it] 64%|██████▍ | 6386/10000 [25:05:41<13:51:25, 13.80s/it] {'loss': 0.0099, 'learning_rate': 1.8115000000000002e-05, 'epoch': 8.36} 64%|██████▍ | 6386/10000 [25:05:41<13:51:25, 13.80s/it] 64%|██████▍ | 6387/10000 [25:05:55<13:51:33, 13.81s/it] {'loss': 0.0086, 'learning_rate': 1.811e-05, 'epoch': 8.36} 64%|██████▍ | 6387/10000 [25:05:55<13:51:33, 13.81s/it] 64%|██████▍ | 6388/10000 [25:06:09<13:52:09, 13.82s/it] {'loss': 0.0079, 'learning_rate': 1.8105e-05, 'epoch': 8.36} 64%|██████▍ | 6388/10000 [25:06:09<13:52:09, 13.82s/it] 64%|██████▍ | 6389/10000 [25:06:22<13:51:11, 13.81s/it] {'loss': 0.009, 'learning_rate': 1.81e-05, 'epoch': 8.36} 64%|██████▍ | 6389/10000 [25:06:22<13:51:11, 13.81s/it] 64%|██████▍ | 6390/10000 [25:06:36<13:49:31, 13.79s/it] {'loss': 0.0107, 'learning_rate': 1.8095000000000002e-05, 'epoch': 8.36} 64%|██████▍ | 6390/10000 [25:06:36<13:49:31, 13.79s/it] 64%|██████▍ | 6391/10000 [25:06:50<13:50:26, 13.81s/it] {'loss': 0.011, 'learning_rate': 1.809e-05, 'epoch': 8.37} 64%|██████▍ | 6391/10000 [25:06:50<13:50:26, 13.81s/it] 64%|██████▍ | 6392/10000 [25:07:04<13:50:11, 13.81s/it] {'loss': 0.0072, 'learning_rate': 1.8085e-05, 'epoch': 8.37} 64%|██████▍ | 6392/10000 [25:07:04<13:50:11, 13.81s/it] 64%|██████▍ | 6393/10000 [25:07:18<13:49:46, 13.80s/it] {'loss': 0.0083, 'learning_rate': 1.808e-05, 'epoch': 8.37} 64%|██████▍ | 6393/10000 [25:07:18<13:49:46, 13.80s/it] 64%|██████▍ | 6394/10000 [25:07:31<13:46:57, 13.76s/it] {'loss': 0.0098, 'learning_rate': 1.8075e-05, 'epoch': 8.37} 64%|██████▍ | 6394/10000 [25:07:31<13:46:57, 13.76s/it] 64%|██████▍ | 6395/10000 [25:07:45<13:48:07, 13.78s/it] {'loss': 0.009, 'learning_rate': 1.807e-05, 'epoch': 8.37} 64%|██████▍ | 6395/10000 [25:07:45<13:48:07, 13.78s/it] 64%|██████▍ | 6396/10000 [25:07:59<13:48:20, 13.79s/it] {'loss': 0.0079, 'learning_rate': 1.8065e-05, 'epoch': 8.37} 64%|██████▍ | 6396/10000 [25:07:59<13:48:20, 13.79s/it] 64%|██████▍ | 6397/10000 [25:08:13<13:52:01, 13.86s/it] {'loss': 0.0095, 'learning_rate': 1.8060000000000003e-05, 'epoch': 8.37} 64%|██████▍ | 6397/10000 [25:08:13<13:52:01, 13.86s/it] 64%|██████▍ | 6398/10000 [25:08:27<13:52:22, 13.87s/it] {'loss': 0.009, 'learning_rate': 1.8055e-05, 'epoch': 8.37} 64%|██████▍ | 6398/10000 [25:08:27<13:52:22, 13.87s/it] 64%|██████▍ | 6399/10000 [25:08:41<13:50:07, 13.83s/it] {'loss': 0.0115, 'learning_rate': 1.805e-05, 'epoch': 8.38} 64%|██████▍ | 6399/10000 [25:08:41<13:50:07, 13.83s/it] 64%|██████▍ | 6400/10000 [25:08:54<13:51:01, 13.85s/it] {'loss': 0.0091, 'learning_rate': 1.8045e-05, 'epoch': 8.38} 64%|██████▍ | 6400/10000 [25:08:54<13:51:01, 13.85s/it] 64%|██████▍ | 6401/10000 [25:09:08<13:50:32, 13.85s/it] {'loss': 0.0078, 'learning_rate': 1.804e-05, 'epoch': 8.38} 64%|██████▍ | 6401/10000 [25:09:08<13:50:32, 13.85s/it] 64%|██████▍ | 6402/10000 [25:09:22<13:52:09, 13.88s/it] {'loss': 0.0106, 'learning_rate': 1.8035000000000003e-05, 'epoch': 8.38} 64%|██████▍ | 6402/10000 [25:09:22<13:52:09, 13.88s/it] 64%|██████▍ | 6403/10000 [25:09:36<13:47:56, 13.81s/it] {'loss': 0.0084, 'learning_rate': 1.803e-05, 'epoch': 8.38} 64%|██████▍ | 6403/10000 [25:09:36<13:47:56, 13.81s/it] 64%|██████▍ | 6404/10000 [25:09:50<13:48:24, 13.82s/it] {'loss': 0.0101, 'learning_rate': 1.8025e-05, 'epoch': 8.38} 64%|██████▍ | 6404/10000 [25:09:50<13:48:24, 13.82s/it] 64%|██████▍ | 6405/10000 [25:10:03<13:47:32, 13.81s/it] {'loss': 0.0111, 'learning_rate': 1.802e-05, 'epoch': 8.38} 64%|██████▍ | 6405/10000 [25:10:04<13:47:32, 13.81s/it] 64%|██████▍ | 6406/10000 [25:10:17<13:47:54, 13.82s/it] {'loss': 0.0088, 'learning_rate': 1.8015000000000003e-05, 'epoch': 8.38} 64%|██████▍ | 6406/10000 [25:10:17<13:47:54, 13.82s/it] 64%|██████▍ | 6407/10000 [25:10:31<13:46:43, 13.81s/it] {'loss': 0.0114, 'learning_rate': 1.8010000000000002e-05, 'epoch': 8.39} 64%|██████▍ | 6407/10000 [25:10:31<13:46:43, 13.81s/it] 64%|██████▍ | 6408/10000 [25:10:45<13:46:20, 13.80s/it] {'loss': 0.0057, 'learning_rate': 1.8005e-05, 'epoch': 8.39} 64%|██████▍ | 6408/10000 [25:10:45<13:46:20, 13.80s/it] 64%|██████▍ | 6409/10000 [25:10:59<13:44:08, 13.77s/it] {'loss': 0.011, 'learning_rate': 1.8e-05, 'epoch': 8.39} 64%|██████▍ | 6409/10000 [25:10:59<13:44:08, 13.77s/it] 64%|██████▍ | 6410/10000 [25:11:13<13:46:21, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.7995e-05, 'epoch': 8.39} 64%|██████▍ | 6410/10000 [25:11:13<13:46:21, 13.81s/it] 64%|██████▍ | 6411/10000 [25:11:26<13:46:41, 13.82s/it] {'loss': 0.0069, 'learning_rate': 1.7990000000000002e-05, 'epoch': 8.39} 64%|██████▍ | 6411/10000 [25:11:26<13:46:41, 13.82s/it] 64%|██████▍ | 6412/10000 [25:11:40<13:46:13, 13.82s/it] {'loss': 0.0102, 'learning_rate': 1.7985e-05, 'epoch': 8.39} 64%|██████▍ | 6412/10000 [25:11:40<13:46:13, 13.82s/it] 64%|██████▍ | 6413/10000 [25:11:54<13:45:29, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.798e-05, 'epoch': 8.39} 64%|██████▍ | 6413/10000 [25:11:54<13:45:29, 13.81s/it] 64%|██████▍ | 6414/10000 [25:12:08<13:46:20, 13.83s/it] {'loss': 0.0094, 'learning_rate': 1.7975e-05, 'epoch': 8.4} 64%|██████▍ | 6414/10000 [25:12:08<13:46:20, 13.83s/it] 64%|██████▍ | 6415/10000 [25:12:22<13:46:09, 13.83s/it] {'loss': 0.0083, 'learning_rate': 1.797e-05, 'epoch': 8.4} 64%|██████▍ | 6415/10000 [25:12:22<13:46:09, 13.83s/it] 64%|██████▍ | 6416/10000 [25:12:35<13:45:57, 13.83s/it] {'loss': 0.0102, 'learning_rate': 1.7965e-05, 'epoch': 8.4} 64%|██████▍ | 6416/10000 [25:12:35<13:45:57, 13.83s/it] 64%|██████▍ | 6417/10000 [25:12:49<13:46:51, 13.85s/it] {'loss': 0.0072, 'learning_rate': 1.796e-05, 'epoch': 8.4} 64%|██████▍ | 6417/10000 [25:12:49<13:46:51, 13.85s/it] 64%|██████▍ | 6418/10000 [25:13:03<13:44:43, 13.81s/it] {'loss': 0.0101, 'learning_rate': 1.7955e-05, 'epoch': 8.4} 64%|██████▍ | 6418/10000 [25:13:03<13:44:43, 13.81s/it] 64%|██████▍ | 6419/10000 [25:13:17<13:46:48, 13.85s/it] {'loss': 0.009, 'learning_rate': 1.795e-05, 'epoch': 8.4} 64%|██████▍ | 6419/10000 [25:13:17<13:46:48, 13.85s/it] 64%|██████▍ | 6420/10000 [25:13:31<13:46:02, 13.84s/it] {'loss': 0.0084, 'learning_rate': 1.7945000000000002e-05, 'epoch': 8.4} 64%|██████▍ | 6420/10000 [25:13:31<13:46:02, 13.84s/it] 64%|██████▍ | 6421/10000 [25:13:45<13:45:07, 13.83s/it] {'loss': 0.0095, 'learning_rate': 1.794e-05, 'epoch': 8.4} 64%|██████▍ | 6421/10000 [25:13:45<13:45:07, 13.83s/it] 64%|██████▍ | 6422/10000 [25:13:58<13:43:40, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.7935e-05, 'epoch': 8.41} 64%|██████▍ | 6422/10000 [25:13:58<13:43:40, 13.81s/it] 64%|██████▍ | 6423/10000 [25:14:12<13:44:06, 13.82s/it] {'loss': 0.0106, 'learning_rate': 1.793e-05, 'epoch': 8.41} 64%|██████▍ | 6423/10000 [25:14:12<13:44:06, 13.82s/it] 64%|██████▍ | 6424/10000 [25:14:26<13:41:49, 13.79s/it] {'loss': 0.0075, 'learning_rate': 1.7925e-05, 'epoch': 8.41} 64%|██████▍ | 6424/10000 [25:14:26<13:41:49, 13.79s/it] 64%|██████▍ | 6425/10000 [25:14:40<13:42:34, 13.81s/it] {'loss': 0.0092, 'learning_rate': 1.792e-05, 'epoch': 8.41} 64%|██████▍ | 6425/10000 [25:14:40<13:42:34, 13.81s/it] 64%|██████▍ | 6426/10000 [25:14:54<13:41:25, 13.79s/it] {'loss': 0.01, 'learning_rate': 1.7915e-05, 'epoch': 8.41} 64%|██████▍ | 6426/10000 [25:14:54<13:41:25, 13.79s/it] 64%|██████▍ | 6427/10000 [25:15:07<13:41:30, 13.80s/it] {'loss': 0.0093, 'learning_rate': 1.7910000000000003e-05, 'epoch': 8.41} 64%|██████▍ | 6427/10000 [25:15:07<13:41:30, 13.80s/it] 64%|██████▍ | 6428/10000 [25:15:21<13:39:29, 13.77s/it] {'loss': 0.0091, 'learning_rate': 1.7905e-05, 'epoch': 8.41} 64%|██████▍ | 6428/10000 [25:15:21<13:39:29, 13.77s/it] 64%|██████▍ | 6429/10000 [25:15:35<13:40:42, 13.79s/it] {'loss': 0.0071, 'learning_rate': 1.79e-05, 'epoch': 8.41} 64%|██████▍ | 6429/10000 [25:15:35<13:40:42, 13.79s/it] 64%|██████▍ | 6430/10000 [25:15:49<13:40:48, 13.80s/it] {'loss': 0.0077, 'learning_rate': 1.7895e-05, 'epoch': 8.42} 64%|██████▍ | 6430/10000 [25:15:49<13:40:48, 13.80s/it] 64%|██████▍ | 6431/10000 [25:16:03<13:42:29, 13.83s/it] {'loss': 0.0082, 'learning_rate': 1.789e-05, 'epoch': 8.42} 64%|██████▍ | 6431/10000 [25:16:03<13:42:29, 13.83s/it] 64%|██████▍ | 6432/10000 [25:16:16<13:40:18, 13.79s/it] {'loss': 0.0116, 'learning_rate': 1.7885000000000002e-05, 'epoch': 8.42} 64%|██████▍ | 6432/10000 [25:16:16<13:40:18, 13.79s/it] 64%|██████▍ | 6433/10000 [25:16:30<13:37:42, 13.75s/it] {'loss': 0.0099, 'learning_rate': 1.7879999999999998e-05, 'epoch': 8.42} 64%|██████▍ | 6433/10000 [25:16:30<13:37:42, 13.75s/it] 64%|██████▍ | 6434/10000 [25:16:44<13:38:47, 13.78s/it] {'loss': 0.0081, 'learning_rate': 1.7875e-05, 'epoch': 8.42} 64%|██████▍ | 6434/10000 [25:16:44<13:38:47, 13.78s/it] 64%|██████▍ | 6435/10000 [25:16:58<13:37:47, 13.76s/it] {'loss': 0.0103, 'learning_rate': 1.787e-05, 'epoch': 8.42} 64%|██████▍ | 6435/10000 [25:16:58<13:37:47, 13.76s/it] 64%|██████▍ | 6436/10000 [25:17:11<13:38:06, 13.77s/it] {'loss': 0.0085, 'learning_rate': 1.7865000000000003e-05, 'epoch': 8.42} 64%|██████▍ | 6436/10000 [25:17:11<13:38:06, 13.77s/it] 64%|██████▍ | 6437/10000 [25:17:25<13:37:13, 13.76s/it] {'loss': 0.0083, 'learning_rate': 1.7860000000000002e-05, 'epoch': 8.43} 64%|██████▍ | 6437/10000 [25:17:25<13:37:13, 13.76s/it] 64%|██████▍ | 6438/10000 [25:17:39<13:37:35, 13.77s/it] {'loss': 0.0054, 'learning_rate': 1.7855e-05, 'epoch': 8.43} 64%|██████▍ | 6438/10000 [25:17:39<13:37:35, 13.77s/it] 64%|██████▍ | 6439/10000 [25:17:53<13:41:21, 13.84s/it] {'loss': 0.0089, 'learning_rate': 1.785e-05, 'epoch': 8.43} 64%|██████▍ | 6439/10000 [25:17:53<13:41:21, 13.84s/it] 64%|██████▍ | 6440/10000 [25:18:07<13:42:08, 13.86s/it] {'loss': 0.0085, 'learning_rate': 1.7845e-05, 'epoch': 8.43} 64%|██████▍ | 6440/10000 [25:18:07<13:42:08, 13.86s/it] 64%|██████▍ | 6441/10000 [25:18:21<13:42:11, 13.86s/it] {'loss': 0.0098, 'learning_rate': 1.7840000000000002e-05, 'epoch': 8.43} 64%|██████▍ | 6441/10000 [25:18:21<13:42:11, 13.86s/it] 64%|██████▍ | 6442/10000 [25:18:35<13:42:30, 13.87s/it] {'loss': 0.0095, 'learning_rate': 1.7835e-05, 'epoch': 8.43} 64%|██████▍ | 6442/10000 [25:18:35<13:42:30, 13.87s/it] 64%|██████▍ | 6443/10000 [25:18:48<13:42:27, 13.87s/it] {'loss': 0.0089, 'learning_rate': 1.783e-05, 'epoch': 8.43} 64%|██████▍ | 6443/10000 [25:18:48<13:42:27, 13.87s/it] 64%|██████▍ | 6444/10000 [25:19:02<13:40:47, 13.85s/it] {'loss': 0.008, 'learning_rate': 1.7825e-05, 'epoch': 8.43} 64%|██████▍ | 6444/10000 [25:19:02<13:40:47, 13.85s/it] 64%|██████▍ | 6445/10000 [25:19:16<13:40:57, 13.86s/it] {'loss': 0.0091, 'learning_rate': 1.7820000000000002e-05, 'epoch': 8.44} 64%|██████▍ | 6445/10000 [25:19:16<13:40:57, 13.86s/it] 64%|██████▍ | 6446/10000 [25:19:30<13:40:23, 13.85s/it] {'loss': 0.0085, 'learning_rate': 1.7815e-05, 'epoch': 8.44} 64%|██████▍ | 6446/10000 [25:19:30<13:40:23, 13.85s/it] 64%|██████▍ | 6447/10000 [25:19:44<13:40:44, 13.86s/it] {'loss': 0.0088, 'learning_rate': 1.781e-05, 'epoch': 8.44} 64%|██████▍ | 6447/10000 [25:19:44<13:40:44, 13.86s/it] 64%|██████▍ | 6448/10000 [25:19:58<13:39:02, 13.84s/it] {'loss': 0.0083, 'learning_rate': 1.7805000000000003e-05, 'epoch': 8.44} 64%|██████▍ | 6448/10000 [25:19:58<13:39:02, 13.84s/it] 64%|██████▍ | 6449/10000 [25:20:11<13:38:18, 13.83s/it] {'loss': 0.0076, 'learning_rate': 1.78e-05, 'epoch': 8.44} 64%|██████▍ | 6449/10000 [25:20:11<13:38:18, 13.83s/it] 64%|██████▍ | 6450/10000 [25:20:25<13:38:02, 13.83s/it] {'loss': 0.0102, 'learning_rate': 1.7795e-05, 'epoch': 8.44} 64%|██████▍ | 6450/10000 [25:20:25<13:38:02, 13.83s/it] 65%|██████▍ | 6451/10000 [25:20:39<13:37:07, 13.81s/it] {'loss': 0.0083, 'learning_rate': 1.779e-05, 'epoch': 8.44} 65%|██████▍ | 6451/10000 [25:20:39<13:37:07, 13.81s/it] 65%|██████▍ | 6452/10000 [25:20:53<13:38:23, 13.84s/it] {'loss': 0.0102, 'learning_rate': 1.7785e-05, 'epoch': 8.45} 65%|██████▍ | 6452/10000 [25:20:53<13:38:23, 13.84s/it] 65%|██████▍ | 6453/10000 [25:21:07<13:38:23, 13.84s/it] {'loss': 0.0075, 'learning_rate': 1.7780000000000003e-05, 'epoch': 8.45} 65%|██████▍ | 6453/10000 [25:21:07<13:38:23, 13.84s/it] 65%|██████▍ | 6454/10000 [25:21:21<13:39:25, 13.87s/it] {'loss': 0.0097, 'learning_rate': 1.7775e-05, 'epoch': 8.45} 65%|██████▍ | 6454/10000 [25:21:21<13:39:25, 13.87s/it] 65%|██████▍ | 6455/10000 [25:21:35<13:40:28, 13.89s/it] {'loss': 0.0092, 'learning_rate': 1.777e-05, 'epoch': 8.45} 65%|██████▍ | 6455/10000 [25:21:35<13:40:28, 13.89s/it] 65%|██████▍ | 6456/10000 [25:21:49<13:40:37, 13.89s/it] {'loss': 0.0081, 'learning_rate': 1.7765e-05, 'epoch': 8.45} 65%|██████▍ | 6456/10000 [25:21:49<13:40:37, 13.89s/it] 65%|██████▍ | 6457/10000 [25:22:02<13:40:13, 13.89s/it] {'loss': 0.0091, 'learning_rate': 1.7760000000000003e-05, 'epoch': 8.45} 65%|██████▍ | 6457/10000 [25:22:02<13:40:13, 13.89s/it] 65%|██████▍ | 6458/10000 [25:22:16<13:38:53, 13.87s/it] {'loss': 0.0082, 'learning_rate': 1.7755000000000002e-05, 'epoch': 8.45} 65%|██████▍ | 6458/10000 [25:22:16<13:38:53, 13.87s/it] 65%|██████▍ | 6459/10000 [25:22:30<13:39:47, 13.89s/it] {'loss': 0.0089, 'learning_rate': 1.775e-05, 'epoch': 8.45} 65%|██████▍ | 6459/10000 [25:22:30<13:39:47, 13.89s/it] 65%|██████▍ | 6460/10000 [25:22:44<13:38:02, 13.87s/it] {'loss': 0.0088, 'learning_rate': 1.7745e-05, 'epoch': 8.46} 65%|██████▍ | 6460/10000 [25:22:44<13:38:02, 13.87s/it] 65%|██████▍ | 6461/10000 [25:22:58<13:38:08, 13.87s/it] {'loss': 0.0095, 'learning_rate': 1.774e-05, 'epoch': 8.46} 65%|██████▍ | 6461/10000 [25:22:58<13:38:08, 13.87s/it] 65%|██████▍ | 6462/10000 [25:23:12<13:39:41, 13.90s/it] {'loss': 0.0074, 'learning_rate': 1.7735000000000002e-05, 'epoch': 8.46} 65%|██████▍ | 6462/10000 [25:23:12<13:39:41, 13.90s/it] 65%|██████▍ | 6463/10000 [25:23:26<13:38:36, 13.89s/it] {'loss': 0.009, 'learning_rate': 1.773e-05, 'epoch': 8.46} 65%|██████▍ | 6463/10000 [25:23:26<13:38:36, 13.89s/it] 65%|██████▍ | 6464/10000 [25:23:40<13:38:32, 13.89s/it] {'loss': 0.0082, 'learning_rate': 1.7725e-05, 'epoch': 8.46} 65%|██████▍ | 6464/10000 [25:23:40<13:38:32, 13.89s/it] 65%|██████▍ | 6465/10000 [25:23:53<13:37:30, 13.88s/it] {'loss': 0.0106, 'learning_rate': 1.772e-05, 'epoch': 8.46} 65%|██████▍ | 6465/10000 [25:23:53<13:37:30, 13.88s/it] 65%|██████▍ | 6466/10000 [25:24:07<13:36:46, 13.87s/it] {'loss': 0.0076, 'learning_rate': 1.7715000000000002e-05, 'epoch': 8.46} 65%|██████▍ | 6466/10000 [25:24:07<13:36:46, 13.87s/it] 65%|██████▍ | 6467/10000 [25:24:21<13:38:05, 13.89s/it] {'loss': 0.0091, 'learning_rate': 1.771e-05, 'epoch': 8.46} 65%|██████▍ | 6467/10000 [25:24:21<13:38:05, 13.89s/it] 65%|██████▍ | 6468/10000 [25:24:35<13:34:20, 13.83s/it] {'loss': 0.0118, 'learning_rate': 1.7705e-05, 'epoch': 8.47} 65%|██████▍ | 6468/10000 [25:24:35<13:34:20, 13.83s/it] 65%|██████▍ | 6469/10000 [25:24:49<13:33:29, 13.82s/it] {'loss': 0.009, 'learning_rate': 1.77e-05, 'epoch': 8.47} 65%|██████▍ | 6469/10000 [25:24:49<13:33:29, 13.82s/it] 65%|██████▍ | 6470/10000 [25:25:03<13:33:43, 13.83s/it] {'loss': 0.0086, 'learning_rate': 1.7695e-05, 'epoch': 8.47} 65%|██████▍ | 6470/10000 [25:25:03<13:33:43, 13.83s/it] 65%|██████▍ | 6471/10000 [25:25:17<13:34:47, 13.85s/it] {'loss': 0.0082, 'learning_rate': 1.7690000000000002e-05, 'epoch': 8.47} 65%|██████▍ | 6471/10000 [25:25:17<13:34:47, 13.85s/it] 65%|██████▍ | 6472/10000 [25:25:30<13:35:01, 13.86s/it] {'loss': 0.0123, 'learning_rate': 1.7685e-05, 'epoch': 8.47} 65%|██████▍ | 6472/10000 [25:25:30<13:35:01, 13.86s/it] 65%|██████▍ | 6473/10000 [25:25:44<13:33:38, 13.84s/it] {'loss': 0.0094, 'learning_rate': 1.7680000000000004e-05, 'epoch': 8.47} 65%|██████▍ | 6473/10000 [25:25:44<13:33:38, 13.84s/it] 65%|██████▍ | 6474/10000 [25:25:58<13:32:53, 13.83s/it] {'loss': 0.0087, 'learning_rate': 1.7675e-05, 'epoch': 8.47} 65%|██████▍ | 6474/10000 [25:25:58<13:32:53, 13.83s/it] 65%|██████▍ | 6475/10000 [25:26:12<13:33:01, 13.84s/it] {'loss': 0.0077, 'learning_rate': 1.7670000000000002e-05, 'epoch': 8.48} 65%|██████▍ | 6475/10000 [25:26:12<13:33:01, 13.84s/it] 65%|██████▍ | 6476/10000 [25:26:26<13:30:55, 13.81s/it] {'loss': 0.011, 'learning_rate': 1.7665e-05, 'epoch': 8.48} 65%|██████▍ | 6476/10000 [25:26:26<13:30:55, 13.81s/it] 65%|██████▍ | 6477/10000 [25:26:39<13:31:44, 13.82s/it] {'loss': 0.0089, 'learning_rate': 1.766e-05, 'epoch': 8.48} 65%|██████▍ | 6477/10000 [25:26:39<13:31:44, 13.82s/it] 65%|██████▍ | 6478/10000 [25:26:53<13:31:03, 13.82s/it] {'loss': 0.0073, 'learning_rate': 1.7655000000000003e-05, 'epoch': 8.48} 65%|██████▍ | 6478/10000 [25:26:53<13:31:03, 13.82s/it] 65%|██████▍ | 6479/10000 [25:27:07<13:28:24, 13.78s/it] {'loss': 0.0088, 'learning_rate': 1.765e-05, 'epoch': 8.48} 65%|██████▍ | 6479/10000 [25:27:07<13:28:24, 13.78s/it] 65%|██████▍ | 6480/10000 [25:27:21<13:27:47, 13.77s/it] {'loss': 0.0048, 'learning_rate': 1.7645e-05, 'epoch': 8.48} 65%|██████▍ | 6480/10000 [25:27:21<13:27:47, 13.77s/it] 65%|██████▍ | 6481/10000 [25:27:35<13:30:25, 13.82s/it] {'loss': 0.0063, 'learning_rate': 1.764e-05, 'epoch': 8.48} 65%|██████▍ | 6481/10000 [25:27:35<13:30:25, 13.82s/it] 65%|██████▍ | 6482/10000 [25:27:48<13:29:10, 13.80s/it] {'loss': 0.0076, 'learning_rate': 1.7635000000000003e-05, 'epoch': 8.48} 65%|██████▍ | 6482/10000 [25:27:48<13:29:10, 13.80s/it] 65%|██████▍ | 6483/10000 [25:28:02<13:31:27, 13.84s/it] {'loss': 0.0074, 'learning_rate': 1.7630000000000002e-05, 'epoch': 8.49} 65%|██████▍ | 6483/10000 [25:28:02<13:31:27, 13.84s/it] 65%|██████▍ | 6484/10000 [25:28:16<13:30:14, 13.83s/it] {'loss': 0.0103, 'learning_rate': 1.7625e-05, 'epoch': 8.49} 65%|██████▍ | 6484/10000 [25:28:16<13:30:14, 13.83s/it] 65%|██████▍ | 6485/10000 [25:28:30<13:29:25, 13.82s/it] {'loss': 0.0066, 'learning_rate': 1.762e-05, 'epoch': 8.49} 65%|██████▍ | 6485/10000 [25:28:30<13:29:25, 13.82s/it] 65%|██████▍ | 6486/10000 [25:28:44<13:30:47, 13.84s/it] {'loss': 0.0089, 'learning_rate': 1.7615e-05, 'epoch': 8.49} 65%|██████▍ | 6486/10000 [25:28:44<13:30:47, 13.84s/it] 65%|██████▍ | 6487/10000 [25:28:58<13:29:47, 13.83s/it] {'loss': 0.0098, 'learning_rate': 1.7610000000000002e-05, 'epoch': 8.49} 65%|██████▍ | 6487/10000 [25:28:58<13:29:47, 13.83s/it] 65%|██████▍ | 6488/10000 [25:29:11<13:27:57, 13.80s/it] {'loss': 0.0089, 'learning_rate': 1.7605000000000002e-05, 'epoch': 8.49} 65%|██████▍ | 6488/10000 [25:29:11<13:27:57, 13.80s/it] 65%|██████▍ | 6489/10000 [25:29:25<13:28:05, 13.81s/it] {'loss': 0.0091, 'learning_rate': 1.76e-05, 'epoch': 8.49} 65%|██████▍ | 6489/10000 [25:29:25<13:28:05, 13.81s/it] 65%|██████▍ | 6490/10000 [25:29:39<13:27:54, 13.81s/it] {'loss': 0.0084, 'learning_rate': 1.7595e-05, 'epoch': 8.49} 65%|██████▍ | 6490/10000 [25:29:39<13:27:54, 13.81s/it] 65%|██████▍ | 6491/10000 [25:29:53<13:27:42, 13.81s/it] {'loss': 0.0093, 'learning_rate': 1.759e-05, 'epoch': 8.5} 65%|██████▍ | 6491/10000 [25:29:53<13:27:42, 13.81s/it] 65%|██████▍ | 6492/10000 [25:30:07<13:25:52, 13.78s/it] {'loss': 0.0113, 'learning_rate': 1.7585000000000002e-05, 'epoch': 8.5} 65%|██████▍ | 6492/10000 [25:30:07<13:25:52, 13.78s/it] 65%|██████▍ | 6493/10000 [25:30:20<13:24:57, 13.77s/it] {'loss': 0.0081, 'learning_rate': 1.758e-05, 'epoch': 8.5} 65%|██████▍ | 6493/10000 [25:30:20<13:24:57, 13.77s/it] 65%|██████▍ | 6494/10000 [25:30:34<13:25:59, 13.79s/it] {'loss': 0.009, 'learning_rate': 1.7575e-05, 'epoch': 8.5} 65%|██████▍ | 6494/10000 [25:30:34<13:25:59, 13.79s/it] 65%|██████▍ | 6495/10000 [25:30:48<13:26:55, 13.81s/it] {'loss': 0.0127, 'learning_rate': 1.757e-05, 'epoch': 8.5} 65%|██████▍ | 6495/10000 [25:30:48<13:26:55, 13.81s/it] 65%|██████▍ | 6496/10000 [25:31:02<13:27:14, 13.82s/it] {'loss': 0.0091, 'learning_rate': 1.7565000000000002e-05, 'epoch': 8.5} 65%|██████▍ | 6496/10000 [25:31:02<13:27:14, 13.82s/it] 65%|██████▍ | 6497/10000 [25:31:16<13:29:28, 13.86s/it] {'loss': 0.009, 'learning_rate': 1.756e-05, 'epoch': 8.5} 65%|██████▍ | 6497/10000 [25:31:16<13:29:28, 13.86s/it] 65%|██████▍ | 6498/10000 [25:31:30<13:30:26, 13.89s/it] {'loss': 0.0088, 'learning_rate': 1.7555e-05, 'epoch': 8.51} 65%|██████▍ | 6498/10000 [25:31:30<13:30:26, 13.89s/it] 65%|██████▍ | 6499/10000 [25:31:43<13:27:21, 13.84s/it] {'loss': 0.0061, 'learning_rate': 1.755e-05, 'epoch': 8.51} 65%|██████▍ | 6499/10000 [25:31:43<13:27:21, 13.84s/it] 65%|██████▌ | 6500/10000 [25:31:57<13:26:11, 13.82s/it] {'loss': 0.0088, 'learning_rate': 1.7545e-05, 'epoch': 8.51} 65%|██████▌ | 6500/10000 [25:31:57<13:26:11, 13.82s/it] 65%|██████▌ | 6501/10000 [25:32:11<13:26:22, 13.83s/it] {'loss': 0.0083, 'learning_rate': 1.754e-05, 'epoch': 8.51} 65%|██████▌ | 6501/10000 [25:32:11<13:26:22, 13.83s/it] 65%|██████▌ | 6502/10000 [25:32:25<13:27:47, 13.86s/it] {'loss': 0.0096, 'learning_rate': 1.7535e-05, 'epoch': 8.51} 65%|██████▌ | 6502/10000 [25:32:25<13:27:47, 13.86s/it] 65%|██████▌ | 6503/10000 [25:32:39<13:25:23, 13.82s/it] {'loss': 0.0096, 'learning_rate': 1.7530000000000003e-05, 'epoch': 8.51} 65%|██████▌ | 6503/10000 [25:32:39<13:25:23, 13.82s/it] 65%|██████▌ | 6504/10000 [25:32:52<13:23:09, 13.78s/it] {'loss': 0.0073, 'learning_rate': 1.7525e-05, 'epoch': 8.51} 65%|██████▌ | 6504/10000 [25:32:52<13:23:09, 13.78s/it] 65%|██████▌ | 6505/10000 [25:33:06<13:22:58, 13.79s/it] {'loss': 0.0083, 'learning_rate': 1.752e-05, 'epoch': 8.51} 65%|██████▌ | 6505/10000 [25:33:06<13:22:58, 13.79s/it] 65%|██████▌ | 6506/10000 [25:33:20<13:23:06, 13.79s/it] {'loss': 0.011, 'learning_rate': 1.7515e-05, 'epoch': 8.52} 65%|██████▌ | 6506/10000 [25:33:20<13:23:06, 13.79s/it] 65%|██████▌ | 6507/10000 [25:33:34<13:23:18, 13.80s/it] {'loss': 0.01, 'learning_rate': 1.751e-05, 'epoch': 8.52} 65%|██████▌ | 6507/10000 [25:33:34<13:23:18, 13.80s/it] 65%|██████▌ | 6508/10000 [25:33:48<13:23:29, 13.81s/it] {'loss': 0.0781, 'learning_rate': 1.7505000000000003e-05, 'epoch': 8.52} 65%|██████▌ | 6508/10000 [25:33:48<13:23:29, 13.81s/it] 65%|██████▌ | 6509/10000 [25:34:02<13:26:07, 13.85s/it] {'loss': 0.0086, 'learning_rate': 1.75e-05, 'epoch': 8.52} 65%|██████▌ | 6509/10000 [25:34:02<13:26:07, 13.85s/it] 65%|██████▌ | 6510/10000 [25:34:15<13:24:05, 13.82s/it] {'loss': 0.0081, 'learning_rate': 1.7495e-05, 'epoch': 8.52} 65%|██████▌ | 6510/10000 [25:34:15<13:24:05, 13.82s/it] 65%|██████▌ | 6511/10000 [25:34:29<13:23:51, 13.82s/it] {'loss': 0.0117, 'learning_rate': 1.749e-05, 'epoch': 8.52} 65%|██████▌ | 6511/10000 [25:34:29<13:23:51, 13.82s/it] 65%|██████▌ | 6512/10000 [25:34:43<13:20:54, 13.78s/it] {'loss': 0.0068, 'learning_rate': 1.7485000000000003e-05, 'epoch': 8.52} 65%|██████▌ | 6512/10000 [25:34:43<13:20:54, 13.78s/it] 65%|██████▌ | 6513/10000 [25:34:57<13:22:16, 13.80s/it] {'loss': 0.0111, 'learning_rate': 1.7480000000000002e-05, 'epoch': 8.52} 65%|██████▌ | 6513/10000 [25:34:57<13:22:16, 13.80s/it] 65%|██████▌ | 6514/10000 [25:35:11<13:22:11, 13.81s/it] {'loss': 0.0114, 'learning_rate': 1.7475e-05, 'epoch': 8.53} 65%|██████▌ | 6514/10000 [25:35:11<13:22:11, 13.81s/it] 65%|██████▌ | 6515/10000 [25:35:24<13:22:22, 13.81s/it] {'loss': 0.012, 'learning_rate': 1.747e-05, 'epoch': 8.53} 65%|██████▌ | 6515/10000 [25:35:24<13:22:22, 13.81s/it] 65%|██████▌ | 6516/10000 [25:35:38<13:22:27, 13.82s/it] {'loss': 0.0095, 'learning_rate': 1.7465e-05, 'epoch': 8.53} 65%|██████▌ | 6516/10000 [25:35:38<13:22:27, 13.82s/it] 65%|██████▌ | 6517/10000 [25:35:52<13:22:17, 13.82s/it] {'loss': 0.0224, 'learning_rate': 1.7460000000000002e-05, 'epoch': 8.53} 65%|██████▌ | 6517/10000 [25:35:52<13:22:17, 13.82s/it] 65%|██████▌ | 6518/10000 [25:36:06<13:22:07, 13.82s/it] {'loss': 0.0091, 'learning_rate': 1.7455e-05, 'epoch': 8.53} 65%|██████▌ | 6518/10000 [25:36:06<13:22:07, 13.82s/it] 65%|██████▌ | 6519/10000 [25:36:20<13:21:11, 13.81s/it] {'loss': 0.0115, 'learning_rate': 1.745e-05, 'epoch': 8.53} 65%|██████▌ | 6519/10000 [25:36:20<13:21:11, 13.81s/it] 65%|██████▌ | 6520/10000 [25:36:33<13:21:05, 13.81s/it] {'loss': 0.0123, 'learning_rate': 1.7445e-05, 'epoch': 8.53} 65%|██████▌ | 6520/10000 [25:36:33<13:21:05, 13.81s/it] 65%|██████▌ | 6521/10000 [25:36:47<13:18:18, 13.77s/it] {'loss': 0.0109, 'learning_rate': 1.7440000000000002e-05, 'epoch': 8.54} 65%|██████▌ | 6521/10000 [25:36:47<13:18:18, 13.77s/it] 65%|██████▌ | 6522/10000 [25:37:01<13:21:21, 13.82s/it] {'loss': 0.0095, 'learning_rate': 1.7435e-05, 'epoch': 8.54} 65%|██████▌ | 6522/10000 [25:37:01<13:21:21, 13.82s/it] 65%|██████▌ | 6523/10000 [25:37:15<13:21:24, 13.83s/it] {'loss': 0.0125, 'learning_rate': 1.743e-05, 'epoch': 8.54} 65%|██████▌ | 6523/10000 [25:37:15<13:21:24, 13.83s/it] 65%|██████▌ | 6524/10000 [25:37:29<13:21:45, 13.84s/it] {'loss': 0.0097, 'learning_rate': 1.7425e-05, 'epoch': 8.54} 65%|██████▌ | 6524/10000 [25:37:29<13:21:45, 13.84s/it] 65%|██████▌ | 6525/10000 [25:37:42<13:19:19, 13.80s/it] {'loss': 0.0086, 'learning_rate': 1.742e-05, 'epoch': 8.54} 65%|██████▌ | 6525/10000 [25:37:43<13:19:19, 13.80s/it] 65%|██████▌ | 6526/10000 [25:37:56<13:19:08, 13.80s/it] {'loss': 0.0124, 'learning_rate': 1.7415000000000002e-05, 'epoch': 8.54} 65%|██████▌ | 6526/10000 [25:37:56<13:19:08, 13.80s/it] 65%|██████▌ | 6527/10000 [25:38:10<13:20:36, 13.83s/it] {'loss': 0.0137, 'learning_rate': 1.741e-05, 'epoch': 8.54} 65%|██████▌ | 6527/10000 [25:38:10<13:20:36, 13.83s/it] 65%|██████▌ | 6528/10000 [25:38:24<13:22:51, 13.87s/it] {'loss': 0.0082, 'learning_rate': 1.7405e-05, 'epoch': 8.54} 65%|██████▌ | 6528/10000 [25:38:24<13:22:51, 13.87s/it] 65%|██████▌ | 6529/10000 [25:38:38<13:22:46, 13.88s/it] {'loss': 0.0086, 'learning_rate': 1.74e-05, 'epoch': 8.55} 65%|██████▌ | 6529/10000 [25:38:38<13:22:46, 13.88s/it] 65%|██████▌ | 6530/10000 [25:38:52<13:22:06, 13.87s/it] {'loss': 0.0099, 'learning_rate': 1.7395e-05, 'epoch': 8.55} 65%|██████▌ | 6530/10000 [25:38:52<13:22:06, 13.87s/it] 65%|██████▌ | 6531/10000 [25:39:06<13:24:20, 13.91s/it] {'loss': 0.0108, 'learning_rate': 1.739e-05, 'epoch': 8.55} 65%|██████▌ | 6531/10000 [25:39:06<13:24:20, 13.91s/it] 65%|██████▌ | 6532/10000 [25:39:20<13:22:22, 13.88s/it] {'loss': 0.0125, 'learning_rate': 1.7385e-05, 'epoch': 8.55} 65%|██████▌ | 6532/10000 [25:39:20<13:22:22, 13.88s/it] 65%|██████▌ | 6533/10000 [25:39:34<13:21:08, 13.86s/it] {'loss': 0.0078, 'learning_rate': 1.7380000000000003e-05, 'epoch': 8.55} 65%|██████▌ | 6533/10000 [25:39:34<13:21:08, 13.86s/it] 65%|██████▌ | 6534/10000 [25:39:47<13:17:30, 13.81s/it] {'loss': 0.0082, 'learning_rate': 1.7375e-05, 'epoch': 8.55} 65%|██████▌ | 6534/10000 [25:39:47<13:17:30, 13.81s/it] 65%|██████▌ | 6535/10000 [25:40:01<13:16:28, 13.79s/it] {'loss': 0.0104, 'learning_rate': 1.737e-05, 'epoch': 8.55} 65%|██████▌ | 6535/10000 [25:40:01<13:16:28, 13.79s/it] 65%|██████▌ | 6536/10000 [25:40:15<13:17:07, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.7365e-05, 'epoch': 8.55} 65%|██████▌ | 6536/10000 [25:40:15<13:17:07, 13.81s/it] 65%|██████▌ | 6537/10000 [25:40:29<13:16:48, 13.81s/it] {'loss': 0.0095, 'learning_rate': 1.736e-05, 'epoch': 8.56} 65%|██████▌ | 6537/10000 [25:40:29<13:16:48, 13.81s/it] 65%|██████▌ | 6538/10000 [25:40:42<13:16:13, 13.80s/it] {'loss': 0.011, 'learning_rate': 1.7355000000000002e-05, 'epoch': 8.56} 65%|██████▌ | 6538/10000 [25:40:42<13:16:13, 13.80s/it] 65%|██████▌ | 6539/10000 [25:40:56<13:17:31, 13.83s/it] {'loss': 0.0116, 'learning_rate': 1.7349999999999998e-05, 'epoch': 8.56} 65%|██████▌ | 6539/10000 [25:40:56<13:17:31, 13.83s/it] 65%|██████▌ | 6540/10000 [25:41:10<13:17:47, 13.83s/it] {'loss': 0.0114, 'learning_rate': 1.7345e-05, 'epoch': 8.56} 65%|██████▌ | 6540/10000 [25:41:10<13:17:47, 13.83s/it] 65%|██████▌ | 6541/10000 [25:41:24<13:18:38, 13.85s/it] {'loss': 0.0078, 'learning_rate': 1.734e-05, 'epoch': 8.56} 65%|██████▌ | 6541/10000 [25:41:24<13:18:38, 13.85s/it] 65%|██████▌ | 6542/10000 [25:41:38<13:16:01, 13.81s/it] {'loss': 0.0091, 'learning_rate': 1.7335000000000003e-05, 'epoch': 8.56} 65%|██████▌ | 6542/10000 [25:41:38<13:16:01, 13.81s/it] 65%|██████▌ | 6543/10000 [25:41:52<13:18:54, 13.87s/it] {'loss': 0.012, 'learning_rate': 1.7330000000000002e-05, 'epoch': 8.56} 65%|██████▌ | 6543/10000 [25:41:52<13:18:54, 13.87s/it] 65%|██████▌ | 6544/10000 [25:42:06<13:18:59, 13.87s/it] {'loss': 0.0087, 'learning_rate': 1.7325e-05, 'epoch': 8.57} 65%|██████▌ | 6544/10000 [25:42:06<13:18:59, 13.87s/it] 65%|██████▌ | 6545/10000 [25:42:19<13:17:49, 13.86s/it] {'loss': 0.0101, 'learning_rate': 1.732e-05, 'epoch': 8.57} 65%|██████▌ | 6545/10000 [25:42:19<13:17:49, 13.86s/it] 65%|██████▌ | 6546/10000 [25:42:33<13:17:47, 13.86s/it] {'loss': 0.0089, 'learning_rate': 1.7315e-05, 'epoch': 8.57} 65%|██████▌ | 6546/10000 [25:42:33<13:17:47, 13.86s/it] 65%|██████▌ | 6547/10000 [25:42:47<13:17:43, 13.86s/it] {'loss': 0.011, 'learning_rate': 1.7310000000000002e-05, 'epoch': 8.57} 65%|██████▌ | 6547/10000 [25:42:47<13:17:43, 13.86s/it] 65%|██████▌ | 6548/10000 [25:43:01<13:17:03, 13.85s/it] {'loss': 0.0102, 'learning_rate': 1.7305e-05, 'epoch': 8.57} 65%|██████▌ | 6548/10000 [25:43:01<13:17:03, 13.85s/it] 65%|██████▌ | 6549/10000 [25:43:15<13:17:32, 13.87s/it] {'loss': 0.0098, 'learning_rate': 1.73e-05, 'epoch': 8.57} 65%|██████▌ | 6549/10000 [25:43:15<13:17:32, 13.87s/it] 66%|██████▌ | 6550/10000 [25:43:29<13:17:20, 13.87s/it] {'loss': 0.0093, 'learning_rate': 1.7295e-05, 'epoch': 8.57} 66%|██████▌ | 6550/10000 [25:43:29<13:17:20, 13.87s/it] 66%|██████▌ | 6551/10000 [25:43:43<13:15:16, 13.83s/it] {'loss': 0.0104, 'learning_rate': 1.7290000000000002e-05, 'epoch': 8.57} 66%|██████▌ | 6551/10000 [25:43:43<13:15:16, 13.83s/it] 66%|██████▌ | 6552/10000 [25:43:56<13:15:06, 13.84s/it] {'loss': 0.0096, 'learning_rate': 1.7285e-05, 'epoch': 8.58} 66%|██████▌ | 6552/10000 [25:43:56<13:15:06, 13.84s/it] 66%|██████▌ | 6553/10000 [25:44:10<13:15:47, 13.85s/it] {'loss': 0.0092, 'learning_rate': 1.728e-05, 'epoch': 8.58} 66%|██████▌ | 6553/10000 [25:44:10<13:15:47, 13.85s/it] 66%|██████▌ | 6554/10000 [25:44:24<13:15:59, 13.86s/it] {'loss': 0.0094, 'learning_rate': 1.7275e-05, 'epoch': 8.58} 66%|██████▌ | 6554/10000 [25:44:24<13:15:59, 13.86s/it] 66%|██████▌ | 6555/10000 [25:44:38<13:14:56, 13.85s/it] {'loss': 0.0095, 'learning_rate': 1.727e-05, 'epoch': 8.58} 66%|██████▌ | 6555/10000 [25:44:38<13:14:56, 13.85s/it] 66%|██████▌ | 6556/10000 [25:44:52<13:15:07, 13.85s/it] {'loss': 0.0086, 'learning_rate': 1.7265e-05, 'epoch': 8.58} 66%|██████▌ | 6556/10000 [25:44:52<13:15:07, 13.85s/it] 66%|██████▌ | 6557/10000 [25:45:06<13:12:45, 13.82s/it] {'loss': 0.0111, 'learning_rate': 1.726e-05, 'epoch': 8.58} 66%|██████▌ | 6557/10000 [25:45:06<13:12:45, 13.82s/it] 66%|██████▌ | 6558/10000 [25:45:19<13:11:09, 13.79s/it] {'loss': 0.0115, 'learning_rate': 1.7255000000000003e-05, 'epoch': 8.58} 66%|██████▌ | 6558/10000 [25:45:19<13:11:09, 13.79s/it] 66%|██████▌ | 6559/10000 [25:45:33<13:13:02, 13.83s/it] {'loss': 0.0083, 'learning_rate': 1.725e-05, 'epoch': 8.59} 66%|██████▌ | 6559/10000 [25:45:33<13:13:02, 13.83s/it] 66%|██████▌ | 6560/10000 [25:45:47<13:13:34, 13.84s/it] {'loss': 0.0077, 'learning_rate': 1.7245e-05, 'epoch': 8.59} 66%|██████▌ | 6560/10000 [25:45:47<13:13:34, 13.84s/it] 66%|██████▌ | 6561/10000 [25:46:01<13:13:42, 13.85s/it] {'loss': 0.0094, 'learning_rate': 1.724e-05, 'epoch': 8.59} 66%|██████▌ | 6561/10000 [25:46:01<13:13:42, 13.85s/it] 66%|██████▌ | 6562/10000 [25:46:15<13:15:51, 13.89s/it] {'loss': 0.0098, 'learning_rate': 1.7235e-05, 'epoch': 8.59} 66%|██████▌ | 6562/10000 [25:46:15<13:15:51, 13.89s/it] 66%|██████▌ | 6563/10000 [25:46:29<13:14:29, 13.87s/it] {'loss': 0.0111, 'learning_rate': 1.7230000000000003e-05, 'epoch': 8.59} 66%|██████▌ | 6563/10000 [25:46:29<13:14:29, 13.87s/it] 66%|██████▌ | 6564/10000 [25:46:43<13:12:47, 13.84s/it] {'loss': 0.0099, 'learning_rate': 1.7225e-05, 'epoch': 8.59} 66%|██████▌ | 6564/10000 [25:46:43<13:12:47, 13.84s/it] 66%|██████▌ | 6565/10000 [25:46:56<13:12:33, 13.84s/it] {'loss': 0.0099, 'learning_rate': 1.722e-05, 'epoch': 8.59} 66%|██████▌ | 6565/10000 [25:46:56<13:12:33, 13.84s/it] 66%|██████▌ | 6566/10000 [25:47:10<13:12:49, 13.85s/it] {'loss': 0.0086, 'learning_rate': 1.7215e-05, 'epoch': 8.59} 66%|██████▌ | 6566/10000 [25:47:10<13:12:49, 13.85s/it] 66%|██████▌ | 6567/10000 [25:47:24<13:08:52, 13.79s/it] {'loss': 0.0097, 'learning_rate': 1.721e-05, 'epoch': 8.6} 66%|██████▌ | 6567/10000 [25:47:24<13:08:52, 13.79s/it] 66%|██████▌ | 6568/10000 [25:47:38<13:10:17, 13.82s/it] {'loss': 0.0095, 'learning_rate': 1.7205000000000002e-05, 'epoch': 8.6} 66%|██████▌ | 6568/10000 [25:47:38<13:10:17, 13.82s/it] 66%|██████▌ | 6569/10000 [25:47:52<13:09:34, 13.81s/it] {'loss': 0.0077, 'learning_rate': 1.7199999999999998e-05, 'epoch': 8.6} 66%|██████▌ | 6569/10000 [25:47:52<13:09:34, 13.81s/it] 66%|██████▌ | 6570/10000 [25:48:05<13:09:25, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.7195e-05, 'epoch': 8.6} 66%|██████▌ | 6570/10000 [25:48:05<13:09:25, 13.81s/it] 66%|██████▌ | 6571/10000 [25:48:19<13:09:18, 13.81s/it] {'loss': 0.0096, 'learning_rate': 1.719e-05, 'epoch': 8.6} 66%|██████▌ | 6571/10000 [25:48:19<13:09:18, 13.81s/it] 66%|██████▌ | 6572/10000 [25:48:33<13:10:10, 13.83s/it] {'loss': 0.0081, 'learning_rate': 1.7185000000000002e-05, 'epoch': 8.6} 66%|██████▌ | 6572/10000 [25:48:33<13:10:10, 13.83s/it] 66%|██████▌ | 6573/10000 [25:48:47<13:07:51, 13.79s/it] {'loss': 0.0101, 'learning_rate': 1.718e-05, 'epoch': 8.6} 66%|██████▌ | 6573/10000 [25:48:47<13:07:51, 13.79s/it] 66%|██████▌ | 6574/10000 [25:49:01<13:08:08, 13.80s/it] {'loss': 0.0084, 'learning_rate': 1.7175e-05, 'epoch': 8.6} 66%|██████▌ | 6574/10000 [25:49:01<13:08:08, 13.80s/it] 66%|██████▌ | 6575/10000 [25:49:14<13:08:04, 13.81s/it] {'loss': 0.0094, 'learning_rate': 1.717e-05, 'epoch': 8.61} 66%|██████▌ | 6575/10000 [25:49:14<13:08:04, 13.81s/it] 66%|██████▌ | 6576/10000 [25:49:28<13:07:52, 13.81s/it] {'loss': 0.008, 'learning_rate': 1.7165e-05, 'epoch': 8.61} 66%|██████▌ | 6576/10000 [25:49:28<13:07:52, 13.81s/it] 66%|██████▌ | 6577/10000 [25:49:42<13:07:16, 13.80s/it] {'loss': 0.0087, 'learning_rate': 1.7160000000000002e-05, 'epoch': 8.61} 66%|██████▌ | 6577/10000 [25:49:42<13:07:16, 13.80s/it] 66%|██████▌ | 6578/10000 [25:49:56<13:08:05, 13.82s/it] {'loss': 0.009, 'learning_rate': 1.7155e-05, 'epoch': 8.61} 66%|██████▌ | 6578/10000 [25:49:56<13:08:05, 13.82s/it] 66%|██████▌ | 6579/10000 [25:50:10<13:06:46, 13.80s/it] {'loss': 0.01, 'learning_rate': 1.7150000000000004e-05, 'epoch': 8.61} 66%|██████▌ | 6579/10000 [25:50:10<13:06:46, 13.80s/it] 66%|██████▌ | 6580/10000 [25:50:23<13:06:30, 13.80s/it] {'loss': 0.0083, 'learning_rate': 1.7145e-05, 'epoch': 8.61} 66%|██████▌ | 6580/10000 [25:50:23<13:06:30, 13.80s/it] 66%|██████▌ | 6581/10000 [25:50:37<13:06:42, 13.81s/it] {'loss': 0.0074, 'learning_rate': 1.7140000000000002e-05, 'epoch': 8.61} 66%|██████▌ | 6581/10000 [25:50:37<13:06:42, 13.81s/it] 66%|██████▌ | 6582/10000 [25:50:51<13:08:04, 13.83s/it] {'loss': 0.0078, 'learning_rate': 1.7135e-05, 'epoch': 8.62} 66%|██████▌ | 6582/10000 [25:50:51<13:08:04, 13.83s/it] 66%|██████▌ | 6583/10000 [25:51:05<13:08:37, 13.85s/it] {'loss': 0.0126, 'learning_rate': 1.713e-05, 'epoch': 8.62} 66%|██████▌ | 6583/10000 [25:51:05<13:08:37, 13.85s/it] 66%|██████▌ | 6584/10000 [25:51:19<13:10:03, 13.88s/it] {'loss': 0.0104, 'learning_rate': 1.7125000000000003e-05, 'epoch': 8.62} 66%|██████▌ | 6584/10000 [25:51:19<13:10:03, 13.88s/it] 66%|██████▌ | 6585/10000 [25:51:33<13:10:36, 13.89s/it] {'loss': 0.0094, 'learning_rate': 1.712e-05, 'epoch': 8.62} 66%|██████▌ | 6585/10000 [25:51:33<13:10:36, 13.89s/it] 66%|██████▌ | 6586/10000 [25:51:47<13:07:36, 13.84s/it] {'loss': 0.0113, 'learning_rate': 1.7115e-05, 'epoch': 8.62} 66%|██████▌ | 6586/10000 [25:51:47<13:07:36, 13.84s/it] 66%|██████▌ | 6587/10000 [25:52:00<13:05:30, 13.81s/it] {'loss': 0.0066, 'learning_rate': 1.711e-05, 'epoch': 8.62} 66%|██████▌ | 6587/10000 [25:52:00<13:05:30, 13.81s/it] 66%|██████▌ | 6588/10000 [25:52:14<13:08:27, 13.87s/it] {'loss': 0.0079, 'learning_rate': 1.7105000000000003e-05, 'epoch': 8.62} 66%|██████▌ | 6588/10000 [25:52:14<13:08:27, 13.87s/it] 66%|██████▌ | 6589/10000 [25:52:28<13:07:07, 13.85s/it] {'loss': 0.0099, 'learning_rate': 1.7100000000000002e-05, 'epoch': 8.62} 66%|██████▌ | 6589/10000 [25:52:28<13:07:07, 13.85s/it] 66%|██████▌ | 6590/10000 [25:52:42<13:04:41, 13.81s/it] {'loss': 0.0089, 'learning_rate': 1.7095e-05, 'epoch': 8.63} 66%|██████▌ | 6590/10000 [25:52:42<13:04:41, 13.81s/it] 66%|██████▌ | 6591/10000 [25:52:56<13:03:43, 13.79s/it] {'loss': 0.007, 'learning_rate': 1.709e-05, 'epoch': 8.63} 66%|██████▌ | 6591/10000 [25:52:56<13:03:43, 13.79s/it] 66%|██████▌ | 6592/10000 [25:53:10<13:05:32, 13.83s/it] {'loss': 0.0112, 'learning_rate': 1.7085e-05, 'epoch': 8.63} 66%|██████▌ | 6592/10000 [25:53:10<13:05:32, 13.83s/it] 66%|██████▌ | 6593/10000 [25:53:23<13:04:02, 13.81s/it] {'loss': 0.0071, 'learning_rate': 1.7080000000000002e-05, 'epoch': 8.63} 66%|██████▌ | 6593/10000 [25:53:23<13:04:02, 13.81s/it] 66%|██████▌ | 6594/10000 [25:53:37<13:07:07, 13.87s/it] {'loss': 0.0092, 'learning_rate': 1.7075e-05, 'epoch': 8.63} 66%|██████▌ | 6594/10000 [25:53:37<13:07:07, 13.87s/it] 66%|██████▌ | 6595/10000 [25:53:51<13:06:03, 13.85s/it] {'loss': 0.0105, 'learning_rate': 1.707e-05, 'epoch': 8.63} 66%|██████▌ | 6595/10000 [25:53:51<13:06:03, 13.85s/it] 66%|██████▌ | 6596/10000 [25:54:05<13:05:50, 13.85s/it] {'loss': 0.0078, 'learning_rate': 1.7065e-05, 'epoch': 8.63} 66%|██████▌ | 6596/10000 [25:54:05<13:05:50, 13.85s/it] 66%|██████▌ | 6597/10000 [25:54:19<13:06:41, 13.87s/it] {'loss': 0.0087, 'learning_rate': 1.706e-05, 'epoch': 8.63} 66%|██████▌ | 6597/10000 [25:54:19<13:06:41, 13.87s/it] 66%|██████▌ | 6598/10000 [25:54:33<13:07:18, 13.89s/it] {'loss': 0.0089, 'learning_rate': 1.7055000000000002e-05, 'epoch': 8.64} 66%|██████▌ | 6598/10000 [25:54:33<13:07:18, 13.89s/it] 66%|██████▌ | 6599/10000 [25:54:47<13:05:53, 13.86s/it] {'loss': 0.0092, 'learning_rate': 1.705e-05, 'epoch': 8.64} 66%|██████▌ | 6599/10000 [25:54:47<13:05:53, 13.86s/it] 66%|██████▌ | 6600/10000 [25:55:00<13:05:41, 13.87s/it] {'loss': 0.0086, 'learning_rate': 1.7045e-05, 'epoch': 8.64} 66%|██████▌ | 6600/10000 [25:55:00<13:05:41, 13.87s/it] 66%|██████▌ | 6601/10000 [25:55:14<13:06:54, 13.89s/it] {'loss': 0.0077, 'learning_rate': 1.704e-05, 'epoch': 8.64} 66%|██████▌ | 6601/10000 [25:55:14<13:06:54, 13.89s/it] 66%|██████▌ | 6602/10000 [25:55:28<13:05:52, 13.88s/it] {'loss': 0.0097, 'learning_rate': 1.7035000000000002e-05, 'epoch': 8.64} 66%|██████▌ | 6602/10000 [25:55:28<13:05:52, 13.88s/it] 66%|██████▌ | 6603/10000 [25:55:42<13:05:21, 13.87s/it] {'loss': 0.0081, 'learning_rate': 1.703e-05, 'epoch': 8.64} 66%|██████▌ | 6603/10000 [25:55:42<13:05:21, 13.87s/it] 66%|██████▌ | 6604/10000 [25:55:56<13:06:58, 13.90s/it] {'loss': 0.0104, 'learning_rate': 1.7025e-05, 'epoch': 8.64} 66%|██████▌ | 6604/10000 [25:55:56<13:06:58, 13.90s/it] 66%|██████▌ | 6605/10000 [25:56:10<13:06:23, 13.90s/it] {'loss': 0.0087, 'learning_rate': 1.702e-05, 'epoch': 8.65} 66%|██████▌ | 6605/10000 [25:56:10<13:06:23, 13.90s/it] 66%|██████▌ | 6606/10000 [25:56:24<13:04:22, 13.87s/it] {'loss': 0.0115, 'learning_rate': 1.7015e-05, 'epoch': 8.65} 66%|██████▌ | 6606/10000 [25:56:24<13:04:22, 13.87s/it] 66%|██████▌ | 6607/10000 [25:56:38<13:03:00, 13.85s/it] {'loss': 0.0129, 'learning_rate': 1.701e-05, 'epoch': 8.65} 66%|██████▌ | 6607/10000 [25:56:38<13:03:00, 13.85s/it] 66%|██████▌ | 6608/10000 [25:56:51<13:03:00, 13.85s/it] {'loss': 0.0076, 'learning_rate': 1.7005e-05, 'epoch': 8.65} 66%|██████▌ | 6608/10000 [25:56:51<13:03:00, 13.85s/it] 66%|██████▌ | 6609/10000 [25:57:05<13:00:08, 13.80s/it] {'loss': 0.0101, 'learning_rate': 1.7000000000000003e-05, 'epoch': 8.65} 66%|██████▌ | 6609/10000 [25:57:05<13:00:08, 13.80s/it] 66%|██████▌ | 6610/10000 [25:57:19<12:58:02, 13.77s/it] {'loss': 0.0089, 'learning_rate': 1.6995e-05, 'epoch': 8.65} 66%|██████▌ | 6610/10000 [25:57:19<12:58:02, 13.77s/it] 66%|██████▌ | 6611/10000 [25:57:33<12:57:17, 13.76s/it] {'loss': 0.0091, 'learning_rate': 1.699e-05, 'epoch': 8.65} 66%|██████▌ | 6611/10000 [25:57:33<12:57:17, 13.76s/it] 66%|██████▌ | 6612/10000 [25:57:46<12:58:56, 13.79s/it] {'loss': 0.0095, 'learning_rate': 1.6985e-05, 'epoch': 8.65} 66%|██████▌ | 6612/10000 [25:57:46<12:58:56, 13.79s/it] 66%|██████▌ | 6613/10000 [25:58:00<12:59:28, 13.81s/it] {'loss': 0.01, 'learning_rate': 1.698e-05, 'epoch': 8.66} 66%|██████▌ | 6613/10000 [25:58:00<12:59:28, 13.81s/it] 66%|██████▌ | 6614/10000 [25:58:14<13:00:08, 13.82s/it] {'loss': 0.0083, 'learning_rate': 1.6975000000000003e-05, 'epoch': 8.66} 66%|██████▌ | 6614/10000 [25:58:14<13:00:08, 13.82s/it] 66%|██████▌ | 6615/10000 [25:58:28<13:00:04, 13.83s/it] {'loss': 0.0101, 'learning_rate': 1.697e-05, 'epoch': 8.66} 66%|██████▌ | 6615/10000 [25:58:28<13:00:04, 13.83s/it] 66%|██████▌ | 6616/10000 [25:58:42<13:01:30, 13.86s/it] {'loss': 0.0096, 'learning_rate': 1.6965e-05, 'epoch': 8.66} 66%|██████▌ | 6616/10000 [25:58:42<13:01:30, 13.86s/it] 66%|██████▌ | 6617/10000 [25:58:56<12:59:01, 13.82s/it] {'loss': 0.0076, 'learning_rate': 1.696e-05, 'epoch': 8.66} 66%|██████▌ | 6617/10000 [25:58:56<12:59:01, 13.82s/it] 66%|██████▌ | 6618/10000 [25:59:09<12:58:06, 13.80s/it] {'loss': 0.0093, 'learning_rate': 1.6955000000000003e-05, 'epoch': 8.66} 66%|██████▌ | 6618/10000 [25:59:09<12:58:06, 13.80s/it] 66%|██████▌ | 6619/10000 [25:59:23<12:57:48, 13.80s/it] {'loss': 0.0098, 'learning_rate': 1.6950000000000002e-05, 'epoch': 8.66} 66%|██████▌ | 6619/10000 [25:59:23<12:57:48, 13.80s/it] 66%|██████▌ | 6620/10000 [25:59:37<12:58:13, 13.81s/it] {'loss': 0.0083, 'learning_rate': 1.6945e-05, 'epoch': 8.66} 66%|██████▌ | 6620/10000 [25:59:37<12:58:13, 13.81s/it] 66%|██████▌ | 6621/10000 [25:59:51<12:56:50, 13.79s/it] {'loss': 0.0094, 'learning_rate': 1.694e-05, 'epoch': 8.67} 66%|██████▌ | 6621/10000 [25:59:51<12:56:50, 13.79s/it] 66%|██████▌ | 6622/10000 [26:00:05<12:57:15, 13.81s/it] {'loss': 0.0092, 'learning_rate': 1.6935e-05, 'epoch': 8.67} 66%|██████▌ | 6622/10000 [26:00:05<12:57:15, 13.81s/it] 66%|██████▌ | 6623/10000 [26:00:19<12:58:44, 13.84s/it] {'loss': 0.0097, 'learning_rate': 1.6930000000000002e-05, 'epoch': 8.67} 66%|██████▌ | 6623/10000 [26:00:19<12:58:44, 13.84s/it] 66%|██████▌ | 6624/10000 [26:00:32<12:57:46, 13.82s/it] {'loss': 0.0075, 'learning_rate': 1.6925e-05, 'epoch': 8.67} 66%|██████▌ | 6624/10000 [26:00:32<12:57:46, 13.82s/it] 66%|██████▋ | 6625/10000 [26:00:46<12:57:53, 13.83s/it] {'loss': 0.0125, 'learning_rate': 1.692e-05, 'epoch': 8.67} 66%|██████▋ | 6625/10000 [26:00:46<12:57:53, 13.83s/it] 66%|██████▋ | 6626/10000 [26:01:00<12:59:17, 13.86s/it] {'loss': 0.0093, 'learning_rate': 1.6915e-05, 'epoch': 8.67} 66%|██████▋ | 6626/10000 [26:01:00<12:59:17, 13.86s/it] 66%|██████▋ | 6627/10000 [26:01:14<12:59:37, 13.87s/it] {'loss': 0.0119, 'learning_rate': 1.6910000000000002e-05, 'epoch': 8.67} 66%|██████▋ | 6627/10000 [26:01:14<12:59:37, 13.87s/it] 66%|██████▋ | 6628/10000 [26:01:28<12:58:54, 13.86s/it] {'loss': 0.0093, 'learning_rate': 1.6905e-05, 'epoch': 8.68} 66%|██████▋ | 6628/10000 [26:01:28<12:58:54, 13.86s/it] 66%|██████▋ | 6629/10000 [26:01:42<12:59:17, 13.87s/it] {'loss': 0.0081, 'learning_rate': 1.69e-05, 'epoch': 8.68} 66%|██████▋ | 6629/10000 [26:01:42<12:59:17, 13.87s/it] 66%|██████▋ | 6630/10000 [26:01:56<12:57:53, 13.85s/it] {'loss': 0.0084, 'learning_rate': 1.6895e-05, 'epoch': 8.68} 66%|██████▋ | 6630/10000 [26:01:56<12:57:53, 13.85s/it] 66%|██████▋ | 6631/10000 [26:02:09<12:57:52, 13.85s/it] {'loss': 0.0086, 'learning_rate': 1.689e-05, 'epoch': 8.68} 66%|██████▋ | 6631/10000 [26:02:09<12:57:52, 13.85s/it] 66%|██████▋ | 6632/10000 [26:02:23<12:59:00, 13.88s/it] {'loss': 0.0106, 'learning_rate': 1.6885000000000002e-05, 'epoch': 8.68} 66%|██████▋ | 6632/10000 [26:02:23<12:59:00, 13.88s/it] 66%|██████▋ | 6633/10000 [26:02:37<12:58:21, 13.87s/it] {'loss': 0.0093, 'learning_rate': 1.688e-05, 'epoch': 8.68} 66%|██████▋ | 6633/10000 [26:02:37<12:58:21, 13.87s/it] 66%|██████▋ | 6634/10000 [26:02:51<12:56:16, 13.84s/it] {'loss': 0.0078, 'learning_rate': 1.6875000000000004e-05, 'epoch': 8.68} 66%|██████▋ | 6634/10000 [26:02:51<12:56:16, 13.84s/it] 66%|██████▋ | 6635/10000 [26:03:05<12:55:57, 13.84s/it] {'loss': 0.0103, 'learning_rate': 1.687e-05, 'epoch': 8.68} 66%|██████▋ | 6635/10000 [26:03:05<12:55:57, 13.84s/it] 66%|██████▋ | 6636/10000 [26:03:19<12:54:59, 13.82s/it] {'loss': 0.0073, 'learning_rate': 1.6865e-05, 'epoch': 8.69} 66%|██████▋ | 6636/10000 [26:03:19<12:54:59, 13.82s/it] 66%|██████▋ | 6637/10000 [26:03:32<12:55:02, 13.83s/it] {'loss': 0.0126, 'learning_rate': 1.686e-05, 'epoch': 8.69} 66%|██████▋ | 6637/10000 [26:03:32<12:55:02, 13.83s/it] 66%|██████▋ | 6638/10000 [26:03:46<12:56:46, 13.86s/it] {'loss': 0.0109, 'learning_rate': 1.6855e-05, 'epoch': 8.69} 66%|██████▋ | 6638/10000 [26:03:46<12:56:46, 13.86s/it] 66%|██████▋ | 6639/10000 [26:04:00<12:55:49, 13.85s/it] {'loss': 0.0084, 'learning_rate': 1.6850000000000003e-05, 'epoch': 8.69} 66%|██████▋ | 6639/10000 [26:04:00<12:55:49, 13.85s/it] 66%|██████▋ | 6640/10000 [26:04:14<12:55:18, 13.84s/it] {'loss': 0.0093, 'learning_rate': 1.6845e-05, 'epoch': 8.69} 66%|██████▋ | 6640/10000 [26:04:14<12:55:18, 13.84s/it] 66%|██████▋ | 6641/10000 [26:04:28<12:53:41, 13.82s/it] {'loss': 0.0097, 'learning_rate': 1.684e-05, 'epoch': 8.69} 66%|██████▋ | 6641/10000 [26:04:28<12:53:41, 13.82s/it] 66%|██████▋ | 6642/10000 [26:04:41<12:52:10, 13.80s/it] {'loss': 0.0071, 'learning_rate': 1.6835e-05, 'epoch': 8.69} 66%|██████▋ | 6642/10000 [26:04:42<12:52:10, 13.80s/it] 66%|██████▋ | 6643/10000 [26:04:55<12:52:53, 13.81s/it] {'loss': 0.0078, 'learning_rate': 1.683e-05, 'epoch': 8.7} 66%|██████▋ | 6643/10000 [26:04:55<12:52:53, 13.81s/it] 66%|██████▋ | 6644/10000 [26:05:09<12:51:29, 13.79s/it] {'loss': 0.0087, 'learning_rate': 1.6825000000000002e-05, 'epoch': 8.7} 66%|██████▋ | 6644/10000 [26:05:09<12:51:29, 13.79s/it] 66%|██████▋ | 6645/10000 [26:05:23<12:52:20, 13.81s/it] {'loss': 0.0102, 'learning_rate': 1.6819999999999998e-05, 'epoch': 8.7} 66%|██████▋ | 6645/10000 [26:05:23<12:52:20, 13.81s/it] 66%|██████▋ | 6646/10000 [26:05:37<12:50:32, 13.78s/it] {'loss': 0.0109, 'learning_rate': 1.6815e-05, 'epoch': 8.7} 66%|██████▋ | 6646/10000 [26:05:37<12:50:32, 13.78s/it] 66%|██████▋ | 6647/10000 [26:05:50<12:50:23, 13.79s/it] {'loss': 0.0088, 'learning_rate': 1.681e-05, 'epoch': 8.7} 66%|██████▋ | 6647/10000 [26:05:50<12:50:23, 13.79s/it] 66%|██████▋ | 6648/10000 [26:06:04<12:49:46, 13.78s/it] {'loss': 0.0166, 'learning_rate': 1.6805000000000003e-05, 'epoch': 8.7} 66%|██████▋ | 6648/10000 [26:06:04<12:49:46, 13.78s/it] 66%|██████▋ | 6649/10000 [26:06:18<12:50:46, 13.80s/it] {'loss': 0.01, 'learning_rate': 1.6800000000000002e-05, 'epoch': 8.7} 66%|██████▋ | 6649/10000 [26:06:18<12:50:46, 13.80s/it] 66%|██████▋ | 6650/10000 [26:06:32<12:52:12, 13.83s/it] {'loss': 0.0074, 'learning_rate': 1.6795e-05, 'epoch': 8.7} 66%|██████▋ | 6650/10000 [26:06:32<12:52:12, 13.83s/it] 67%|██████▋ | 6651/10000 [26:06:46<12:51:08, 13.82s/it] {'loss': 0.0096, 'learning_rate': 1.679e-05, 'epoch': 8.71} 67%|██████▋ | 6651/10000 [26:06:46<12:51:08, 13.82s/it] 67%|██████▋ | 6652/10000 [26:07:00<12:50:50, 13.81s/it] {'loss': 0.0073, 'learning_rate': 1.6785e-05, 'epoch': 8.71} 67%|██████▋ | 6652/10000 [26:07:00<12:50:50, 13.81s/it] 67%|██████▋ | 6653/10000 [26:07:13<12:52:14, 13.84s/it] {'loss': 0.009, 'learning_rate': 1.6780000000000002e-05, 'epoch': 8.71} 67%|██████▋ | 6653/10000 [26:07:14<12:52:14, 13.84s/it] 67%|██████▋ | 6654/10000 [26:07:27<12:52:17, 13.85s/it] {'loss': 0.0083, 'learning_rate': 1.6775e-05, 'epoch': 8.71} 67%|██████▋ | 6654/10000 [26:07:27<12:52:17, 13.85s/it] 67%|██████▋ | 6655/10000 [26:07:41<12:51:42, 13.84s/it] {'loss': 0.01, 'learning_rate': 1.677e-05, 'epoch': 8.71} 67%|██████▋ | 6655/10000 [26:07:41<12:51:42, 13.84s/it] 67%|██████▋ | 6656/10000 [26:07:55<12:51:43, 13.85s/it] {'loss': 0.0094, 'learning_rate': 1.6765e-05, 'epoch': 8.71} 67%|██████▋ | 6656/10000 [26:07:55<12:51:43, 13.85s/it] 67%|██████▋ | 6657/10000 [26:08:09<12:51:26, 13.85s/it] {'loss': 0.0083, 'learning_rate': 1.6760000000000002e-05, 'epoch': 8.71} 67%|██████▋ | 6657/10000 [26:08:09<12:51:26, 13.85s/it] 67%|██████▋ | 6658/10000 [26:08:23<12:50:21, 13.83s/it] {'loss': 0.0073, 'learning_rate': 1.6755e-05, 'epoch': 8.71} 67%|██████▋ | 6658/10000 [26:08:23<12:50:21, 13.83s/it] 67%|██████▋ | 6659/10000 [26:08:36<12:49:30, 13.82s/it] {'loss': 0.0084, 'learning_rate': 1.675e-05, 'epoch': 8.72} 67%|██████▋ | 6659/10000 [26:08:36<12:49:30, 13.82s/it] 67%|██████▋ | 6660/10000 [26:08:50<12:49:11, 13.82s/it] {'loss': 0.0072, 'learning_rate': 1.6745e-05, 'epoch': 8.72} 67%|██████▋ | 6660/10000 [26:08:50<12:49:11, 13.82s/it] 67%|██████▋ | 6661/10000 [26:09:04<12:49:27, 13.83s/it] {'loss': 0.0091, 'learning_rate': 1.674e-05, 'epoch': 8.72} 67%|██████▋ | 6661/10000 [26:09:04<12:49:27, 13.83s/it] 67%|██████▋ | 6662/10000 [26:09:18<12:48:00, 13.80s/it] {'loss': 0.008, 'learning_rate': 1.6735e-05, 'epoch': 8.72} 67%|██████▋ | 6662/10000 [26:09:18<12:48:00, 13.80s/it] 67%|██████▋ | 6663/10000 [26:09:32<12:47:13, 13.79s/it] {'loss': 0.012, 'learning_rate': 1.673e-05, 'epoch': 8.72} 67%|██████▋ | 6663/10000 [26:09:32<12:47:13, 13.79s/it] 67%|██████▋ | 6664/10000 [26:09:45<12:47:21, 13.80s/it] {'loss': 0.0092, 'learning_rate': 1.6725000000000003e-05, 'epoch': 8.72} 67%|██████▋ | 6664/10000 [26:09:45<12:47:21, 13.80s/it] 67%|██████▋ | 6665/10000 [26:09:59<12:47:29, 13.81s/it] {'loss': 0.0115, 'learning_rate': 1.672e-05, 'epoch': 8.72} 67%|██████▋ | 6665/10000 [26:09:59<12:47:29, 13.81s/it] 67%|██████▋ | 6666/10000 [26:10:13<12:46:07, 13.79s/it] {'loss': 0.0091, 'learning_rate': 1.6715000000000002e-05, 'epoch': 8.73} 67%|██████▋ | 6666/10000 [26:10:13<12:46:07, 13.79s/it] 67%|██████▋ | 6667/10000 [26:10:27<12:46:06, 13.79s/it] {'loss': 0.0094, 'learning_rate': 1.671e-05, 'epoch': 8.73} 67%|██████▋ | 6667/10000 [26:10:27<12:46:06, 13.79s/it] 67%|██████▋ | 6668/10000 [26:10:41<12:47:13, 13.82s/it] {'loss': 0.0079, 'learning_rate': 1.6705e-05, 'epoch': 8.73} 67%|██████▋ | 6668/10000 [26:10:41<12:47:13, 13.82s/it] 67%|██████▋ | 6669/10000 [26:10:54<12:45:32, 13.79s/it] {'loss': 0.014, 'learning_rate': 1.6700000000000003e-05, 'epoch': 8.73} 67%|██████▋ | 6669/10000 [26:10:54<12:45:32, 13.79s/it] 67%|██████▋ | 6670/10000 [26:11:08<12:44:53, 13.78s/it] {'loss': 0.0083, 'learning_rate': 1.6695e-05, 'epoch': 8.73} 67%|██████▋ | 6670/10000 [26:11:08<12:44:53, 13.78s/it] 67%|██████▋ | 6671/10000 [26:11:22<12:46:08, 13.81s/it] {'loss': 0.0094, 'learning_rate': 1.669e-05, 'epoch': 8.73} 67%|██████▋ | 6671/10000 [26:11:22<12:46:08, 13.81s/it] 67%|██████▋ | 6672/10000 [26:11:36<12:44:39, 13.79s/it] {'loss': 0.0131, 'learning_rate': 1.6685e-05, 'epoch': 8.73} 67%|██████▋ | 6672/10000 [26:11:36<12:44:39, 13.79s/it] 67%|██████▋ | 6673/10000 [26:11:50<12:43:45, 13.77s/it] {'loss': 0.0094, 'learning_rate': 1.668e-05, 'epoch': 8.73} 67%|██████▋ | 6673/10000 [26:11:50<12:43:45, 13.77s/it] 67%|██████▋ | 6674/10000 [26:12:03<12:43:52, 13.78s/it] {'loss': 0.0102, 'learning_rate': 1.6675000000000002e-05, 'epoch': 8.74} 67%|██████▋ | 6674/10000 [26:12:03<12:43:52, 13.78s/it] 67%|██████▋ | 6675/10000 [26:12:17<12:41:14, 13.74s/it] {'loss': 0.009, 'learning_rate': 1.6669999999999998e-05, 'epoch': 8.74} 67%|██████▋ | 6675/10000 [26:12:17<12:41:14, 13.74s/it] 67%|██████▋ | 6676/10000 [26:12:31<12:41:51, 13.75s/it] {'loss': 0.0092, 'learning_rate': 1.6665e-05, 'epoch': 8.74} 67%|██████▋ | 6676/10000 [26:12:31<12:41:51, 13.75s/it] 67%|██████▋ | 6677/10000 [26:12:45<12:44:24, 13.80s/it] {'loss': 0.0093, 'learning_rate': 1.666e-05, 'epoch': 8.74} 67%|██████▋ | 6677/10000 [26:12:45<12:44:24, 13.80s/it] 67%|██████▋ | 6678/10000 [26:12:58<12:42:57, 13.78s/it] {'loss': 0.0093, 'learning_rate': 1.6655000000000002e-05, 'epoch': 8.74} 67%|██████▋ | 6678/10000 [26:12:58<12:42:57, 13.78s/it] 67%|██████▋ | 6679/10000 [26:13:12<12:47:17, 13.86s/it] {'loss': 0.0092, 'learning_rate': 1.665e-05, 'epoch': 8.74} 67%|██████▋ | 6679/10000 [26:13:12<12:47:17, 13.86s/it] 67%|██████▋ | 6680/10000 [26:13:26<12:45:30, 13.83s/it] {'loss': 0.0099, 'learning_rate': 1.6645e-05, 'epoch': 8.74} 67%|██████▋ | 6680/10000 [26:13:26<12:45:30, 13.83s/it] 67%|██████▋ | 6681/10000 [26:13:40<12:44:58, 13.83s/it] {'loss': 0.0083, 'learning_rate': 1.664e-05, 'epoch': 8.74} 67%|██████▋ | 6681/10000 [26:13:40<12:44:58, 13.83s/it] 67%|██████▋ | 6682/10000 [26:13:54<12:44:35, 13.83s/it] {'loss': 0.0098, 'learning_rate': 1.6635e-05, 'epoch': 8.75} 67%|██████▋ | 6682/10000 [26:13:54<12:44:35, 13.83s/it] 67%|██████▋ | 6683/10000 [26:14:08<12:43:28, 13.81s/it] {'loss': 0.0082, 'learning_rate': 1.6630000000000002e-05, 'epoch': 8.75} 67%|██████▋ | 6683/10000 [26:14:08<12:43:28, 13.81s/it] 67%|██████▋ | 6684/10000 [26:14:21<12:42:05, 13.79s/it] {'loss': 0.0069, 'learning_rate': 1.6625e-05, 'epoch': 8.75} 67%|██████▋ | 6684/10000 [26:14:21<12:42:05, 13.79s/it] 67%|██████▋ | 6685/10000 [26:14:35<12:43:36, 13.82s/it] {'loss': 0.0073, 'learning_rate': 1.662e-05, 'epoch': 8.75} 67%|██████▋ | 6685/10000 [26:14:35<12:43:36, 13.82s/it] 67%|██████▋ | 6686/10000 [26:14:49<12:44:29, 13.84s/it] {'loss': 0.0079, 'learning_rate': 1.6615e-05, 'epoch': 8.75} 67%|██████▋ | 6686/10000 [26:14:49<12:44:29, 13.84s/it] 67%|██████▋ | 6687/10000 [26:15:03<12:44:22, 13.84s/it] {'loss': 0.0083, 'learning_rate': 1.6610000000000002e-05, 'epoch': 8.75} 67%|██████▋ | 6687/10000 [26:15:03<12:44:22, 13.84s/it] 67%|██████▋ | 6688/10000 [26:15:17<12:42:52, 13.82s/it] {'loss': 0.0079, 'learning_rate': 1.6605e-05, 'epoch': 8.75} 67%|██████▋ | 6688/10000 [26:15:17<12:42:52, 13.82s/it] 67%|██████▋ | 6689/10000 [26:15:31<12:43:43, 13.84s/it] {'loss': 0.0116, 'learning_rate': 1.66e-05, 'epoch': 8.76} 67%|██████▋ | 6689/10000 [26:15:31<12:43:43, 13.84s/it] 67%|██████▋ | 6690/10000 [26:15:45<12:43:45, 13.84s/it] {'loss': 0.0099, 'learning_rate': 1.6595e-05, 'epoch': 8.76} 67%|██████▋ | 6690/10000 [26:15:45<12:43:45, 13.84s/it] 67%|██████▋ | 6691/10000 [26:15:58<12:43:10, 13.84s/it] {'loss': 0.0067, 'learning_rate': 1.659e-05, 'epoch': 8.76} 67%|██████▋ | 6691/10000 [26:15:58<12:43:10, 13.84s/it] 67%|██████▋ | 6692/10000 [26:16:12<12:42:34, 13.83s/it] {'loss': 0.0095, 'learning_rate': 1.6585e-05, 'epoch': 8.76} 67%|██████▋ | 6692/10000 [26:16:12<12:42:34, 13.83s/it] 67%|██████▋ | 6693/10000 [26:16:26<12:42:51, 13.84s/it] {'loss': 0.0078, 'learning_rate': 1.658e-05, 'epoch': 8.76} 67%|██████▋ | 6693/10000 [26:16:26<12:42:51, 13.84s/it] 67%|██████▋ | 6694/10000 [26:16:40<12:43:47, 13.86s/it] {'loss': 0.0094, 'learning_rate': 1.6575000000000003e-05, 'epoch': 8.76} 67%|██████▋ | 6694/10000 [26:16:40<12:43:47, 13.86s/it] 67%|██████▋ | 6695/10000 [26:16:54<12:47:02, 13.93s/it] {'loss': 0.0098, 'learning_rate': 1.657e-05, 'epoch': 8.76} 67%|██████▋ | 6695/10000 [26:16:54<12:47:02, 13.93s/it] 67%|██████▋ | 6696/10000 [26:17:08<12:44:59, 13.89s/it] {'loss': 0.0103, 'learning_rate': 1.6565e-05, 'epoch': 8.76} 67%|██████▋ | 6696/10000 [26:17:08<12:44:59, 13.89s/it] 67%|██████▋ | 6697/10000 [26:17:22<12:43:58, 13.88s/it] {'loss': 0.0106, 'learning_rate': 1.656e-05, 'epoch': 8.77} 67%|██████▋ | 6697/10000 [26:17:22<12:43:58, 13.88s/it] 67%|██████▋ | 6698/10000 [26:17:36<12:46:28, 13.93s/it] {'loss': 0.0094, 'learning_rate': 1.6555e-05, 'epoch': 8.77} 67%|██████▋ | 6698/10000 [26:17:36<12:46:28, 13.93s/it] 67%|██████▋ | 6699/10000 [26:17:50<12:48:16, 13.96s/it] {'loss': 0.0205, 'learning_rate': 1.6550000000000002e-05, 'epoch': 8.77} 67%|██████▋ | 6699/10000 [26:17:50<12:48:16, 13.96s/it] 67%|██████▋ | 6700/10000 [26:18:04<12:45:19, 13.91s/it] {'loss': 0.0138, 'learning_rate': 1.6545e-05, 'epoch': 8.77} 67%|██████▋ | 6700/10000 [26:18:04<12:45:19, 13.91s/it] 67%|██████▋ | 6701/10000 [26:18:17<12:43:52, 13.89s/it] {'loss': 0.0083, 'learning_rate': 1.654e-05, 'epoch': 8.77} 67%|██████▋ | 6701/10000 [26:18:17<12:43:52, 13.89s/it] 67%|██████▋ | 6702/10000 [26:18:31<12:43:41, 13.89s/it] {'loss': 0.013, 'learning_rate': 1.6535e-05, 'epoch': 8.77} 67%|██████▋ | 6702/10000 [26:18:31<12:43:41, 13.89s/it] 67%|██████▋ | 6703/10000 [26:18:45<12:45:41, 13.93s/it] {'loss': 0.0093, 'learning_rate': 1.6530000000000003e-05, 'epoch': 8.77} 67%|██████▋ | 6703/10000 [26:18:45<12:45:41, 13.93s/it] 67%|██████▋ | 6704/10000 [26:18:59<12:43:10, 13.89s/it] {'loss': 0.0083, 'learning_rate': 1.6525000000000002e-05, 'epoch': 8.77} 67%|██████▋ | 6704/10000 [26:18:59<12:43:10, 13.89s/it] 67%|██████▋ | 6705/10000 [26:19:13<12:41:32, 13.87s/it] {'loss': 0.0083, 'learning_rate': 1.652e-05, 'epoch': 8.78} 67%|██████▋ | 6705/10000 [26:19:13<12:41:32, 13.87s/it] 67%|██████▋ | 6706/10000 [26:19:27<12:39:45, 13.84s/it] {'loss': 0.0098, 'learning_rate': 1.6515e-05, 'epoch': 8.78} 67%|██████▋ | 6706/10000 [26:19:27<12:39:45, 13.84s/it] 67%|██████▋ | 6707/10000 [26:19:40<12:38:13, 13.82s/it] {'loss': 0.0098, 'learning_rate': 1.651e-05, 'epoch': 8.78} 67%|██████▋ | 6707/10000 [26:19:40<12:38:13, 13.82s/it] 67%|██████▋ | 6708/10000 [26:19:54<12:36:48, 13.79s/it] {'loss': 0.0073, 'learning_rate': 1.6505000000000002e-05, 'epoch': 8.78} 67%|██████▋ | 6708/10000 [26:19:54<12:36:48, 13.79s/it] 67%|██████▋ | 6709/10000 [26:20:08<12:36:50, 13.80s/it] {'loss': 0.0101, 'learning_rate': 1.65e-05, 'epoch': 8.78} 67%|██████▋ | 6709/10000 [26:20:08<12:36:50, 13.80s/it] 67%|██████▋ | 6710/10000 [26:20:22<12:36:56, 13.80s/it] {'loss': 0.0095, 'learning_rate': 1.6495e-05, 'epoch': 8.78} 67%|██████▋ | 6710/10000 [26:20:22<12:36:56, 13.80s/it] 67%|██████▋ | 6711/10000 [26:20:36<12:35:46, 13.79s/it] {'loss': 0.0101, 'learning_rate': 1.649e-05, 'epoch': 8.78} 67%|██████▋ | 6711/10000 [26:20:36<12:35:46, 13.79s/it] 67%|██████▋ | 6712/10000 [26:20:49<12:36:48, 13.81s/it] {'loss': 0.0093, 'learning_rate': 1.6485e-05, 'epoch': 8.79} 67%|██████▋ | 6712/10000 [26:20:49<12:36:48, 13.81s/it] 67%|██████▋ | 6713/10000 [26:21:03<12:34:26, 13.77s/it] {'loss': 0.0104, 'learning_rate': 1.648e-05, 'epoch': 8.79} 67%|██████▋ | 6713/10000 [26:21:03<12:34:26, 13.77s/it] 67%|██████▋ | 6714/10000 [26:21:17<12:35:39, 13.80s/it] {'loss': 0.0072, 'learning_rate': 1.6475e-05, 'epoch': 8.79} 67%|██████▋ | 6714/10000 [26:21:17<12:35:39, 13.80s/it] 67%|██████▋ | 6715/10000 [26:21:31<12:35:01, 13.79s/it] {'loss': 0.0119, 'learning_rate': 1.6470000000000003e-05, 'epoch': 8.79} 67%|██████▋ | 6715/10000 [26:21:31<12:35:01, 13.79s/it] 67%|██████▋ | 6716/10000 [26:21:45<12:35:32, 13.80s/it] {'loss': 0.0084, 'learning_rate': 1.6465e-05, 'epoch': 8.79} 67%|██████▋ | 6716/10000 [26:21:45<12:35:32, 13.80s/it] 67%|██████▋ | 6717/10000 [26:21:58<12:36:17, 13.82s/it] {'loss': 0.0087, 'learning_rate': 1.646e-05, 'epoch': 8.79} 67%|██████▋ | 6717/10000 [26:21:58<12:36:17, 13.82s/it] 67%|██████▋ | 6718/10000 [26:22:12<12:36:35, 13.83s/it] {'loss': 0.0083, 'learning_rate': 1.6455e-05, 'epoch': 8.79} 67%|██████▋ | 6718/10000 [26:22:12<12:36:35, 13.83s/it] 67%|██████▋ | 6719/10000 [26:22:26<12:37:03, 13.84s/it] {'loss': 0.0165, 'learning_rate': 1.645e-05, 'epoch': 8.79} 67%|██████▋ | 6719/10000 [26:22:26<12:37:03, 13.84s/it] 67%|██████▋ | 6720/10000 [26:22:40<12:37:24, 13.86s/it] {'loss': 0.0091, 'learning_rate': 1.6445000000000003e-05, 'epoch': 8.8} 67%|██████▋ | 6720/10000 [26:22:40<12:37:24, 13.86s/it] 67%|██████▋ | 6721/10000 [26:22:54<12:40:14, 13.91s/it] {'loss': 0.0089, 'learning_rate': 1.644e-05, 'epoch': 8.8} 67%|██████▋ | 6721/10000 [26:22:54<12:40:14, 13.91s/it] 67%|██████▋ | 6722/10000 [26:23:08<12:39:47, 13.91s/it] {'loss': 0.01, 'learning_rate': 1.6435e-05, 'epoch': 8.8} 67%|██████▋ | 6722/10000 [26:23:08<12:39:47, 13.91s/it] 67%|██████▋ | 6723/10000 [26:23:22<12:39:14, 13.90s/it] {'loss': 0.0084, 'learning_rate': 1.643e-05, 'epoch': 8.8} 67%|██████▋ | 6723/10000 [26:23:22<12:39:14, 13.90s/it] 67%|██████▋ | 6724/10000 [26:23:36<12:37:20, 13.87s/it] {'loss': 0.0079, 'learning_rate': 1.6425000000000003e-05, 'epoch': 8.8} 67%|██████▋ | 6724/10000 [26:23:36<12:37:20, 13.87s/it] 67%|██████▋ | 6725/10000 [26:23:50<12:36:56, 13.87s/it] {'loss': 0.0105, 'learning_rate': 1.6420000000000002e-05, 'epoch': 8.8} 67%|██████▋ | 6725/10000 [26:23:50<12:36:56, 13.87s/it] 67%|██████▋ | 6726/10000 [26:24:03<12:35:49, 13.85s/it] {'loss': 0.0092, 'learning_rate': 1.6415e-05, 'epoch': 8.8} 67%|██████▋ | 6726/10000 [26:24:03<12:35:49, 13.85s/it] 67%|██████▋ | 6727/10000 [26:24:17<12:36:30, 13.87s/it] {'loss': 0.0095, 'learning_rate': 1.641e-05, 'epoch': 8.8} 67%|██████▋ | 6727/10000 [26:24:17<12:36:30, 13.87s/it] 67%|██████▋ | 6728/10000 [26:24:31<12:35:24, 13.85s/it] {'loss': 0.0084, 'learning_rate': 1.6405e-05, 'epoch': 8.81} 67%|██████▋ | 6728/10000 [26:24:31<12:35:24, 13.85s/it] 67%|██████▋ | 6729/10000 [26:24:45<12:33:59, 13.83s/it] {'loss': 0.0104, 'learning_rate': 1.6400000000000002e-05, 'epoch': 8.81} 67%|██████▋ | 6729/10000 [26:24:45<12:33:59, 13.83s/it] 67%|██████▋ | 6730/10000 [26:24:59<12:34:22, 13.84s/it] {'loss': 0.008, 'learning_rate': 1.6395e-05, 'epoch': 8.81} 67%|██████▋ | 6730/10000 [26:24:59<12:34:22, 13.84s/it] 67%|██████▋ | 6731/10000 [26:25:13<12:34:24, 13.85s/it] {'loss': 0.0067, 'learning_rate': 1.639e-05, 'epoch': 8.81} 67%|██████▋ | 6731/10000 [26:25:13<12:34:24, 13.85s/it] 67%|██████▋ | 6732/10000 [26:25:26<12:34:34, 13.85s/it] {'loss': 0.0072, 'learning_rate': 1.6385e-05, 'epoch': 8.81} 67%|██████▋ | 6732/10000 [26:25:26<12:34:34, 13.85s/it] 67%|██████▋ | 6733/10000 [26:25:40<12:32:56, 13.83s/it] {'loss': 0.0076, 'learning_rate': 1.6380000000000002e-05, 'epoch': 8.81} 67%|██████▋ | 6733/10000 [26:25:40<12:32:56, 13.83s/it] 67%|██████▋ | 6734/10000 [26:25:54<12:31:17, 13.80s/it] {'loss': 0.0105, 'learning_rate': 1.6375e-05, 'epoch': 8.81} 67%|██████▋ | 6734/10000 [26:25:54<12:31:17, 13.80s/it] 67%|██████▋ | 6735/10000 [26:26:08<12:31:32, 13.81s/it] {'loss': 0.0095, 'learning_rate': 1.637e-05, 'epoch': 8.82} 67%|██████▋ | 6735/10000 [26:26:08<12:31:32, 13.81s/it] 67%|██████▋ | 6736/10000 [26:26:22<12:30:17, 13.79s/it] {'loss': 0.008, 'learning_rate': 1.6365e-05, 'epoch': 8.82} 67%|██████▋ | 6736/10000 [26:26:22<12:30:17, 13.79s/it] 67%|██████▋ | 6737/10000 [26:26:35<12:31:28, 13.82s/it] {'loss': 0.0084, 'learning_rate': 1.636e-05, 'epoch': 8.82} 67%|██████▋ | 6737/10000 [26:26:35<12:31:28, 13.82s/it] 67%|██████▋ | 6738/10000 [26:26:49<12:32:27, 13.84s/it] {'loss': 0.0077, 'learning_rate': 1.6355000000000002e-05, 'epoch': 8.82} 67%|██████▋ | 6738/10000 [26:26:49<12:32:27, 13.84s/it] 67%|██████▋ | 6739/10000 [26:27:03<12:32:45, 13.85s/it] {'loss': 0.0087, 'learning_rate': 1.635e-05, 'epoch': 8.82} 67%|██████▋ | 6739/10000 [26:27:03<12:32:45, 13.85s/it] 67%|██████▋ | 6740/10000 [26:27:17<12:31:28, 13.83s/it] {'loss': 0.0091, 'learning_rate': 1.6345000000000004e-05, 'epoch': 8.82} 67%|██████▋ | 6740/10000 [26:27:17<12:31:28, 13.83s/it] 67%|██████▋ | 6741/10000 [26:27:31<12:31:21, 13.83s/it] {'loss': 0.0081, 'learning_rate': 1.634e-05, 'epoch': 8.82} 67%|██████▋ | 6741/10000 [26:27:31<12:31:21, 13.83s/it] 67%|██████▋ | 6742/10000 [26:27:45<12:32:20, 13.86s/it] {'loss': 0.0068, 'learning_rate': 1.6335e-05, 'epoch': 8.82} 67%|██████▋ | 6742/10000 [26:27:45<12:32:20, 13.86s/it] 67%|██████▋ | 6743/10000 [26:27:59<12:33:32, 13.88s/it] {'loss': 0.0094, 'learning_rate': 1.633e-05, 'epoch': 8.83} 67%|██████▋ | 6743/10000 [26:27:59<12:33:32, 13.88s/it] 67%|██████▋ | 6744/10000 [26:28:12<12:31:18, 13.84s/it] {'loss': 0.0111, 'learning_rate': 1.6325e-05, 'epoch': 8.83} 67%|██████▋ | 6744/10000 [26:28:12<12:31:18, 13.84s/it] 67%|██████▋ | 6745/10000 [26:28:26<12:29:41, 13.82s/it] {'loss': 0.011, 'learning_rate': 1.6320000000000003e-05, 'epoch': 8.83} 67%|██████▋ | 6745/10000 [26:28:26<12:29:41, 13.82s/it] 67%|██████▋ | 6746/10000 [26:28:40<12:29:51, 13.83s/it] {'loss': 0.013, 'learning_rate': 1.6315e-05, 'epoch': 8.83} 67%|██████▋ | 6746/10000 [26:28:40<12:29:51, 13.83s/it] 67%|██████▋ | 6747/10000 [26:28:54<12:27:58, 13.80s/it] {'loss': 0.0079, 'learning_rate': 1.631e-05, 'epoch': 8.83} 67%|██████▋ | 6747/10000 [26:28:54<12:27:58, 13.80s/it] 67%|██████▋ | 6748/10000 [26:29:08<12:29:36, 13.83s/it] {'loss': 0.0093, 'learning_rate': 1.6305e-05, 'epoch': 8.83} 67%|██████▋ | 6748/10000 [26:29:08<12:29:36, 13.83s/it] 67%|██████▋ | 6749/10000 [26:29:21<12:29:23, 13.83s/it] {'loss': 0.0098, 'learning_rate': 1.63e-05, 'epoch': 8.83} 67%|██████▋ | 6749/10000 [26:29:22<12:29:23, 13.83s/it] 68%|██████▊ | 6750/10000 [26:29:35<12:29:52, 13.84s/it] {'loss': 0.01, 'learning_rate': 1.6295000000000002e-05, 'epoch': 8.84} 68%|██████▊ | 6750/10000 [26:29:35<12:29:52, 13.84s/it] 68%|██████▊ | 6751/10000 [26:29:49<12:29:29, 13.84s/it] {'loss': 0.0105, 'learning_rate': 1.6289999999999998e-05, 'epoch': 8.84} 68%|██████▊ | 6751/10000 [26:29:49<12:29:29, 13.84s/it] 68%|██████▊ | 6752/10000 [26:30:03<12:29:34, 13.85s/it] {'loss': 0.0085, 'learning_rate': 1.6285e-05, 'epoch': 8.84} 68%|██████▊ | 6752/10000 [26:30:03<12:29:34, 13.85s/it] 68%|██████▊ | 6753/10000 [26:30:17<12:31:25, 13.89s/it] {'loss': 0.0091, 'learning_rate': 1.628e-05, 'epoch': 8.84} 68%|██████▊ | 6753/10000 [26:30:17<12:31:25, 13.89s/it] 68%|██████▊ | 6754/10000 [26:30:31<12:30:50, 13.88s/it] {'loss': 0.0117, 'learning_rate': 1.6275000000000003e-05, 'epoch': 8.84} 68%|██████▊ | 6754/10000 [26:30:31<12:30:50, 13.88s/it] 68%|██████▊ | 6755/10000 [26:30:45<12:29:27, 13.86s/it] {'loss': 0.0072, 'learning_rate': 1.6270000000000002e-05, 'epoch': 8.84} 68%|██████▊ | 6755/10000 [26:30:45<12:29:27, 13.86s/it] 68%|██████▊ | 6756/10000 [26:30:58<12:26:55, 13.82s/it] {'loss': 0.0072, 'learning_rate': 1.6265e-05, 'epoch': 8.84} 68%|██████▊ | 6756/10000 [26:30:58<12:26:55, 13.82s/it] 68%|██████▊ | 6757/10000 [26:31:12<12:25:40, 13.80s/it] {'loss': 0.0122, 'learning_rate': 1.626e-05, 'epoch': 8.84} 68%|██████▊ | 6757/10000 [26:31:12<12:25:40, 13.80s/it] 68%|██████▊ | 6758/10000 [26:31:26<12:25:43, 13.80s/it] {'loss': 0.0096, 'learning_rate': 1.6255e-05, 'epoch': 8.85} 68%|██████▊ | 6758/10000 [26:31:26<12:25:43, 13.80s/it] 68%|██████▊ | 6759/10000 [26:31:40<12:26:17, 13.82s/it] {'loss': 0.0078, 'learning_rate': 1.6250000000000002e-05, 'epoch': 8.85} 68%|██████▊ | 6759/10000 [26:31:40<12:26:17, 13.82s/it] 68%|██████▊ | 6760/10000 [26:31:54<12:28:19, 13.86s/it] {'loss': 0.0081, 'learning_rate': 1.6245e-05, 'epoch': 8.85} 68%|██████▊ | 6760/10000 [26:31:54<12:28:19, 13.86s/it] 68%|██████▊ | 6761/10000 [26:32:08<12:28:12, 13.86s/it] {'loss': 0.0106, 'learning_rate': 1.624e-05, 'epoch': 8.85} 68%|██████▊ | 6761/10000 [26:32:08<12:28:12, 13.86s/it] 68%|██████▊ | 6762/10000 [26:32:21<12:27:36, 13.85s/it] {'loss': 0.0088, 'learning_rate': 1.6235e-05, 'epoch': 8.85} 68%|██████▊ | 6762/10000 [26:32:22<12:27:36, 13.85s/it] 68%|██████▊ | 6763/10000 [26:32:35<12:25:39, 13.82s/it] {'loss': 0.0093, 'learning_rate': 1.6230000000000002e-05, 'epoch': 8.85} 68%|██████▊ | 6763/10000 [26:32:35<12:25:39, 13.82s/it] 68%|██████▊ | 6764/10000 [26:32:49<12:26:04, 13.83s/it] {'loss': 0.0113, 'learning_rate': 1.6225e-05, 'epoch': 8.85} 68%|██████▊ | 6764/10000 [26:32:49<12:26:04, 13.83s/it] 68%|██████▊ | 6765/10000 [26:33:03<12:26:52, 13.85s/it] {'loss': 0.006, 'learning_rate': 1.622e-05, 'epoch': 8.85} 68%|██████▊ | 6765/10000 [26:33:03<12:26:52, 13.85s/it] 68%|██████▊ | 6766/10000 [26:33:17<12:25:47, 13.84s/it] {'loss': 0.0099, 'learning_rate': 1.6215e-05, 'epoch': 8.86} 68%|██████▊ | 6766/10000 [26:33:17<12:25:47, 13.84s/it] 68%|██████▊ | 6767/10000 [26:33:31<12:26:23, 13.85s/it] {'loss': 0.0084, 'learning_rate': 1.621e-05, 'epoch': 8.86} 68%|██████▊ | 6767/10000 [26:33:31<12:26:23, 13.85s/it] 68%|██████▊ | 6768/10000 [26:33:45<12:27:37, 13.88s/it] {'loss': 0.0065, 'learning_rate': 1.6205e-05, 'epoch': 8.86} 68%|██████▊ | 6768/10000 [26:33:45<12:27:37, 13.88s/it] 68%|██████▊ | 6769/10000 [26:33:59<12:27:27, 13.88s/it] {'loss': 0.0095, 'learning_rate': 1.62e-05, 'epoch': 8.86} 68%|██████▊ | 6769/10000 [26:33:59<12:27:27, 13.88s/it] 68%|██████▊ | 6770/10000 [26:34:12<12:26:24, 13.87s/it] {'loss': 0.0068, 'learning_rate': 1.6195000000000003e-05, 'epoch': 8.86} 68%|██████▊ | 6770/10000 [26:34:12<12:26:24, 13.87s/it] 68%|██████▊ | 6771/10000 [26:34:26<12:26:23, 13.87s/it] {'loss': 0.0088, 'learning_rate': 1.619e-05, 'epoch': 8.86} 68%|██████▊ | 6771/10000 [26:34:26<12:26:23, 13.87s/it] 68%|██████▊ | 6772/10000 [26:34:40<12:24:27, 13.84s/it] {'loss': 0.0089, 'learning_rate': 1.6185000000000002e-05, 'epoch': 8.86} 68%|██████▊ | 6772/10000 [26:34:40<12:24:27, 13.84s/it] 68%|██████▊ | 6773/10000 [26:34:54<12:24:35, 13.84s/it] {'loss': 0.0129, 'learning_rate': 1.618e-05, 'epoch': 8.87} 68%|██████▊ | 6773/10000 [26:34:54<12:24:35, 13.84s/it] 68%|██████▊ | 6774/10000 [26:35:08<12:23:15, 13.82s/it] {'loss': 0.0077, 'learning_rate': 1.6175e-05, 'epoch': 8.87} 68%|██████▊ | 6774/10000 [26:35:08<12:23:15, 13.82s/it] 68%|██████▊ | 6775/10000 [26:35:21<12:23:41, 13.84s/it] {'loss': 0.0076, 'learning_rate': 1.6170000000000003e-05, 'epoch': 8.87} 68%|██████▊ | 6775/10000 [26:35:22<12:23:41, 13.84s/it] 68%|██████▊ | 6776/10000 [26:35:35<12:22:06, 13.81s/it] {'loss': 0.0094, 'learning_rate': 1.6165e-05, 'epoch': 8.87} 68%|██████▊ | 6776/10000 [26:35:35<12:22:06, 13.81s/it] 68%|██████▊ | 6777/10000 [26:35:49<12:22:40, 13.83s/it] {'loss': 0.0099, 'learning_rate': 1.616e-05, 'epoch': 8.87} 68%|██████▊ | 6777/10000 [26:35:49<12:22:40, 13.83s/it] 68%|██████▊ | 6778/10000 [26:36:03<12:23:01, 13.84s/it] {'loss': 0.0098, 'learning_rate': 1.6155e-05, 'epoch': 8.87} 68%|██████▊ | 6778/10000 [26:36:03<12:23:01, 13.84s/it] 68%|██████▊ | 6779/10000 [26:36:17<12:22:36, 13.83s/it] {'loss': 0.0075, 'learning_rate': 1.6150000000000003e-05, 'epoch': 8.87} 68%|██████▊ | 6779/10000 [26:36:17<12:22:36, 13.83s/it] 68%|██████▊ | 6780/10000 [26:36:31<12:21:53, 13.82s/it] {'loss': 0.0079, 'learning_rate': 1.6145000000000002e-05, 'epoch': 8.87} 68%|██████▊ | 6780/10000 [26:36:31<12:21:53, 13.82s/it] 68%|██████▊ | 6781/10000 [26:36:44<12:19:35, 13.79s/it] {'loss': 0.0081, 'learning_rate': 1.6139999999999998e-05, 'epoch': 8.88} 68%|██████▊ | 6781/10000 [26:36:44<12:19:35, 13.79s/it] 68%|██████▊ | 6782/10000 [26:36:58<12:20:32, 13.81s/it] {'loss': 0.0099, 'learning_rate': 1.6135e-05, 'epoch': 8.88} 68%|██████▊ | 6782/10000 [26:36:58<12:20:32, 13.81s/it] 68%|██████▊ | 6783/10000 [26:37:12<12:19:42, 13.80s/it] {'loss': 0.0074, 'learning_rate': 1.613e-05, 'epoch': 8.88} 68%|██████▊ | 6783/10000 [26:37:12<12:19:42, 13.80s/it] 68%|██████▊ | 6784/10000 [26:37:26<12:19:01, 13.79s/it] {'loss': 0.0093, 'learning_rate': 1.6125000000000002e-05, 'epoch': 8.88} 68%|██████▊ | 6784/10000 [26:37:26<12:19:01, 13.79s/it] 68%|██████▊ | 6785/10000 [26:37:40<12:19:57, 13.81s/it] {'loss': 0.0087, 'learning_rate': 1.612e-05, 'epoch': 8.88} 68%|██████▊ | 6785/10000 [26:37:40<12:19:57, 13.81s/it] 68%|██████▊ | 6786/10000 [26:37:53<12:19:34, 13.81s/it] {'loss': 0.0066, 'learning_rate': 1.6115e-05, 'epoch': 8.88} 68%|██████▊ | 6786/10000 [26:37:53<12:19:34, 13.81s/it] 68%|██████▊ | 6787/10000 [26:38:07<12:18:16, 13.79s/it] {'loss': 0.0078, 'learning_rate': 1.611e-05, 'epoch': 8.88} 68%|██████▊ | 6787/10000 [26:38:07<12:18:16, 13.79s/it] 68%|██████▊ | 6788/10000 [26:38:21<12:20:07, 13.83s/it] {'loss': 0.0077, 'learning_rate': 1.6105e-05, 'epoch': 8.88} 68%|██████▊ | 6788/10000 [26:38:21<12:20:07, 13.83s/it] 68%|██████▊ | 6789/10000 [26:38:35<12:19:17, 13.81s/it] {'loss': 0.0098, 'learning_rate': 1.6100000000000002e-05, 'epoch': 8.89} 68%|██████▊ | 6789/10000 [26:38:35<12:19:17, 13.81s/it] 68%|██████▊ | 6790/10000 [26:38:49<12:20:40, 13.84s/it] {'loss': 0.0115, 'learning_rate': 1.6095e-05, 'epoch': 8.89} 68%|██████▊ | 6790/10000 [26:38:49<12:20:40, 13.84s/it] 68%|██████▊ | 6791/10000 [26:39:03<12:21:11, 13.86s/it] {'loss': 0.0077, 'learning_rate': 1.609e-05, 'epoch': 8.89} 68%|██████▊ | 6791/10000 [26:39:03<12:21:11, 13.86s/it] 68%|██████▊ | 6792/10000 [26:39:17<12:22:46, 13.89s/it] {'loss': 0.0069, 'learning_rate': 1.6085e-05, 'epoch': 8.89} 68%|██████▊ | 6792/10000 [26:39:17<12:22:46, 13.89s/it] 68%|██████▊ | 6793/10000 [26:39:30<12:21:30, 13.87s/it] {'loss': 0.0093, 'learning_rate': 1.6080000000000002e-05, 'epoch': 8.89} 68%|██████▊ | 6793/10000 [26:39:30<12:21:30, 13.87s/it] 68%|██████▊ | 6794/10000 [26:39:44<12:22:47, 13.90s/it] {'loss': 0.0081, 'learning_rate': 1.6075e-05, 'epoch': 8.89} 68%|██████▊ | 6794/10000 [26:39:44<12:22:47, 13.90s/it] 68%|██████▊ | 6795/10000 [26:39:58<12:20:32, 13.86s/it] {'loss': 0.0093, 'learning_rate': 1.607e-05, 'epoch': 8.89} 68%|██████▊ | 6795/10000 [26:39:58<12:20:32, 13.86s/it] 68%|██████▊ | 6796/10000 [26:40:12<12:22:04, 13.90s/it] {'loss': 0.0079, 'learning_rate': 1.6065e-05, 'epoch': 8.9} 68%|██████▊ | 6796/10000 [26:40:12<12:22:04, 13.90s/it] 68%|██████▊ | 6797/10000 [26:40:26<12:23:21, 13.93s/it] {'loss': 0.0082, 'learning_rate': 1.606e-05, 'epoch': 8.9} 68%|██████▊ | 6797/10000 [26:40:26<12:23:21, 13.93s/it] 68%|██████▊ | 6798/10000 [26:40:40<12:23:51, 13.94s/it] {'loss': 0.0103, 'learning_rate': 1.6055e-05, 'epoch': 8.9} 68%|██████▊ | 6798/10000 [26:40:40<12:23:51, 13.94s/it] 68%|██████▊ | 6799/10000 [26:40:54<12:23:15, 13.93s/it] {'loss': 0.0099, 'learning_rate': 1.605e-05, 'epoch': 8.9} 68%|██████▊ | 6799/10000 [26:40:54<12:23:15, 13.93s/it] 68%|██████▊ | 6800/10000 [26:41:08<12:22:02, 13.91s/it] {'loss': 0.0096, 'learning_rate': 1.6045000000000003e-05, 'epoch': 8.9} 68%|██████▊ | 6800/10000 [26:41:08<12:22:02, 13.91s/it] 68%|██████▊ | 6801/10000 [26:41:22<12:18:36, 13.85s/it] {'loss': 0.0099, 'learning_rate': 1.604e-05, 'epoch': 8.9} 68%|██████▊ | 6801/10000 [26:41:22<12:18:36, 13.85s/it] 68%|██████▊ | 6802/10000 [26:41:35<12:18:40, 13.86s/it] {'loss': 0.0108, 'learning_rate': 1.6035e-05, 'epoch': 8.9} 68%|██████▊ | 6802/10000 [26:41:35<12:18:40, 13.86s/it] 68%|██████▊ | 6803/10000 [26:41:49<12:17:36, 13.84s/it] {'loss': 0.0097, 'learning_rate': 1.603e-05, 'epoch': 8.9} 68%|██████▊ | 6803/10000 [26:41:49<12:17:36, 13.84s/it] 68%|██████▊ | 6804/10000 [26:42:03<12:17:35, 13.85s/it] {'loss': 0.0066, 'learning_rate': 1.6025e-05, 'epoch': 8.91} 68%|██████▊ | 6804/10000 [26:42:03<12:17:35, 13.85s/it] 68%|██████▊ | 6805/10000 [26:42:17<12:19:02, 13.88s/it] {'loss': 0.0088, 'learning_rate': 1.6020000000000002e-05, 'epoch': 8.91} 68%|██████▊ | 6805/10000 [26:42:17<12:19:02, 13.88s/it] 68%|██████▊ | 6806/10000 [26:42:31<12:17:33, 13.86s/it] {'loss': 0.0084, 'learning_rate': 1.6014999999999998e-05, 'epoch': 8.91} 68%|██████▊ | 6806/10000 [26:42:31<12:17:33, 13.86s/it] 68%|██████▊ | 6807/10000 [26:42:45<12:15:39, 13.82s/it] {'loss': 0.007, 'learning_rate': 1.601e-05, 'epoch': 8.91} 68%|██████▊ | 6807/10000 [26:42:45<12:15:39, 13.82s/it] 68%|██████▊ | 6808/10000 [26:42:59<12:16:39, 13.85s/it] {'loss': 0.0099, 'learning_rate': 1.6005e-05, 'epoch': 8.91} 68%|██████▊ | 6808/10000 [26:42:59<12:16:39, 13.85s/it] 68%|██████▊ | 6809/10000 [26:43:12<12:15:18, 13.83s/it] {'loss': 0.009, 'learning_rate': 1.6000000000000003e-05, 'epoch': 8.91} 68%|██████▊ | 6809/10000 [26:43:12<12:15:18, 13.83s/it] 68%|██████▊ | 6810/10000 [26:43:26<12:15:32, 13.83s/it] {'loss': 0.0078, 'learning_rate': 1.5995000000000002e-05, 'epoch': 8.91} 68%|██████▊ | 6810/10000 [26:43:26<12:15:32, 13.83s/it] 68%|██████▊ | 6811/10000 [26:43:40<12:13:37, 13.80s/it] {'loss': 0.0089, 'learning_rate': 1.599e-05, 'epoch': 8.91} 68%|██████▊ | 6811/10000 [26:43:40<12:13:37, 13.80s/it] 68%|██████▊ | 6812/10000 [26:43:54<12:12:49, 13.79s/it] {'loss': 0.0057, 'learning_rate': 1.5985e-05, 'epoch': 8.92} 68%|██████▊ | 6812/10000 [26:43:54<12:12:49, 13.79s/it] 68%|██████▊ | 6813/10000 [26:44:07<12:12:54, 13.80s/it] {'loss': 0.009, 'learning_rate': 1.598e-05, 'epoch': 8.92} 68%|██████▊ | 6813/10000 [26:44:07<12:12:54, 13.80s/it] 68%|██████▊ | 6814/10000 [26:44:21<12:12:23, 13.79s/it] {'loss': 0.0099, 'learning_rate': 1.5975000000000002e-05, 'epoch': 8.92} 68%|██████▊ | 6814/10000 [26:44:21<12:12:23, 13.79s/it] 68%|██████▊ | 6815/10000 [26:44:35<12:13:10, 13.81s/it] {'loss': 0.0081, 'learning_rate': 1.597e-05, 'epoch': 8.92} 68%|██████▊ | 6815/10000 [26:44:35<12:13:10, 13.81s/it] 68%|██████▊ | 6816/10000 [26:44:49<12:12:10, 13.80s/it] {'loss': 0.0097, 'learning_rate': 1.5965e-05, 'epoch': 8.92} 68%|██████▊ | 6816/10000 [26:44:49<12:12:10, 13.80s/it] 68%|██████▊ | 6817/10000 [26:45:03<12:12:31, 13.81s/it] {'loss': 0.0086, 'learning_rate': 1.596e-05, 'epoch': 8.92} 68%|██████▊ | 6817/10000 [26:45:03<12:12:31, 13.81s/it] 68%|██████▊ | 6818/10000 [26:45:17<12:14:30, 13.85s/it] {'loss': 0.0106, 'learning_rate': 1.5955e-05, 'epoch': 8.92} 68%|██████▊ | 6818/10000 [26:45:17<12:14:30, 13.85s/it] 68%|██████▊ | 6819/10000 [26:45:30<12:14:32, 13.86s/it] {'loss': 0.0067, 'learning_rate': 1.595e-05, 'epoch': 8.93} 68%|██████▊ | 6819/10000 [26:45:31<12:14:32, 13.86s/it] 68%|██████▊ | 6820/10000 [26:45:44<12:14:16, 13.85s/it] {'loss': 0.0078, 'learning_rate': 1.5945e-05, 'epoch': 8.93} 68%|██████▊ | 6820/10000 [26:45:44<12:14:16, 13.85s/it] 68%|██████▊ | 6821/10000 [26:45:58<12:12:28, 13.82s/it] {'loss': 0.0093, 'learning_rate': 1.594e-05, 'epoch': 8.93} 68%|██████▊ | 6821/10000 [26:45:58<12:12:28, 13.82s/it] 68%|██████▊ | 6822/10000 [26:46:12<12:12:02, 13.82s/it] {'loss': 0.0094, 'learning_rate': 1.5935e-05, 'epoch': 8.93} 68%|██████▊ | 6822/10000 [26:46:12<12:12:02, 13.82s/it] 68%|██████▊ | 6823/10000 [26:46:26<12:12:21, 13.83s/it] {'loss': 0.0103, 'learning_rate': 1.593e-05, 'epoch': 8.93} 68%|██████▊ | 6823/10000 [26:46:26<12:12:21, 13.83s/it] 68%|██████▊ | 6824/10000 [26:46:40<12:10:51, 13.81s/it] {'loss': 0.0078, 'learning_rate': 1.5925e-05, 'epoch': 8.93} 68%|██████▊ | 6824/10000 [26:46:40<12:10:51, 13.81s/it] 68%|██████▊ | 6825/10000 [26:46:53<12:12:28, 13.84s/it] {'loss': 0.0086, 'learning_rate': 1.592e-05, 'epoch': 8.93} 68%|██████▊ | 6825/10000 [26:46:53<12:12:28, 13.84s/it] 68%|██████▊ | 6826/10000 [26:47:07<12:11:15, 13.82s/it] {'loss': 0.0113, 'learning_rate': 1.5915000000000003e-05, 'epoch': 8.93} 68%|██████▊ | 6826/10000 [26:47:07<12:11:15, 13.82s/it] 68%|██████▊ | 6827/10000 [26:47:21<12:11:34, 13.83s/it] {'loss': 0.0081, 'learning_rate': 1.591e-05, 'epoch': 8.94} 68%|██████▊ | 6827/10000 [26:47:21<12:11:34, 13.83s/it] 68%|██████▊ | 6828/10000 [26:47:35<12:11:07, 13.83s/it] {'loss': 0.0087, 'learning_rate': 1.5905e-05, 'epoch': 8.94} 68%|██████▊ | 6828/10000 [26:47:35<12:11:07, 13.83s/it] 68%|██████▊ | 6829/10000 [26:47:49<12:10:48, 13.83s/it] {'loss': 0.0077, 'learning_rate': 1.59e-05, 'epoch': 8.94} 68%|██████▊ | 6829/10000 [26:47:49<12:10:48, 13.83s/it] 68%|██████▊ | 6830/10000 [26:48:03<12:12:02, 13.86s/it] {'loss': 0.0089, 'learning_rate': 1.5895000000000003e-05, 'epoch': 8.94} 68%|██████▊ | 6830/10000 [26:48:03<12:12:02, 13.86s/it] 68%|██████▊ | 6831/10000 [26:48:17<12:12:44, 13.87s/it] {'loss': 0.0085, 'learning_rate': 1.5890000000000002e-05, 'epoch': 8.94} 68%|██████▊ | 6831/10000 [26:48:17<12:12:44, 13.87s/it] 68%|██████▊ | 6832/10000 [26:48:30<12:13:13, 13.89s/it] {'loss': 0.009, 'learning_rate': 1.5885e-05, 'epoch': 8.94} 68%|██████▊ | 6832/10000 [26:48:31<12:13:13, 13.89s/it] 68%|██████▊ | 6833/10000 [26:48:44<12:11:36, 13.86s/it] {'loss': 0.0089, 'learning_rate': 1.588e-05, 'epoch': 8.94} 68%|██████▊ | 6833/10000 [26:48:44<12:11:36, 13.86s/it] 68%|██████▊ | 6834/10000 [26:48:58<12:11:02, 13.85s/it] {'loss': 0.0091, 'learning_rate': 1.5875e-05, 'epoch': 8.95} 68%|██████▊ | 6834/10000 [26:48:58<12:11:02, 13.85s/it] 68%|██████▊ | 6835/10000 [26:49:12<12:11:56, 13.88s/it] {'loss': 0.0084, 'learning_rate': 1.5870000000000002e-05, 'epoch': 8.95} 68%|██████▊ | 6835/10000 [26:49:12<12:11:56, 13.88s/it] 68%|██████▊ | 6836/10000 [26:49:26<12:12:10, 13.88s/it] {'loss': 0.0073, 'learning_rate': 1.5865e-05, 'epoch': 8.95} 68%|██████▊ | 6836/10000 [26:49:26<12:12:10, 13.88s/it] 68%|██████▊ | 6837/10000 [26:49:40<12:11:13, 13.87s/it] {'loss': 0.0101, 'learning_rate': 1.586e-05, 'epoch': 8.95} 68%|██████▊ | 6837/10000 [26:49:40<12:11:13, 13.87s/it] 68%|██████▊ | 6838/10000 [26:49:54<12:11:31, 13.88s/it] {'loss': 0.0092, 'learning_rate': 1.5855e-05, 'epoch': 8.95} 68%|██████▊ | 6838/10000 [26:49:54<12:11:31, 13.88s/it] 68%|██████▊ | 6839/10000 [26:50:08<12:10:55, 13.87s/it] {'loss': 0.0095, 'learning_rate': 1.5850000000000002e-05, 'epoch': 8.95} 68%|██████▊ | 6839/10000 [26:50:08<12:10:55, 13.87s/it] 68%|██████▊ | 6840/10000 [26:50:21<12:10:51, 13.88s/it] {'loss': 0.0097, 'learning_rate': 1.5845e-05, 'epoch': 8.95} 68%|██████▊ | 6840/10000 [26:50:21<12:10:51, 13.88s/it] 68%|██████▊ | 6841/10000 [26:50:35<12:10:52, 13.88s/it] {'loss': 0.0118, 'learning_rate': 1.584e-05, 'epoch': 8.95} 68%|██████▊ | 6841/10000 [26:50:35<12:10:52, 13.88s/it] 68%|██████▊ | 6842/10000 [26:50:49<12:07:49, 13.83s/it] {'loss': 0.0094, 'learning_rate': 1.5835e-05, 'epoch': 8.96} 68%|██████▊ | 6842/10000 [26:50:49<12:07:49, 13.83s/it] 68%|██████▊ | 6843/10000 [26:51:03<12:04:48, 13.78s/it] {'loss': 0.0088, 'learning_rate': 1.583e-05, 'epoch': 8.96} 68%|██████▊ | 6843/10000 [26:51:03<12:04:48, 13.78s/it] 68%|██████▊ | 6844/10000 [26:51:16<12:04:21, 13.77s/it] {'loss': 0.0111, 'learning_rate': 1.5825000000000002e-05, 'epoch': 8.96} 68%|██████▊ | 6844/10000 [26:51:16<12:04:21, 13.77s/it] 68%|██████▊ | 6845/10000 [26:51:30<12:03:41, 13.76s/it] {'loss': 0.0074, 'learning_rate': 1.582e-05, 'epoch': 8.96} 68%|██████▊ | 6845/10000 [26:51:30<12:03:41, 13.76s/it] 68%|██████▊ | 6846/10000 [26:51:44<12:04:36, 13.78s/it] {'loss': 0.0093, 'learning_rate': 1.5815000000000004e-05, 'epoch': 8.96} 68%|██████▊ | 6846/10000 [26:51:44<12:04:36, 13.78s/it] 68%|██████▊ | 6847/10000 [26:51:58<12:05:00, 13.80s/it] {'loss': 0.009, 'learning_rate': 1.581e-05, 'epoch': 8.96} 68%|██████▊ | 6847/10000 [26:51:58<12:05:00, 13.80s/it] 68%|██████▊ | 6848/10000 [26:52:12<12:06:04, 13.82s/it] {'loss': 0.0081, 'learning_rate': 1.5805000000000002e-05, 'epoch': 8.96} 68%|██████▊ | 6848/10000 [26:52:12<12:06:04, 13.82s/it] 68%|██████▊ | 6849/10000 [26:52:25<12:04:43, 13.80s/it] {'loss': 0.0078, 'learning_rate': 1.58e-05, 'epoch': 8.96} 68%|██████▊ | 6849/10000 [26:52:26<12:04:43, 13.80s/it] 68%|██████▊ | 6850/10000 [26:52:39<12:06:18, 13.83s/it] {'loss': 0.0095, 'learning_rate': 1.5795e-05, 'epoch': 8.97} 68%|██████▊ | 6850/10000 [26:52:39<12:06:18, 13.83s/it] 69%|██████▊ | 6851/10000 [26:52:53<12:06:17, 13.84s/it] {'loss': 0.0122, 'learning_rate': 1.5790000000000003e-05, 'epoch': 8.97} 69%|██████▊ | 6851/10000 [26:52:53<12:06:17, 13.84s/it] 69%|██████▊ | 6852/10000 [26:53:07<12:06:53, 13.85s/it] {'loss': 0.0101, 'learning_rate': 1.5785e-05, 'epoch': 8.97} 69%|██████▊ | 6852/10000 [26:53:07<12:06:53, 13.85s/it] 69%|██████▊ | 6853/10000 [26:53:21<12:06:43, 13.86s/it] {'loss': 0.014, 'learning_rate': 1.578e-05, 'epoch': 8.97} 69%|██████▊ | 6853/10000 [26:53:21<12:06:43, 13.86s/it] 69%|██████▊ | 6854/10000 [26:53:35<12:05:33, 13.84s/it] {'loss': 0.009, 'learning_rate': 1.5775e-05, 'epoch': 8.97} 69%|██████▊ | 6854/10000 [26:53:35<12:05:33, 13.84s/it] 69%|██████▊ | 6855/10000 [26:53:49<12:05:57, 13.85s/it] {'loss': 0.0081, 'learning_rate': 1.577e-05, 'epoch': 8.97} 69%|██████▊ | 6855/10000 [26:53:49<12:05:57, 13.85s/it] 69%|██████▊ | 6856/10000 [26:54:02<12:04:31, 13.83s/it] {'loss': 0.0078, 'learning_rate': 1.5765000000000002e-05, 'epoch': 8.97} 69%|██████▊ | 6856/10000 [26:54:02<12:04:31, 13.83s/it] 69%|██████▊ | 6857/10000 [26:54:16<12:02:47, 13.80s/it] {'loss': 0.0077, 'learning_rate': 1.5759999999999998e-05, 'epoch': 8.98} 69%|██████▊ | 6857/10000 [26:54:16<12:02:47, 13.80s/it] 69%|██████▊ | 6858/10000 [26:54:30<12:03:43, 13.82s/it] {'loss': 0.0086, 'learning_rate': 1.5755e-05, 'epoch': 8.98} 69%|██████▊ | 6858/10000 [26:54:30<12:03:43, 13.82s/it] 69%|██████▊ | 6859/10000 [26:54:44<12:05:38, 13.86s/it] {'loss': 0.0081, 'learning_rate': 1.575e-05, 'epoch': 8.98} 69%|██████▊ | 6859/10000 [26:54:44<12:05:38, 13.86s/it] 69%|██████▊ | 6860/10000 [26:54:58<12:05:33, 13.86s/it] {'loss': 0.0091, 'learning_rate': 1.5745000000000003e-05, 'epoch': 8.98} 69%|██████▊ | 6860/10000 [26:54:58<12:05:33, 13.86s/it] 69%|██████▊ | 6861/10000 [26:55:12<12:05:45, 13.87s/it] {'loss': 0.0089, 'learning_rate': 1.5740000000000002e-05, 'epoch': 8.98} 69%|██████▊ | 6861/10000 [26:55:12<12:05:45, 13.87s/it] 69%|██████▊ | 6862/10000 [26:55:26<12:04:11, 13.85s/it] {'loss': 0.0102, 'learning_rate': 1.5735e-05, 'epoch': 8.98} 69%|██████▊ | 6862/10000 [26:55:26<12:04:11, 13.85s/it] 69%|██████▊ | 6863/10000 [26:55:40<12:06:07, 13.89s/it] {'loss': 0.0105, 'learning_rate': 1.573e-05, 'epoch': 8.98} 69%|██████▊ | 6863/10000 [26:55:40<12:06:07, 13.89s/it] 69%|██████▊ | 6864/10000 [26:55:53<12:04:27, 13.86s/it] {'loss': 0.0092, 'learning_rate': 1.5725e-05, 'epoch': 8.98} 69%|██████▊ | 6864/10000 [26:55:53<12:04:27, 13.86s/it] 69%|██████▊ | 6865/10000 [26:56:07<12:04:53, 13.87s/it] {'loss': 0.0437, 'learning_rate': 1.5720000000000002e-05, 'epoch': 8.99} 69%|██████▊ | 6865/10000 [26:56:07<12:04:53, 13.87s/it] 69%|██████▊ | 6866/10000 [26:56:21<12:03:34, 13.85s/it] {'loss': 0.0112, 'learning_rate': 1.5715e-05, 'epoch': 8.99} 69%|██████▊ | 6866/10000 [26:56:21<12:03:34, 13.85s/it] 69%|██████▊ | 6867/10000 [26:56:35<12:02:50, 13.84s/it] {'loss': 0.009, 'learning_rate': 1.571e-05, 'epoch': 8.99} 69%|██████▊ | 6867/10000 [26:56:35<12:02:50, 13.84s/it] 69%|██████▊ | 6868/10000 [26:56:49<12:02:18, 13.84s/it] {'loss': 0.0084, 'learning_rate': 1.5705e-05, 'epoch': 8.99} 69%|██████▊ | 6868/10000 [26:56:49<12:02:18, 13.84s/it] 69%|██████▊ | 6869/10000 [26:57:03<12:03:25, 13.86s/it] {'loss': 0.0084, 'learning_rate': 1.5700000000000002e-05, 'epoch': 8.99} 69%|██████▊ | 6869/10000 [26:57:03<12:03:25, 13.86s/it] 69%|██████▊ | 6870/10000 [26:57:17<12:04:23, 13.89s/it] {'loss': 0.0096, 'learning_rate': 1.5695e-05, 'epoch': 8.99} 69%|██████▊ | 6870/10000 [26:57:17<12:04:23, 13.89s/it] 69%|██████▊ | 6871/10000 [26:57:31<12:06:03, 13.92s/it] {'loss': 0.0075, 'learning_rate': 1.569e-05, 'epoch': 8.99} 69%|██████▊ | 6871/10000 [26:57:31<12:06:03, 13.92s/it] 69%|██████▊ | 6872/10000 [26:57:44<12:06:14, 13.93s/it] {'loss': 0.0103, 'learning_rate': 1.5685e-05, 'epoch': 8.99} 69%|██████▊ | 6872/10000 [26:57:45<12:06:14, 13.93s/it] 69%|██████▊ | 6873/10000 [26:57:58<12:03:57, 13.89s/it] {'loss': 0.0095, 'learning_rate': 1.568e-05, 'epoch': 9.0} 69%|██████▊ | 6873/10000 [26:57:58<12:03:57, 13.89s/it] 69%|██████▊ | 6874/10000 [26:58:12<12:03:14, 13.88s/it] {'loss': 0.0132, 'learning_rate': 1.5675e-05, 'epoch': 9.0} 69%|██████▊ | 6874/10000 [26:58:12<12:03:14, 13.88s/it] 69%|██████▉ | 6875/10000 [26:58:26<12:02:25, 13.87s/it] {'loss': 0.0093, 'learning_rate': 1.567e-05, 'epoch': 9.0} 69%|██████▉ | 6875/10000 [26:58:26<12:02:25, 13.87s/it] 69%|██████▉ | 6876/10000 [26:58:39<11:41:46, 13.48s/it] {'loss': 0.009, 'learning_rate': 1.5665000000000003e-05, 'epoch': 9.0} 69%|██████▉ | 6876/10000 [26:58:39<11:41:46, 13.48s/it] 69%|██████▉ | 6877/10000 [26:58:52<11:47:32, 13.59s/it] {'loss': 0.0093, 'learning_rate': 1.566e-05, 'epoch': 9.0} 69%|██████▉ | 6877/10000 [26:58:52<11:47:32, 13.59s/it] 69%|██████▉ | 6878/10000 [26:59:06<11:51:52, 13.68s/it] {'loss': 0.0061, 'learning_rate': 1.5655000000000002e-05, 'epoch': 9.0} 69%|██████▉ | 6878/10000 [26:59:06<11:51:52, 13.68s/it] 69%|██████▉ | 6879/10000 [26:59:20<11:54:20, 13.73s/it] {'loss': 0.0057, 'learning_rate': 1.565e-05, 'epoch': 9.0} 69%|██████▉ | 6879/10000 [26:59:20<11:54:20, 13.73s/it] 69%|██████▉ | 6880/10000 [26:59:34<11:55:57, 13.77s/it] {'loss': 0.0054, 'learning_rate': 1.5645e-05, 'epoch': 9.01} 69%|██████▉ | 6880/10000 [26:59:34<11:55:57, 13.77s/it] 69%|██████▉ | 6881/10000 [26:59:48<11:57:13, 13.80s/it] {'loss': 0.0058, 'learning_rate': 1.5640000000000003e-05, 'epoch': 9.01} 69%|██████▉ | 6881/10000 [26:59:48<11:57:13, 13.80s/it] 69%|██████▉ | 6882/10000 [27:00:02<12:00:00, 13.86s/it] {'loss': 0.0064, 'learning_rate': 1.5635e-05, 'epoch': 9.01} 69%|██████▉ | 6882/10000 [27:00:02<12:00:00, 13.86s/it] 69%|██████▉ | 6883/10000 [27:00:16<11:58:55, 13.84s/it] {'loss': 0.0055, 'learning_rate': 1.563e-05, 'epoch': 9.01} 69%|██████▉ | 6883/10000 [27:00:16<11:58:55, 13.84s/it] 69%|██████▉ | 6884/10000 [27:00:30<12:01:23, 13.89s/it] {'loss': 0.0054, 'learning_rate': 1.5625e-05, 'epoch': 9.01} 69%|██████▉ | 6884/10000 [27:00:30<12:01:23, 13.89s/it] 69%|██████▉ | 6885/10000 [27:00:43<11:59:59, 13.87s/it] {'loss': 0.0048, 'learning_rate': 1.5620000000000003e-05, 'epoch': 9.01} 69%|██████▉ | 6885/10000 [27:00:44<11:59:59, 13.87s/it] 69%|██████▉ | 6886/10000 [27:00:57<12:01:14, 13.90s/it] {'loss': 0.0057, 'learning_rate': 1.5615000000000002e-05, 'epoch': 9.01} 69%|██████▉ | 6886/10000 [27:00:58<12:01:14, 13.90s/it] 69%|██████▉ | 6887/10000 [27:01:11<12:01:38, 13.91s/it] {'loss': 0.0052, 'learning_rate': 1.561e-05, 'epoch': 9.01} 69%|██████▉ | 6887/10000 [27:01:11<12:01:38, 13.91s/it] 69%|██████▉ | 6888/10000 [27:01:25<12:01:10, 13.90s/it] {'loss': 0.0056, 'learning_rate': 1.5605e-05, 'epoch': 9.02} 69%|██████▉ | 6888/10000 [27:01:25<12:01:10, 13.90s/it] 69%|██████▉ | 6889/10000 [27:01:39<12:00:17, 13.89s/it] {'loss': 0.0055, 'learning_rate': 1.56e-05, 'epoch': 9.02} 69%|██████▉ | 6889/10000 [27:01:39<12:00:17, 13.89s/it] 69%|██████▉ | 6890/10000 [27:01:53<11:59:37, 13.88s/it] {'loss': 0.0068, 'learning_rate': 1.5595000000000002e-05, 'epoch': 9.02} 69%|██████▉ | 6890/10000 [27:01:53<11:59:37, 13.88s/it] 69%|██████▉ | 6891/10000 [27:02:07<11:59:24, 13.88s/it] {'loss': 0.0058, 'learning_rate': 1.559e-05, 'epoch': 9.02} 69%|██████▉ | 6891/10000 [27:02:07<11:59:24, 13.88s/it] 69%|██████▉ | 6892/10000 [27:02:21<11:57:32, 13.85s/it] {'loss': 0.0073, 'learning_rate': 1.5585e-05, 'epoch': 9.02} 69%|██████▉ | 6892/10000 [27:02:21<11:57:32, 13.85s/it] 69%|██████▉ | 6893/10000 [27:02:35<11:57:42, 13.86s/it] {'loss': 0.0072, 'learning_rate': 1.558e-05, 'epoch': 9.02} 69%|██████▉ | 6893/10000 [27:02:35<11:57:42, 13.86s/it] 69%|██████▉ | 6894/10000 [27:02:49<11:58:49, 13.89s/it] {'loss': 0.0074, 'learning_rate': 1.5575e-05, 'epoch': 9.02} 69%|██████▉ | 6894/10000 [27:02:49<11:58:49, 13.89s/it] 69%|██████▉ | 6895/10000 [27:03:02<11:58:00, 13.87s/it] {'loss': 0.0055, 'learning_rate': 1.5570000000000002e-05, 'epoch': 9.02} 69%|██████▉ | 6895/10000 [27:03:02<11:58:00, 13.87s/it] 69%|██████▉ | 6896/10000 [27:03:16<11:59:12, 13.90s/it] {'loss': 0.0049, 'learning_rate': 1.5565e-05, 'epoch': 9.03} 69%|██████▉ | 6896/10000 [27:03:16<11:59:12, 13.90s/it] 69%|██████▉ | 6897/10000 [27:03:30<11:57:39, 13.88s/it] {'loss': 0.0052, 'learning_rate': 1.556e-05, 'epoch': 9.03} 69%|██████▉ | 6897/10000 [27:03:30<11:57:39, 13.88s/it] 69%|██████▉ | 6898/10000 [27:03:44<11:56:28, 13.86s/it] {'loss': 0.0053, 'learning_rate': 1.5555e-05, 'epoch': 9.03} 69%|██████▉ | 6898/10000 [27:03:44<11:56:28, 13.86s/it] 69%|██████▉ | 6899/10000 [27:03:58<11:53:47, 13.81s/it] {'loss': 0.0075, 'learning_rate': 1.5550000000000002e-05, 'epoch': 9.03} 69%|██████▉ | 6899/10000 [27:03:58<11:53:47, 13.81s/it] 69%|██████▉ | 6900/10000 [27:04:11<11:52:39, 13.79s/it] {'loss': 0.0059, 'learning_rate': 1.5545e-05, 'epoch': 9.03} 69%|██████▉ | 6900/10000 [27:04:11<11:52:39, 13.79s/it] 69%|██████▉ | 6901/10000 [27:04:25<11:55:52, 13.86s/it] {'loss': 0.0049, 'learning_rate': 1.554e-05, 'epoch': 9.03} 69%|██████▉ | 6901/10000 [27:04:25<11:55:52, 13.86s/it] 69%|██████▉ | 6902/10000 [27:04:39<11:54:49, 13.84s/it] {'loss': 0.0064, 'learning_rate': 1.5535e-05, 'epoch': 9.03} 69%|██████▉ | 6902/10000 [27:04:39<11:54:49, 13.84s/it] 69%|██████▉ | 6903/10000 [27:04:53<11:57:15, 13.90s/it] {'loss': 0.005, 'learning_rate': 1.553e-05, 'epoch': 9.04} 69%|██████▉ | 6903/10000 [27:04:53<11:57:15, 13.90s/it] 69%|██████▉ | 6904/10000 [27:05:07<11:54:07, 13.84s/it] {'loss': 0.0047, 'learning_rate': 1.5525e-05, 'epoch': 9.04} 69%|██████▉ | 6904/10000 [27:05:07<11:54:07, 13.84s/it] 69%|██████▉ | 6905/10000 [27:05:21<11:54:38, 13.85s/it] {'loss': 0.0048, 'learning_rate': 1.552e-05, 'epoch': 9.04} 69%|██████▉ | 6905/10000 [27:05:21<11:54:38, 13.85s/it] 69%|██████▉ | 6906/10000 [27:05:35<11:54:15, 13.85s/it] {'loss': 0.0054, 'learning_rate': 1.5515000000000003e-05, 'epoch': 9.04} 69%|██████▉ | 6906/10000 [27:05:35<11:54:15, 13.85s/it] 69%|██████▉ | 6907/10000 [27:05:49<11:53:36, 13.84s/it] {'loss': 0.0053, 'learning_rate': 1.551e-05, 'epoch': 9.04} 69%|██████▉ | 6907/10000 [27:05:49<11:53:36, 13.84s/it] 69%|██████▉ | 6908/10000 [27:06:02<11:51:45, 13.81s/it] {'loss': 0.0053, 'learning_rate': 1.5505e-05, 'epoch': 9.04} 69%|██████▉ | 6908/10000 [27:06:02<11:51:45, 13.81s/it] 69%|██████▉ | 6909/10000 [27:06:16<11:52:28, 13.83s/it] {'loss': 0.0047, 'learning_rate': 1.55e-05, 'epoch': 9.04} 69%|██████▉ | 6909/10000 [27:06:16<11:52:28, 13.83s/it] 69%|██████▉ | 6910/10000 [27:06:30<11:53:10, 13.85s/it] {'loss': 0.005, 'learning_rate': 1.5495e-05, 'epoch': 9.04} 69%|██████▉ | 6910/10000 [27:06:30<11:53:10, 13.85s/it] 69%|██████▉ | 6911/10000 [27:06:44<11:55:43, 13.90s/it] {'loss': 0.0056, 'learning_rate': 1.5490000000000002e-05, 'epoch': 9.05} 69%|██████▉ | 6911/10000 [27:06:44<11:55:43, 13.90s/it] 69%|██████▉ | 6912/10000 [27:06:58<11:55:11, 13.90s/it] {'loss': 0.0075, 'learning_rate': 1.5484999999999998e-05, 'epoch': 9.05} 69%|██████▉ | 6912/10000 [27:06:58<11:55:11, 13.90s/it] 69%|██████▉ | 6913/10000 [27:07:12<11:53:06, 13.86s/it] {'loss': 0.0068, 'learning_rate': 1.548e-05, 'epoch': 9.05} 69%|██████▉ | 6913/10000 [27:07:12<11:53:06, 13.86s/it] 69%|██████▉ | 6914/10000 [27:07:25<11:51:07, 13.83s/it] {'loss': 0.0046, 'learning_rate': 1.5475e-05, 'epoch': 9.05} 69%|██████▉ | 6914/10000 [27:07:25<11:51:07, 13.83s/it] 69%|██████▉ | 6915/10000 [27:07:39<11:51:41, 13.84s/it] {'loss': 0.0069, 'learning_rate': 1.5470000000000003e-05, 'epoch': 9.05} 69%|██████▉ | 6915/10000 [27:07:39<11:51:41, 13.84s/it] 69%|██████▉ | 6916/10000 [27:07:53<11:51:04, 13.83s/it] {'loss': 0.0071, 'learning_rate': 1.5465000000000002e-05, 'epoch': 9.05} 69%|██████▉ | 6916/10000 [27:07:53<11:51:04, 13.83s/it] 69%|██████▉ | 6917/10000 [27:08:07<11:51:24, 13.85s/it] {'loss': 0.0054, 'learning_rate': 1.546e-05, 'epoch': 9.05} 69%|██████▉ | 6917/10000 [27:08:07<11:51:24, 13.85s/it] 69%|██████▉ | 6918/10000 [27:08:21<11:51:13, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.5455e-05, 'epoch': 9.05} 69%|██████▉ | 6918/10000 [27:08:21<11:51:13, 13.85s/it] 69%|██████▉ | 6919/10000 [27:08:35<11:51:42, 13.86s/it] {'loss': 0.0069, 'learning_rate': 1.545e-05, 'epoch': 9.06} 69%|██████▉ | 6919/10000 [27:08:35<11:51:42, 13.86s/it] 69%|██████▉ | 6920/10000 [27:08:49<11:51:24, 13.86s/it] {'loss': 0.0069, 'learning_rate': 1.5445000000000002e-05, 'epoch': 9.06} 69%|██████▉ | 6920/10000 [27:08:49<11:51:24, 13.86s/it] 69%|██████▉ | 6921/10000 [27:09:02<11:50:05, 13.84s/it] {'loss': 0.0047, 'learning_rate': 1.544e-05, 'epoch': 9.06} 69%|██████▉ | 6921/10000 [27:09:02<11:50:05, 13.84s/it] 69%|██████▉ | 6922/10000 [27:09:16<11:50:11, 13.84s/it] {'loss': 0.0055, 'learning_rate': 1.5435e-05, 'epoch': 9.06} 69%|██████▉ | 6922/10000 [27:09:16<11:50:11, 13.84s/it] 69%|██████▉ | 6923/10000 [27:09:30<11:48:33, 13.82s/it] {'loss': 0.0051, 'learning_rate': 1.543e-05, 'epoch': 9.06} 69%|██████▉ | 6923/10000 [27:09:30<11:48:33, 13.82s/it] 69%|██████▉ | 6924/10000 [27:09:44<11:49:02, 13.83s/it] {'loss': 0.0067, 'learning_rate': 1.5425000000000002e-05, 'epoch': 9.06} 69%|██████▉ | 6924/10000 [27:09:44<11:49:02, 13.83s/it] 69%|██████▉ | 6925/10000 [27:09:58<11:49:54, 13.85s/it] {'loss': 0.0058, 'learning_rate': 1.542e-05, 'epoch': 9.06} 69%|██████▉ | 6925/10000 [27:09:58<11:49:54, 13.85s/it] 69%|██████▉ | 6926/10000 [27:10:12<11:48:30, 13.83s/it] {'loss': 0.0047, 'learning_rate': 1.5415e-05, 'epoch': 9.07} 69%|██████▉ | 6926/10000 [27:10:12<11:48:30, 13.83s/it] 69%|██████▉ | 6927/10000 [27:10:25<11:47:49, 13.82s/it] {'loss': 0.0058, 'learning_rate': 1.541e-05, 'epoch': 9.07} 69%|██████▉ | 6927/10000 [27:10:25<11:47:49, 13.82s/it] 69%|██████▉ | 6928/10000 [27:10:39<11:47:09, 13.81s/it] {'loss': 0.0079, 'learning_rate': 1.5405e-05, 'epoch': 9.07} 69%|██████▉ | 6928/10000 [27:10:39<11:47:09, 13.81s/it] 69%|██████▉ | 6929/10000 [27:10:53<11:46:56, 13.81s/it] {'loss': 0.0043, 'learning_rate': 1.54e-05, 'epoch': 9.07} 69%|██████▉ | 6929/10000 [27:10:53<11:46:56, 13.81s/it] 69%|██████▉ | 6930/10000 [27:11:07<11:46:35, 13.81s/it] {'loss': 0.0045, 'learning_rate': 1.5395e-05, 'epoch': 9.07} 69%|██████▉ | 6930/10000 [27:11:07<11:46:35, 13.81s/it] 69%|██████▉ | 6931/10000 [27:11:21<11:46:45, 13.82s/it] {'loss': 0.0078, 'learning_rate': 1.539e-05, 'epoch': 9.07} 69%|██████▉ | 6931/10000 [27:11:21<11:46:45, 13.82s/it] 69%|██████▉ | 6932/10000 [27:11:35<11:48:15, 13.85s/it] {'loss': 0.0047, 'learning_rate': 1.5385e-05, 'epoch': 9.07} 69%|██████▉ | 6932/10000 [27:11:35<11:48:15, 13.85s/it] 69%|██████▉ | 6933/10000 [27:11:48<11:46:21, 13.82s/it] {'loss': 0.0066, 'learning_rate': 1.538e-05, 'epoch': 9.07} 69%|██████▉ | 6933/10000 [27:11:48<11:46:21, 13.82s/it] 69%|██████▉ | 6934/10000 [27:12:02<11:44:29, 13.79s/it] {'loss': 0.0049, 'learning_rate': 1.5375e-05, 'epoch': 9.08} 69%|██████▉ | 6934/10000 [27:12:02<11:44:29, 13.79s/it] 69%|██████▉ | 6935/10000 [27:12:16<11:45:19, 13.81s/it] {'loss': 0.0048, 'learning_rate': 1.537e-05, 'epoch': 9.08} 69%|██████▉ | 6935/10000 [27:12:16<11:45:19, 13.81s/it] 69%|██████▉ | 6936/10000 [27:12:30<11:44:07, 13.79s/it] {'loss': 0.0074, 'learning_rate': 1.5365000000000003e-05, 'epoch': 9.08} 69%|██████▉ | 6936/10000 [27:12:30<11:44:07, 13.79s/it] 69%|██████▉ | 6937/10000 [27:12:43<11:44:00, 13.79s/it] {'loss': 0.0061, 'learning_rate': 1.536e-05, 'epoch': 9.08} 69%|██████▉ | 6937/10000 [27:12:43<11:44:00, 13.79s/it] 69%|██████▉ | 6938/10000 [27:12:57<11:45:57, 13.83s/it] {'loss': 0.0046, 'learning_rate': 1.5355e-05, 'epoch': 9.08} 69%|██████▉ | 6938/10000 [27:12:57<11:45:57, 13.83s/it] 69%|██████▉ | 6939/10000 [27:13:11<11:42:38, 13.77s/it] {'loss': 0.0041, 'learning_rate': 1.535e-05, 'epoch': 9.08} 69%|██████▉ | 6939/10000 [27:13:11<11:42:38, 13.77s/it] 69%|██████▉ | 6940/10000 [27:13:25<11:42:49, 13.78s/it] {'loss': 0.0059, 'learning_rate': 1.5345e-05, 'epoch': 9.08} 69%|██████▉ | 6940/10000 [27:13:25<11:42:49, 13.78s/it] 69%|██████▉ | 6941/10000 [27:13:39<11:43:26, 13.80s/it] {'loss': 0.0052, 'learning_rate': 1.5340000000000002e-05, 'epoch': 9.09} 69%|██████▉ | 6941/10000 [27:13:39<11:43:26, 13.80s/it] 69%|██████▉ | 6942/10000 [27:13:52<11:43:58, 13.81s/it] {'loss': 0.006, 'learning_rate': 1.5334999999999998e-05, 'epoch': 9.09} 69%|██████▉ | 6942/10000 [27:13:52<11:43:58, 13.81s/it] 69%|██████▉ | 6943/10000 [27:14:06<11:44:10, 13.82s/it] {'loss': 0.0074, 'learning_rate': 1.533e-05, 'epoch': 9.09} 69%|██████▉ | 6943/10000 [27:14:06<11:44:10, 13.82s/it] 69%|██████▉ | 6944/10000 [27:14:20<11:43:21, 13.81s/it] {'loss': 0.0059, 'learning_rate': 1.5325e-05, 'epoch': 9.09} 69%|██████▉ | 6944/10000 [27:14:20<11:43:21, 13.81s/it] 69%|██████▉ | 6945/10000 [27:14:34<11:43:08, 13.81s/it] {'loss': 0.0048, 'learning_rate': 1.5320000000000002e-05, 'epoch': 9.09} 69%|██████▉ | 6945/10000 [27:14:34<11:43:08, 13.81s/it] 69%|██████▉ | 6946/10000 [27:14:48<11:42:46, 13.81s/it] {'loss': 0.0057, 'learning_rate': 1.5315e-05, 'epoch': 9.09} 69%|██████▉ | 6946/10000 [27:14:48<11:42:46, 13.81s/it] 69%|██████▉ | 6947/10000 [27:15:01<11:42:06, 13.80s/it] {'loss': 0.0055, 'learning_rate': 1.531e-05, 'epoch': 9.09} 69%|██████▉ | 6947/10000 [27:15:01<11:42:06, 13.80s/it] 69%|██████▉ | 6948/10000 [27:15:15<11:42:01, 13.80s/it] {'loss': 0.0038, 'learning_rate': 1.5305e-05, 'epoch': 9.09} 69%|██████▉ | 6948/10000 [27:15:15<11:42:01, 13.80s/it] 69%|██████▉ | 6949/10000 [27:15:29<11:44:34, 13.86s/it] {'loss': 0.0056, 'learning_rate': 1.53e-05, 'epoch': 9.1} 69%|██████▉ | 6949/10000 [27:15:29<11:44:34, 13.86s/it] 70%|██████▉ | 6950/10000 [27:15:43<11:44:14, 13.85s/it] {'loss': 0.0036, 'learning_rate': 1.5295000000000002e-05, 'epoch': 9.1} 70%|██████▉ | 6950/10000 [27:15:43<11:44:14, 13.85s/it] 70%|██████▉ | 6951/10000 [27:15:57<11:43:58, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.529e-05, 'epoch': 9.1} 70%|██████▉ | 6951/10000 [27:15:57<11:43:58, 13.85s/it] 70%|██████▉ | 6952/10000 [27:16:11<11:44:19, 13.86s/it] {'loss': 0.0052, 'learning_rate': 1.5285000000000004e-05, 'epoch': 9.1} 70%|██████▉ | 6952/10000 [27:16:11<11:44:19, 13.86s/it] 70%|██████▉ | 6953/10000 [27:16:25<11:44:52, 13.88s/it] {'loss': 0.0049, 'learning_rate': 1.528e-05, 'epoch': 9.1} 70%|██████▉ | 6953/10000 [27:16:25<11:44:52, 13.88s/it] 70%|██████▉ | 6954/10000 [27:16:38<11:42:46, 13.84s/it] {'loss': 0.0044, 'learning_rate': 1.5275000000000002e-05, 'epoch': 9.1} 70%|██████▉ | 6954/10000 [27:16:39<11:42:46, 13.84s/it] 70%|██████▉ | 6955/10000 [27:16:53<11:45:16, 13.90s/it] {'loss': 0.0059, 'learning_rate': 1.527e-05, 'epoch': 9.1} 70%|██████▉ | 6955/10000 [27:16:53<11:45:16, 13.90s/it] 70%|██████▉ | 6956/10000 [27:17:06<11:44:14, 13.88s/it] {'loss': 0.0038, 'learning_rate': 1.5265e-05, 'epoch': 9.1} 70%|██████▉ | 6956/10000 [27:17:06<11:44:14, 13.88s/it] 70%|██████▉ | 6957/10000 [27:17:20<11:45:00, 13.90s/it] {'loss': 0.006, 'learning_rate': 1.5260000000000003e-05, 'epoch': 9.11} 70%|██████▉ | 6957/10000 [27:17:20<11:45:00, 13.90s/it] 70%|██████▉ | 6958/10000 [27:17:34<11:43:33, 13.88s/it] {'loss': 0.0049, 'learning_rate': 1.5255e-05, 'epoch': 9.11} 70%|██████▉ | 6958/10000 [27:17:34<11:43:33, 13.88s/it] 70%|██████▉ | 6959/10000 [27:17:48<11:42:01, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.525e-05, 'epoch': 9.11} 70%|██████▉ | 6959/10000 [27:17:48<11:42:01, 13.85s/it] 70%|██████▉ | 6960/10000 [27:18:02<11:39:57, 13.81s/it] {'loss': 0.0068, 'learning_rate': 1.5245e-05, 'epoch': 9.11} 70%|██████▉ | 6960/10000 [27:18:02<11:39:57, 13.81s/it] 70%|██████▉ | 6961/10000 [27:18:16<11:41:56, 13.86s/it] {'loss': 0.0064, 'learning_rate': 1.5240000000000001e-05, 'epoch': 9.11} 70%|██████▉ | 6961/10000 [27:18:16<11:41:56, 13.86s/it] 70%|██████▉ | 6962/10000 [27:18:29<11:41:08, 13.85s/it] {'loss': 0.0082, 'learning_rate': 1.5235000000000002e-05, 'epoch': 9.11} 70%|██████▉ | 6962/10000 [27:18:29<11:41:08, 13.85s/it] 70%|██████▉ | 6963/10000 [27:18:43<11:41:38, 13.86s/it] {'loss': 0.0036, 'learning_rate': 1.523e-05, 'epoch': 9.11} 70%|██████▉ | 6963/10000 [27:18:43<11:41:38, 13.86s/it] 70%|██████▉ | 6964/10000 [27:18:57<11:40:35, 13.85s/it] {'loss': 0.0079, 'learning_rate': 1.5225e-05, 'epoch': 9.12} 70%|██████▉ | 6964/10000 [27:18:57<11:40:35, 13.85s/it] 70%|██████▉ | 6965/10000 [27:19:11<11:39:18, 13.82s/it] {'loss': 0.0041, 'learning_rate': 1.5220000000000002e-05, 'epoch': 9.12} 70%|██████▉ | 6965/10000 [27:19:11<11:39:18, 13.82s/it] 70%|██████▉ | 6966/10000 [27:19:25<11:38:41, 13.82s/it] {'loss': 0.0051, 'learning_rate': 1.5215000000000001e-05, 'epoch': 9.12} 70%|██████▉ | 6966/10000 [27:19:25<11:38:41, 13.82s/it] 70%|██████▉ | 6967/10000 [27:19:38<11:37:25, 13.80s/it] {'loss': 0.0057, 'learning_rate': 1.5210000000000002e-05, 'epoch': 9.12} 70%|██████▉ | 6967/10000 [27:19:38<11:37:25, 13.80s/it] 70%|██████▉ | 6968/10000 [27:19:52<11:36:30, 13.78s/it] {'loss': 0.0066, 'learning_rate': 1.5205e-05, 'epoch': 9.12} 70%|██████▉ | 6968/10000 [27:19:52<11:36:30, 13.78s/it] 70%|██████▉ | 6969/10000 [27:20:06<11:37:47, 13.81s/it] {'loss': 0.0051, 'learning_rate': 1.52e-05, 'epoch': 9.12} 70%|██████▉ | 6969/10000 [27:20:06<11:37:47, 13.81s/it] 70%|██████▉ | 6970/10000 [27:20:20<11:37:11, 13.81s/it] {'loss': 0.0054, 'learning_rate': 1.5195000000000001e-05, 'epoch': 9.12} 70%|██████▉ | 6970/10000 [27:20:20<11:37:11, 13.81s/it] 70%|██████▉ | 6971/10000 [27:20:34<11:37:14, 13.81s/it] {'loss': 0.0039, 'learning_rate': 1.5190000000000002e-05, 'epoch': 9.12} 70%|██████▉ | 6971/10000 [27:20:34<11:37:14, 13.81s/it] 70%|██████▉ | 6972/10000 [27:20:48<11:37:36, 13.82s/it] {'loss': 0.0072, 'learning_rate': 1.5185000000000003e-05, 'epoch': 9.13} 70%|██████▉ | 6972/10000 [27:20:48<11:37:36, 13.82s/it] 70%|██████▉ | 6973/10000 [27:21:01<11:37:02, 13.82s/it] {'loss': 0.0052, 'learning_rate': 1.518e-05, 'epoch': 9.13} 70%|██████▉ | 6973/10000 [27:21:01<11:37:02, 13.82s/it] 70%|██████▉ | 6974/10000 [27:21:15<11:36:30, 13.81s/it] {'loss': 0.0044, 'learning_rate': 1.5175e-05, 'epoch': 9.13} 70%|██████▉ | 6974/10000 [27:21:15<11:36:30, 13.81s/it] 70%|██████▉ | 6975/10000 [27:21:29<11:37:23, 13.83s/it] {'loss': 0.0045, 'learning_rate': 1.517e-05, 'epoch': 9.13} 70%|██████▉ | 6975/10000 [27:21:29<11:37:23, 13.83s/it] 70%|██████▉ | 6976/10000 [27:21:43<11:37:53, 13.85s/it] {'loss': 0.0073, 'learning_rate': 1.5165000000000001e-05, 'epoch': 9.13} 70%|██████▉ | 6976/10000 [27:21:43<11:37:53, 13.85s/it] 70%|██████▉ | 6977/10000 [27:21:57<11:37:28, 13.84s/it] {'loss': 0.0055, 'learning_rate': 1.5160000000000002e-05, 'epoch': 9.13} 70%|██████▉ | 6977/10000 [27:21:57<11:37:28, 13.84s/it] 70%|██████▉ | 6978/10000 [27:22:11<11:37:48, 13.85s/it] {'loss': 0.0064, 'learning_rate': 1.5155e-05, 'epoch': 9.13} 70%|██████▉ | 6978/10000 [27:22:11<11:37:48, 13.85s/it] 70%|██████▉ | 6979/10000 [27:22:24<11:36:44, 13.84s/it] {'loss': 0.006, 'learning_rate': 1.515e-05, 'epoch': 9.13} 70%|██████▉ | 6979/10000 [27:22:24<11:36:44, 13.84s/it] 70%|██████▉ | 6980/10000 [27:22:38<11:37:34, 13.86s/it] {'loss': 0.0038, 'learning_rate': 1.5145000000000002e-05, 'epoch': 9.14} 70%|██████▉ | 6980/10000 [27:22:38<11:37:34, 13.86s/it] 70%|██████▉ | 6981/10000 [27:22:52<11:35:57, 13.83s/it] {'loss': 0.0052, 'learning_rate': 1.514e-05, 'epoch': 9.14} 70%|██████▉ | 6981/10000 [27:22:52<11:35:57, 13.83s/it] 70%|██████▉ | 6982/10000 [27:23:06<11:35:58, 13.84s/it] {'loss': 0.0059, 'learning_rate': 1.5135000000000002e-05, 'epoch': 9.14} 70%|██████▉ | 6982/10000 [27:23:06<11:35:58, 13.84s/it] 70%|██████▉ | 6983/10000 [27:23:20<11:34:35, 13.81s/it] {'loss': 0.0057, 'learning_rate': 1.5129999999999999e-05, 'epoch': 9.14} 70%|██████▉ | 6983/10000 [27:23:20<11:34:35, 13.81s/it] 70%|██████▉ | 6984/10000 [27:23:33<11:33:49, 13.80s/it] {'loss': 0.0072, 'learning_rate': 1.5125e-05, 'epoch': 9.14} 70%|██████▉ | 6984/10000 [27:23:34<11:33:49, 13.80s/it] 70%|██████▉ | 6985/10000 [27:23:47<11:33:13, 13.80s/it] {'loss': 0.0073, 'learning_rate': 1.5120000000000001e-05, 'epoch': 9.14} 70%|██████▉ | 6985/10000 [27:23:47<11:33:13, 13.80s/it] 70%|██████▉ | 6986/10000 [27:24:01<11:32:41, 13.79s/it] {'loss': 0.0044, 'learning_rate': 1.5115000000000002e-05, 'epoch': 9.14} 70%|██████▉ | 6986/10000 [27:24:01<11:32:41, 13.79s/it] 70%|██████▉ | 6987/10000 [27:24:15<11:32:05, 13.78s/it] {'loss': 0.004, 'learning_rate': 1.5110000000000003e-05, 'epoch': 9.15} 70%|██████▉ | 6987/10000 [27:24:15<11:32:05, 13.78s/it] 70%|██████▉ | 6988/10000 [27:24:29<11:33:34, 13.82s/it] {'loss': 0.0061, 'learning_rate': 1.5105e-05, 'epoch': 9.15} 70%|██████▉ | 6988/10000 [27:24:29<11:33:34, 13.82s/it] 70%|██████▉ | 6989/10000 [27:24:42<11:32:02, 13.79s/it] {'loss': 0.004, 'learning_rate': 1.51e-05, 'epoch': 9.15} 70%|██████▉ | 6989/10000 [27:24:42<11:32:02, 13.79s/it] 70%|██████▉ | 6990/10000 [27:24:56<11:32:13, 13.80s/it] {'loss': 0.0051, 'learning_rate': 1.5095e-05, 'epoch': 9.15} 70%|██████▉ | 6990/10000 [27:24:56<11:32:13, 13.80s/it] 70%|██████▉ | 6991/10000 [27:25:10<11:32:12, 13.80s/it] {'loss': 0.0047, 'learning_rate': 1.5090000000000001e-05, 'epoch': 9.15} 70%|██████▉ | 6991/10000 [27:25:10<11:32:12, 13.80s/it] 70%|██████▉ | 6992/10000 [27:25:24<11:32:21, 13.81s/it] {'loss': 0.0044, 'learning_rate': 1.5085000000000002e-05, 'epoch': 9.15} 70%|██████▉ | 6992/10000 [27:25:24<11:32:21, 13.81s/it] 70%|██████▉ | 6993/10000 [27:25:38<11:33:36, 13.84s/it] {'loss': 0.007, 'learning_rate': 1.508e-05, 'epoch': 9.15} 70%|██████▉ | 6993/10000 [27:25:38<11:33:36, 13.84s/it] 70%|██████▉ | 6994/10000 [27:25:52<11:32:42, 13.83s/it] {'loss': 0.0055, 'learning_rate': 1.5075e-05, 'epoch': 9.15} 70%|██████▉ | 6994/10000 [27:25:52<11:32:42, 13.83s/it] 70%|██████▉ | 6995/10000 [27:26:05<11:32:37, 13.83s/it] {'loss': 0.0058, 'learning_rate': 1.5070000000000001e-05, 'epoch': 9.16} 70%|██████▉ | 6995/10000 [27:26:05<11:32:37, 13.83s/it] 70%|██████▉ | 6996/10000 [27:26:19<11:31:57, 13.82s/it] {'loss': 0.0078, 'learning_rate': 1.5065e-05, 'epoch': 9.16} 70%|██████▉ | 6996/10000 [27:26:19<11:31:57, 13.82s/it] 70%|██████▉ | 6997/10000 [27:26:33<11:31:36, 13.82s/it] {'loss': 0.0037, 'learning_rate': 1.5060000000000001e-05, 'epoch': 9.16} 70%|██████▉ | 6997/10000 [27:26:33<11:31:36, 13.82s/it] 70%|██████▉ | 6998/10000 [27:26:47<11:30:50, 13.81s/it] {'loss': 0.0056, 'learning_rate': 1.5054999999999999e-05, 'epoch': 9.16} 70%|██████▉ | 6998/10000 [27:26:47<11:30:50, 13.81s/it] 70%|██████▉ | 6999/10000 [27:27:01<11:31:12, 13.82s/it] {'loss': 0.008, 'learning_rate': 1.505e-05, 'epoch': 9.16} 70%|██████▉ | 6999/10000 [27:27:01<11:31:12, 13.82s/it] 70%|███████ | 7000/10000 [27:27:14<11:28:53, 13.78s/it] {'loss': 0.0069, 'learning_rate': 1.5045e-05, 'epoch': 9.16} 70%|███████ | 7000/10000 [27:27:14<11:28:53, 13.78s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-04 23:45:22,651 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-7000/config.json [INFO|configuration_utils.py:364] 2024-11-04 23:45:22,653 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-7000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-04 23:46:13,078 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-7000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-04 23:46:13,080 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-7000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-04 23:46:13,082 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-7000/special_tokens_map.json [2024-11-04 23:46:13,093] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step7000 is about to be saved! [2024-11-04 23:46:13,151] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt [2024-11-04 23:46:13,151] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt... [2024-11-04 23:47:04,549] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/mp_rank_00_model_states.pt. [2024-11-04 23:47:04,720] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-04 23:48:58,069] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-04 23:48:58,207] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-7000/global_step7000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-04 23:48:58,207] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step7000 is ready now! 70%|███████ | 7001/10000 [27:31:04<65:23:56, 78.50s/it] {'loss': 0.0064, 'learning_rate': 1.5040000000000002e-05, 'epoch': 9.16} 70%|███████ | 7001/10000 [27:31:04<65:23:56, 78.50s/it] 70%|███████ | 7002/10000 [27:31:17<49:09:39, 59.03s/it] {'loss': 0.0056, 'learning_rate': 1.5035000000000003e-05, 'epoch': 9.16} 70%|███████ | 7002/10000 [27:31:18<49:09:39, 59.03s/it] 70%|███████ | 7003/10000 [27:31:31<37:50:12, 45.45s/it] {'loss': 0.0063, 'learning_rate': 1.503e-05, 'epoch': 9.17} 70%|███████ | 7003/10000 [27:31:31<37:50:12, 45.45s/it] 70%|███████ | 7004/10000 [27:31:45<29:54:22, 35.94s/it] {'loss': 0.0059, 'learning_rate': 1.5025000000000001e-05, 'epoch': 9.17} 70%|███████ | 7004/10000 [27:31:45<29:54:22, 35.94s/it] 70%|███████ | 7005/10000 [27:31:59<24:23:08, 29.31s/it] {'loss': 0.007, 'learning_rate': 1.502e-05, 'epoch': 9.17} 70%|███████ | 7005/10000 [27:31:59<24:23:08, 29.31s/it] 70%|███████ | 7006/10000 [27:32:13<20:32:35, 24.70s/it] {'loss': 0.0053, 'learning_rate': 1.5015000000000001e-05, 'epoch': 9.17} 70%|███████ | 7006/10000 [27:32:13<20:32:35, 24.70s/it] 70%|███████ | 7007/10000 [27:32:27<17:48:46, 21.43s/it] {'loss': 0.0058, 'learning_rate': 1.5010000000000002e-05, 'epoch': 9.17} 70%|███████ | 7007/10000 [27:32:27<17:48:46, 21.43s/it] 70%|███████ | 7008/10000 [27:32:41<15:58:15, 19.22s/it] {'loss': 0.0054, 'learning_rate': 1.5005e-05, 'epoch': 9.17} 70%|███████ | 7008/10000 [27:32:41<15:58:15, 19.22s/it] 70%|███████ | 7009/10000 [27:32:55<14:38:10, 17.62s/it] {'loss': 0.0038, 'learning_rate': 1.5e-05, 'epoch': 9.17} 70%|███████ | 7009/10000 [27:32:55<14:38:10, 17.62s/it] 70%|███████ | 7010/10000 [27:33:08<13:39:34, 16.45s/it] {'loss': 0.0046, 'learning_rate': 1.4995000000000001e-05, 'epoch': 9.18} 70%|███████ | 7010/10000 [27:33:08<13:39:34, 16.45s/it] 70%|███████ | 7011/10000 [27:33:22<13:01:16, 15.68s/it] {'loss': 0.0056, 'learning_rate': 1.499e-05, 'epoch': 9.18} 70%|███████ | 7011/10000 [27:33:22<13:01:16, 15.68s/it] 70%|███████ | 7012/10000 [27:33:36<12:33:13, 15.12s/it] {'loss': 0.0077, 'learning_rate': 1.4985000000000001e-05, 'epoch': 9.18} 70%|███████ | 7012/10000 [27:33:36<12:33:13, 15.12s/it] 70%|███████ | 7013/10000 [27:33:50<12:14:55, 14.76s/it] {'loss': 0.0047, 'learning_rate': 1.4979999999999999e-05, 'epoch': 9.18} 70%|███████ | 7013/10000 [27:33:50<12:14:55, 14.76s/it] 70%|███████ | 7014/10000 [27:34:04<12:02:44, 14.52s/it] {'loss': 0.0053, 'learning_rate': 1.4975e-05, 'epoch': 9.18} 70%|███████ | 7014/10000 [27:34:04<12:02:44, 14.52s/it] 70%|███████ | 7015/10000 [27:34:18<11:53:48, 14.35s/it] {'loss': 0.0051, 'learning_rate': 1.497e-05, 'epoch': 9.18} 70%|███████ | 7015/10000 [27:34:18<11:53:48, 14.35s/it] 70%|███████ | 7016/10000 [27:34:32<11:47:18, 14.22s/it] {'loss': 0.0141, 'learning_rate': 1.4965000000000002e-05, 'epoch': 9.18} 70%|███████ | 7016/10000 [27:34:32<11:47:18, 14.22s/it] 70%|███████ | 7017/10000 [27:34:46<11:41:36, 14.11s/it] {'loss': 0.005, 'learning_rate': 1.4960000000000002e-05, 'epoch': 9.18} 70%|███████ | 7017/10000 [27:34:46<11:41:36, 14.11s/it] 70%|███████ | 7018/10000 [27:34:59<11:37:29, 14.03s/it] {'loss': 0.0073, 'learning_rate': 1.4955e-05, 'epoch': 9.19} 70%|███████ | 7018/10000 [27:34:59<11:37:29, 14.03s/it] 70%|███████ | 7019/10000 [27:35:13<11:34:11, 13.97s/it] {'loss': 0.0095, 'learning_rate': 1.4950000000000001e-05, 'epoch': 9.19} 70%|███████ | 7019/10000 [27:35:13<11:34:11, 13.97s/it] 70%|███████ | 7020/10000 [27:35:27<11:32:34, 13.94s/it] {'loss': 0.0055, 'learning_rate': 1.4945e-05, 'epoch': 9.19} 70%|███████ | 7020/10000 [27:35:27<11:32:34, 13.94s/it] 70%|███████ | 7021/10000 [27:35:41<11:30:29, 13.91s/it] {'loss': 0.0057, 'learning_rate': 1.4940000000000001e-05, 'epoch': 9.19} 70%|███████ | 7021/10000 [27:35:41<11:30:29, 13.91s/it] 70%|███████ | 7022/10000 [27:35:55<11:30:22, 13.91s/it] {'loss': 0.0042, 'learning_rate': 1.4935000000000002e-05, 'epoch': 9.19} 70%|███████ | 7022/10000 [27:35:55<11:30:22, 13.91s/it] 70%|███████ | 7023/10000 [27:36:09<11:30:58, 13.93s/it] {'loss': 0.0057, 'learning_rate': 1.493e-05, 'epoch': 9.19} 70%|███████ | 7023/10000 [27:36:09<11:30:58, 13.93s/it] 70%|███████ | 7024/10000 [27:36:23<11:32:08, 13.95s/it] {'loss': 0.0056, 'learning_rate': 1.4925e-05, 'epoch': 9.19} 70%|███████ | 7024/10000 [27:36:23<11:32:08, 13.95s/it] 70%|███████ | 7025/10000 [27:36:37<11:30:27, 13.93s/it] {'loss': 0.0054, 'learning_rate': 1.4920000000000001e-05, 'epoch': 9.2} 70%|███████ | 7025/10000 [27:36:37<11:30:27, 13.93s/it] 70%|███████ | 7026/10000 [27:36:51<11:29:10, 13.90s/it] {'loss': 0.0052, 'learning_rate': 1.4915000000000002e-05, 'epoch': 9.2} 70%|███████ | 7026/10000 [27:36:51<11:29:10, 13.90s/it] 70%|███████ | 7027/10000 [27:37:04<11:29:32, 13.92s/it] {'loss': 0.0058, 'learning_rate': 1.4910000000000001e-05, 'epoch': 9.2} 70%|███████ | 7027/10000 [27:37:05<11:29:32, 13.92s/it] 70%|███████ | 7028/10000 [27:37:18<11:29:18, 13.92s/it] {'loss': 0.0054, 'learning_rate': 1.4904999999999999e-05, 'epoch': 9.2} 70%|███████ | 7028/10000 [27:37:18<11:29:18, 13.92s/it] 70%|███████ | 7029/10000 [27:37:32<11:28:26, 13.90s/it] {'loss': 0.0054, 'learning_rate': 1.49e-05, 'epoch': 9.2} 70%|███████ | 7029/10000 [27:37:32<11:28:26, 13.90s/it] 70%|███████ | 7030/10000 [27:37:46<11:27:43, 13.89s/it] {'loss': 0.0051, 'learning_rate': 1.4895e-05, 'epoch': 9.2} 70%|███████ | 7030/10000 [27:37:46<11:27:43, 13.89s/it] 70%|███████ | 7031/10000 [27:38:00<11:27:49, 13.90s/it] {'loss': 0.0041, 'learning_rate': 1.4890000000000001e-05, 'epoch': 9.2} 70%|███████ | 7031/10000 [27:38:00<11:27:49, 13.90s/it] 70%|███████ | 7032/10000 [27:38:14<11:26:39, 13.88s/it] {'loss': 0.0044, 'learning_rate': 1.4885000000000002e-05, 'epoch': 9.2} 70%|███████ | 7032/10000 [27:38:14<11:26:39, 13.88s/it] 70%|███████ | 7033/10000 [27:38:28<11:26:08, 13.88s/it] {'loss': 0.0051, 'learning_rate': 1.488e-05, 'epoch': 9.21} 70%|███████ | 7033/10000 [27:38:28<11:26:08, 13.88s/it] 70%|███████ | 7034/10000 [27:38:42<11:26:52, 13.90s/it] {'loss': 0.0058, 'learning_rate': 1.4875e-05, 'epoch': 9.21} 70%|███████ | 7034/10000 [27:38:42<11:26:52, 13.90s/it] 70%|███████ | 7035/10000 [27:38:56<11:27:08, 13.90s/it] {'loss': 0.0055, 'learning_rate': 1.487e-05, 'epoch': 9.21} 70%|███████ | 7035/10000 [27:38:56<11:27:08, 13.90s/it] 70%|███████ | 7036/10000 [27:39:10<11:27:11, 13.91s/it] {'loss': 0.0059, 'learning_rate': 1.4865e-05, 'epoch': 9.21} 70%|███████ | 7036/10000 [27:39:10<11:27:11, 13.91s/it] 70%|███████ | 7037/10000 [27:39:24<11:27:56, 13.93s/it] {'loss': 0.0042, 'learning_rate': 1.4860000000000002e-05, 'epoch': 9.21} 70%|███████ | 7037/10000 [27:39:24<11:27:56, 13.93s/it] 70%|███████ | 7038/10000 [27:39:37<11:25:30, 13.89s/it] {'loss': 0.0064, 'learning_rate': 1.4855e-05, 'epoch': 9.21} 70%|███████ | 7038/10000 [27:39:37<11:25:30, 13.89s/it] 70%|███████ | 7039/10000 [27:39:51<11:27:42, 13.94s/it] {'loss': 0.0056, 'learning_rate': 1.485e-05, 'epoch': 9.21} 70%|███████ | 7039/10000 [27:39:51<11:27:42, 13.94s/it] 70%|███████ | 7040/10000 [27:40:05<11:28:42, 13.96s/it] {'loss': 0.0044, 'learning_rate': 1.4845000000000001e-05, 'epoch': 9.21} 70%|███████ | 7040/10000 [27:40:05<11:28:42, 13.96s/it] 70%|███████ | 7041/10000 [27:40:19<11:27:06, 13.93s/it] {'loss': 0.0051, 'learning_rate': 1.4840000000000002e-05, 'epoch': 9.22} 70%|███████ | 7041/10000 [27:40:19<11:27:06, 13.93s/it] 70%|███████ | 7042/10000 [27:40:33<11:25:38, 13.91s/it] {'loss': 0.0063, 'learning_rate': 1.4835000000000001e-05, 'epoch': 9.22} 70%|███████ | 7042/10000 [27:40:33<11:25:38, 13.91s/it] 70%|███████ | 7043/10000 [27:40:47<11:25:20, 13.91s/it] {'loss': 0.0047, 'learning_rate': 1.4829999999999999e-05, 'epoch': 9.22} 70%|███████ | 7043/10000 [27:40:47<11:25:20, 13.91s/it] 70%|███████ | 7044/10000 [27:41:01<11:24:29, 13.89s/it] {'loss': 0.0048, 'learning_rate': 1.4825e-05, 'epoch': 9.22} 70%|███████ | 7044/10000 [27:41:01<11:24:29, 13.89s/it] 70%|███████ | 7045/10000 [27:41:15<11:24:38, 13.90s/it] {'loss': 0.0055, 'learning_rate': 1.482e-05, 'epoch': 9.22} 70%|███████ | 7045/10000 [27:41:15<11:24:38, 13.90s/it] 70%|███████ | 7046/10000 [27:41:29<11:23:26, 13.88s/it] {'loss': 0.0047, 'learning_rate': 1.4815000000000001e-05, 'epoch': 9.22} 70%|███████ | 7046/10000 [27:41:29<11:23:26, 13.88s/it] 70%|███████ | 7047/10000 [27:41:43<11:23:21, 13.88s/it] {'loss': 0.0058, 'learning_rate': 1.4810000000000002e-05, 'epoch': 9.22} 70%|███████ | 7047/10000 [27:41:43<11:23:21, 13.88s/it] 70%|███████ | 7048/10000 [27:41:56<11:22:31, 13.87s/it] {'loss': 0.005, 'learning_rate': 1.4805e-05, 'epoch': 9.23} 70%|███████ | 7048/10000 [27:41:56<11:22:31, 13.87s/it] 70%|███████ | 7049/10000 [27:42:10<11:22:06, 13.87s/it] {'loss': 0.0069, 'learning_rate': 1.48e-05, 'epoch': 9.23} 70%|███████ | 7049/10000 [27:42:10<11:22:06, 13.87s/it] 70%|███████ | 7050/10000 [27:42:24<11:21:28, 13.86s/it] {'loss': 0.0062, 'learning_rate': 1.4795e-05, 'epoch': 9.23} 70%|███████ | 7050/10000 [27:42:24<11:21:28, 13.86s/it] 71%|███████ | 7051/10000 [27:42:38<11:23:24, 13.90s/it] {'loss': 0.0047, 'learning_rate': 1.479e-05, 'epoch': 9.23} 71%|███████ | 7051/10000 [27:42:38<11:23:24, 13.90s/it] 71%|███████ | 7052/10000 [27:42:52<11:22:38, 13.89s/it] {'loss': 0.0053, 'learning_rate': 1.4785000000000002e-05, 'epoch': 9.23} 71%|███████ | 7052/10000 [27:42:52<11:22:38, 13.89s/it] 71%|███████ | 7053/10000 [27:43:06<11:21:58, 13.88s/it] {'loss': 0.0043, 'learning_rate': 1.4779999999999999e-05, 'epoch': 9.23} 71%|███████ | 7053/10000 [27:43:06<11:21:58, 13.88s/it] 71%|███████ | 7054/10000 [27:43:20<11:22:08, 13.89s/it] {'loss': 0.0053, 'learning_rate': 1.4775e-05, 'epoch': 9.23} 71%|███████ | 7054/10000 [27:43:20<11:22:08, 13.89s/it] 71%|███████ | 7055/10000 [27:43:34<11:21:07, 13.88s/it] {'loss': 0.0038, 'learning_rate': 1.4770000000000001e-05, 'epoch': 9.23} 71%|███████ | 7055/10000 [27:43:34<11:21:07, 13.88s/it] 71%|███████ | 7056/10000 [27:43:47<11:19:21, 13.85s/it] {'loss': 0.0068, 'learning_rate': 1.4765000000000002e-05, 'epoch': 9.24} 71%|███████ | 7056/10000 [27:43:47<11:19:21, 13.85s/it] 71%|███████ | 7057/10000 [27:44:01<11:21:23, 13.89s/it] {'loss': 0.0049, 'learning_rate': 1.4760000000000001e-05, 'epoch': 9.24} 71%|███████ | 7057/10000 [27:44:01<11:21:23, 13.89s/it] 71%|███████ | 7058/10000 [27:44:15<11:21:50, 13.91s/it] {'loss': 0.0058, 'learning_rate': 1.4755e-05, 'epoch': 9.24} 71%|███████ | 7058/10000 [27:44:15<11:21:50, 13.91s/it] 71%|███████ | 7059/10000 [27:44:29<11:23:26, 13.94s/it] {'loss': 0.0076, 'learning_rate': 1.475e-05, 'epoch': 9.24} 71%|███████ | 7059/10000 [27:44:29<11:23:26, 13.94s/it] 71%|███████ | 7060/10000 [27:44:43<11:23:11, 13.94s/it] {'loss': 0.0046, 'learning_rate': 1.4745e-05, 'epoch': 9.24} 71%|███████ | 7060/10000 [27:44:43<11:23:11, 13.94s/it] 71%|███████ | 7061/10000 [27:44:57<11:23:03, 13.94s/it] {'loss': 0.0051, 'learning_rate': 1.4740000000000001e-05, 'epoch': 9.24} 71%|███████ | 7061/10000 [27:44:57<11:23:03, 13.94s/it] 71%|███████ | 7062/10000 [27:45:11<11:23:54, 13.97s/it] {'loss': 0.0053, 'learning_rate': 1.4735000000000002e-05, 'epoch': 9.24} 71%|███████ | 7062/10000 [27:45:11<11:23:54, 13.97s/it] 71%|███████ | 7063/10000 [27:45:25<11:22:21, 13.94s/it] {'loss': 0.0053, 'learning_rate': 1.473e-05, 'epoch': 9.24} 71%|███████ | 7063/10000 [27:45:25<11:22:21, 13.94s/it] 71%|███████ | 7064/10000 [27:45:39<11:22:28, 13.95s/it] {'loss': 0.0067, 'learning_rate': 1.4725e-05, 'epoch': 9.25} 71%|███████ | 7064/10000 [27:45:39<11:22:28, 13.95s/it] 71%|███████ | 7065/10000 [27:45:53<11:21:58, 13.94s/it] {'loss': 0.0056, 'learning_rate': 1.472e-05, 'epoch': 9.25} 71%|███████ | 7065/10000 [27:45:53<11:21:58, 13.94s/it] 71%|███████ | 7066/10000 [27:46:07<11:20:42, 13.92s/it] {'loss': 0.0053, 'learning_rate': 1.4715e-05, 'epoch': 9.25} 71%|███████ | 7066/10000 [27:46:07<11:20:42, 13.92s/it] 71%|███████ | 7067/10000 [27:46:21<11:20:46, 13.93s/it] {'loss': 0.006, 'learning_rate': 1.4710000000000001e-05, 'epoch': 9.25} 71%|███████ | 7067/10000 [27:46:21<11:20:46, 13.93s/it] 71%|███████ | 7068/10000 [27:46:35<11:18:07, 13.88s/it] {'loss': 0.0055, 'learning_rate': 1.4704999999999999e-05, 'epoch': 9.25} 71%|███████ | 7068/10000 [27:46:35<11:18:07, 13.88s/it] 71%|███████ | 7069/10000 [27:46:48<11:17:23, 13.87s/it] {'loss': 0.0047, 'learning_rate': 1.47e-05, 'epoch': 9.25} 71%|███████ | 7069/10000 [27:46:48<11:17:23, 13.87s/it] 71%|███████ | 7070/10000 [27:47:02<11:16:38, 13.86s/it] {'loss': 0.0042, 'learning_rate': 1.4695e-05, 'epoch': 9.25} 71%|███████ | 7070/10000 [27:47:02<11:16:38, 13.86s/it] 71%|███████ | 7071/10000 [27:47:16<11:16:16, 13.85s/it] {'loss': 0.0064, 'learning_rate': 1.4690000000000002e-05, 'epoch': 9.26} 71%|███████ | 7071/10000 [27:47:16<11:16:16, 13.85s/it] 71%|███████ | 7072/10000 [27:47:30<11:17:03, 13.87s/it] {'loss': 0.0049, 'learning_rate': 1.4685000000000001e-05, 'epoch': 9.26} 71%|███████ | 7072/10000 [27:47:30<11:17:03, 13.87s/it] 71%|███████ | 7073/10000 [27:47:44<11:19:36, 13.93s/it] {'loss': 0.0062, 'learning_rate': 1.4680000000000002e-05, 'epoch': 9.26} 71%|███████ | 7073/10000 [27:47:44<11:19:36, 13.93s/it] 71%|███████ | 7074/10000 [27:47:58<11:18:36, 13.92s/it] {'loss': 0.0065, 'learning_rate': 1.4675e-05, 'epoch': 9.26} 71%|███████ | 7074/10000 [27:47:58<11:18:36, 13.92s/it] 71%|███████ | 7075/10000 [27:48:12<11:17:23, 13.90s/it] {'loss': 0.0055, 'learning_rate': 1.467e-05, 'epoch': 9.26} 71%|███████ | 7075/10000 [27:48:12<11:17:23, 13.90s/it] 71%|███████ | 7076/10000 [27:48:26<11:16:28, 13.88s/it] {'loss': 0.0046, 'learning_rate': 1.4665000000000001e-05, 'epoch': 9.26} 71%|███████ | 7076/10000 [27:48:26<11:16:28, 13.88s/it] 71%|███████ | 7077/10000 [27:48:40<11:18:09, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.4660000000000002e-05, 'epoch': 9.26} 71%|███████ | 7077/10000 [27:48:40<11:18:09, 13.92s/it] 71%|███████ | 7078/10000 [27:48:54<11:18:30, 13.93s/it] {'loss': 0.0048, 'learning_rate': 1.4655000000000003e-05, 'epoch': 9.26} 71%|███████ | 7078/10000 [27:48:54<11:18:30, 13.93s/it] 71%|███████ | 7079/10000 [27:49:07<11:17:36, 13.92s/it] {'loss': 0.0052, 'learning_rate': 1.465e-05, 'epoch': 9.27} 71%|███████ | 7079/10000 [27:49:08<11:17:36, 13.92s/it] 71%|███████ | 7080/10000 [27:49:21<11:17:03, 13.91s/it] {'loss': 0.0103, 'learning_rate': 1.4645e-05, 'epoch': 9.27} 71%|███████ | 7080/10000 [27:49:21<11:17:03, 13.91s/it] 71%|███████ | 7081/10000 [27:49:35<11:16:32, 13.91s/it] {'loss': 0.0067, 'learning_rate': 1.464e-05, 'epoch': 9.27} 71%|███████ | 7081/10000 [27:49:35<11:16:32, 13.91s/it] 71%|███████ | 7082/10000 [27:49:49<11:17:43, 13.94s/it] {'loss': 0.005, 'learning_rate': 1.4635000000000001e-05, 'epoch': 9.27} 71%|███████ | 7082/10000 [27:49:49<11:17:43, 13.94s/it] 71%|███████ | 7083/10000 [27:50:03<11:17:09, 13.93s/it] {'loss': 0.0049, 'learning_rate': 1.4630000000000002e-05, 'epoch': 9.27} 71%|███████ | 7083/10000 [27:50:03<11:17:09, 13.93s/it] 71%|███████ | 7084/10000 [27:50:17<11:12:54, 13.85s/it] {'loss': 0.0065, 'learning_rate': 1.4625e-05, 'epoch': 9.27} 71%|███████ | 7084/10000 [27:50:17<11:12:54, 13.85s/it] 71%|███████ | 7085/10000 [27:50:31<11:13:02, 13.85s/it] {'loss': 0.0054, 'learning_rate': 1.462e-05, 'epoch': 9.27} 71%|███████ | 7085/10000 [27:50:31<11:13:02, 13.85s/it] 71%|███████ | 7086/10000 [27:50:45<11:14:49, 13.89s/it] {'loss': 0.0074, 'learning_rate': 1.4615000000000002e-05, 'epoch': 9.27} 71%|███████ | 7086/10000 [27:50:45<11:14:49, 13.89s/it] 71%|███████ | 7087/10000 [27:50:59<11:13:12, 13.87s/it] {'loss': 0.0046, 'learning_rate': 1.461e-05, 'epoch': 9.28} 71%|███████ | 7087/10000 [27:50:59<11:13:12, 13.87s/it] 71%|███████ | 7088/10000 [27:51:12<11:13:09, 13.87s/it] {'loss': 0.0047, 'learning_rate': 1.4605000000000002e-05, 'epoch': 9.28} 71%|███████ | 7088/10000 [27:51:12<11:13:09, 13.87s/it] 71%|███████ | 7089/10000 [27:51:26<11:12:40, 13.86s/it] {'loss': 0.0057, 'learning_rate': 1.4599999999999999e-05, 'epoch': 9.28} 71%|███████ | 7089/10000 [27:51:26<11:12:40, 13.86s/it] 71%|███████ | 7090/10000 [27:51:40<11:13:36, 13.89s/it] {'loss': 0.0052, 'learning_rate': 1.4595e-05, 'epoch': 9.28} 71%|███████ | 7090/10000 [27:51:40<11:13:36, 13.89s/it] 71%|███████ | 7091/10000 [27:51:54<11:13:39, 13.89s/it] {'loss': 0.0064, 'learning_rate': 1.4590000000000001e-05, 'epoch': 9.28} 71%|███████ | 7091/10000 [27:51:54<11:13:39, 13.89s/it] 71%|███████ | 7092/10000 [27:52:08<11:14:12, 13.91s/it] {'loss': 0.0042, 'learning_rate': 1.4585000000000002e-05, 'epoch': 9.28} 71%|███████ | 7092/10000 [27:52:08<11:14:12, 13.91s/it] 71%|███████ | 7093/10000 [27:52:22<11:12:59, 13.89s/it] {'loss': 0.006, 'learning_rate': 1.4580000000000003e-05, 'epoch': 9.28} 71%|███████ | 7093/10000 [27:52:22<11:12:59, 13.89s/it] 71%|███████ | 7094/10000 [27:52:36<11:11:58, 13.87s/it] {'loss': 0.0073, 'learning_rate': 1.4575e-05, 'epoch': 9.29} 71%|███████ | 7094/10000 [27:52:36<11:11:58, 13.87s/it] 71%|███████ | 7095/10000 [27:52:50<11:13:06, 13.90s/it] {'loss': 0.013, 'learning_rate': 1.4570000000000001e-05, 'epoch': 9.29} 71%|███████ | 7095/10000 [27:52:50<11:13:06, 13.90s/it] 71%|███████ | 7096/10000 [27:53:04<11:13:24, 13.91s/it] {'loss': 0.006, 'learning_rate': 1.4565e-05, 'epoch': 9.29} 71%|███████ | 7096/10000 [27:53:04<11:13:24, 13.91s/it] 71%|███████ | 7097/10000 [27:53:18<11:13:38, 13.92s/it] {'loss': 0.0059, 'learning_rate': 1.4560000000000001e-05, 'epoch': 9.29} 71%|███████ | 7097/10000 [27:53:18<11:13:38, 13.92s/it] 71%|███████ | 7098/10000 [27:53:32<11:14:04, 13.94s/it] {'loss': 0.0044, 'learning_rate': 1.4555000000000002e-05, 'epoch': 9.29} 71%|███████ | 7098/10000 [27:53:32<11:14:04, 13.94s/it] 71%|███████ | 7099/10000 [27:53:45<11:12:17, 13.90s/it] {'loss': 0.0047, 'learning_rate': 1.455e-05, 'epoch': 9.29} 71%|███████ | 7099/10000 [27:53:45<11:12:17, 13.90s/it] 71%|███████ | 7100/10000 [27:53:59<11:11:42, 13.90s/it] {'loss': 0.0063, 'learning_rate': 1.4545e-05, 'epoch': 9.29} 71%|███████ | 7100/10000 [27:53:59<11:11:42, 13.90s/it] 71%|███████ | 7101/10000 [27:54:13<11:11:02, 13.89s/it] {'loss': 0.0066, 'learning_rate': 1.4540000000000001e-05, 'epoch': 9.29} 71%|███████ | 7101/10000 [27:54:13<11:11:02, 13.89s/it] 71%|███████ | 7102/10000 [27:54:27<11:09:56, 13.87s/it] {'loss': 0.006, 'learning_rate': 1.4535e-05, 'epoch': 9.3} 71%|███████ | 7102/10000 [27:54:27<11:09:56, 13.87s/it] 71%|███████ | 7103/10000 [27:54:41<11:09:58, 13.88s/it] {'loss': 0.0057, 'learning_rate': 1.4530000000000001e-05, 'epoch': 9.3} 71%|███████ | 7103/10000 [27:54:41<11:09:58, 13.88s/it] 71%|███████ | 7104/10000 [27:54:55<11:11:07, 13.90s/it] {'loss': 0.0055, 'learning_rate': 1.4524999999999999e-05, 'epoch': 9.3} 71%|███████ | 7104/10000 [27:54:55<11:11:07, 13.90s/it] 71%|███████ | 7105/10000 [27:55:09<11:11:00, 13.91s/it] {'loss': 0.0171, 'learning_rate': 1.452e-05, 'epoch': 9.3} 71%|███████ | 7105/10000 [27:55:09<11:11:00, 13.91s/it] 71%|███████ | 7106/10000 [27:55:22<11:08:33, 13.86s/it] {'loss': 0.0057, 'learning_rate': 1.4515e-05, 'epoch': 9.3} 71%|███████ | 7106/10000 [27:55:23<11:08:33, 13.86s/it] 71%|███████ | 7107/10000 [27:55:36<11:09:12, 13.88s/it] {'loss': 0.0045, 'learning_rate': 1.4510000000000002e-05, 'epoch': 9.3} 71%|███████ | 7107/10000 [27:55:36<11:09:12, 13.88s/it] 71%|███████ | 7108/10000 [27:55:50<11:07:58, 13.86s/it] {'loss': 0.0062, 'learning_rate': 1.4505000000000003e-05, 'epoch': 9.3} 71%|███████ | 7108/10000 [27:55:50<11:07:58, 13.86s/it] 71%|███████ | 7109/10000 [27:56:04<11:08:23, 13.87s/it] {'loss': 0.0063, 'learning_rate': 1.45e-05, 'epoch': 9.3} 71%|███████ | 7109/10000 [27:56:04<11:08:23, 13.87s/it] 71%|███████ | 7110/10000 [27:56:18<11:10:30, 13.92s/it] {'loss': 0.0034, 'learning_rate': 1.4495000000000001e-05, 'epoch': 9.31} 71%|███████ | 7110/10000 [27:56:18<11:10:30, 13.92s/it] 71%|███████ | 7111/10000 [27:56:32<11:11:18, 13.94s/it] {'loss': 0.0057, 'learning_rate': 1.449e-05, 'epoch': 9.31} 71%|███████ | 7111/10000 [27:56:32<11:11:18, 13.94s/it] 71%|███████ | 7112/10000 [27:56:46<11:08:58, 13.90s/it] {'loss': 0.0053, 'learning_rate': 1.4485000000000001e-05, 'epoch': 9.31} 71%|███████ | 7112/10000 [27:56:46<11:08:58, 13.90s/it] 71%|███████ | 7113/10000 [27:57:00<11:08:18, 13.89s/it] {'loss': 0.0048, 'learning_rate': 1.4480000000000002e-05, 'epoch': 9.31} 71%|███████ | 7113/10000 [27:57:00<11:08:18, 13.89s/it] 71%|███████ | 7114/10000 [27:57:14<11:08:17, 13.89s/it] {'loss': 0.0061, 'learning_rate': 1.4475e-05, 'epoch': 9.31} 71%|███████ | 7114/10000 [27:57:14<11:08:17, 13.89s/it] 71%|███████ | 7115/10000 [27:57:28<11:09:11, 13.92s/it] {'loss': 0.0059, 'learning_rate': 1.447e-05, 'epoch': 9.31} 71%|███████ | 7115/10000 [27:57:28<11:09:11, 13.92s/it] 71%|███████ | 7116/10000 [27:57:42<11:09:54, 13.94s/it] {'loss': 0.0034, 'learning_rate': 1.4465000000000001e-05, 'epoch': 9.31} 71%|███████ | 7116/10000 [27:57:42<11:09:54, 13.94s/it] 71%|███████ | 7117/10000 [27:57:55<11:08:06, 13.90s/it] {'loss': 0.004, 'learning_rate': 1.4460000000000002e-05, 'epoch': 9.32} 71%|███████ | 7117/10000 [27:57:56<11:08:06, 13.90s/it] 71%|███████ | 7118/10000 [27:58:09<11:09:09, 13.93s/it] {'loss': 0.0056, 'learning_rate': 1.4455000000000001e-05, 'epoch': 9.32} 71%|███████ | 7118/10000 [27:58:10<11:09:09, 13.93s/it] 71%|███████ | 7119/10000 [27:58:23<11:07:07, 13.89s/it] {'loss': 0.0056, 'learning_rate': 1.4449999999999999e-05, 'epoch': 9.32} 71%|███████ | 7119/10000 [27:58:23<11:07:07, 13.89s/it] 71%|███████ | 7120/10000 [27:58:37<11:08:06, 13.92s/it] {'loss': 0.0036, 'learning_rate': 1.4445e-05, 'epoch': 9.32} 71%|███████ | 7120/10000 [27:58:37<11:08:06, 13.92s/it] 71%|███████ | 7121/10000 [27:58:51<11:07:44, 13.92s/it] {'loss': 0.0058, 'learning_rate': 1.444e-05, 'epoch': 9.32} 71%|███████ | 7121/10000 [27:58:51<11:07:44, 13.92s/it] 71%|███████ | 7122/10000 [27:59:05<11:09:00, 13.95s/it] {'loss': 0.0047, 'learning_rate': 1.4435000000000002e-05, 'epoch': 9.32} 71%|███████ | 7122/10000 [27:59:05<11:09:00, 13.95s/it] 71%|███████ | 7123/10000 [27:59:19<11:08:44, 13.95s/it] {'loss': 0.0059, 'learning_rate': 1.4430000000000002e-05, 'epoch': 9.32} 71%|███████ | 7123/10000 [27:59:19<11:08:44, 13.95s/it] 71%|███████ | 7124/10000 [27:59:33<11:08:59, 13.96s/it] {'loss': 0.0038, 'learning_rate': 1.4425e-05, 'epoch': 9.32} 71%|███████ | 7124/10000 [27:59:33<11:08:59, 13.96s/it] 71%|███████▏ | 7125/10000 [27:59:47<11:09:36, 13.97s/it] {'loss': 0.0066, 'learning_rate': 1.4420000000000001e-05, 'epoch': 9.33} 71%|███████▏ | 7125/10000 [27:59:47<11:09:36, 13.97s/it] 71%|███████▏ | 7126/10000 [28:00:01<11:07:19, 13.93s/it] {'loss': 0.0048, 'learning_rate': 1.4415e-05, 'epoch': 9.33} 71%|███████▏ | 7126/10000 [28:00:01<11:07:19, 13.93s/it] 71%|███████▏ | 7127/10000 [28:00:15<11:07:37, 13.94s/it] {'loss': 0.0062, 'learning_rate': 1.4410000000000001e-05, 'epoch': 9.33} 71%|███████▏ | 7127/10000 [28:00:15<11:07:37, 13.94s/it] 71%|███████▏ | 7128/10000 [28:00:29<11:07:48, 13.95s/it] {'loss': 0.0065, 'learning_rate': 1.4405000000000002e-05, 'epoch': 9.33} 71%|███████▏ | 7128/10000 [28:00:29<11:07:48, 13.95s/it] 71%|███████▏ | 7129/10000 [28:00:43<11:07:09, 13.94s/it] {'loss': 0.0063, 'learning_rate': 1.44e-05, 'epoch': 9.33} 71%|███████▏ | 7129/10000 [28:00:43<11:07:09, 13.94s/it] 71%|███████▏ | 7130/10000 [28:00:57<11:07:55, 13.96s/it] {'loss': 0.0042, 'learning_rate': 1.4395e-05, 'epoch': 9.33} 71%|███████▏ | 7130/10000 [28:00:57<11:07:55, 13.96s/it] 71%|███████▏ | 7131/10000 [28:01:11<11:06:44, 13.94s/it] {'loss': 0.0035, 'learning_rate': 1.4390000000000001e-05, 'epoch': 9.33} 71%|███████▏ | 7131/10000 [28:01:11<11:06:44, 13.94s/it] 71%|███████▏ | 7132/10000 [28:01:25<11:07:57, 13.97s/it] {'loss': 0.004, 'learning_rate': 1.4385000000000002e-05, 'epoch': 9.34} 71%|███████▏ | 7132/10000 [28:01:25<11:07:57, 13.97s/it] 71%|███████▏ | 7133/10000 [28:01:39<11:07:21, 13.97s/it] {'loss': 0.0059, 'learning_rate': 1.4380000000000001e-05, 'epoch': 9.34} 71%|███████▏ | 7133/10000 [28:01:39<11:07:21, 13.97s/it] 71%|███████▏ | 7134/10000 [28:01:53<11:05:08, 13.92s/it] {'loss': 0.0045, 'learning_rate': 1.4374999999999999e-05, 'epoch': 9.34} 71%|███████▏ | 7134/10000 [28:01:53<11:05:08, 13.92s/it] 71%|███████▏ | 7135/10000 [28:02:07<11:05:15, 13.93s/it] {'loss': 0.0065, 'learning_rate': 1.437e-05, 'epoch': 9.34} 71%|███████▏ | 7135/10000 [28:02:07<11:05:15, 13.93s/it] 71%|███████▏ | 7136/10000 [28:02:20<11:04:32, 13.92s/it] {'loss': 0.0056, 'learning_rate': 1.4365e-05, 'epoch': 9.34} 71%|███████▏ | 7136/10000 [28:02:20<11:04:32, 13.92s/it] 71%|███████▏ | 7137/10000 [28:02:34<11:04:16, 13.92s/it] {'loss': 0.0048, 'learning_rate': 1.4360000000000001e-05, 'epoch': 9.34} 71%|███████▏ | 7137/10000 [28:02:34<11:04:16, 13.92s/it] 71%|███████▏ | 7138/10000 [28:02:48<11:02:42, 13.89s/it] {'loss': 0.0049, 'learning_rate': 1.4355000000000002e-05, 'epoch': 9.34} 71%|███████▏ | 7138/10000 [28:02:48<11:02:42, 13.89s/it] 71%|███████▏ | 7139/10000 [28:03:02<11:03:12, 13.91s/it] {'loss': 0.005, 'learning_rate': 1.435e-05, 'epoch': 9.34} 71%|███████▏ | 7139/10000 [28:03:02<11:03:12, 13.91s/it] 71%|███████▏ | 7140/10000 [28:03:16<11:01:35, 13.88s/it] {'loss': 0.0111, 'learning_rate': 1.4345e-05, 'epoch': 9.35} 71%|███████▏ | 7140/10000 [28:03:16<11:01:35, 13.88s/it] 71%|███████▏ | 7141/10000 [28:03:30<11:01:01, 13.87s/it] {'loss': 0.0044, 'learning_rate': 1.434e-05, 'epoch': 9.35} 71%|███████▏ | 7141/10000 [28:03:30<11:01:01, 13.87s/it] 71%|███████▏ | 7142/10000 [28:03:44<11:01:02, 13.88s/it] {'loss': 0.0058, 'learning_rate': 1.4335e-05, 'epoch': 9.35} 71%|███████▏ | 7142/10000 [28:03:44<11:01:02, 13.88s/it] 71%|███████▏ | 7143/10000 [28:03:58<11:00:31, 13.87s/it] {'loss': 0.0081, 'learning_rate': 1.4330000000000002e-05, 'epoch': 9.35} 71%|███████▏ | 7143/10000 [28:03:58<11:00:31, 13.87s/it] 71%|███████▏ | 7144/10000 [28:04:11<10:59:42, 13.86s/it] {'loss': 0.0039, 'learning_rate': 1.4325e-05, 'epoch': 9.35} 71%|███████▏ | 7144/10000 [28:04:11<10:59:42, 13.86s/it] 71%|███████▏ | 7145/10000 [28:04:25<10:59:50, 13.87s/it] {'loss': 0.0067, 'learning_rate': 1.432e-05, 'epoch': 9.35} 71%|███████▏ | 7145/10000 [28:04:25<10:59:50, 13.87s/it] 71%|███████▏ | 7146/10000 [28:04:39<10:59:00, 13.85s/it] {'loss': 0.0067, 'learning_rate': 1.4315000000000001e-05, 'epoch': 9.35} 71%|███████▏ | 7146/10000 [28:04:39<10:59:00, 13.85s/it] 71%|███████▏ | 7147/10000 [28:04:53<10:58:01, 13.84s/it] {'loss': 0.004, 'learning_rate': 1.4310000000000002e-05, 'epoch': 9.35} 71%|███████▏ | 7147/10000 [28:04:53<10:58:01, 13.84s/it] 71%|███████▏ | 7148/10000 [28:05:07<11:01:07, 13.91s/it] {'loss': 0.0049, 'learning_rate': 1.4305000000000001e-05, 'epoch': 9.36} 71%|███████▏ | 7148/10000 [28:05:07<11:01:07, 13.91s/it] 71%|███████▏ | 7149/10000 [28:05:21<11:02:13, 13.94s/it] {'loss': 0.0047, 'learning_rate': 1.43e-05, 'epoch': 9.36} 71%|███████▏ | 7149/10000 [28:05:21<11:02:13, 13.94s/it] 72%|███████▏ | 7150/10000 [28:05:35<11:01:21, 13.92s/it] {'loss': 0.0067, 'learning_rate': 1.4295e-05, 'epoch': 9.36} 72%|███████▏ | 7150/10000 [28:05:35<11:01:21, 13.92s/it] 72%|███████▏ | 7151/10000 [28:05:49<11:02:09, 13.95s/it] {'loss': 0.0046, 'learning_rate': 1.429e-05, 'epoch': 9.36} 72%|███████▏ | 7151/10000 [28:05:49<11:02:09, 13.95s/it] 72%|███████▏ | 7152/10000 [28:06:03<11:00:34, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.4285000000000001e-05, 'epoch': 9.36} 72%|███████▏ | 7152/10000 [28:06:03<11:00:34, 13.92s/it] 72%|███████▏ | 7153/10000 [28:06:17<10:59:48, 13.91s/it] {'loss': 0.0062, 'learning_rate': 1.4280000000000002e-05, 'epoch': 9.36} 72%|███████▏ | 7153/10000 [28:06:17<10:59:48, 13.91s/it] 72%|███████▏ | 7154/10000 [28:06:30<10:59:52, 13.91s/it] {'loss': 0.0051, 'learning_rate': 1.4275e-05, 'epoch': 9.36} 72%|███████▏ | 7154/10000 [28:06:31<10:59:52, 13.91s/it] 72%|███████▏ | 7155/10000 [28:06:44<11:00:02, 13.92s/it] {'loss': 0.0046, 'learning_rate': 1.427e-05, 'epoch': 9.37} 72%|███████▏ | 7155/10000 [28:06:44<11:00:02, 13.92s/it] 72%|███████▏ | 7156/10000 [28:06:58<11:01:10, 13.95s/it] {'loss': 0.0056, 'learning_rate': 1.4265e-05, 'epoch': 9.37} 72%|███████▏ | 7156/10000 [28:06:58<11:01:10, 13.95s/it] 72%|███████▏ | 7157/10000 [28:07:12<11:00:32, 13.94s/it] {'loss': 0.0054, 'learning_rate': 1.426e-05, 'epoch': 9.37} 72%|███████▏ | 7157/10000 [28:07:12<11:00:32, 13.94s/it] 72%|███████▏ | 7158/10000 [28:07:26<10:59:58, 13.93s/it] {'loss': 0.006, 'learning_rate': 1.4255000000000002e-05, 'epoch': 9.37} 72%|███████▏ | 7158/10000 [28:07:26<10:59:58, 13.93s/it] 72%|███████▏ | 7159/10000 [28:07:40<10:57:49, 13.89s/it] {'loss': 0.0069, 'learning_rate': 1.4249999999999999e-05, 'epoch': 9.37} 72%|███████▏ | 7159/10000 [28:07:40<10:57:49, 13.89s/it] 72%|███████▏ | 7160/10000 [28:07:54<10:55:38, 13.85s/it] {'loss': 0.0054, 'learning_rate': 1.4245e-05, 'epoch': 9.37} 72%|███████▏ | 7160/10000 [28:07:54<10:55:38, 13.85s/it] 72%|███████▏ | 7161/10000 [28:08:08<10:55:39, 13.86s/it] {'loss': 0.0068, 'learning_rate': 1.4240000000000001e-05, 'epoch': 9.37} 72%|███████▏ | 7161/10000 [28:08:08<10:55:39, 13.86s/it] 72%|███████▏ | 7162/10000 [28:08:22<10:56:59, 13.89s/it] {'loss': 0.0053, 'learning_rate': 1.4235000000000002e-05, 'epoch': 9.37} 72%|███████▏ | 7162/10000 [28:08:22<10:56:59, 13.89s/it] 72%|███████▏ | 7163/10000 [28:08:36<10:56:04, 13.88s/it] {'loss': 0.0048, 'learning_rate': 1.4230000000000001e-05, 'epoch': 9.38} 72%|███████▏ | 7163/10000 [28:08:36<10:56:04, 13.88s/it] 72%|███████▏ | 7164/10000 [28:08:49<10:56:19, 13.89s/it] {'loss': 0.0049, 'learning_rate': 1.4225e-05, 'epoch': 9.38} 72%|███████▏ | 7164/10000 [28:08:49<10:56:19, 13.89s/it] 72%|███████▏ | 7165/10000 [28:09:03<10:54:47, 13.86s/it] {'loss': 0.0074, 'learning_rate': 1.422e-05, 'epoch': 9.38} 72%|███████▏ | 7165/10000 [28:09:03<10:54:47, 13.86s/it] 72%|███████▏ | 7166/10000 [28:09:17<10:55:26, 13.88s/it] {'loss': 0.0054, 'learning_rate': 1.4215e-05, 'epoch': 9.38} 72%|███████▏ | 7166/10000 [28:09:17<10:55:26, 13.88s/it] 72%|███████▏ | 7167/10000 [28:09:31<10:57:34, 13.93s/it] {'loss': 0.0058, 'learning_rate': 1.4210000000000001e-05, 'epoch': 9.38} 72%|███████▏ | 7167/10000 [28:09:31<10:57:34, 13.93s/it] 72%|███████▏ | 7168/10000 [28:09:45<10:56:56, 13.92s/it] {'loss': 0.0048, 'learning_rate': 1.4205000000000002e-05, 'epoch': 9.38} 72%|███████▏ | 7168/10000 [28:09:45<10:56:56, 13.92s/it] 72%|███████▏ | 7169/10000 [28:09:59<10:56:31, 13.91s/it] {'loss': 0.0058, 'learning_rate': 1.42e-05, 'epoch': 9.38} 72%|███████▏ | 7169/10000 [28:09:59<10:56:31, 13.91s/it] 72%|███████▏ | 7170/10000 [28:10:13<10:57:07, 13.93s/it] {'loss': 0.0047, 'learning_rate': 1.4195e-05, 'epoch': 9.38} 72%|███████▏ | 7170/10000 [28:10:13<10:57:07, 13.93s/it] 72%|███████▏ | 7171/10000 [28:10:27<10:56:33, 13.92s/it] {'loss': 0.0049, 'learning_rate': 1.4190000000000001e-05, 'epoch': 9.39} 72%|███████▏ | 7171/10000 [28:10:27<10:56:33, 13.92s/it] 72%|███████▏ | 7172/10000 [28:10:41<10:57:43, 13.95s/it] {'loss': 0.0071, 'learning_rate': 1.4185e-05, 'epoch': 9.39} 72%|███████▏ | 7172/10000 [28:10:41<10:57:43, 13.95s/it] 72%|███████▏ | 7173/10000 [28:10:55<10:56:23, 13.93s/it] {'loss': 0.0047, 'learning_rate': 1.4180000000000001e-05, 'epoch': 9.39} 72%|███████▏ | 7173/10000 [28:10:55<10:56:23, 13.93s/it] 72%|███████▏ | 7174/10000 [28:11:09<10:56:19, 13.93s/it] {'loss': 0.0057, 'learning_rate': 1.4174999999999999e-05, 'epoch': 9.39} 72%|███████▏ | 7174/10000 [28:11:09<10:56:19, 13.93s/it] 72%|███████▏ | 7175/10000 [28:11:23<10:55:18, 13.92s/it] {'loss': 0.0071, 'learning_rate': 1.417e-05, 'epoch': 9.39} 72%|███████▏ | 7175/10000 [28:11:23<10:55:18, 13.92s/it] 72%|███████▏ | 7176/10000 [28:11:37<10:55:55, 13.94s/it] {'loss': 0.0063, 'learning_rate': 1.4165e-05, 'epoch': 9.39} 72%|███████▏ | 7176/10000 [28:11:37<10:55:55, 13.94s/it] 72%|███████▏ | 7177/10000 [28:11:50<10:54:11, 13.90s/it] {'loss': 0.0071, 'learning_rate': 1.4160000000000002e-05, 'epoch': 9.39} 72%|███████▏ | 7177/10000 [28:11:50<10:54:11, 13.90s/it] 72%|███████▏ | 7178/10000 [28:12:04<10:52:47, 13.88s/it] {'loss': 0.0045, 'learning_rate': 1.4155000000000001e-05, 'epoch': 9.4} 72%|███████▏ | 7178/10000 [28:12:04<10:52:47, 13.88s/it] 72%|███████▏ | 7179/10000 [28:12:18<10:52:50, 13.89s/it] {'loss': 0.0047, 'learning_rate': 1.415e-05, 'epoch': 9.4} 72%|███████▏ | 7179/10000 [28:12:18<10:52:50, 13.89s/it] 72%|███████▏ | 7180/10000 [28:12:32<10:53:02, 13.89s/it] {'loss': 0.0057, 'learning_rate': 1.4145e-05, 'epoch': 9.4} 72%|███████▏ | 7180/10000 [28:12:32<10:53:02, 13.89s/it] 72%|███████▏ | 7181/10000 [28:12:46<10:53:01, 13.90s/it] {'loss': 0.0042, 'learning_rate': 1.414e-05, 'epoch': 9.4} 72%|███████▏ | 7181/10000 [28:12:46<10:53:01, 13.90s/it] 72%|███████▏ | 7182/10000 [28:13:00<10:52:32, 13.89s/it] {'loss': 0.0054, 'learning_rate': 1.4135000000000001e-05, 'epoch': 9.4} 72%|███████▏ | 7182/10000 [28:13:00<10:52:32, 13.89s/it] 72%|███████▏ | 7183/10000 [28:13:14<10:52:08, 13.89s/it] {'loss': 0.0034, 'learning_rate': 1.4130000000000002e-05, 'epoch': 9.4} 72%|███████▏ | 7183/10000 [28:13:14<10:52:08, 13.89s/it] 72%|███████▏ | 7184/10000 [28:13:28<10:51:52, 13.89s/it] {'loss': 0.0048, 'learning_rate': 1.4125e-05, 'epoch': 9.4} 72%|███████▏ | 7184/10000 [28:13:28<10:51:52, 13.89s/it] 72%|███████▏ | 7185/10000 [28:13:41<10:49:43, 13.85s/it] {'loss': 0.0052, 'learning_rate': 1.412e-05, 'epoch': 9.4} 72%|███████▏ | 7185/10000 [28:13:41<10:49:43, 13.85s/it] 72%|███████▏ | 7186/10000 [28:13:55<10:49:45, 13.85s/it] {'loss': 0.0044, 'learning_rate': 1.4115000000000001e-05, 'epoch': 9.41} 72%|███████▏ | 7186/10000 [28:13:55<10:49:45, 13.85s/it] 72%|███████▏ | 7187/10000 [28:14:09<10:49:30, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.411e-05, 'epoch': 9.41} 72%|███████▏ | 7187/10000 [28:14:09<10:49:30, 13.85s/it] 72%|███████▏ | 7188/10000 [28:14:23<10:48:23, 13.83s/it] {'loss': 0.0322, 'learning_rate': 1.4105000000000001e-05, 'epoch': 9.41} 72%|███████▏ | 7188/10000 [28:14:23<10:48:23, 13.83s/it] 72%|███████▏ | 7189/10000 [28:14:37<10:46:19, 13.80s/it] {'loss': 0.0044, 'learning_rate': 1.4099999999999999e-05, 'epoch': 9.41} 72%|███████▏ | 7189/10000 [28:14:37<10:46:19, 13.80s/it] 72%|███████▏ | 7190/10000 [28:14:50<10:47:13, 13.82s/it] {'loss': 0.0053, 'learning_rate': 1.4095e-05, 'epoch': 9.41} 72%|███████▏ | 7190/10000 [28:14:50<10:47:13, 13.82s/it] 72%|███████▏ | 7191/10000 [28:15:04<10:47:02, 13.82s/it] {'loss': 0.0044, 'learning_rate': 1.409e-05, 'epoch': 9.41} 72%|███████▏ | 7191/10000 [28:15:04<10:47:02, 13.82s/it] 72%|███████▏ | 7192/10000 [28:15:18<10:46:25, 13.81s/it] {'loss': 0.0039, 'learning_rate': 1.4085000000000002e-05, 'epoch': 9.41} 72%|███████▏ | 7192/10000 [28:15:18<10:46:25, 13.81s/it] 72%|███████▏ | 7193/10000 [28:15:32<10:45:03, 13.79s/it] {'loss': 0.0047, 'learning_rate': 1.408e-05, 'epoch': 9.41} 72%|███████▏ | 7193/10000 [28:15:32<10:45:03, 13.79s/it] 72%|███████▏ | 7194/10000 [28:15:46<10:45:06, 13.79s/it] {'loss': 0.0052, 'learning_rate': 1.4075e-05, 'epoch': 9.42} 72%|███████▏ | 7194/10000 [28:15:46<10:45:06, 13.79s/it] 72%|███████▏ | 7195/10000 [28:15:59<10:46:01, 13.82s/it] {'loss': 0.0054, 'learning_rate': 1.4069999999999999e-05, 'epoch': 9.42} 72%|███████▏ | 7195/10000 [28:15:59<10:46:01, 13.82s/it] 72%|███████▏ | 7196/10000 [28:16:13<10:45:27, 13.81s/it] {'loss': 0.0057, 'learning_rate': 1.4065e-05, 'epoch': 9.42} 72%|███████▏ | 7196/10000 [28:16:13<10:45:27, 13.81s/it] 72%|███████▏ | 7197/10000 [28:16:27<10:43:50, 13.78s/it] {'loss': 0.0056, 'learning_rate': 1.4060000000000001e-05, 'epoch': 9.42} 72%|███████▏ | 7197/10000 [28:16:27<10:43:50, 13.78s/it] 72%|███████▏ | 7198/10000 [28:16:41<10:44:49, 13.81s/it] {'loss': 0.0052, 'learning_rate': 1.4055000000000002e-05, 'epoch': 9.42} 72%|███████▏ | 7198/10000 [28:16:41<10:44:49, 13.81s/it] 72%|███████▏ | 7199/10000 [28:16:55<10:44:29, 13.81s/it] {'loss': 0.0056, 'learning_rate': 1.4050000000000003e-05, 'epoch': 9.42} 72%|███████▏ | 7199/10000 [28:16:55<10:44:29, 13.81s/it] 72%|███████▏ | 7200/10000 [28:17:08<10:43:49, 13.80s/it] {'loss': 0.0051, 'learning_rate': 1.4045e-05, 'epoch': 9.42} 72%|███████▏ | 7200/10000 [28:17:08<10:43:49, 13.80s/it] 72%|███████▏ | 7201/10000 [28:17:22<10:42:47, 13.78s/it] {'loss': 0.0062, 'learning_rate': 1.4040000000000001e-05, 'epoch': 9.43} 72%|███████▏ | 7201/10000 [28:17:22<10:42:47, 13.78s/it] 72%|███████▏ | 7202/10000 [28:17:36<10:43:36, 13.80s/it] {'loss': 0.0072, 'learning_rate': 1.4035e-05, 'epoch': 9.43} 72%|███████▏ | 7202/10000 [28:17:36<10:43:36, 13.80s/it] 72%|███████▏ | 7203/10000 [28:17:50<10:42:59, 13.79s/it] {'loss': 0.0039, 'learning_rate': 1.4030000000000001e-05, 'epoch': 9.43} 72%|███████▏ | 7203/10000 [28:17:50<10:42:59, 13.79s/it] 72%|███████▏ | 7204/10000 [28:18:04<10:43:04, 13.80s/it] {'loss': 0.0051, 'learning_rate': 1.4025000000000002e-05, 'epoch': 9.43} 72%|███████▏ | 7204/10000 [28:18:04<10:43:04, 13.80s/it] 72%|███████▏ | 7205/10000 [28:18:17<10:44:10, 13.83s/it] {'loss': 0.0042, 'learning_rate': 1.402e-05, 'epoch': 9.43} 72%|███████▏ | 7205/10000 [28:18:18<10:44:10, 13.83s/it] 72%|███████▏ | 7206/10000 [28:18:31<10:43:22, 13.82s/it] {'loss': 0.0067, 'learning_rate': 1.4015e-05, 'epoch': 9.43} 72%|███████▏ | 7206/10000 [28:18:31<10:43:22, 13.82s/it] 72%|███████▏ | 7207/10000 [28:18:45<10:42:24, 13.80s/it] {'loss': 0.007, 'learning_rate': 1.4010000000000001e-05, 'epoch': 9.43} 72%|███████▏ | 7207/10000 [28:18:45<10:42:24, 13.80s/it] 72%|███████▏ | 7208/10000 [28:18:59<10:43:48, 13.84s/it] {'loss': 0.005, 'learning_rate': 1.4005000000000002e-05, 'epoch': 9.43} 72%|███████▏ | 7208/10000 [28:18:59<10:43:48, 13.84s/it] 72%|███████▏ | 7209/10000 [28:19:13<10:43:58, 13.84s/it] {'loss': 0.0062, 'learning_rate': 1.4000000000000001e-05, 'epoch': 9.44} 72%|███████▏ | 7209/10000 [28:19:13<10:43:58, 13.84s/it] 72%|███████▏ | 7210/10000 [28:19:27<10:42:08, 13.81s/it] {'loss': 0.0048, 'learning_rate': 1.3994999999999999e-05, 'epoch': 9.44} 72%|███████▏ | 7210/10000 [28:19:27<10:42:08, 13.81s/it] 72%|███████▏ | 7211/10000 [28:19:40<10:43:29, 13.84s/it] {'loss': 0.0057, 'learning_rate': 1.399e-05, 'epoch': 9.44} 72%|███████▏ | 7211/10000 [28:19:40<10:43:29, 13.84s/it] 72%|███████▏ | 7212/10000 [28:19:54<10:43:41, 13.85s/it] {'loss': 0.0052, 'learning_rate': 1.3985e-05, 'epoch': 9.44} 72%|███████▏ | 7212/10000 [28:19:54<10:43:41, 13.85s/it] 72%|███████▏ | 7213/10000 [28:20:08<10:44:26, 13.87s/it] {'loss': 0.0055, 'learning_rate': 1.3980000000000002e-05, 'epoch': 9.44} 72%|███████▏ | 7213/10000 [28:20:08<10:44:26, 13.87s/it] 72%|███████▏ | 7214/10000 [28:20:22<10:47:03, 13.94s/it] {'loss': 0.0062, 'learning_rate': 1.3975000000000003e-05, 'epoch': 9.44} 72%|███████▏ | 7214/10000 [28:20:22<10:47:03, 13.94s/it] 72%|███████▏ | 7215/10000 [28:20:36<10:45:40, 13.91s/it] {'loss': 0.0056, 'learning_rate': 1.397e-05, 'epoch': 9.44} 72%|███████▏ | 7215/10000 [28:20:36<10:45:40, 13.91s/it] 72%|███████▏ | 7216/10000 [28:20:50<10:46:44, 13.94s/it] {'loss': 0.0058, 'learning_rate': 1.3965000000000001e-05, 'epoch': 9.45} 72%|███████▏ | 7216/10000 [28:20:50<10:46:44, 13.94s/it] 72%|███████▏ | 7217/10000 [28:21:04<10:45:39, 13.92s/it] {'loss': 0.0052, 'learning_rate': 1.396e-05, 'epoch': 9.45} 72%|███████▏ | 7217/10000 [28:21:04<10:45:39, 13.92s/it] 72%|███████▏ | 7218/10000 [28:21:18<10:44:43, 13.90s/it] {'loss': 0.0054, 'learning_rate': 1.3955000000000001e-05, 'epoch': 9.45} 72%|███████▏ | 7218/10000 [28:21:18<10:44:43, 13.90s/it] 72%|███████▏ | 7219/10000 [28:21:32<10:42:37, 13.86s/it] {'loss': 0.0053, 'learning_rate': 1.3950000000000002e-05, 'epoch': 9.45} 72%|███████▏ | 7219/10000 [28:21:32<10:42:37, 13.86s/it] 72%|███████▏ | 7220/10000 [28:21:45<10:40:36, 13.83s/it] {'loss': 0.0049, 'learning_rate': 1.3945e-05, 'epoch': 9.45} 72%|███████▏ | 7220/10000 [28:21:45<10:40:36, 13.83s/it] 72%|███████▏ | 7221/10000 [28:21:59<10:38:03, 13.78s/it] {'loss': 0.0064, 'learning_rate': 1.394e-05, 'epoch': 9.45} 72%|███████▏ | 7221/10000 [28:21:59<10:38:03, 13.78s/it] 72%|███████▏ | 7222/10000 [28:22:13<10:39:12, 13.81s/it] {'loss': 0.0053, 'learning_rate': 1.3935000000000001e-05, 'epoch': 9.45} 72%|███████▏ | 7222/10000 [28:22:13<10:39:12, 13.81s/it] 72%|███████▏ | 7223/10000 [28:22:27<10:37:59, 13.78s/it] {'loss': 0.0046, 'learning_rate': 1.3930000000000002e-05, 'epoch': 9.45} 72%|███████▏ | 7223/10000 [28:22:27<10:37:59, 13.78s/it] 72%|███████▏ | 7224/10000 [28:22:41<10:38:26, 13.80s/it] {'loss': 0.0049, 'learning_rate': 1.3925000000000001e-05, 'epoch': 9.46} 72%|███████▏ | 7224/10000 [28:22:41<10:38:26, 13.80s/it] 72%|███████▏ | 7225/10000 [28:22:54<10:38:52, 13.81s/it] {'loss': 0.005, 'learning_rate': 1.3919999999999999e-05, 'epoch': 9.46} 72%|███████▏ | 7225/10000 [28:22:54<10:38:52, 13.81s/it] 72%|███████▏ | 7226/10000 [28:23:08<10:39:54, 13.84s/it] {'loss': 0.0049, 'learning_rate': 1.3915e-05, 'epoch': 9.46} 72%|███████▏ | 7226/10000 [28:23:08<10:39:54, 13.84s/it] 72%|███████▏ | 7227/10000 [28:23:22<10:37:23, 13.79s/it] {'loss': 0.0053, 'learning_rate': 1.391e-05, 'epoch': 9.46} 72%|███████▏ | 7227/10000 [28:23:22<10:37:23, 13.79s/it] 72%|███████▏ | 7228/10000 [28:23:36<10:36:50, 13.78s/it] {'loss': 0.0076, 'learning_rate': 1.3905000000000002e-05, 'epoch': 9.46} 72%|███████▏ | 7228/10000 [28:23:36<10:36:50, 13.78s/it] 72%|███████▏ | 7229/10000 [28:23:49<10:35:19, 13.76s/it] {'loss': 0.0044, 'learning_rate': 1.3900000000000002e-05, 'epoch': 9.46} 72%|███████▏ | 7229/10000 [28:23:49<10:35:19, 13.76s/it] 72%|███████▏ | 7230/10000 [28:24:03<10:35:02, 13.76s/it] {'loss': 0.0045, 'learning_rate': 1.3895e-05, 'epoch': 9.46} 72%|███████▏ | 7230/10000 [28:24:03<10:35:02, 13.76s/it] 72%|███████▏ | 7231/10000 [28:24:17<10:35:57, 13.78s/it] {'loss': 0.0047, 'learning_rate': 1.389e-05, 'epoch': 9.46} 72%|███████▏ | 7231/10000 [28:24:17<10:35:57, 13.78s/it] 72%|███████▏ | 7232/10000 [28:24:31<10:36:18, 13.79s/it] {'loss': 0.0046, 'learning_rate': 1.3885e-05, 'epoch': 9.47} 72%|███████▏ | 7232/10000 [28:24:31<10:36:18, 13.79s/it] 72%|███████▏ | 7233/10000 [28:24:45<10:37:30, 13.82s/it] {'loss': 0.0055, 'learning_rate': 1.3880000000000001e-05, 'epoch': 9.47} 72%|███████▏ | 7233/10000 [28:24:45<10:37:30, 13.82s/it] 72%|███████▏ | 7234/10000 [28:24:59<10:37:55, 13.84s/it] {'loss': 0.0056, 'learning_rate': 1.3875000000000002e-05, 'epoch': 9.47} 72%|███████▏ | 7234/10000 [28:24:59<10:37:55, 13.84s/it] 72%|███████▏ | 7235/10000 [28:25:12<10:35:54, 13.80s/it] {'loss': 0.012, 'learning_rate': 1.387e-05, 'epoch': 9.47} 72%|███████▏ | 7235/10000 [28:25:12<10:35:54, 13.80s/it] 72%|███████▏ | 7236/10000 [28:25:26<10:35:54, 13.80s/it] {'loss': 0.0045, 'learning_rate': 1.3865e-05, 'epoch': 9.47} 72%|███████▏ | 7236/10000 [28:25:26<10:35:54, 13.80s/it] 72%|███████▏ | 7237/10000 [28:25:40<10:34:41, 13.78s/it] {'loss': 0.0083, 'learning_rate': 1.3860000000000001e-05, 'epoch': 9.47} 72%|███████▏ | 7237/10000 [28:25:40<10:34:41, 13.78s/it] 72%|███████▏ | 7238/10000 [28:25:54<10:35:13, 13.80s/it] {'loss': 0.0044, 'learning_rate': 1.3855000000000002e-05, 'epoch': 9.47} 72%|███████▏ | 7238/10000 [28:25:54<10:35:13, 13.80s/it] 72%|███████▏ | 7239/10000 [28:26:07<10:33:46, 13.77s/it] {'loss': 0.0071, 'learning_rate': 1.3850000000000001e-05, 'epoch': 9.48} 72%|███████▏ | 7239/10000 [28:26:07<10:33:46, 13.77s/it] 72%|███████▏ | 7240/10000 [28:26:21<10:33:08, 13.76s/it] {'loss': 0.0069, 'learning_rate': 1.3845e-05, 'epoch': 9.48} 72%|███████▏ | 7240/10000 [28:26:21<10:33:08, 13.76s/it] 72%|███████▏ | 7241/10000 [28:26:35<10:33:47, 13.78s/it] {'loss': 0.006, 'learning_rate': 1.384e-05, 'epoch': 9.48} 72%|███████▏ | 7241/10000 [28:26:35<10:33:47, 13.78s/it] 72%|███████▏ | 7242/10000 [28:26:49<10:34:05, 13.79s/it] {'loss': 0.0054, 'learning_rate': 1.3835e-05, 'epoch': 9.48} 72%|███████▏ | 7242/10000 [28:26:49<10:34:05, 13.79s/it] 72%|███████▏ | 7243/10000 [28:27:03<10:33:34, 13.79s/it] {'loss': 0.0047, 'learning_rate': 1.3830000000000001e-05, 'epoch': 9.48} 72%|███████▏ | 7243/10000 [28:27:03<10:33:34, 13.79s/it] 72%|███████▏ | 7244/10000 [28:27:16<10:34:13, 13.81s/it] {'loss': 0.0063, 'learning_rate': 1.3825000000000002e-05, 'epoch': 9.48} 72%|███████▏ | 7244/10000 [28:27:16<10:34:13, 13.81s/it] 72%|███████▏ | 7245/10000 [28:27:30<10:35:01, 13.83s/it] {'loss': 0.0073, 'learning_rate': 1.382e-05, 'epoch': 9.48} 72%|███████▏ | 7245/10000 [28:27:30<10:35:01, 13.83s/it] 72%|███████▏ | 7246/10000 [28:27:44<10:33:48, 13.81s/it] {'loss': 0.0054, 'learning_rate': 1.3815e-05, 'epoch': 9.48} 72%|███████▏ | 7246/10000 [28:27:44<10:33:48, 13.81s/it] 72%|███████▏ | 7247/10000 [28:27:58<10:33:02, 13.80s/it] {'loss': 0.0038, 'learning_rate': 1.381e-05, 'epoch': 9.49} 72%|███████▏ | 7247/10000 [28:27:58<10:33:02, 13.80s/it] 72%|███████▏ | 7248/10000 [28:28:12<10:32:30, 13.79s/it] {'loss': 0.0096, 'learning_rate': 1.3805e-05, 'epoch': 9.49} 72%|███████▏ | 7248/10000 [28:28:12<10:32:30, 13.79s/it] 72%|███████▏ | 7249/10000 [28:28:25<10:33:10, 13.81s/it] {'loss': 0.0035, 'learning_rate': 1.3800000000000002e-05, 'epoch': 9.49} 72%|███████▏ | 7249/10000 [28:28:26<10:33:10, 13.81s/it] 72%|███████▎ | 7250/10000 [28:28:39<10:32:28, 13.80s/it] {'loss': 0.0053, 'learning_rate': 1.3795e-05, 'epoch': 9.49} 72%|███████▎ | 7250/10000 [28:28:39<10:32:28, 13.80s/it] 73%|███████▎ | 7251/10000 [28:28:53<10:31:52, 13.79s/it] {'loss': 0.0051, 'learning_rate': 1.379e-05, 'epoch': 9.49} 73%|███████▎ | 7251/10000 [28:28:53<10:31:52, 13.79s/it] 73%|███████▎ | 7252/10000 [28:29:07<10:32:09, 13.80s/it] {'loss': 0.0054, 'learning_rate': 1.3785000000000001e-05, 'epoch': 9.49} 73%|███████▎ | 7252/10000 [28:29:07<10:32:09, 13.80s/it] 73%|███████▎ | 7253/10000 [28:29:21<10:32:43, 13.82s/it] {'loss': 0.0053, 'learning_rate': 1.3780000000000002e-05, 'epoch': 9.49} 73%|███████▎ | 7253/10000 [28:29:21<10:32:43, 13.82s/it] 73%|███████▎ | 7254/10000 [28:29:35<10:33:54, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.3775000000000001e-05, 'epoch': 9.49} 73%|███████▎ | 7254/10000 [28:29:35<10:33:54, 13.85s/it] 73%|███████▎ | 7255/10000 [28:29:49<10:35:15, 13.89s/it] {'loss': 0.0056, 'learning_rate': 1.377e-05, 'epoch': 9.5} 73%|███████▎ | 7255/10000 [28:29:49<10:35:15, 13.89s/it] 73%|███████▎ | 7256/10000 [28:30:03<10:36:28, 13.92s/it] {'loss': 0.004, 'learning_rate': 1.3765e-05, 'epoch': 9.5} 73%|███████▎ | 7256/10000 [28:30:03<10:36:28, 13.92s/it] 73%|███████▎ | 7257/10000 [28:30:16<10:34:48, 13.89s/it] {'loss': 0.0069, 'learning_rate': 1.376e-05, 'epoch': 9.5} 73%|███████▎ | 7257/10000 [28:30:16<10:34:48, 13.89s/it] 73%|███████▎ | 7258/10000 [28:30:30<10:33:09, 13.85s/it] {'loss': 0.0064, 'learning_rate': 1.3755000000000001e-05, 'epoch': 9.5} 73%|███████▎ | 7258/10000 [28:30:30<10:33:09, 13.85s/it] 73%|███████▎ | 7259/10000 [28:30:44<10:34:02, 13.88s/it] {'loss': 0.004, 'learning_rate': 1.3750000000000002e-05, 'epoch': 9.5} 73%|███████▎ | 7259/10000 [28:30:44<10:34:02, 13.88s/it] 73%|███████▎ | 7260/10000 [28:30:58<10:31:30, 13.83s/it] {'loss': 0.0048, 'learning_rate': 1.3745e-05, 'epoch': 9.5} 73%|███████▎ | 7260/10000 [28:30:58<10:31:30, 13.83s/it] 73%|███████▎ | 7261/10000 [28:31:11<10:28:48, 13.77s/it] {'loss': 0.0053, 'learning_rate': 1.374e-05, 'epoch': 9.5} 73%|███████▎ | 7261/10000 [28:31:12<10:28:48, 13.77s/it] 73%|███████▎ | 7262/10000 [28:31:25<10:29:06, 13.79s/it] {'loss': 0.0044, 'learning_rate': 1.3735000000000001e-05, 'epoch': 9.51} 73%|███████▎ | 7262/10000 [28:31:25<10:29:06, 13.79s/it] 73%|███████▎ | 7263/10000 [28:31:39<10:28:37, 13.78s/it] {'loss': 0.0049, 'learning_rate': 1.373e-05, 'epoch': 9.51} 73%|███████▎ | 7263/10000 [28:31:39<10:28:37, 13.78s/it] 73%|███████▎ | 7264/10000 [28:31:53<10:28:44, 13.79s/it] {'loss': 0.0052, 'learning_rate': 1.3725000000000002e-05, 'epoch': 9.51} 73%|███████▎ | 7264/10000 [28:31:53<10:28:44, 13.79s/it] 73%|███████▎ | 7265/10000 [28:32:07<10:28:24, 13.79s/it] {'loss': 0.0035, 'learning_rate': 1.3719999999999999e-05, 'epoch': 9.51} 73%|███████▎ | 7265/10000 [28:32:07<10:28:24, 13.79s/it] 73%|███████▎ | 7266/10000 [28:32:21<10:29:19, 13.81s/it] {'loss': 0.0051, 'learning_rate': 1.3715e-05, 'epoch': 9.51} 73%|███████▎ | 7266/10000 [28:32:21<10:29:19, 13.81s/it] 73%|███████▎ | 7267/10000 [28:32:34<10:28:04, 13.79s/it] {'loss': 0.0055, 'learning_rate': 1.3710000000000001e-05, 'epoch': 9.51} 73%|███████▎ | 7267/10000 [28:32:34<10:28:04, 13.79s/it] 73%|███████▎ | 7268/10000 [28:32:48<10:28:45, 13.81s/it] {'loss': 0.006, 'learning_rate': 1.3705000000000002e-05, 'epoch': 9.51} 73%|███████▎ | 7268/10000 [28:32:48<10:28:45, 13.81s/it] 73%|███████▎ | 7269/10000 [28:33:02<10:29:03, 13.82s/it] {'loss': 0.0045, 'learning_rate': 1.3700000000000001e-05, 'epoch': 9.51} 73%|███████▎ | 7269/10000 [28:33:02<10:29:03, 13.82s/it] 73%|███████▎ | 7270/10000 [28:33:16<10:28:10, 13.81s/it] {'loss': 0.0046, 'learning_rate': 1.3695e-05, 'epoch': 9.52} 73%|███████▎ | 7270/10000 [28:33:16<10:28:10, 13.81s/it] 73%|███████▎ | 7271/10000 [28:33:30<10:27:35, 13.80s/it] {'loss': 0.0047, 'learning_rate': 1.369e-05, 'epoch': 9.52} 73%|███████▎ | 7271/10000 [28:33:30<10:27:35, 13.80s/it] 73%|███████▎ | 7272/10000 [28:33:43<10:26:32, 13.78s/it] {'loss': 0.0045, 'learning_rate': 1.3685e-05, 'epoch': 9.52} 73%|███████▎ | 7272/10000 [28:33:43<10:26:32, 13.78s/it] 73%|███████▎ | 7273/10000 [28:33:57<10:27:46, 13.81s/it] {'loss': 0.0045, 'learning_rate': 1.3680000000000001e-05, 'epoch': 9.52} 73%|███████▎ | 7273/10000 [28:33:57<10:27:46, 13.81s/it] 73%|███████▎ | 7274/10000 [28:34:11<10:27:43, 13.82s/it] {'loss': 0.0056, 'learning_rate': 1.3675000000000002e-05, 'epoch': 9.52} 73%|███████▎ | 7274/10000 [28:34:11<10:27:43, 13.82s/it] 73%|███████▎ | 7275/10000 [28:34:25<10:26:09, 13.79s/it] {'loss': 0.005, 'learning_rate': 1.367e-05, 'epoch': 9.52} 73%|███████▎ | 7275/10000 [28:34:25<10:26:09, 13.79s/it] 73%|███████▎ | 7276/10000 [28:34:39<10:26:56, 13.81s/it] {'loss': 0.0076, 'learning_rate': 1.3665e-05, 'epoch': 9.52} 73%|███████▎ | 7276/10000 [28:34:39<10:26:56, 13.81s/it] 73%|███████▎ | 7277/10000 [28:34:52<10:26:32, 13.81s/it] {'loss': 0.0077, 'learning_rate': 1.3660000000000001e-05, 'epoch': 9.52} 73%|███████▎ | 7277/10000 [28:34:52<10:26:32, 13.81s/it] 73%|███████▎ | 7278/10000 [28:35:06<10:27:39, 13.84s/it] {'loss': 0.0062, 'learning_rate': 1.3655e-05, 'epoch': 9.53} 73%|███████▎ | 7278/10000 [28:35:06<10:27:39, 13.84s/it] 73%|███████▎ | 7279/10000 [28:35:20<10:29:26, 13.88s/it] {'loss': 0.0054, 'learning_rate': 1.3650000000000001e-05, 'epoch': 9.53} 73%|███████▎ | 7279/10000 [28:35:20<10:29:26, 13.88s/it] 73%|███████▎ | 7280/10000 [28:35:34<10:28:04, 13.85s/it] {'loss': 0.009, 'learning_rate': 1.3644999999999999e-05, 'epoch': 9.53} 73%|███████▎ | 7280/10000 [28:35:34<10:28:04, 13.85s/it] 73%|███████▎ | 7281/10000 [28:35:48<10:25:51, 13.81s/it] {'loss': 0.0042, 'learning_rate': 1.364e-05, 'epoch': 9.53} 73%|███████▎ | 7281/10000 [28:35:48<10:25:51, 13.81s/it] 73%|███████▎ | 7282/10000 [28:36:02<10:24:54, 13.80s/it] {'loss': 0.0039, 'learning_rate': 1.3635e-05, 'epoch': 9.53} 73%|███████▎ | 7282/10000 [28:36:02<10:24:54, 13.80s/it] 73%|███████▎ | 7283/10000 [28:36:15<10:26:53, 13.84s/it] {'loss': 0.0063, 'learning_rate': 1.3630000000000002e-05, 'epoch': 9.53} 73%|███████▎ | 7283/10000 [28:36:15<10:26:53, 13.84s/it] 73%|███████▎ | 7284/10000 [28:36:29<10:25:57, 13.83s/it] {'loss': 0.0048, 'learning_rate': 1.3625e-05, 'epoch': 9.53} 73%|███████▎ | 7284/10000 [28:36:29<10:25:57, 13.83s/it] 73%|███████▎ | 7285/10000 [28:36:43<10:24:25, 13.80s/it] {'loss': 0.0064, 'learning_rate': 1.362e-05, 'epoch': 9.54} 73%|███████▎ | 7285/10000 [28:36:43<10:24:25, 13.80s/it] 73%|███████▎ | 7286/10000 [28:36:57<10:25:13, 13.82s/it] {'loss': 0.0072, 'learning_rate': 1.3615e-05, 'epoch': 9.54} 73%|███████▎ | 7286/10000 [28:36:57<10:25:13, 13.82s/it] 73%|███████▎ | 7287/10000 [28:37:11<10:25:11, 13.83s/it] {'loss': 0.0046, 'learning_rate': 1.361e-05, 'epoch': 9.54} 73%|███████▎ | 7287/10000 [28:37:11<10:25:11, 13.83s/it] 73%|███████▎ | 7288/10000 [28:37:24<10:24:09, 13.81s/it] {'loss': 0.0069, 'learning_rate': 1.3605000000000001e-05, 'epoch': 9.54} 73%|███████▎ | 7288/10000 [28:37:24<10:24:09, 13.81s/it] 73%|███████▎ | 7289/10000 [28:37:38<10:24:20, 13.82s/it] {'loss': 0.0056, 'learning_rate': 1.3600000000000002e-05, 'epoch': 9.54} 73%|███████▎ | 7289/10000 [28:37:38<10:24:20, 13.82s/it] 73%|███████▎ | 7290/10000 [28:37:52<10:25:04, 13.84s/it] {'loss': 0.0049, 'learning_rate': 1.3595e-05, 'epoch': 9.54} 73%|███████▎ | 7290/10000 [28:37:52<10:25:04, 13.84s/it] 73%|███████▎ | 7291/10000 [28:38:06<10:24:31, 13.83s/it] {'loss': 0.0041, 'learning_rate': 1.359e-05, 'epoch': 9.54} 73%|███████▎ | 7291/10000 [28:38:06<10:24:31, 13.83s/it] 73%|███████▎ | 7292/10000 [28:38:20<10:26:59, 13.89s/it] {'loss': 0.0071, 'learning_rate': 1.3585000000000001e-05, 'epoch': 9.54} 73%|███████▎ | 7292/10000 [28:38:20<10:26:59, 13.89s/it] 73%|███████▎ | 7293/10000 [28:38:34<10:24:39, 13.85s/it] {'loss': 0.0046, 'learning_rate': 1.358e-05, 'epoch': 9.55} 73%|███████▎ | 7293/10000 [28:38:34<10:24:39, 13.85s/it] 73%|███████▎ | 7294/10000 [28:38:48<10:25:46, 13.88s/it] {'loss': 0.0049, 'learning_rate': 1.3575000000000001e-05, 'epoch': 9.55} 73%|███████▎ | 7294/10000 [28:38:48<10:25:46, 13.88s/it] 73%|███████▎ | 7295/10000 [28:39:01<10:23:12, 13.82s/it] {'loss': 0.0067, 'learning_rate': 1.3569999999999999e-05, 'epoch': 9.55} 73%|███████▎ | 7295/10000 [28:39:01<10:23:12, 13.82s/it] 73%|███████▎ | 7296/10000 [28:39:15<10:22:01, 13.80s/it] {'loss': 0.0071, 'learning_rate': 1.3565e-05, 'epoch': 9.55} 73%|███████▎ | 7296/10000 [28:39:15<10:22:01, 13.80s/it] 73%|███████▎ | 7297/10000 [28:39:29<10:22:04, 13.81s/it] {'loss': 0.0065, 'learning_rate': 1.356e-05, 'epoch': 9.55} 73%|███████▎ | 7297/10000 [28:39:29<10:22:04, 13.81s/it] 73%|███████▎ | 7298/10000 [28:39:43<10:21:18, 13.80s/it] {'loss': 0.0068, 'learning_rate': 1.3555000000000002e-05, 'epoch': 9.55} 73%|███████▎ | 7298/10000 [28:39:43<10:21:18, 13.80s/it] 73%|███████▎ | 7299/10000 [28:39:57<10:21:30, 13.81s/it] {'loss': 0.0045, 'learning_rate': 1.3550000000000002e-05, 'epoch': 9.55} 73%|███████▎ | 7299/10000 [28:39:57<10:21:30, 13.81s/it] 73%|███████▎ | 7300/10000 [28:40:10<10:21:13, 13.80s/it] {'loss': 0.0074, 'learning_rate': 1.3545e-05, 'epoch': 9.55} 73%|███████▎ | 7300/10000 [28:40:10<10:21:13, 13.80s/it] 73%|███████▎ | 7301/10000 [28:40:24<10:19:16, 13.77s/it] {'loss': 0.0055, 'learning_rate': 1.3539999999999999e-05, 'epoch': 9.56} 73%|███████▎ | 7301/10000 [28:40:24<10:19:16, 13.77s/it] 73%|███████▎ | 7302/10000 [28:40:38<10:21:12, 13.81s/it] {'loss': 0.0054, 'learning_rate': 1.3535e-05, 'epoch': 9.56} 73%|███████▎ | 7302/10000 [28:40:38<10:21:12, 13.81s/it] 73%|███████▎ | 7303/10000 [28:40:52<10:20:53, 13.81s/it] {'loss': 0.0095, 'learning_rate': 1.3530000000000001e-05, 'epoch': 9.56} 73%|███████▎ | 7303/10000 [28:40:52<10:20:53, 13.81s/it] 73%|███████▎ | 7304/10000 [28:41:06<10:20:51, 13.82s/it] {'loss': 0.006, 'learning_rate': 1.3525000000000002e-05, 'epoch': 9.56} 73%|███████▎ | 7304/10000 [28:41:06<10:20:51, 13.82s/it] 73%|███████▎ | 7305/10000 [28:41:20<10:21:23, 13.83s/it] {'loss': 0.005, 'learning_rate': 1.352e-05, 'epoch': 9.56} 73%|███████▎ | 7305/10000 [28:41:20<10:21:23, 13.83s/it] 73%|███████▎ | 7306/10000 [28:41:33<10:21:55, 13.85s/it] {'loss': 0.0073, 'learning_rate': 1.3515e-05, 'epoch': 9.56} 73%|███████▎ | 7306/10000 [28:41:33<10:21:55, 13.85s/it] 73%|███████▎ | 7307/10000 [28:41:47<10:21:35, 13.85s/it] {'loss': 0.0069, 'learning_rate': 1.3510000000000001e-05, 'epoch': 9.56} 73%|███████▎ | 7307/10000 [28:41:47<10:21:35, 13.85s/it] 73%|███████▎ | 7308/10000 [28:42:01<10:21:45, 13.86s/it] {'loss': 0.0063, 'learning_rate': 1.3505e-05, 'epoch': 9.57} 73%|███████▎ | 7308/10000 [28:42:01<10:21:45, 13.86s/it] 73%|███████▎ | 7309/10000 [28:42:15<10:21:20, 13.85s/it] {'loss': 0.0057, 'learning_rate': 1.3500000000000001e-05, 'epoch': 9.57} 73%|███████▎ | 7309/10000 [28:42:15<10:21:20, 13.85s/it] 73%|███████▎ | 7310/10000 [28:42:29<10:21:44, 13.87s/it] {'loss': 0.0057, 'learning_rate': 1.3494999999999999e-05, 'epoch': 9.57} 73%|███████▎ | 7310/10000 [28:42:29<10:21:44, 13.87s/it] 73%|███████▎ | 7311/10000 [28:42:43<10:21:33, 13.87s/it] {'loss': 0.0037, 'learning_rate': 1.349e-05, 'epoch': 9.57} 73%|███████▎ | 7311/10000 [28:42:43<10:21:33, 13.87s/it] 73%|███████▎ | 7312/10000 [28:42:57<10:21:38, 13.88s/it] {'loss': 0.0047, 'learning_rate': 1.3485e-05, 'epoch': 9.57} 73%|███████▎ | 7312/10000 [28:42:57<10:21:38, 13.88s/it] 73%|███████▎ | 7313/10000 [28:43:10<10:20:02, 13.85s/it] {'loss': 0.0052, 'learning_rate': 1.3480000000000001e-05, 'epoch': 9.57} 73%|███████▎ | 7313/10000 [28:43:10<10:20:02, 13.85s/it] 73%|███████▎ | 7314/10000 [28:43:24<10:20:17, 13.86s/it] {'loss': 0.0045, 'learning_rate': 1.3475000000000002e-05, 'epoch': 9.57} 73%|███████▎ | 7314/10000 [28:43:24<10:20:17, 13.86s/it] 73%|███████▎ | 7315/10000 [28:43:38<10:19:54, 13.85s/it] {'loss': 0.0066, 'learning_rate': 1.347e-05, 'epoch': 9.57} 73%|███████▎ | 7315/10000 [28:43:38<10:19:54, 13.85s/it] 73%|███████▎ | 7316/10000 [28:43:52<10:20:05, 13.86s/it] {'loss': 0.0054, 'learning_rate': 1.3465e-05, 'epoch': 9.58} 73%|███████▎ | 7316/10000 [28:43:52<10:20:05, 13.86s/it] 73%|███████▎ | 7317/10000 [28:44:06<10:18:00, 13.82s/it] {'loss': 0.0063, 'learning_rate': 1.346e-05, 'epoch': 9.58} 73%|███████▎ | 7317/10000 [28:44:06<10:18:00, 13.82s/it] 73%|███████▎ | 7318/10000 [28:44:20<10:18:04, 13.83s/it] {'loss': 0.007, 'learning_rate': 1.3455e-05, 'epoch': 9.58} 73%|███████▎ | 7318/10000 [28:44:20<10:18:04, 13.83s/it] 73%|███████▎ | 7319/10000 [28:44:33<10:18:05, 13.83s/it] {'loss': 0.0062, 'learning_rate': 1.3450000000000002e-05, 'epoch': 9.58} 73%|███████▎ | 7319/10000 [28:44:33<10:18:05, 13.83s/it] 73%|███████▎ | 7320/10000 [28:44:47<10:19:51, 13.88s/it] {'loss': 0.0063, 'learning_rate': 1.3445e-05, 'epoch': 9.58} 73%|███████▎ | 7320/10000 [28:44:47<10:19:51, 13.88s/it] 73%|███████▎ | 7321/10000 [28:45:01<10:21:21, 13.92s/it] {'loss': 0.0055, 'learning_rate': 1.344e-05, 'epoch': 9.58} 73%|███████▎ | 7321/10000 [28:45:01<10:21:21, 13.92s/it] 73%|███████▎ | 7322/10000 [28:45:15<10:18:37, 13.86s/it] {'loss': 0.0063, 'learning_rate': 1.3435000000000001e-05, 'epoch': 9.58} 73%|███████▎ | 7322/10000 [28:45:15<10:18:37, 13.86s/it] 73%|███████▎ | 7323/10000 [28:45:29<10:17:57, 13.85s/it] {'loss': 0.0052, 'learning_rate': 1.343e-05, 'epoch': 9.59} 73%|███████▎ | 7323/10000 [28:45:29<10:17:57, 13.85s/it] 73%|███████▎ | 7324/10000 [28:45:43<10:19:57, 13.90s/it] {'loss': 0.0041, 'learning_rate': 1.3425000000000001e-05, 'epoch': 9.59} 73%|███████▎ | 7324/10000 [28:45:43<10:19:57, 13.90s/it] 73%|███████▎ | 7325/10000 [28:45:57<10:20:16, 13.91s/it] {'loss': 0.0039, 'learning_rate': 1.3420000000000002e-05, 'epoch': 9.59} 73%|███████▎ | 7325/10000 [28:45:57<10:20:16, 13.91s/it] 73%|███████▎ | 7326/10000 [28:46:11<10:18:44, 13.88s/it] {'loss': 0.0049, 'learning_rate': 1.3415e-05, 'epoch': 9.59} 73%|███████▎ | 7326/10000 [28:46:11<10:18:44, 13.88s/it] 73%|███████▎ | 7327/10000 [28:46:25<10:19:32, 13.91s/it] {'loss': 0.0056, 'learning_rate': 1.341e-05, 'epoch': 9.59} 73%|███████▎ | 7327/10000 [28:46:25<10:19:32, 13.91s/it] 73%|███████▎ | 7328/10000 [28:46:39<10:19:35, 13.91s/it] {'loss': 0.0044, 'learning_rate': 1.3405000000000001e-05, 'epoch': 9.59} 73%|███████▎ | 7328/10000 [28:46:39<10:19:35, 13.91s/it] 73%|███████▎ | 7329/10000 [28:46:53<10:18:37, 13.90s/it] {'loss': 0.0059, 'learning_rate': 1.3400000000000002e-05, 'epoch': 9.59} 73%|███████▎ | 7329/10000 [28:46:53<10:18:37, 13.90s/it] 73%|███████▎ | 7330/10000 [28:47:06<10:17:47, 13.88s/it] {'loss': 0.0065, 'learning_rate': 1.3395000000000001e-05, 'epoch': 9.59} 73%|███████▎ | 7330/10000 [28:47:06<10:17:47, 13.88s/it] 73%|███████▎ | 7331/10000 [28:47:20<10:18:49, 13.91s/it] {'loss': 0.0047, 'learning_rate': 1.339e-05, 'epoch': 9.6} 73%|███████▎ | 7331/10000 [28:47:20<10:18:49, 13.91s/it] 73%|███████▎ | 7332/10000 [28:47:34<10:18:34, 13.91s/it] {'loss': 0.0041, 'learning_rate': 1.3385e-05, 'epoch': 9.6} 73%|███████▎ | 7332/10000 [28:47:34<10:18:34, 13.91s/it] 73%|███████▎ | 7333/10000 [28:47:48<10:15:43, 13.85s/it] {'loss': 0.0075, 'learning_rate': 1.338e-05, 'epoch': 9.6} 73%|███████▎ | 7333/10000 [28:47:48<10:15:43, 13.85s/it] 73%|███████▎ | 7334/10000 [28:48:02<10:13:02, 13.80s/it] {'loss': 0.0083, 'learning_rate': 1.3375000000000002e-05, 'epoch': 9.6} 73%|███████▎ | 7334/10000 [28:48:02<10:13:02, 13.80s/it] 73%|███████▎ | 7335/10000 [28:48:16<10:14:36, 13.84s/it] {'loss': 0.0058, 'learning_rate': 1.3370000000000002e-05, 'epoch': 9.6} 73%|███████▎ | 7335/10000 [28:48:16<10:14:36, 13.84s/it] 73%|███████▎ | 7336/10000 [28:48:29<10:13:20, 13.81s/it] {'loss': 0.0061, 'learning_rate': 1.3365e-05, 'epoch': 9.6} 73%|███████▎ | 7336/10000 [28:48:29<10:13:20, 13.81s/it] 73%|███████▎ | 7337/10000 [28:48:43<10:14:53, 13.85s/it] {'loss': 0.0041, 'learning_rate': 1.336e-05, 'epoch': 9.6} 73%|███████▎ | 7337/10000 [28:48:43<10:14:53, 13.85s/it] 73%|███████▎ | 7338/10000 [28:48:57<10:13:38, 13.83s/it] {'loss': 0.005, 'learning_rate': 1.3355e-05, 'epoch': 9.6} 73%|███████▎ | 7338/10000 [28:48:57<10:13:38, 13.83s/it] 73%|███████▎ | 7339/10000 [28:49:11<10:14:44, 13.86s/it] {'loss': 0.0059, 'learning_rate': 1.3350000000000001e-05, 'epoch': 9.61} 73%|███████▎ | 7339/10000 [28:49:11<10:14:44, 13.86s/it] 73%|███████▎ | 7340/10000 [28:49:25<10:13:19, 13.83s/it] {'loss': 0.0047, 'learning_rate': 1.3345000000000002e-05, 'epoch': 9.61} 73%|███████▎ | 7340/10000 [28:49:25<10:13:19, 13.83s/it] 73%|███████▎ | 7341/10000 [28:49:39<10:14:01, 13.86s/it] {'loss': 0.0063, 'learning_rate': 1.334e-05, 'epoch': 9.61} 73%|███████▎ | 7341/10000 [28:49:39<10:14:01, 13.86s/it] 73%|███████▎ | 7342/10000 [28:49:53<10:14:45, 13.88s/it] {'loss': 0.0036, 'learning_rate': 1.3335e-05, 'epoch': 9.61} 73%|███████▎ | 7342/10000 [28:49:53<10:14:45, 13.88s/it] 73%|███████▎ | 7343/10000 [28:50:06<10:11:38, 13.81s/it] {'loss': 0.0056, 'learning_rate': 1.3330000000000001e-05, 'epoch': 9.61} 73%|███████▎ | 7343/10000 [28:50:06<10:11:38, 13.81s/it] 73%|███████▎ | 7344/10000 [28:50:20<10:12:21, 13.83s/it] {'loss': 0.0068, 'learning_rate': 1.3325000000000002e-05, 'epoch': 9.61} 73%|███████▎ | 7344/10000 [28:50:20<10:12:21, 13.83s/it] 73%|███████▎ | 7345/10000 [28:50:34<10:13:55, 13.87s/it] {'loss': 0.006, 'learning_rate': 1.3320000000000001e-05, 'epoch': 9.61} 73%|███████▎ | 7345/10000 [28:50:34<10:13:55, 13.87s/it] 73%|███████▎ | 7346/10000 [28:50:48<10:13:41, 13.87s/it] {'loss': 0.0068, 'learning_rate': 1.3315e-05, 'epoch': 9.62} 73%|███████▎ | 7346/10000 [28:50:48<10:13:41, 13.87s/it] 73%|███████▎ | 7347/10000 [28:51:02<10:11:03, 13.82s/it] {'loss': 0.0036, 'learning_rate': 1.331e-05, 'epoch': 9.62} 73%|███████▎ | 7347/10000 [28:51:02<10:11:03, 13.82s/it] 73%|███████▎ | 7348/10000 [28:51:15<10:11:10, 13.83s/it] {'loss': 0.0063, 'learning_rate': 1.3305e-05, 'epoch': 9.62} 73%|███████▎ | 7348/10000 [28:51:16<10:11:10, 13.83s/it] 73%|███████▎ | 7349/10000 [28:51:29<10:10:53, 13.83s/it] {'loss': 0.007, 'learning_rate': 1.3300000000000001e-05, 'epoch': 9.62} 73%|███████▎ | 7349/10000 [28:51:29<10:10:53, 13.83s/it] 74%|███████▎ | 7350/10000 [28:51:43<10:11:22, 13.84s/it] {'loss': 0.0042, 'learning_rate': 1.3295000000000002e-05, 'epoch': 9.62} 74%|███████▎ | 7350/10000 [28:51:43<10:11:22, 13.84s/it] 74%|███████▎ | 7351/10000 [28:51:57<10:11:35, 13.85s/it] {'loss': 0.0062, 'learning_rate': 1.329e-05, 'epoch': 9.62} 74%|███████▎ | 7351/10000 [28:51:57<10:11:35, 13.85s/it] 74%|███████▎ | 7352/10000 [28:52:11<10:10:01, 13.82s/it] {'loss': 0.0059, 'learning_rate': 1.3285e-05, 'epoch': 9.62} 74%|███████▎ | 7352/10000 [28:52:11<10:10:01, 13.82s/it] 74%|███████▎ | 7353/10000 [28:52:25<10:11:21, 13.86s/it] {'loss': 0.0061, 'learning_rate': 1.3280000000000002e-05, 'epoch': 9.62} 74%|███████▎ | 7353/10000 [28:52:25<10:11:21, 13.86s/it] 74%|███████▎ | 7354/10000 [28:52:39<10:11:44, 13.87s/it] {'loss': 0.0045, 'learning_rate': 1.3275e-05, 'epoch': 9.63} 74%|███████▎ | 7354/10000 [28:52:39<10:11:44, 13.87s/it] 74%|███████▎ | 7355/10000 [28:52:53<10:11:08, 13.86s/it] {'loss': 0.0057, 'learning_rate': 1.3270000000000002e-05, 'epoch': 9.63} 74%|███████▎ | 7355/10000 [28:52:53<10:11:08, 13.86s/it] 74%|███████▎ | 7356/10000 [28:53:06<10:09:03, 13.82s/it] {'loss': 0.005, 'learning_rate': 1.3265e-05, 'epoch': 9.63} 74%|███████▎ | 7356/10000 [28:53:06<10:09:03, 13.82s/it] 74%|███████▎ | 7357/10000 [28:53:20<10:08:42, 13.82s/it] {'loss': 0.0058, 'learning_rate': 1.326e-05, 'epoch': 9.63} 74%|███████▎ | 7357/10000 [28:53:20<10:08:42, 13.82s/it] 74%|███████▎ | 7358/10000 [28:53:34<10:07:40, 13.80s/it] {'loss': 0.0044, 'learning_rate': 1.3255000000000001e-05, 'epoch': 9.63} 74%|███████▎ | 7358/10000 [28:53:34<10:07:40, 13.80s/it] 74%|███████▎ | 7359/10000 [28:53:48<10:07:23, 13.80s/it] {'loss': 0.0041, 'learning_rate': 1.3250000000000002e-05, 'epoch': 9.63} 74%|███████▎ | 7359/10000 [28:53:48<10:07:23, 13.80s/it] 74%|███████▎ | 7360/10000 [28:54:02<10:09:33, 13.85s/it] {'loss': 0.0075, 'learning_rate': 1.3245000000000001e-05, 'epoch': 9.63} 74%|███████▎ | 7360/10000 [28:54:02<10:09:33, 13.85s/it] 74%|███████▎ | 7361/10000 [28:54:15<10:07:58, 13.82s/it] {'loss': 0.0042, 'learning_rate': 1.324e-05, 'epoch': 9.63} 74%|███████▎ | 7361/10000 [28:54:15<10:07:58, 13.82s/it] 74%|███████▎ | 7362/10000 [28:54:29<10:06:41, 13.80s/it] {'loss': 0.0054, 'learning_rate': 1.3235e-05, 'epoch': 9.64} 74%|███████▎ | 7362/10000 [28:54:29<10:06:41, 13.80s/it] 74%|███████▎ | 7363/10000 [28:54:43<10:06:37, 13.80s/it] {'loss': 0.0046, 'learning_rate': 1.323e-05, 'epoch': 9.64} 74%|███████▎ | 7363/10000 [28:54:43<10:06:37, 13.80s/it] 74%|███████▎ | 7364/10000 [28:54:57<10:06:24, 13.80s/it] {'loss': 0.0061, 'learning_rate': 1.3225000000000001e-05, 'epoch': 9.64} 74%|███████▎ | 7364/10000 [28:54:57<10:06:24, 13.80s/it] 74%|███████▎ | 7365/10000 [28:55:10<10:05:49, 13.79s/it] {'loss': 0.007, 'learning_rate': 1.3220000000000002e-05, 'epoch': 9.64} 74%|███████▎ | 7365/10000 [28:55:10<10:05:49, 13.79s/it] 74%|███████▎ | 7366/10000 [28:55:24<10:06:47, 13.82s/it] {'loss': 0.0054, 'learning_rate': 1.3215e-05, 'epoch': 9.64} 74%|███████▎ | 7366/10000 [28:55:24<10:06:47, 13.82s/it] 74%|███████▎ | 7367/10000 [28:55:38<10:09:08, 13.88s/it] {'loss': 0.0073, 'learning_rate': 1.321e-05, 'epoch': 9.64} 74%|███████▎ | 7367/10000 [28:55:38<10:09:08, 13.88s/it] 74%|███████▎ | 7368/10000 [28:55:52<10:07:50, 13.86s/it] {'loss': 0.0051, 'learning_rate': 1.3205000000000001e-05, 'epoch': 9.64} 74%|███████▎ | 7368/10000 [28:55:52<10:07:50, 13.86s/it] 74%|███████▎ | 7369/10000 [28:56:06<10:06:18, 13.83s/it] {'loss': 0.0038, 'learning_rate': 1.32e-05, 'epoch': 9.65} 74%|███████▎ | 7369/10000 [28:56:06<10:06:18, 13.83s/it] 74%|███████▎ | 7370/10000 [28:56:20<10:04:06, 13.78s/it] {'loss': 0.0063, 'learning_rate': 1.3195000000000002e-05, 'epoch': 9.65} 74%|███████▎ | 7370/10000 [28:56:20<10:04:06, 13.78s/it] 74%|███████▎ | 7371/10000 [28:56:34<10:05:34, 13.82s/it] {'loss': 0.0047, 'learning_rate': 1.3189999999999999e-05, 'epoch': 9.65} 74%|███████▎ | 7371/10000 [28:56:34<10:05:34, 13.82s/it] 74%|███████▎ | 7372/10000 [28:56:47<10:04:35, 13.80s/it] {'loss': 0.0032, 'learning_rate': 1.3185e-05, 'epoch': 9.65} 74%|███████▎ | 7372/10000 [28:56:47<10:04:35, 13.80s/it] 74%|███████▎ | 7373/10000 [28:57:01<10:06:15, 13.85s/it] {'loss': 0.0046, 'learning_rate': 1.3180000000000001e-05, 'epoch': 9.65} 74%|███████▎ | 7373/10000 [28:57:01<10:06:15, 13.85s/it] 74%|███████▎ | 7374/10000 [28:57:15<10:06:24, 13.86s/it] {'loss': 0.0104, 'learning_rate': 1.3175000000000002e-05, 'epoch': 9.65} 74%|███████▎ | 7374/10000 [28:57:15<10:06:24, 13.86s/it] 74%|███████▍ | 7375/10000 [28:57:29<10:05:51, 13.85s/it] {'loss': 0.0053, 'learning_rate': 1.3170000000000001e-05, 'epoch': 9.65} 74%|███████▍ | 7375/10000 [28:57:29<10:05:51, 13.85s/it] 74%|███████▍ | 7376/10000 [28:57:43<10:06:25, 13.87s/it] {'loss': 0.0047, 'learning_rate': 1.3165e-05, 'epoch': 9.65} 74%|███████▍ | 7376/10000 [28:57:43<10:06:25, 13.87s/it] 74%|███████▍ | 7377/10000 [28:57:57<10:06:22, 13.87s/it] {'loss': 0.0049, 'learning_rate': 1.316e-05, 'epoch': 9.66} 74%|███████▍ | 7377/10000 [28:57:57<10:06:22, 13.87s/it] 74%|███████▍ | 7378/10000 [28:58:11<10:05:24, 13.85s/it] {'loss': 0.0073, 'learning_rate': 1.3155e-05, 'epoch': 9.66} 74%|███████▍ | 7378/10000 [28:58:11<10:05:24, 13.85s/it] 74%|███████▍ | 7379/10000 [28:58:24<10:05:25, 13.86s/it] {'loss': 0.0047, 'learning_rate': 1.3150000000000001e-05, 'epoch': 9.66} 74%|███████▍ | 7379/10000 [28:58:24<10:05:25, 13.86s/it] 74%|███████▍ | 7380/10000 [28:58:38<10:06:26, 13.89s/it] {'loss': 0.0057, 'learning_rate': 1.3145000000000002e-05, 'epoch': 9.66} 74%|███████▍ | 7380/10000 [28:58:38<10:06:26, 13.89s/it] 74%|███████▍ | 7381/10000 [28:58:52<10:04:57, 13.86s/it] {'loss': 0.0048, 'learning_rate': 1.314e-05, 'epoch': 9.66} 74%|███████▍ | 7381/10000 [28:58:52<10:04:57, 13.86s/it] 74%|███████▍ | 7382/10000 [28:59:06<10:05:21, 13.87s/it] {'loss': 0.004, 'learning_rate': 1.3135e-05, 'epoch': 9.66} 74%|███████▍ | 7382/10000 [28:59:06<10:05:21, 13.87s/it] 74%|███████▍ | 7383/10000 [28:59:20<10:05:11, 13.88s/it] {'loss': 0.0062, 'learning_rate': 1.3130000000000001e-05, 'epoch': 9.66} 74%|███████▍ | 7383/10000 [28:59:20<10:05:11, 13.88s/it] 74%|███████▍ | 7384/10000 [28:59:34<10:06:03, 13.90s/it] {'loss': 0.0058, 'learning_rate': 1.3125e-05, 'epoch': 9.66} 74%|███████▍ | 7384/10000 [28:59:34<10:06:03, 13.90s/it] 74%|███████▍ | 7385/10000 [28:59:48<10:04:57, 13.88s/it] {'loss': 0.0073, 'learning_rate': 1.3120000000000001e-05, 'epoch': 9.67} 74%|███████▍ | 7385/10000 [28:59:48<10:04:57, 13.88s/it] 74%|███████▍ | 7386/10000 [29:00:02<10:04:26, 13.87s/it] {'loss': 0.006, 'learning_rate': 1.3114999999999999e-05, 'epoch': 9.67} 74%|███████▍ | 7386/10000 [29:00:02<10:04:26, 13.87s/it] 74%|███████▍ | 7387/10000 [29:00:16<10:05:04, 13.89s/it] {'loss': 0.0062, 'learning_rate': 1.311e-05, 'epoch': 9.67} 74%|███████▍ | 7387/10000 [29:00:16<10:05:04, 13.89s/it] 74%|███████▍ | 7388/10000 [29:00:29<10:04:28, 13.89s/it] {'loss': 0.007, 'learning_rate': 1.3105e-05, 'epoch': 9.67} 74%|███████▍ | 7388/10000 [29:00:29<10:04:28, 13.89s/it] 74%|███████▍ | 7389/10000 [29:00:43<10:03:43, 13.87s/it] {'loss': 0.0042, 'learning_rate': 1.3100000000000002e-05, 'epoch': 9.67} 74%|███████▍ | 7389/10000 [29:00:43<10:03:43, 13.87s/it] 74%|███████▍ | 7390/10000 [29:00:57<10:04:22, 13.89s/it] {'loss': 0.004, 'learning_rate': 1.3095000000000003e-05, 'epoch': 9.67} 74%|███████▍ | 7390/10000 [29:00:57<10:04:22, 13.89s/it] 74%|███████▍ | 7391/10000 [29:01:11<10:01:56, 13.84s/it] {'loss': 0.0068, 'learning_rate': 1.309e-05, 'epoch': 9.67} 74%|███████▍ | 7391/10000 [29:01:11<10:01:56, 13.84s/it] 74%|███████▍ | 7392/10000 [29:01:25<10:01:27, 13.84s/it] {'loss': 0.0047, 'learning_rate': 1.3085e-05, 'epoch': 9.68} 74%|███████▍ | 7392/10000 [29:01:25<10:01:27, 13.84s/it] 74%|███████▍ | 7393/10000 [29:01:39<10:01:41, 13.85s/it] {'loss': 0.0047, 'learning_rate': 1.308e-05, 'epoch': 9.68} 74%|███████▍ | 7393/10000 [29:01:39<10:01:41, 13.85s/it] 74%|███████▍ | 7394/10000 [29:01:53<10:02:19, 13.87s/it] {'loss': 0.0064, 'learning_rate': 1.3075000000000001e-05, 'epoch': 9.68} 74%|███████▍ | 7394/10000 [29:01:53<10:02:19, 13.87s/it] 74%|███████▍ | 7395/10000 [29:02:06<10:03:13, 13.89s/it] {'loss': 0.0041, 'learning_rate': 1.3070000000000002e-05, 'epoch': 9.68} 74%|███████▍ | 7395/10000 [29:02:07<10:03:13, 13.89s/it] 74%|███████▍ | 7396/10000 [29:02:20<10:03:56, 13.92s/it] {'loss': 0.005, 'learning_rate': 1.3065e-05, 'epoch': 9.68} 74%|███████▍ | 7396/10000 [29:02:20<10:03:56, 13.92s/it] 74%|███████▍ | 7397/10000 [29:02:34<10:02:40, 13.89s/it] {'loss': 0.0031, 'learning_rate': 1.306e-05, 'epoch': 9.68} 74%|███████▍ | 7397/10000 [29:02:34<10:02:40, 13.89s/it] 74%|███████▍ | 7398/10000 [29:02:48<10:02:28, 13.89s/it] {'loss': 0.0045, 'learning_rate': 1.3055000000000001e-05, 'epoch': 9.68} 74%|███████▍ | 7398/10000 [29:02:48<10:02:28, 13.89s/it] 74%|███████▍ | 7399/10000 [29:03:02<10:03:00, 13.91s/it] {'loss': 0.0067, 'learning_rate': 1.305e-05, 'epoch': 9.68} 74%|███████▍ | 7399/10000 [29:03:02<10:03:00, 13.91s/it] 74%|███████▍ | 7400/10000 [29:03:16<10:02:45, 13.91s/it] {'loss': 0.0055, 'learning_rate': 1.3045000000000001e-05, 'epoch': 9.69} 74%|███████▍ | 7400/10000 [29:03:16<10:02:45, 13.91s/it] 74%|███████▍ | 7401/10000 [29:03:30<10:04:03, 13.95s/it] {'loss': 0.0058, 'learning_rate': 1.3039999999999999e-05, 'epoch': 9.69} 74%|███████▍ | 7401/10000 [29:03:30<10:04:03, 13.95s/it] 74%|███████▍ | 7402/10000 [29:03:44<10:03:41, 13.94s/it] {'loss': 0.0043, 'learning_rate': 1.3035e-05, 'epoch': 9.69} 74%|███████▍ | 7402/10000 [29:03:44<10:03:41, 13.94s/it] 74%|███████▍ | 7403/10000 [29:03:58<10:03:20, 13.94s/it] {'loss': 0.0062, 'learning_rate': 1.303e-05, 'epoch': 9.69} 74%|███████▍ | 7403/10000 [29:03:58<10:03:20, 13.94s/it] 74%|███████▍ | 7404/10000 [29:04:12<10:02:26, 13.92s/it] {'loss': 0.0043, 'learning_rate': 1.3025000000000002e-05, 'epoch': 9.69} 74%|███████▍ | 7404/10000 [29:04:12<10:02:26, 13.92s/it] 74%|███████▍ | 7405/10000 [29:04:26<10:01:04, 13.90s/it] {'loss': 0.0057, 'learning_rate': 1.3020000000000002e-05, 'epoch': 9.69} 74%|███████▍ | 7405/10000 [29:04:26<10:01:04, 13.90s/it] 74%|███████▍ | 7406/10000 [29:04:40<10:02:19, 13.93s/it] {'loss': 0.0039, 'learning_rate': 1.3015e-05, 'epoch': 9.69} 74%|███████▍ | 7406/10000 [29:04:40<10:02:19, 13.93s/it] 74%|███████▍ | 7407/10000 [29:04:54<10:01:49, 13.93s/it] {'loss': 0.0037, 'learning_rate': 1.301e-05, 'epoch': 9.7} 74%|███████▍ | 7407/10000 [29:04:54<10:01:49, 13.93s/it] 74%|███████▍ | 7408/10000 [29:05:08<10:02:40, 13.95s/it] {'loss': 0.005, 'learning_rate': 1.3005e-05, 'epoch': 9.7} 74%|███████▍ | 7408/10000 [29:05:08<10:02:40, 13.95s/it] 74%|███████▍ | 7409/10000 [29:05:21<10:01:49, 13.94s/it] {'loss': 0.0078, 'learning_rate': 1.3000000000000001e-05, 'epoch': 9.7} 74%|███████▍ | 7409/10000 [29:05:22<10:01:49, 13.94s/it] 74%|███████▍ | 7410/10000 [29:05:35<10:01:00, 13.92s/it] {'loss': 0.0052, 'learning_rate': 1.2995000000000002e-05, 'epoch': 9.7} 74%|███████▍ | 7410/10000 [29:05:35<10:01:00, 13.92s/it] 74%|███████▍ | 7411/10000 [29:05:49<9:59:55, 13.90s/it] {'loss': 0.0054, 'learning_rate': 1.299e-05, 'epoch': 9.7} 74%|███████▍ | 7411/10000 [29:05:49<9:59:55, 13.90s/it] 74%|███████▍ | 7412/10000 [29:06:03<9:59:01, 13.89s/it] {'loss': 0.0039, 'learning_rate': 1.2985e-05, 'epoch': 9.7} 74%|███████▍ | 7412/10000 [29:06:03<9:59:01, 13.89s/it] 74%|███████▍ | 7413/10000 [29:06:17<9:59:09, 13.90s/it] {'loss': 0.0061, 'learning_rate': 1.2980000000000001e-05, 'epoch': 9.7} 74%|███████▍ | 7413/10000 [29:06:17<9:59:09, 13.90s/it] 74%|███████▍ | 7414/10000 [29:06:31<9:59:46, 13.92s/it] {'loss': 0.0051, 'learning_rate': 1.2975e-05, 'epoch': 9.7} 74%|███████▍ | 7414/10000 [29:06:31<9:59:46, 13.92s/it] 74%|███████▍ | 7415/10000 [29:06:45<9:58:10, 13.88s/it] {'loss': 0.0059, 'learning_rate': 1.2970000000000001e-05, 'epoch': 9.71} 74%|███████▍ | 7415/10000 [29:06:45<9:58:10, 13.88s/it] 74%|███████▍ | 7416/10000 [29:06:59<9:59:45, 13.93s/it] {'loss': 0.0045, 'learning_rate': 1.2964999999999999e-05, 'epoch': 9.71} 74%|███████▍ | 7416/10000 [29:06:59<9:59:45, 13.93s/it] 74%|███████▍ | 7417/10000 [29:07:13<9:59:43, 13.93s/it] {'loss': 0.006, 'learning_rate': 1.296e-05, 'epoch': 9.71} 74%|███████▍ | 7417/10000 [29:07:13<9:59:43, 13.93s/it] 74%|███████▍ | 7418/10000 [29:07:27<9:58:20, 13.90s/it] {'loss': 0.0047, 'learning_rate': 1.2955e-05, 'epoch': 9.71} 74%|███████▍ | 7418/10000 [29:07:27<9:58:20, 13.90s/it] 74%|███████▍ | 7419/10000 [29:07:40<9:57:45, 13.90s/it] {'loss': 0.0043, 'learning_rate': 1.2950000000000001e-05, 'epoch': 9.71} 74%|███████▍ | 7419/10000 [29:07:41<9:57:45, 13.90s/it] 74%|███████▍ | 7420/10000 [29:07:54<9:58:02, 13.91s/it] {'loss': 0.0054, 'learning_rate': 1.2945000000000002e-05, 'epoch': 9.71} 74%|███████▍ | 7420/10000 [29:07:54<9:58:02, 13.91s/it] 74%|███████▍ | 7421/10000 [29:08:08<9:57:35, 13.90s/it] {'loss': 0.0053, 'learning_rate': 1.294e-05, 'epoch': 9.71} 74%|███████▍ | 7421/10000 [29:08:08<9:57:35, 13.90s/it] 74%|███████▍ | 7422/10000 [29:08:22<9:57:11, 13.90s/it] {'loss': 0.0054, 'learning_rate': 1.2935e-05, 'epoch': 9.71} 74%|███████▍ | 7422/10000 [29:08:22<9:57:11, 13.90s/it] 74%|███████▍ | 7423/10000 [29:08:36<9:57:03, 13.90s/it] {'loss': 0.0051, 'learning_rate': 1.293e-05, 'epoch': 9.72} 74%|███████▍ | 7423/10000 [29:08:36<9:57:03, 13.90s/it] 74%|███████▍ | 7424/10000 [29:08:50<9:56:47, 13.90s/it] {'loss': 0.0092, 'learning_rate': 1.2925e-05, 'epoch': 9.72} 74%|███████▍ | 7424/10000 [29:08:50<9:56:47, 13.90s/it] 74%|███████▍ | 7425/10000 [29:09:04<9:56:09, 13.89s/it] {'loss': 0.0056, 'learning_rate': 1.2920000000000002e-05, 'epoch': 9.72} 74%|███████▍ | 7425/10000 [29:09:04<9:56:09, 13.89s/it] 74%|███████▍ | 7426/10000 [29:09:18<9:57:44, 13.93s/it] {'loss': 0.0043, 'learning_rate': 1.2915e-05, 'epoch': 9.72} 74%|███████▍ | 7426/10000 [29:09:18<9:57:44, 13.93s/it] 74%|███████▍ | 7427/10000 [29:09:32<9:58:09, 13.95s/it] {'loss': 0.0054, 'learning_rate': 1.291e-05, 'epoch': 9.72} 74%|███████▍ | 7427/10000 [29:09:32<9:58:09, 13.95s/it] 74%|███████▍ | 7428/10000 [29:09:46<9:58:48, 13.97s/it] {'loss': 0.0076, 'learning_rate': 1.2905000000000001e-05, 'epoch': 9.72} 74%|███████▍ | 7428/10000 [29:09:46<9:58:48, 13.97s/it] 74%|███████▍ | 7429/10000 [29:10:00<9:55:55, 13.91s/it] {'loss': 0.006, 'learning_rate': 1.29e-05, 'epoch': 9.72} 74%|███████▍ | 7429/10000 [29:10:00<9:55:55, 13.91s/it] 74%|███████▍ | 7430/10000 [29:10:14<9:55:29, 13.90s/it] {'loss': 0.0042, 'learning_rate': 1.2895000000000001e-05, 'epoch': 9.73} 74%|███████▍ | 7430/10000 [29:10:14<9:55:29, 13.90s/it] 74%|███████▍ | 7431/10000 [29:10:27<9:55:23, 13.91s/it] {'loss': 0.0053, 'learning_rate': 1.2889999999999999e-05, 'epoch': 9.73} 74%|███████▍ | 7431/10000 [29:10:28<9:55:23, 13.91s/it] 74%|███████▍ | 7432/10000 [29:10:41<9:56:28, 13.94s/it] {'loss': 0.0061, 'learning_rate': 1.2885e-05, 'epoch': 9.73} 74%|███████▍ | 7432/10000 [29:10:42<9:56:28, 13.94s/it] 74%|███████▍ | 7433/10000 [29:10:55<9:56:57, 13.95s/it] {'loss': 0.0056, 'learning_rate': 1.288e-05, 'epoch': 9.73} 74%|███████▍ | 7433/10000 [29:10:56<9:56:57, 13.95s/it] 74%|███████▍ | 7434/10000 [29:11:09<9:54:23, 13.90s/it] {'loss': 0.006, 'learning_rate': 1.2875000000000001e-05, 'epoch': 9.73} 74%|███████▍ | 7434/10000 [29:11:09<9:54:23, 13.90s/it] 74%|███████▍ | 7435/10000 [29:11:23<9:55:15, 13.92s/it] {'loss': 0.0041, 'learning_rate': 1.2870000000000002e-05, 'epoch': 9.73} 74%|███████▍ | 7435/10000 [29:11:23<9:55:15, 13.92s/it] 74%|███████▍ | 7436/10000 [29:11:37<9:55:41, 13.94s/it] {'loss': 0.0058, 'learning_rate': 1.2865e-05, 'epoch': 9.73} 74%|███████▍ | 7436/10000 [29:11:37<9:55:41, 13.94s/it] 74%|███████▍ | 7437/10000 [29:11:51<9:55:33, 13.94s/it] {'loss': 0.0056, 'learning_rate': 1.286e-05, 'epoch': 9.73} 74%|███████▍ | 7437/10000 [29:11:51<9:55:33, 13.94s/it] 74%|███████▍ | 7438/10000 [29:12:05<9:55:51, 13.95s/it] {'loss': 0.0099, 'learning_rate': 1.2855e-05, 'epoch': 9.74} 74%|███████▍ | 7438/10000 [29:12:05<9:55:51, 13.95s/it] 74%|███████▍ | 7439/10000 [29:12:19<9:54:56, 13.94s/it] {'loss': 0.005, 'learning_rate': 1.285e-05, 'epoch': 9.74} 74%|███████▍ | 7439/10000 [29:12:19<9:54:56, 13.94s/it] 74%|███████▍ | 7440/10000 [29:12:33<9:54:18, 13.93s/it] {'loss': 0.0059, 'learning_rate': 1.2845000000000002e-05, 'epoch': 9.74} 74%|███████▍ | 7440/10000 [29:12:33<9:54:18, 13.93s/it] 74%|███████▍ | 7441/10000 [29:12:47<9:54:38, 13.94s/it] {'loss': 0.0043, 'learning_rate': 1.2839999999999999e-05, 'epoch': 9.74} 74%|███████▍ | 7441/10000 [29:12:47<9:54:38, 13.94s/it] 74%|███████▍ | 7442/10000 [29:13:01<9:54:40, 13.95s/it] {'loss': 0.0058, 'learning_rate': 1.2835e-05, 'epoch': 9.74} 74%|███████▍ | 7442/10000 [29:13:01<9:54:40, 13.95s/it] 74%|███████▍ | 7443/10000 [29:13:15<9:54:05, 13.94s/it] {'loss': 0.0071, 'learning_rate': 1.283e-05, 'epoch': 9.74} 74%|███████▍ | 7443/10000 [29:13:15<9:54:05, 13.94s/it] 74%|███████▍ | 7444/10000 [29:13:29<9:54:15, 13.95s/it] {'loss': 0.0052, 'learning_rate': 1.2825000000000002e-05, 'epoch': 9.74} 74%|███████▍ | 7444/10000 [29:13:29<9:54:15, 13.95s/it] 74%|███████▍ | 7445/10000 [29:13:43<9:54:56, 13.97s/it] {'loss': 0.0051, 'learning_rate': 1.2820000000000001e-05, 'epoch': 9.74} 74%|███████▍ | 7445/10000 [29:13:43<9:54:56, 13.97s/it] 74%|███████▍ | 7446/10000 [29:13:57<9:53:06, 13.93s/it] {'loss': 0.0058, 'learning_rate': 1.2814999999999998e-05, 'epoch': 9.75} 74%|███████▍ | 7446/10000 [29:13:57<9:53:06, 13.93s/it] 74%|███████▍ | 7447/10000 [29:14:11<9:52:05, 13.92s/it] {'loss': 0.0059, 'learning_rate': 1.281e-05, 'epoch': 9.75} 74%|███████▍ | 7447/10000 [29:14:11<9:52:05, 13.92s/it] 74%|███████▍ | 7448/10000 [29:14:24<9:51:18, 13.90s/it] {'loss': 0.0052, 'learning_rate': 1.2805e-05, 'epoch': 9.75} 74%|███████▍ | 7448/10000 [29:14:24<9:51:18, 13.90s/it] 74%|███████▍ | 7449/10000 [29:14:38<9:50:56, 13.90s/it] {'loss': 0.0064, 'learning_rate': 1.2800000000000001e-05, 'epoch': 9.75} 74%|███████▍ | 7449/10000 [29:14:38<9:50:56, 13.90s/it] 74%|███████▍ | 7450/10000 [29:14:52<9:51:26, 13.92s/it] {'loss': 0.0056, 'learning_rate': 1.2795000000000002e-05, 'epoch': 9.75} 74%|███████▍ | 7450/10000 [29:14:52<9:51:26, 13.92s/it] 75%|███████▍ | 7451/10000 [29:15:06<9:52:40, 13.95s/it] {'loss': 0.0039, 'learning_rate': 1.2790000000000001e-05, 'epoch': 9.75} 75%|███████▍ | 7451/10000 [29:15:06<9:52:40, 13.95s/it] 75%|███████▍ | 7452/10000 [29:15:20<9:52:55, 13.96s/it] {'loss': 0.0062, 'learning_rate': 1.2785e-05, 'epoch': 9.75} 75%|███████▍ | 7452/10000 [29:15:20<9:52:55, 13.96s/it] 75%|███████▍ | 7453/10000 [29:15:34<9:52:17, 13.95s/it] {'loss': 0.005, 'learning_rate': 1.278e-05, 'epoch': 9.76} 75%|███████▍ | 7453/10000 [29:15:34<9:52:17, 13.95s/it] 75%|███████▍ | 7454/10000 [29:15:48<9:52:10, 13.96s/it] {'loss': 0.0056, 'learning_rate': 1.2775e-05, 'epoch': 9.76} 75%|███████▍ | 7454/10000 [29:15:48<9:52:10, 13.96s/it] 75%|███████▍ | 7455/10000 [29:16:02<9:50:46, 13.93s/it] {'loss': 0.0039, 'learning_rate': 1.2770000000000001e-05, 'epoch': 9.76} 75%|███████▍ | 7455/10000 [29:16:02<9:50:46, 13.93s/it] 75%|███████▍ | 7456/10000 [29:16:16<9:52:00, 13.96s/it] {'loss': 0.0052, 'learning_rate': 1.2765000000000002e-05, 'epoch': 9.76} 75%|███████▍ | 7456/10000 [29:16:16<9:52:00, 13.96s/it] 75%|███████▍ | 7457/10000 [29:16:30<9:50:01, 13.92s/it] {'loss': 0.0074, 'learning_rate': 1.276e-05, 'epoch': 9.76} 75%|███████▍ | 7457/10000 [29:16:30<9:50:01, 13.92s/it] 75%|███████▍ | 7458/10000 [29:16:44<9:50:49, 13.95s/it] {'loss': 0.0045, 'learning_rate': 1.2755e-05, 'epoch': 9.76} 75%|███████▍ | 7458/10000 [29:16:44<9:50:49, 13.95s/it] 75%|███████▍ | 7459/10000 [29:16:58<9:50:44, 13.95s/it] {'loss': 0.0057, 'learning_rate': 1.2750000000000002e-05, 'epoch': 9.76} 75%|███████▍ | 7459/10000 [29:16:58<9:50:44, 13.95s/it] 75%|███████▍ | 7460/10000 [29:17:12<9:49:56, 13.94s/it] {'loss': 0.0042, 'learning_rate': 1.2745e-05, 'epoch': 9.76} 75%|███████▍ | 7460/10000 [29:17:12<9:49:56, 13.94s/it] 75%|███████▍ | 7461/10000 [29:17:26<9:50:25, 13.95s/it] {'loss': 0.0058, 'learning_rate': 1.2740000000000002e-05, 'epoch': 9.77} 75%|███████▍ | 7461/10000 [29:17:26<9:50:25, 13.95s/it] 75%|███████▍ | 7462/10000 [29:17:40<9:48:56, 13.92s/it] {'loss': 0.0054, 'learning_rate': 1.2735e-05, 'epoch': 9.77} 75%|███████▍ | 7462/10000 [29:17:40<9:48:56, 13.92s/it] 75%|███████▍ | 7463/10000 [29:17:54<9:48:47, 13.92s/it] {'loss': 0.0052, 'learning_rate': 1.273e-05, 'epoch': 9.77} 75%|███████▍ | 7463/10000 [29:17:54<9:48:47, 13.92s/it] 75%|███████▍ | 7464/10000 [29:18:08<9:49:39, 13.95s/it] {'loss': 0.006, 'learning_rate': 1.2725000000000001e-05, 'epoch': 9.77} 75%|███████▍ | 7464/10000 [29:18:08<9:49:39, 13.95s/it] 75%|███████▍ | 7465/10000 [29:18:21<9:47:52, 13.91s/it] {'loss': 0.0055, 'learning_rate': 1.2720000000000002e-05, 'epoch': 9.77} 75%|███████▍ | 7465/10000 [29:18:21<9:47:52, 13.91s/it] 75%|███████▍ | 7466/10000 [29:18:35<9:49:44, 13.96s/it] {'loss': 0.0056, 'learning_rate': 1.2715000000000001e-05, 'epoch': 9.77} 75%|███████▍ | 7466/10000 [29:18:35<9:49:44, 13.96s/it] 75%|███████▍ | 7467/10000 [29:18:49<9:49:58, 13.98s/it] {'loss': 0.0074, 'learning_rate': 1.271e-05, 'epoch': 9.77} 75%|███████▍ | 7467/10000 [29:18:49<9:49:58, 13.98s/it] 75%|███████▍ | 7468/10000 [29:19:03<9:48:12, 13.94s/it] {'loss': 0.0059, 'learning_rate': 1.2705e-05, 'epoch': 9.77} 75%|███████▍ | 7468/10000 [29:19:03<9:48:12, 13.94s/it] 75%|███████▍ | 7469/10000 [29:19:17<9:47:58, 13.94s/it] {'loss': 0.0086, 'learning_rate': 1.27e-05, 'epoch': 9.78} 75%|███████▍ | 7469/10000 [29:19:17<9:47:58, 13.94s/it] 75%|███████▍ | 7470/10000 [29:19:31<9:46:53, 13.92s/it] {'loss': 0.0054, 'learning_rate': 1.2695000000000001e-05, 'epoch': 9.78} 75%|███████▍ | 7470/10000 [29:19:31<9:46:53, 13.92s/it] 75%|███████▍ | 7471/10000 [29:19:45<9:45:21, 13.89s/it] {'loss': 0.0047, 'learning_rate': 1.2690000000000002e-05, 'epoch': 9.78} 75%|███████▍ | 7471/10000 [29:19:45<9:45:21, 13.89s/it] 75%|███████▍ | 7472/10000 [29:19:59<9:44:14, 13.87s/it] {'loss': 0.0049, 'learning_rate': 1.2685e-05, 'epoch': 9.78} 75%|███████▍ | 7472/10000 [29:19:59<9:44:14, 13.87s/it] 75%|███████▍ | 7473/10000 [29:20:13<9:45:04, 13.89s/it] {'loss': 0.0068, 'learning_rate': 1.268e-05, 'epoch': 9.78} 75%|███████▍ | 7473/10000 [29:20:13<9:45:04, 13.89s/it] 75%|███████▍ | 7474/10000 [29:20:27<9:45:55, 13.92s/it] {'loss': 0.0077, 'learning_rate': 1.2675000000000001e-05, 'epoch': 9.78} 75%|███████▍ | 7474/10000 [29:20:27<9:45:55, 13.92s/it] 75%|███████▍ | 7475/10000 [29:20:41<9:46:34, 13.94s/it] {'loss': 0.0054, 'learning_rate': 1.267e-05, 'epoch': 9.78} 75%|███████▍ | 7475/10000 [29:20:41<9:46:34, 13.94s/it] 75%|███████▍ | 7476/10000 [29:20:54<9:45:05, 13.91s/it] {'loss': 0.007, 'learning_rate': 1.2665000000000002e-05, 'epoch': 9.79} 75%|███████▍ | 7476/10000 [29:20:55<9:45:05, 13.91s/it] 75%|███████▍ | 7477/10000 [29:21:08<9:45:12, 13.92s/it] {'loss': 0.0048, 'learning_rate': 1.2659999999999999e-05, 'epoch': 9.79} 75%|███████▍ | 7477/10000 [29:21:08<9:45:12, 13.92s/it] 75%|███████▍ | 7478/10000 [29:21:22<9:44:54, 13.92s/it] {'loss': 0.0063, 'learning_rate': 1.2655e-05, 'epoch': 9.79} 75%|███████▍ | 7478/10000 [29:21:22<9:44:54, 13.92s/it] 75%|███████▍ | 7479/10000 [29:21:36<9:44:44, 13.92s/it] {'loss': 0.0054, 'learning_rate': 1.2650000000000001e-05, 'epoch': 9.79} 75%|███████▍ | 7479/10000 [29:21:36<9:44:44, 13.92s/it] 75%|███████▍ | 7480/10000 [29:21:50<9:44:15, 13.91s/it] {'loss': 0.0037, 'learning_rate': 1.2645000000000002e-05, 'epoch': 9.79} 75%|███████▍ | 7480/10000 [29:21:50<9:44:15, 13.91s/it] 75%|███████▍ | 7481/10000 [29:22:04<9:43:40, 13.90s/it] {'loss': 0.0052, 'learning_rate': 1.2640000000000003e-05, 'epoch': 9.79} 75%|███████▍ | 7481/10000 [29:22:04<9:43:40, 13.90s/it] 75%|███████▍ | 7482/10000 [29:22:18<9:42:18, 13.88s/it] {'loss': 0.0045, 'learning_rate': 1.2635e-05, 'epoch': 9.79} 75%|███████▍ | 7482/10000 [29:22:18<9:42:18, 13.88s/it] 75%|███████▍ | 7483/10000 [29:22:32<9:41:22, 13.86s/it] {'loss': 0.0076, 'learning_rate': 1.263e-05, 'epoch': 9.79} 75%|███████▍ | 7483/10000 [29:22:32<9:41:22, 13.86s/it] 75%|███████▍ | 7484/10000 [29:22:46<9:42:09, 13.88s/it] {'loss': 0.0053, 'learning_rate': 1.2625e-05, 'epoch': 9.8} 75%|███████▍ | 7484/10000 [29:22:46<9:42:09, 13.88s/it] 75%|███████▍ | 7485/10000 [29:22:59<9:41:13, 13.87s/it] {'loss': 0.007, 'learning_rate': 1.2620000000000001e-05, 'epoch': 9.8} 75%|███████▍ | 7485/10000 [29:22:59<9:41:13, 13.87s/it] 75%|███████▍ | 7486/10000 [29:23:13<9:42:41, 13.91s/it] {'loss': 0.0051, 'learning_rate': 1.2615000000000002e-05, 'epoch': 9.8} 75%|███████▍ | 7486/10000 [29:23:13<9:42:41, 13.91s/it] 75%|███████▍ | 7487/10000 [29:23:27<9:42:30, 13.91s/it] {'loss': 0.0054, 'learning_rate': 1.261e-05, 'epoch': 9.8} 75%|███████▍ | 7487/10000 [29:23:27<9:42:30, 13.91s/it] 75%|███████▍ | 7488/10000 [29:23:41<9:40:44, 13.87s/it] {'loss': 0.0055, 'learning_rate': 1.2605e-05, 'epoch': 9.8} 75%|███████▍ | 7488/10000 [29:23:41<9:40:44, 13.87s/it] 75%|███████▍ | 7489/10000 [29:23:55<9:42:40, 13.92s/it] {'loss': 0.0037, 'learning_rate': 1.2600000000000001e-05, 'epoch': 9.8} 75%|███████▍ | 7489/10000 [29:23:55<9:42:40, 13.92s/it] 75%|███████▍ | 7490/10000 [29:24:09<9:43:30, 13.95s/it] {'loss': 0.0066, 'learning_rate': 1.2595e-05, 'epoch': 9.8} 75%|███████▍ | 7490/10000 [29:24:09<9:43:30, 13.95s/it] 75%|███████▍ | 7491/10000 [29:24:23<9:44:01, 13.97s/it] {'loss': 0.0051, 'learning_rate': 1.2590000000000001e-05, 'epoch': 9.8} 75%|███████▍ | 7491/10000 [29:24:23<9:44:01, 13.97s/it] 75%|███████▍ | 7492/10000 [29:24:37<9:42:25, 13.93s/it] {'loss': 0.0045, 'learning_rate': 1.2584999999999999e-05, 'epoch': 9.81} 75%|███████▍ | 7492/10000 [29:24:37<9:42:25, 13.93s/it] 75%|███████▍ | 7493/10000 [29:24:51<9:40:26, 13.89s/it] {'loss': 0.0058, 'learning_rate': 1.258e-05, 'epoch': 9.81} 75%|███████▍ | 7493/10000 [29:24:51<9:40:26, 13.89s/it] 75%|███████▍ | 7494/10000 [29:25:05<9:41:29, 13.92s/it] {'loss': 0.0046, 'learning_rate': 1.2575e-05, 'epoch': 9.81} 75%|███████▍ | 7494/10000 [29:25:05<9:41:29, 13.92s/it] 75%|███████▍ | 7495/10000 [29:25:19<9:42:10, 13.94s/it] {'loss': 0.0052, 'learning_rate': 1.2570000000000002e-05, 'epoch': 9.81} 75%|███████▍ | 7495/10000 [29:25:19<9:42:10, 13.94s/it] 75%|███████▍ | 7496/10000 [29:25:33<9:41:16, 13.93s/it] {'loss': 0.005, 'learning_rate': 1.2565000000000003e-05, 'epoch': 9.81} 75%|███████▍ | 7496/10000 [29:25:33<9:41:16, 13.93s/it] 75%|███████▍ | 7497/10000 [29:25:47<9:41:38, 13.94s/it] {'loss': 0.0042, 'learning_rate': 1.256e-05, 'epoch': 9.81} 75%|███████▍ | 7497/10000 [29:25:47<9:41:38, 13.94s/it] 75%|███████▍ | 7498/10000 [29:26:01<9:42:30, 13.97s/it] {'loss': 0.0053, 'learning_rate': 1.2555000000000001e-05, 'epoch': 9.81} 75%|███████▍ | 7498/10000 [29:26:01<9:42:30, 13.97s/it] 75%|███████▍ | 7499/10000 [29:26:15<9:41:52, 13.96s/it] {'loss': 0.0049, 'learning_rate': 1.255e-05, 'epoch': 9.82} 75%|███████▍ | 7499/10000 [29:26:15<9:41:52, 13.96s/it] 75%|███████▌ | 7500/10000 [29:26:29<9:42:21, 13.98s/it] {'loss': 0.0058, 'learning_rate': 1.2545000000000001e-05, 'epoch': 9.82} 75%|███████▌ | 7500/10000 [29:26:29<9:42:21, 13.98s/it] 75%|███████▌ | 7501/10000 [29:26:43<9:40:27, 13.94s/it] {'loss': 0.0058, 'learning_rate': 1.2540000000000002e-05, 'epoch': 9.82} 75%|███████▌ | 7501/10000 [29:26:43<9:40:27, 13.94s/it] 75%|███████▌ | 7502/10000 [29:26:57<9:40:55, 13.95s/it] {'loss': 0.0053, 'learning_rate': 1.2535e-05, 'epoch': 9.82} 75%|███████▌ | 7502/10000 [29:26:57<9:40:55, 13.95s/it] 75%|███████▌ | 7503/10000 [29:27:10<9:40:40, 13.95s/it] {'loss': 0.0074, 'learning_rate': 1.253e-05, 'epoch': 9.82} 75%|███████▌ | 7503/10000 [29:27:11<9:40:40, 13.95s/it] 75%|███████▌ | 7504/10000 [29:27:24<9:41:00, 13.97s/it] {'loss': 0.0046, 'learning_rate': 1.2525000000000001e-05, 'epoch': 9.82} 75%|███████▌ | 7504/10000 [29:27:25<9:41:00, 13.97s/it] 75%|███████▌ | 7505/10000 [29:27:38<9:40:55, 13.97s/it] {'loss': 0.0064, 'learning_rate': 1.252e-05, 'epoch': 9.82} 75%|███████▌ | 7505/10000 [29:27:39<9:40:55, 13.97s/it] 75%|███████▌ | 7506/10000 [29:27:52<9:39:12, 13.93s/it] {'loss': 0.0056, 'learning_rate': 1.2515000000000001e-05, 'epoch': 9.82} 75%|███████▌ | 7506/10000 [29:27:52<9:39:12, 13.93s/it] 75%|███████▌ | 7507/10000 [29:28:06<9:39:29, 13.95s/it] {'loss': 0.0048, 'learning_rate': 1.2509999999999999e-05, 'epoch': 9.83} 75%|███████▌ | 7507/10000 [29:28:06<9:39:29, 13.95s/it] 75%|███████▌ | 7508/10000 [29:28:20<9:40:04, 13.97s/it] {'loss': 0.0067, 'learning_rate': 1.2505e-05, 'epoch': 9.83} 75%|███████▌ | 7508/10000 [29:28:20<9:40:04, 13.97s/it] 75%|███████▌ | 7509/10000 [29:28:34<9:39:27, 13.96s/it] {'loss': 0.0051, 'learning_rate': 1.25e-05, 'epoch': 9.83} 75%|███████▌ | 7509/10000 [29:28:34<9:39:27, 13.96s/it] 75%|███████▌ | 7510/10000 [29:28:48<9:38:13, 13.93s/it] {'loss': 0.005, 'learning_rate': 1.2495000000000001e-05, 'epoch': 9.83} 75%|███████▌ | 7510/10000 [29:28:48<9:38:13, 13.93s/it] 75%|███████▌ | 7511/10000 [29:29:02<9:37:49, 13.93s/it] {'loss': 0.0064, 'learning_rate': 1.249e-05, 'epoch': 9.83} 75%|███████▌ | 7511/10000 [29:29:02<9:37:49, 13.93s/it] 75%|███████▌ | 7512/10000 [29:29:16<9:36:47, 13.91s/it] {'loss': 0.0046, 'learning_rate': 1.2485000000000002e-05, 'epoch': 9.83} 75%|███████▌ | 7512/10000 [29:29:16<9:36:47, 13.91s/it] 75%|███████▌ | 7513/10000 [29:29:30<9:36:31, 13.91s/it] {'loss': 0.0056, 'learning_rate': 1.248e-05, 'epoch': 9.83} 75%|███████▌ | 7513/10000 [29:29:30<9:36:31, 13.91s/it] 75%|███████▌ | 7514/10000 [29:29:44<9:35:30, 13.89s/it] {'loss': 0.0044, 'learning_rate': 1.2475e-05, 'epoch': 9.84} 75%|███████▌ | 7514/10000 [29:29:44<9:35:30, 13.89s/it] 75%|███████▌ | 7515/10000 [29:29:57<9:34:04, 13.86s/it] {'loss': 0.0078, 'learning_rate': 1.2470000000000001e-05, 'epoch': 9.84} 75%|███████▌ | 7515/10000 [29:29:57<9:34:04, 13.86s/it] 75%|███████▌ | 7516/10000 [29:30:11<9:35:34, 13.90s/it] {'loss': 0.0049, 'learning_rate': 1.2465e-05, 'epoch': 9.84} 75%|███████▌ | 7516/10000 [29:30:11<9:35:34, 13.90s/it] 75%|███████▌ | 7517/10000 [29:30:25<9:35:14, 13.90s/it] {'loss': 0.0058, 'learning_rate': 1.2460000000000001e-05, 'epoch': 9.84} 75%|███████▌ | 7517/10000 [29:30:25<9:35:14, 13.90s/it] 75%|███████▌ | 7518/10000 [29:30:39<9:36:08, 13.93s/it] {'loss': 0.0054, 'learning_rate': 1.2455e-05, 'epoch': 9.84} 75%|███████▌ | 7518/10000 [29:30:39<9:36:08, 13.93s/it] 75%|███████▌ | 7519/10000 [29:30:53<9:35:50, 13.93s/it] {'loss': 0.009, 'learning_rate': 1.2450000000000001e-05, 'epoch': 9.84} 75%|███████▌ | 7519/10000 [29:30:53<9:35:50, 13.93s/it] 75%|███████▌ | 7520/10000 [29:31:07<9:34:26, 13.90s/it] {'loss': 0.0078, 'learning_rate': 1.2445e-05, 'epoch': 9.84} 75%|███████▌ | 7520/10000 [29:31:07<9:34:26, 13.90s/it] 75%|███████▌ | 7521/10000 [29:31:21<9:34:35, 13.91s/it] {'loss': 0.0049, 'learning_rate': 1.244e-05, 'epoch': 9.84} 75%|███████▌ | 7521/10000 [29:31:21<9:34:35, 13.91s/it] 75%|███████▌ | 7522/10000 [29:31:35<9:36:20, 13.96s/it] {'loss': 0.0067, 'learning_rate': 1.2435e-05, 'epoch': 9.85} 75%|███████▌ | 7522/10000 [29:31:35<9:36:20, 13.96s/it] 75%|███████▌ | 7523/10000 [29:31:49<9:37:10, 13.98s/it] {'loss': 0.006, 'learning_rate': 1.243e-05, 'epoch': 9.85} 75%|███████▌ | 7523/10000 [29:31:49<9:37:10, 13.98s/it] 75%|███████▌ | 7524/10000 [29:32:03<9:35:49, 13.95s/it] {'loss': 0.0058, 'learning_rate': 1.2425e-05, 'epoch': 9.85} 75%|███████▌ | 7524/10000 [29:32:03<9:35:49, 13.95s/it] 75%|███████▌ | 7525/10000 [29:32:17<9:35:26, 13.95s/it] {'loss': 0.0041, 'learning_rate': 1.2420000000000001e-05, 'epoch': 9.85} 75%|███████▌ | 7525/10000 [29:32:17<9:35:26, 13.95s/it] 75%|███████▌ | 7526/10000 [29:32:31<9:35:22, 13.95s/it] {'loss': 0.0061, 'learning_rate': 1.2415e-05, 'epoch': 9.85} 75%|███████▌ | 7526/10000 [29:32:31<9:35:22, 13.95s/it] 75%|███████▌ | 7527/10000 [29:32:45<9:33:59, 13.93s/it] {'loss': 0.0048, 'learning_rate': 1.2410000000000001e-05, 'epoch': 9.85} 75%|███████▌ | 7527/10000 [29:32:45<9:33:59, 13.93s/it] 75%|███████▌ | 7528/10000 [29:32:59<9:34:40, 13.95s/it] {'loss': 0.0057, 'learning_rate': 1.2405e-05, 'epoch': 9.85} 75%|███████▌ | 7528/10000 [29:32:59<9:34:40, 13.95s/it] 75%|███████▌ | 7529/10000 [29:33:13<9:33:22, 13.92s/it] {'loss': 0.0051, 'learning_rate': 1.24e-05, 'epoch': 9.85} 75%|███████▌ | 7529/10000 [29:33:13<9:33:22, 13.92s/it] 75%|███████▌ | 7530/10000 [29:33:27<9:33:45, 13.94s/it] {'loss': 0.0057, 'learning_rate': 1.2395e-05, 'epoch': 9.86} 75%|███████▌ | 7530/10000 [29:33:27<9:33:45, 13.94s/it] 75%|███████▌ | 7531/10000 [29:33:41<9:33:35, 13.94s/it] {'loss': 0.0064, 'learning_rate': 1.239e-05, 'epoch': 9.86} 75%|███████▌ | 7531/10000 [29:33:41<9:33:35, 13.94s/it] 75%|███████▌ | 7532/10000 [29:33:54<9:32:49, 13.93s/it] {'loss': 0.0051, 'learning_rate': 1.2385000000000001e-05, 'epoch': 9.86} 75%|███████▌ | 7532/10000 [29:33:54<9:32:49, 13.93s/it] 75%|███████▌ | 7533/10000 [29:34:08<9:32:47, 13.93s/it] {'loss': 0.0047, 'learning_rate': 1.238e-05, 'epoch': 9.86} 75%|███████▌ | 7533/10000 [29:34:08<9:32:47, 13.93s/it] 75%|███████▌ | 7534/10000 [29:34:22<9:31:11, 13.90s/it] {'loss': 0.0051, 'learning_rate': 1.2375000000000001e-05, 'epoch': 9.86} 75%|███████▌ | 7534/10000 [29:34:22<9:31:11, 13.90s/it] 75%|███████▌ | 7535/10000 [29:34:36<9:31:05, 13.90s/it] {'loss': 0.0056, 'learning_rate': 1.2370000000000002e-05, 'epoch': 9.86} 75%|███████▌ | 7535/10000 [29:34:36<9:31:05, 13.90s/it] 75%|███████▌ | 7536/10000 [29:34:50<9:32:22, 13.94s/it] {'loss': 0.0049, 'learning_rate': 1.2365e-05, 'epoch': 9.86} 75%|███████▌ | 7536/10000 [29:34:50<9:32:22, 13.94s/it] 75%|███████▌ | 7537/10000 [29:35:04<9:30:54, 13.91s/it] {'loss': 0.0053, 'learning_rate': 1.236e-05, 'epoch': 9.87} 75%|███████▌ | 7537/10000 [29:35:04<9:30:54, 13.91s/it] 75%|███████▌ | 7538/10000 [29:35:18<9:30:57, 13.91s/it] {'loss': 0.0079, 'learning_rate': 1.2355e-05, 'epoch': 9.87} 75%|███████▌ | 7538/10000 [29:35:18<9:30:57, 13.91s/it] 75%|███████▌ | 7539/10000 [29:35:32<9:30:20, 13.91s/it] {'loss': 0.0068, 'learning_rate': 1.235e-05, 'epoch': 9.87} 75%|███████▌ | 7539/10000 [29:35:32<9:30:20, 13.91s/it] 75%|███████▌ | 7540/10000 [29:35:46<9:31:11, 13.93s/it] {'loss': 0.0058, 'learning_rate': 1.2345000000000001e-05, 'epoch': 9.87} 75%|███████▌ | 7540/10000 [29:35:46<9:31:11, 13.93s/it] 75%|███████▌ | 7541/10000 [29:36:00<9:33:19, 13.99s/it] {'loss': 0.0056, 'learning_rate': 1.234e-05, 'epoch': 9.87} 75%|███████▌ | 7541/10000 [29:36:00<9:33:19, 13.99s/it] 75%|███████▌ | 7542/10000 [29:36:14<9:31:44, 13.96s/it] {'loss': 0.0065, 'learning_rate': 1.2335000000000001e-05, 'epoch': 9.87} 75%|███████▌ | 7542/10000 [29:36:14<9:31:44, 13.96s/it] 75%|███████▌ | 7543/10000 [29:36:28<9:32:22, 13.98s/it] {'loss': 0.0042, 'learning_rate': 1.233e-05, 'epoch': 9.87} 75%|███████▌ | 7543/10000 [29:36:28<9:32:22, 13.98s/it] 75%|███████▌ | 7544/10000 [29:36:42<9:31:39, 13.97s/it] {'loss': 0.0079, 'learning_rate': 1.2325e-05, 'epoch': 9.87} 75%|███████▌ | 7544/10000 [29:36:42<9:31:39, 13.97s/it] 75%|███████▌ | 7545/10000 [29:36:56<9:29:32, 13.92s/it] {'loss': 0.0053, 'learning_rate': 1.232e-05, 'epoch': 9.88} 75%|███████▌ | 7545/10000 [29:36:56<9:29:32, 13.92s/it] 75%|███████▌ | 7546/10000 [29:37:09<9:29:40, 13.93s/it] {'loss': 0.0056, 'learning_rate': 1.2315e-05, 'epoch': 9.88} 75%|███████▌ | 7546/10000 [29:37:10<9:29:40, 13.93s/it] 75%|███████▌ | 7547/10000 [29:37:24<9:30:33, 13.96s/it] {'loss': 0.0038, 'learning_rate': 1.231e-05, 'epoch': 9.88} 75%|███████▌ | 7547/10000 [29:37:24<9:30:33, 13.96s/it] 75%|███████▌ | 7548/10000 [29:37:37<9:30:04, 13.95s/it] {'loss': 0.0059, 'learning_rate': 1.2305000000000002e-05, 'epoch': 9.88} 75%|███████▌ | 7548/10000 [29:37:37<9:30:04, 13.95s/it] 75%|███████▌ | 7549/10000 [29:37:51<9:28:26, 13.92s/it] {'loss': 0.0049, 'learning_rate': 1.23e-05, 'epoch': 9.88} 75%|███████▌ | 7549/10000 [29:37:51<9:28:26, 13.92s/it] 76%|███████▌ | 7550/10000 [29:38:05<9:28:31, 13.92s/it] {'loss': 0.0051, 'learning_rate': 1.2295000000000002e-05, 'epoch': 9.88} 76%|███████▌ | 7550/10000 [29:38:05<9:28:31, 13.92s/it] 76%|███████▌ | 7551/10000 [29:38:19<9:28:23, 13.93s/it] {'loss': 0.0058, 'learning_rate': 1.2290000000000001e-05, 'epoch': 9.88} 76%|███████▌ | 7551/10000 [29:38:19<9:28:23, 13.93s/it] 76%|███████▌ | 7552/10000 [29:38:33<9:27:04, 13.90s/it] {'loss': 0.0073, 'learning_rate': 1.2285e-05, 'epoch': 9.88} 76%|███████▌ | 7552/10000 [29:38:33<9:27:04, 13.90s/it] 76%|███████▌ | 7553/10000 [29:38:47<9:27:26, 13.91s/it] {'loss': 0.005, 'learning_rate': 1.2280000000000001e-05, 'epoch': 9.89} 76%|███████▌ | 7553/10000 [29:38:47<9:27:26, 13.91s/it] 76%|███████▌ | 7554/10000 [29:39:01<9:27:31, 13.92s/it] {'loss': 0.0049, 'learning_rate': 1.2275e-05, 'epoch': 9.89} 76%|███████▌ | 7554/10000 [29:39:01<9:27:31, 13.92s/it] 76%|███████▌ | 7555/10000 [29:39:15<9:26:15, 13.90s/it] {'loss': 0.0057, 'learning_rate': 1.2270000000000001e-05, 'epoch': 9.89} 76%|███████▌ | 7555/10000 [29:39:15<9:26:15, 13.90s/it] 76%|███████▌ | 7556/10000 [29:39:29<9:27:23, 13.93s/it] {'loss': 0.0049, 'learning_rate': 1.2265e-05, 'epoch': 9.89} 76%|███████▌ | 7556/10000 [29:39:29<9:27:23, 13.93s/it] 76%|███████▌ | 7557/10000 [29:39:43<9:28:04, 13.95s/it] {'loss': 0.0064, 'learning_rate': 1.2260000000000001e-05, 'epoch': 9.89} 76%|███████▌ | 7557/10000 [29:39:43<9:28:04, 13.95s/it] 76%|███████▌ | 7558/10000 [29:39:57<9:26:37, 13.92s/it] {'loss': 0.0055, 'learning_rate': 1.2255e-05, 'epoch': 9.89} 76%|███████▌ | 7558/10000 [29:39:57<9:26:37, 13.92s/it] 76%|███████▌ | 7559/10000 [29:40:11<9:28:13, 13.97s/it] {'loss': 0.0054, 'learning_rate': 1.225e-05, 'epoch': 9.89} 76%|███████▌ | 7559/10000 [29:40:11<9:28:13, 13.97s/it] 76%|███████▌ | 7560/10000 [29:40:25<9:28:14, 13.97s/it] {'loss': 0.0055, 'learning_rate': 1.2245e-05, 'epoch': 9.9} 76%|███████▌ | 7560/10000 [29:40:25<9:28:14, 13.97s/it] 76%|███████▌ | 7561/10000 [29:40:39<9:26:49, 13.94s/it] {'loss': 0.0063, 'learning_rate': 1.224e-05, 'epoch': 9.9} 76%|███████▌ | 7561/10000 [29:40:39<9:26:49, 13.94s/it] 76%|███████▌ | 7562/10000 [29:40:52<9:26:03, 13.93s/it] {'loss': 0.0067, 'learning_rate': 1.2235e-05, 'epoch': 9.9} 76%|███████▌ | 7562/10000 [29:40:52<9:26:03, 13.93s/it] 76%|███████▌ | 7563/10000 [29:41:06<9:25:12, 13.92s/it] {'loss': 0.0047, 'learning_rate': 1.2230000000000001e-05, 'epoch': 9.9} 76%|███████▌ | 7563/10000 [29:41:06<9:25:12, 13.92s/it] 76%|███████▌ | 7564/10000 [29:41:20<9:26:08, 13.94s/it] {'loss': 0.0057, 'learning_rate': 1.2225e-05, 'epoch': 9.9} 76%|███████▌ | 7564/10000 [29:41:20<9:26:08, 13.94s/it] 76%|███████▌ | 7565/10000 [29:41:34<9:26:52, 13.97s/it] {'loss': 0.0045, 'learning_rate': 1.2220000000000002e-05, 'epoch': 9.9} 76%|███████▌ | 7565/10000 [29:41:34<9:26:52, 13.97s/it] 76%|███████▌ | 7566/10000 [29:41:48<9:25:46, 13.95s/it] {'loss': 0.0059, 'learning_rate': 1.2215e-05, 'epoch': 9.9} 76%|███████▌ | 7566/10000 [29:41:48<9:25:46, 13.95s/it] 76%|███████▌ | 7567/10000 [29:42:02<9:27:09, 13.99s/it] {'loss': 0.0053, 'learning_rate': 1.221e-05, 'epoch': 9.9} 76%|███████▌ | 7567/10000 [29:42:02<9:27:09, 13.99s/it] 76%|███████▌ | 7568/10000 [29:42:16<9:26:36, 13.98s/it] {'loss': 0.0047, 'learning_rate': 1.2205000000000001e-05, 'epoch': 9.91} 76%|███████▌ | 7568/10000 [29:42:16<9:26:36, 13.98s/it] 76%|███████▌ | 7569/10000 [29:42:30<9:25:59, 13.97s/it] {'loss': 0.0055, 'learning_rate': 1.22e-05, 'epoch': 9.91} 76%|███████▌ | 7569/10000 [29:42:30<9:25:59, 13.97s/it] 76%|███████▌ | 7570/10000 [29:42:44<9:23:35, 13.92s/it] {'loss': 0.0048, 'learning_rate': 1.2195000000000001e-05, 'epoch': 9.91} 76%|███████▌ | 7570/10000 [29:42:44<9:23:35, 13.92s/it] 76%|███████▌ | 7571/10000 [29:42:58<9:23:16, 13.91s/it] {'loss': 0.0057, 'learning_rate': 1.219e-05, 'epoch': 9.91} 76%|███████▌ | 7571/10000 [29:42:58<9:23:16, 13.91s/it] 76%|███████▌ | 7572/10000 [29:43:12<9:23:21, 13.92s/it] {'loss': 0.0047, 'learning_rate': 1.2185000000000001e-05, 'epoch': 9.91} 76%|███████▌ | 7572/10000 [29:43:12<9:23:21, 13.92s/it] 76%|███████▌ | 7573/10000 [29:43:26<9:24:40, 13.96s/it] {'loss': 0.0054, 'learning_rate': 1.2180000000000002e-05, 'epoch': 9.91} 76%|███████▌ | 7573/10000 [29:43:26<9:24:40, 13.96s/it] 76%|███████▌ | 7574/10000 [29:43:40<9:23:51, 13.95s/it] {'loss': 0.005, 'learning_rate': 1.2175e-05, 'epoch': 9.91} 76%|███████▌ | 7574/10000 [29:43:40<9:23:51, 13.95s/it] 76%|███████▌ | 7575/10000 [29:43:54<9:23:41, 13.95s/it] {'loss': 0.0055, 'learning_rate': 1.217e-05, 'epoch': 9.91} 76%|███████▌ | 7575/10000 [29:43:54<9:23:41, 13.95s/it] 76%|███████▌ | 7576/10000 [29:44:08<9:23:29, 13.95s/it] {'loss': 0.0055, 'learning_rate': 1.2165e-05, 'epoch': 9.92} 76%|███████▌ | 7576/10000 [29:44:08<9:23:29, 13.95s/it] 76%|███████▌ | 7577/10000 [29:44:22<9:24:15, 13.97s/it] {'loss': 0.0051, 'learning_rate': 1.216e-05, 'epoch': 9.92} 76%|███████▌ | 7577/10000 [29:44:22<9:24:15, 13.97s/it] 76%|███████▌ | 7578/10000 [29:44:36<9:22:20, 13.93s/it] {'loss': 0.0053, 'learning_rate': 1.2155000000000001e-05, 'epoch': 9.92} 76%|███████▌ | 7578/10000 [29:44:36<9:22:20, 13.93s/it] 76%|███████▌ | 7579/10000 [29:44:49<9:21:38, 13.92s/it] {'loss': 0.0045, 'learning_rate': 1.215e-05, 'epoch': 9.92} 76%|███████▌ | 7579/10000 [29:44:50<9:21:38, 13.92s/it] 76%|███████▌ | 7580/10000 [29:45:03<9:20:39, 13.90s/it] {'loss': 0.01, 'learning_rate': 1.2145000000000001e-05, 'epoch': 9.92} 76%|███████▌ | 7580/10000 [29:45:03<9:20:39, 13.90s/it] 76%|███████▌ | 7581/10000 [29:45:17<9:20:23, 13.90s/it] {'loss': 0.0043, 'learning_rate': 1.214e-05, 'epoch': 9.92} 76%|███████▌ | 7581/10000 [29:45:17<9:20:23, 13.90s/it] 76%|███████▌ | 7582/10000 [29:45:31<9:20:02, 13.90s/it] {'loss': 0.0386, 'learning_rate': 1.2135e-05, 'epoch': 9.92} 76%|███████▌ | 7582/10000 [29:45:31<9:20:02, 13.90s/it] 76%|███████▌ | 7583/10000 [29:45:45<9:19:20, 13.89s/it] {'loss': 0.0051, 'learning_rate': 1.213e-05, 'epoch': 9.93} 76%|███████▌ | 7583/10000 [29:45:45<9:19:20, 13.89s/it] 76%|███████▌ | 7584/10000 [29:45:59<9:20:03, 13.91s/it] {'loss': 0.0048, 'learning_rate': 1.2125e-05, 'epoch': 9.93} 76%|███████▌ | 7584/10000 [29:45:59<9:20:03, 13.91s/it] 76%|███████▌ | 7585/10000 [29:46:13<9:21:13, 13.94s/it] {'loss': 0.0055, 'learning_rate': 1.2120000000000001e-05, 'epoch': 9.93} 76%|███████▌ | 7585/10000 [29:46:13<9:21:13, 13.94s/it] 76%|███████▌ | 7586/10000 [29:46:27<9:20:07, 13.92s/it] {'loss': 0.0059, 'learning_rate': 1.2115e-05, 'epoch': 9.93} 76%|███████▌ | 7586/10000 [29:46:27<9:20:07, 13.92s/it] 76%|███████▌ | 7587/10000 [29:46:41<9:21:45, 13.97s/it] {'loss': 0.0066, 'learning_rate': 1.2110000000000001e-05, 'epoch': 9.93} 76%|███████▌ | 7587/10000 [29:46:41<9:21:45, 13.97s/it] 76%|███████▌ | 7588/10000 [29:46:55<9:20:52, 13.95s/it] {'loss': 0.0063, 'learning_rate': 1.2105000000000002e-05, 'epoch': 9.93} 76%|███████▌ | 7588/10000 [29:46:55<9:20:52, 13.95s/it] 76%|███████▌ | 7589/10000 [29:47:09<9:20:16, 13.94s/it] {'loss': 0.0077, 'learning_rate': 1.2100000000000001e-05, 'epoch': 9.93} 76%|███████▌ | 7589/10000 [29:47:09<9:20:16, 13.94s/it] 76%|███████▌ | 7590/10000 [29:47:23<9:20:20, 13.95s/it] {'loss': 0.0053, 'learning_rate': 1.2095e-05, 'epoch': 9.93} 76%|███████▌ | 7590/10000 [29:47:23<9:20:20, 13.95s/it] 76%|███████▌ | 7591/10000 [29:47:37<9:20:52, 13.97s/it] {'loss': 0.005, 'learning_rate': 1.209e-05, 'epoch': 9.94} 76%|███████▌ | 7591/10000 [29:47:37<9:20:52, 13.97s/it] 76%|███████▌ | 7592/10000 [29:47:51<9:19:58, 13.95s/it] {'loss': 0.0053, 'learning_rate': 1.2085e-05, 'epoch': 9.94} 76%|███████▌ | 7592/10000 [29:47:51<9:19:58, 13.95s/it] 76%|███████▌ | 7593/10000 [29:48:04<9:18:30, 13.92s/it] {'loss': 0.0055, 'learning_rate': 1.2080000000000001e-05, 'epoch': 9.94} 76%|███████▌ | 7593/10000 [29:48:05<9:18:30, 13.92s/it] 76%|███████▌ | 7594/10000 [29:48:18<9:18:24, 13.93s/it] {'loss': 0.0053, 'learning_rate': 1.2075e-05, 'epoch': 9.94} 76%|███████▌ | 7594/10000 [29:48:18<9:18:24, 13.93s/it] 76%|███████▌ | 7595/10000 [29:48:32<9:17:58, 13.92s/it] {'loss': 0.0044, 'learning_rate': 1.2070000000000001e-05, 'epoch': 9.94} 76%|███████▌ | 7595/10000 [29:48:32<9:17:58, 13.92s/it] 76%|███████▌ | 7596/10000 [29:48:46<9:18:01, 13.93s/it] {'loss': 0.0062, 'learning_rate': 1.2065e-05, 'epoch': 9.94} 76%|███████▌ | 7596/10000 [29:48:46<9:18:01, 13.93s/it] 76%|███████▌ | 7597/10000 [29:49:00<9:16:03, 13.88s/it] {'loss': 0.0071, 'learning_rate': 1.206e-05, 'epoch': 9.94} 76%|███████▌ | 7597/10000 [29:49:00<9:16:03, 13.88s/it] 76%|███████▌ | 7598/10000 [29:49:14<9:16:46, 13.91s/it] {'loss': 0.005, 'learning_rate': 1.2055e-05, 'epoch': 9.95} 76%|███████▌ | 7598/10000 [29:49:14<9:16:46, 13.91s/it] 76%|███████▌ | 7599/10000 [29:49:28<9:16:48, 13.91s/it] {'loss': 0.006, 'learning_rate': 1.205e-05, 'epoch': 9.95} 76%|███████▌ | 7599/10000 [29:49:28<9:16:48, 13.91s/it] 76%|███████▌ | 7600/10000 [29:49:42<9:17:36, 13.94s/it] {'loss': 0.0138, 'learning_rate': 1.2045e-05, 'epoch': 9.95} 76%|███████▌ | 7600/10000 [29:49:42<9:17:36, 13.94s/it] 76%|███████▌ | 7601/10000 [29:49:56<9:17:05, 13.93s/it] {'loss': 0.0049, 'learning_rate': 1.204e-05, 'epoch': 9.95} 76%|███████▌ | 7601/10000 [29:49:56<9:17:05, 13.93s/it] 76%|███████▌ | 7602/10000 [29:50:10<9:18:44, 13.98s/it] {'loss': 0.0059, 'learning_rate': 1.2035e-05, 'epoch': 9.95} 76%|███████▌ | 7602/10000 [29:50:10<9:18:44, 13.98s/it] 76%|███████▌ | 7603/10000 [29:50:24<9:18:07, 13.97s/it] {'loss': 0.0058, 'learning_rate': 1.2030000000000002e-05, 'epoch': 9.95} 76%|███████▌ | 7603/10000 [29:50:24<9:18:07, 13.97s/it] 76%|███████▌ | 7604/10000 [29:50:38<9:18:12, 13.98s/it] {'loss': 0.0054, 'learning_rate': 1.2025000000000001e-05, 'epoch': 9.95} 76%|███████▌ | 7604/10000 [29:50:38<9:18:12, 13.98s/it] 76%|███████▌ | 7605/10000 [29:50:52<9:17:13, 13.96s/it] {'loss': 0.0059, 'learning_rate': 1.202e-05, 'epoch': 9.95} 76%|███████▌ | 7605/10000 [29:50:52<9:17:13, 13.96s/it] 76%|███████▌ | 7606/10000 [29:51:06<9:16:16, 13.94s/it] {'loss': 0.0063, 'learning_rate': 1.2015000000000001e-05, 'epoch': 9.96} 76%|███████▌ | 7606/10000 [29:51:06<9:16:16, 13.94s/it] 76%|███████▌ | 7607/10000 [29:51:20<9:14:21, 13.90s/it] {'loss': 0.0057, 'learning_rate': 1.201e-05, 'epoch': 9.96} 76%|███████▌ | 7607/10000 [29:51:20<9:14:21, 13.90s/it] 76%|███████▌ | 7608/10000 [29:51:33<9:13:58, 13.90s/it] {'loss': 0.0049, 'learning_rate': 1.2005000000000001e-05, 'epoch': 9.96} 76%|███████▌ | 7608/10000 [29:51:33<9:13:58, 13.90s/it] 76%|███████▌ | 7609/10000 [29:51:47<9:14:05, 13.90s/it] {'loss': 0.0075, 'learning_rate': 1.2e-05, 'epoch': 9.96} 76%|███████▌ | 7609/10000 [29:51:47<9:14:05, 13.90s/it] 76%|███████▌ | 7610/10000 [29:52:01<9:14:14, 13.91s/it] {'loss': 0.0053, 'learning_rate': 1.1995000000000001e-05, 'epoch': 9.96} 76%|███████▌ | 7610/10000 [29:52:01<9:14:14, 13.91s/it] 76%|███████▌ | 7611/10000 [29:52:15<9:12:38, 13.88s/it] {'loss': 0.0048, 'learning_rate': 1.199e-05, 'epoch': 9.96} 76%|███████▌ | 7611/10000 [29:52:15<9:12:38, 13.88s/it] 76%|███████▌ | 7612/10000 [29:52:29<9:11:45, 13.86s/it] {'loss': 0.0072, 'learning_rate': 1.1985e-05, 'epoch': 9.96} 76%|███████▌ | 7612/10000 [29:52:29<9:11:45, 13.86s/it] 76%|███████▌ | 7613/10000 [29:52:43<9:13:43, 13.92s/it] {'loss': 0.0074, 'learning_rate': 1.198e-05, 'epoch': 9.96} 76%|███████▌ | 7613/10000 [29:52:43<9:13:43, 13.92s/it] 76%|███████▌ | 7614/10000 [29:52:57<9:12:30, 13.89s/it] {'loss': 0.0053, 'learning_rate': 1.1975e-05, 'epoch': 9.97} 76%|███████▌ | 7614/10000 [29:52:57<9:12:30, 13.89s/it] 76%|███████▌ | 7615/10000 [29:53:11<9:11:49, 13.88s/it] {'loss': 0.0078, 'learning_rate': 1.197e-05, 'epoch': 9.97} 76%|███████▌ | 7615/10000 [29:53:11<9:11:49, 13.88s/it] 76%|███████▌ | 7616/10000 [29:53:25<9:11:40, 13.88s/it] {'loss': 0.0059, 'learning_rate': 1.1965000000000001e-05, 'epoch': 9.97} 76%|███████▌ | 7616/10000 [29:53:25<9:11:40, 13.88s/it] 76%|███████▌ | 7617/10000 [29:53:39<9:13:02, 13.92s/it] {'loss': 0.0042, 'learning_rate': 1.196e-05, 'epoch': 9.97} 76%|███████▌ | 7617/10000 [29:53:39<9:13:02, 13.92s/it] 76%|███████▌ | 7618/10000 [29:53:52<9:12:08, 13.91s/it] {'loss': 0.0078, 'learning_rate': 1.1955000000000002e-05, 'epoch': 9.97} 76%|███████▌ | 7618/10000 [29:53:52<9:12:08, 13.91s/it] 76%|███████▌ | 7619/10000 [29:54:06<9:11:48, 13.91s/it] {'loss': 0.007, 'learning_rate': 1.195e-05, 'epoch': 9.97} 76%|███████▌ | 7619/10000 [29:54:06<9:11:48, 13.91s/it] 76%|███████▌ | 7620/10000 [29:54:20<9:10:01, 13.87s/it] {'loss': 0.0051, 'learning_rate': 1.1945e-05, 'epoch': 9.97} 76%|███████▌ | 7620/10000 [29:54:20<9:10:01, 13.87s/it] 76%|███████▌ | 7621/10000 [29:54:34<9:11:12, 13.90s/it] {'loss': 0.0062, 'learning_rate': 1.1940000000000001e-05, 'epoch': 9.98} 76%|███████▌ | 7621/10000 [29:54:34<9:11:12, 13.90s/it] 76%|███████▌ | 7622/10000 [29:54:48<9:11:34, 13.92s/it] {'loss': 0.007, 'learning_rate': 1.1935e-05, 'epoch': 9.98} 76%|███████▌ | 7622/10000 [29:54:48<9:11:34, 13.92s/it] 76%|███████▌ | 7623/10000 [29:55:02<9:11:53, 13.93s/it] {'loss': 0.008, 'learning_rate': 1.1930000000000001e-05, 'epoch': 9.98} 76%|███████▌ | 7623/10000 [29:55:02<9:11:53, 13.93s/it] 76%|███████▌ | 7624/10000 [29:55:16<9:12:26, 13.95s/it] {'loss': 0.0089, 'learning_rate': 1.1925e-05, 'epoch': 9.98} 76%|███████▌ | 7624/10000 [29:55:16<9:12:26, 13.95s/it] 76%|███████▋ | 7625/10000 [29:55:30<9:13:10, 13.98s/it] {'loss': 0.0058, 'learning_rate': 1.1920000000000001e-05, 'epoch': 9.98} 76%|███████▋ | 7625/10000 [29:55:30<9:13:10, 13.98s/it] 76%|███████▋ | 7626/10000 [29:55:44<9:12:57, 13.98s/it] {'loss': 0.0088, 'learning_rate': 1.1915000000000002e-05, 'epoch': 9.98} 76%|███████▋ | 7626/10000 [29:55:44<9:12:57, 13.98s/it] 76%|███████▋ | 7627/10000 [29:55:58<9:14:13, 14.01s/it] {'loss': 0.0055, 'learning_rate': 1.1910000000000001e-05, 'epoch': 9.98} 76%|███████▋ | 7627/10000 [29:55:58<9:14:13, 14.01s/it] 76%|███████▋ | 7628/10000 [29:56:12<9:12:40, 13.98s/it] {'loss': 0.0053, 'learning_rate': 1.1905e-05, 'epoch': 9.98} 76%|███████▋ | 7628/10000 [29:56:12<9:12:40, 13.98s/it] 76%|███████▋ | 7629/10000 [29:56:26<9:11:29, 13.96s/it] {'loss': 0.006, 'learning_rate': 1.19e-05, 'epoch': 9.99} 76%|███████▋ | 7629/10000 [29:56:26<9:11:29, 13.96s/it] 76%|███████▋ | 7630/10000 [29:56:40<9:10:58, 13.95s/it] {'loss': 0.0051, 'learning_rate': 1.1895e-05, 'epoch': 9.99} 76%|███████▋ | 7630/10000 [29:56:40<9:10:58, 13.95s/it] 76%|███████▋ | 7631/10000 [29:56:54<9:09:35, 13.92s/it] {'loss': 0.0057, 'learning_rate': 1.1890000000000001e-05, 'epoch': 9.99} 76%|███████▋ | 7631/10000 [29:56:54<9:09:35, 13.92s/it] 76%|███████▋ | 7632/10000 [29:57:08<9:09:33, 13.92s/it] {'loss': 0.0046, 'learning_rate': 1.1885e-05, 'epoch': 9.99} 76%|███████▋ | 7632/10000 [29:57:08<9:09:33, 13.92s/it] 76%|███████▋ | 7633/10000 [29:57:22<9:09:54, 13.94s/it] {'loss': 0.0045, 'learning_rate': 1.1880000000000001e-05, 'epoch': 9.99} 76%|███████▋ | 7633/10000 [29:57:22<9:09:54, 13.94s/it] 76%|███████▋ | 7634/10000 [29:57:36<9:09:19, 13.93s/it] {'loss': 0.0055, 'learning_rate': 1.1875e-05, 'epoch': 9.99} 76%|███████▋ | 7634/10000 [29:57:36<9:09:19, 13.93s/it] 76%|███████▋ | 7635/10000 [29:57:49<9:08:39, 13.92s/it] {'loss': 0.0061, 'learning_rate': 1.187e-05, 'epoch': 9.99} 76%|███████▋ | 7635/10000 [29:57:49<9:08:39, 13.92s/it] 76%|███████▋ | 7636/10000 [29:58:03<9:10:24, 13.97s/it] {'loss': 0.005, 'learning_rate': 1.1865e-05, 'epoch': 9.99} 76%|███████▋ | 7636/10000 [29:58:04<9:10:24, 13.97s/it] 76%|███████▋ | 7637/10000 [29:58:18<9:11:34, 14.01s/it] {'loss': 0.0085, 'learning_rate': 1.186e-05, 'epoch': 10.0} 76%|███████▋ | 7637/10000 [29:58:18<9:11:34, 14.01s/it] 76%|███████▋ | 7638/10000 [29:58:32<9:11:52, 14.02s/it] {'loss': 0.0053, 'learning_rate': 1.1855e-05, 'epoch': 10.0} 76%|███████▋ | 7638/10000 [29:58:32<9:11:52, 14.02s/it] 76%|███████▋ | 7639/10000 [29:58:46<9:12:14, 14.03s/it] {'loss': 0.0049, 'learning_rate': 1.185e-05, 'epoch': 10.0} 76%|███████▋ | 7639/10000 [29:58:46<9:12:14, 14.03s/it] 76%|███████▋ | 7640/10000 [29:58:58<8:54:24, 13.59s/it] {'loss': 0.0077, 'learning_rate': 1.1845000000000001e-05, 'epoch': 10.0} 76%|███████▋ | 7640/10000 [29:58:58<8:54:24, 13.59s/it] 76%|███████▋ | 7641/10000 [29:59:12<8:56:54, 13.66s/it] {'loss': 0.0042, 'learning_rate': 1.1840000000000002e-05, 'epoch': 10.0} 76%|███████▋ | 7641/10000 [29:59:12<8:56:54, 13.66s/it] 76%|███████▋ | 7642/10000 [29:59:26<8:59:44, 13.73s/it] {'loss': 0.0035, 'learning_rate': 1.1835000000000001e-05, 'epoch': 10.0} 76%|███████▋ | 7642/10000 [29:59:26<8:59:44, 13.73s/it] 76%|███████▋ | 7643/10000 [29:59:40<9:02:01, 13.80s/it] {'loss': 0.0037, 'learning_rate': 1.183e-05, 'epoch': 10.0} 76%|███████▋ | 7643/10000 [29:59:40<9:02:01, 13.80s/it] 76%|███████▋ | 7644/10000 [29:59:54<9:04:13, 13.86s/it] {'loss': 0.0034, 'learning_rate': 1.1825e-05, 'epoch': 10.01} 76%|███████▋ | 7644/10000 [29:59:54<9:04:13, 13.86s/it] 76%|███████▋ | 7645/10000 [30:00:08<9:05:20, 13.89s/it] {'loss': 0.0042, 'learning_rate': 1.182e-05, 'epoch': 10.01} 76%|███████▋ | 7645/10000 [30:00:08<9:05:20, 13.89s/it] 76%|███████▋ | 7646/10000 [30:00:22<9:05:33, 13.91s/it] {'loss': 0.0035, 'learning_rate': 1.1815000000000001e-05, 'epoch': 10.01} 76%|███████▋ | 7646/10000 [30:00:22<9:05:33, 13.91s/it] 76%|███████▋ | 7647/10000 [30:00:36<9:05:41, 13.91s/it] {'loss': 0.0028, 'learning_rate': 1.181e-05, 'epoch': 10.01} 76%|███████▋ | 7647/10000 [30:00:36<9:05:41, 13.91s/it] 76%|███████▋ | 7648/10000 [30:00:50<9:06:11, 13.93s/it] {'loss': 0.003, 'learning_rate': 1.1805000000000001e-05, 'epoch': 10.01} 76%|███████▋ | 7648/10000 [30:00:50<9:06:11, 13.93s/it] 76%|███████▋ | 7649/10000 [30:01:04<9:04:55, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.18e-05, 'epoch': 10.01} 76%|███████▋ | 7649/10000 [30:01:04<9:04:55, 13.91s/it] 76%|███████▋ | 7650/10000 [30:01:17<9:04:26, 13.90s/it] {'loss': 0.0034, 'learning_rate': 1.1795e-05, 'epoch': 10.01} 76%|███████▋ | 7650/10000 [30:01:18<9:04:26, 13.90s/it] 77%|███████▋ | 7651/10000 [30:01:31<9:04:35, 13.91s/it] {'loss': 0.0045, 'learning_rate': 1.179e-05, 'epoch': 10.01} 77%|███████▋ | 7651/10000 [30:01:31<9:04:35, 13.91s/it] 77%|███████▋ | 7652/10000 [30:01:45<9:02:20, 13.86s/it] {'loss': 0.003, 'learning_rate': 1.1785e-05, 'epoch': 10.02} 77%|███████▋ | 7652/10000 [30:01:45<9:02:20, 13.86s/it] 77%|███████▋ | 7653/10000 [30:01:59<9:03:07, 13.88s/it] {'loss': 0.0041, 'learning_rate': 1.178e-05, 'epoch': 10.02} 77%|███████▋ | 7653/10000 [30:01:59<9:03:07, 13.88s/it] 77%|███████▋ | 7654/10000 [30:02:13<9:02:37, 13.88s/it] {'loss': 0.004, 'learning_rate': 1.1775e-05, 'epoch': 10.02} 77%|███████▋ | 7654/10000 [30:02:13<9:02:37, 13.88s/it] 77%|███████▋ | 7655/10000 [30:02:27<9:02:23, 13.88s/it] {'loss': 0.0044, 'learning_rate': 1.177e-05, 'epoch': 10.02} 77%|███████▋ | 7655/10000 [30:02:27<9:02:23, 13.88s/it] 77%|███████▋ | 7656/10000 [30:02:41<9:01:30, 13.86s/it] {'loss': 0.0032, 'learning_rate': 1.1765000000000002e-05, 'epoch': 10.02} 77%|███████▋ | 7656/10000 [30:02:41<9:01:30, 13.86s/it] 77%|███████▋ | 7657/10000 [30:02:54<9:01:08, 13.86s/it] {'loss': 0.005, 'learning_rate': 1.1760000000000001e-05, 'epoch': 10.02} 77%|███████▋ | 7657/10000 [30:02:55<9:01:08, 13.86s/it] 77%|███████▋ | 7658/10000 [30:03:08<9:00:59, 13.86s/it] {'loss': 0.0035, 'learning_rate': 1.1755e-05, 'epoch': 10.02} 77%|███████▋ | 7658/10000 [30:03:08<9:00:59, 13.86s/it] 77%|███████▋ | 7659/10000 [30:03:22<8:59:57, 13.84s/it] {'loss': 0.0046, 'learning_rate': 1.175e-05, 'epoch': 10.02} 77%|███████▋ | 7659/10000 [30:03:22<8:59:57, 13.84s/it] 77%|███████▋ | 7660/10000 [30:03:36<8:58:48, 13.82s/it] {'loss': 0.0039, 'learning_rate': 1.1745e-05, 'epoch': 10.03} 77%|███████▋ | 7660/10000 [30:03:36<8:58:48, 13.82s/it] 77%|███████▋ | 7661/10000 [30:03:50<9:01:23, 13.89s/it] {'loss': 0.0043, 'learning_rate': 1.1740000000000001e-05, 'epoch': 10.03} 77%|███████▋ | 7661/10000 [30:03:50<9:01:23, 13.89s/it] 77%|███████▋ | 7662/10000 [30:04:04<9:01:54, 13.91s/it] {'loss': 0.003, 'learning_rate': 1.1735e-05, 'epoch': 10.03} 77%|███████▋ | 7662/10000 [30:04:04<9:01:54, 13.91s/it] 77%|███████▋ | 7663/10000 [30:04:18<9:02:08, 13.92s/it] {'loss': 0.0031, 'learning_rate': 1.1730000000000001e-05, 'epoch': 10.03} 77%|███████▋ | 7663/10000 [30:04:18<9:02:08, 13.92s/it] 77%|███████▋ | 7664/10000 [30:04:32<9:02:33, 13.94s/it] {'loss': 0.0029, 'learning_rate': 1.1725e-05, 'epoch': 10.03} 77%|███████▋ | 7664/10000 [30:04:32<9:02:33, 13.94s/it] 77%|███████▋ | 7665/10000 [30:04:46<9:01:36, 13.92s/it] {'loss': 0.0039, 'learning_rate': 1.172e-05, 'epoch': 10.03} 77%|███████▋ | 7665/10000 [30:04:46<9:01:36, 13.92s/it] 77%|███████▋ | 7666/10000 [30:05:00<9:01:16, 13.91s/it] {'loss': 0.0031, 'learning_rate': 1.1715e-05, 'epoch': 10.03} 77%|███████▋ | 7666/10000 [30:05:00<9:01:16, 13.91s/it] 77%|███████▋ | 7667/10000 [30:05:13<9:00:28, 13.90s/it] {'loss': 0.0021, 'learning_rate': 1.171e-05, 'epoch': 10.04} 77%|███████▋ | 7667/10000 [30:05:14<9:00:28, 13.90s/it] 77%|███████▋ | 7668/10000 [30:05:27<8:59:06, 13.87s/it] {'loss': 0.0045, 'learning_rate': 1.1705e-05, 'epoch': 10.04} 77%|███████▋ | 7668/10000 [30:05:27<8:59:06, 13.87s/it] 77%|███████▋ | 7669/10000 [30:05:41<8:59:53, 13.90s/it] {'loss': 0.0048, 'learning_rate': 1.1700000000000001e-05, 'epoch': 10.04} 77%|███████▋ | 7669/10000 [30:05:41<8:59:53, 13.90s/it] 77%|███████▋ | 7670/10000 [30:05:55<9:01:43, 13.95s/it] {'loss': 0.0031, 'learning_rate': 1.1695e-05, 'epoch': 10.04} 77%|███████▋ | 7670/10000 [30:05:55<9:01:43, 13.95s/it] 77%|███████▋ | 7671/10000 [30:06:09<9:00:57, 13.94s/it] {'loss': 0.0049, 'learning_rate': 1.1690000000000002e-05, 'epoch': 10.04} 77%|███████▋ | 7671/10000 [30:06:09<9:00:57, 13.94s/it] 77%|███████▋ | 7672/10000 [30:06:23<9:00:06, 13.92s/it] {'loss': 0.0031, 'learning_rate': 1.1685e-05, 'epoch': 10.04} 77%|███████▋ | 7672/10000 [30:06:23<9:00:06, 13.92s/it] 77%|███████▋ | 7673/10000 [30:06:37<9:00:06, 13.93s/it] {'loss': 0.0036, 'learning_rate': 1.168e-05, 'epoch': 10.04} 77%|███████▋ | 7673/10000 [30:06:37<9:00:06, 13.93s/it] 77%|███████▋ | 7674/10000 [30:06:51<9:00:35, 13.94s/it] {'loss': 0.0033, 'learning_rate': 1.1675000000000001e-05, 'epoch': 10.04} 77%|███████▋ | 7674/10000 [30:06:51<9:00:35, 13.94s/it] 77%|███████▋ | 7675/10000 [30:07:05<9:01:33, 13.98s/it] {'loss': 0.0034, 'learning_rate': 1.167e-05, 'epoch': 10.05} 77%|███████▋ | 7675/10000 [30:07:05<9:01:33, 13.98s/it] 77%|███████▋ | 7676/10000 [30:07:19<9:00:43, 13.96s/it] {'loss': 0.003, 'learning_rate': 1.1665000000000001e-05, 'epoch': 10.05} 77%|███████▋ | 7676/10000 [30:07:19<9:00:43, 13.96s/it] 77%|███████▋ | 7677/10000 [30:07:33<8:59:12, 13.93s/it] {'loss': 0.0037, 'learning_rate': 1.166e-05, 'epoch': 10.05} 77%|███████▋ | 7677/10000 [30:07:33<8:59:12, 13.93s/it] 77%|███████▋ | 7678/10000 [30:07:47<9:00:07, 13.96s/it] {'loss': 0.0038, 'learning_rate': 1.1655000000000001e-05, 'epoch': 10.05} 77%|███████▋ | 7678/10000 [30:07:47<9:00:07, 13.96s/it] 77%|███████▋ | 7679/10000 [30:08:01<8:59:47, 13.95s/it] {'loss': 0.0054, 'learning_rate': 1.1650000000000002e-05, 'epoch': 10.05} 77%|███████▋ | 7679/10000 [30:08:01<8:59:47, 13.95s/it] 77%|███████▋ | 7680/10000 [30:08:15<8:59:30, 13.95s/it] {'loss': 0.0033, 'learning_rate': 1.1645000000000001e-05, 'epoch': 10.05} 77%|███████▋ | 7680/10000 [30:08:15<8:59:30, 13.95s/it] 77%|███████▋ | 7681/10000 [30:08:29<8:57:24, 13.90s/it] {'loss': 0.0043, 'learning_rate': 1.164e-05, 'epoch': 10.05} 77%|███████▋ | 7681/10000 [30:08:29<8:57:24, 13.90s/it] 77%|███████▋ | 7682/10000 [30:08:42<8:56:42, 13.89s/it] {'loss': 0.0043, 'learning_rate': 1.1635e-05, 'epoch': 10.05} 77%|███████▋ | 7682/10000 [30:08:43<8:56:42, 13.89s/it] 77%|███████▋ | 7683/10000 [30:08:56<8:56:30, 13.89s/it] {'loss': 0.0057, 'learning_rate': 1.163e-05, 'epoch': 10.06} 77%|███████▋ | 7683/10000 [30:08:56<8:56:30, 13.89s/it] 77%|███████▋ | 7684/10000 [30:09:10<8:57:55, 13.94s/it] {'loss': 0.0032, 'learning_rate': 1.1625000000000001e-05, 'epoch': 10.06} 77%|███████▋ | 7684/10000 [30:09:10<8:57:55, 13.94s/it] 77%|███████▋ | 7685/10000 [30:09:24<8:58:25, 13.95s/it] {'loss': 0.0061, 'learning_rate': 1.162e-05, 'epoch': 10.06} 77%|███████▋ | 7685/10000 [30:09:24<8:58:25, 13.95s/it] 77%|███████▋ | 7686/10000 [30:09:38<8:57:40, 13.94s/it] {'loss': 0.0026, 'learning_rate': 1.1615000000000001e-05, 'epoch': 10.06} 77%|███████▋ | 7686/10000 [30:09:38<8:57:40, 13.94s/it] 77%|███████▋ | 7687/10000 [30:09:52<8:58:20, 13.96s/it] {'loss': 0.0043, 'learning_rate': 1.161e-05, 'epoch': 10.06} 77%|███████▋ | 7687/10000 [30:09:52<8:58:20, 13.96s/it] 77%|███████▋ | 7688/10000 [30:10:06<8:56:39, 13.93s/it] {'loss': 0.0027, 'learning_rate': 1.1605e-05, 'epoch': 10.06} 77%|███████▋ | 7688/10000 [30:10:06<8:56:39, 13.93s/it] 77%|███████▋ | 7689/10000 [30:10:20<8:55:03, 13.89s/it] {'loss': 0.0041, 'learning_rate': 1.16e-05, 'epoch': 10.06} 77%|███████▋ | 7689/10000 [30:10:20<8:55:03, 13.89s/it] 77%|███████▋ | 7690/10000 [30:10:34<8:57:02, 13.95s/it] {'loss': 0.0029, 'learning_rate': 1.1595e-05, 'epoch': 10.07} 77%|███████▋ | 7690/10000 [30:10:34<8:57:02, 13.95s/it] 77%|███████▋ | 7691/10000 [30:10:48<8:56:21, 13.94s/it] {'loss': 0.0053, 'learning_rate': 1.159e-05, 'epoch': 10.07} 77%|███████▋ | 7691/10000 [30:10:48<8:56:21, 13.94s/it] 77%|███████▋ | 7692/10000 [30:11:02<8:57:10, 13.96s/it] {'loss': 0.0043, 'learning_rate': 1.1585e-05, 'epoch': 10.07} 77%|███████▋ | 7692/10000 [30:11:02<8:57:10, 13.96s/it] 77%|███████▋ | 7693/10000 [30:11:16<8:55:28, 13.93s/it] {'loss': 0.0058, 'learning_rate': 1.1580000000000001e-05, 'epoch': 10.07} 77%|███████▋ | 7693/10000 [30:11:16<8:55:28, 13.93s/it] 77%|███████▋ | 7694/10000 [30:11:30<8:55:21, 13.93s/it] {'loss': 0.0042, 'learning_rate': 1.1575000000000002e-05, 'epoch': 10.07} 77%|███████▋ | 7694/10000 [30:11:30<8:55:21, 13.93s/it] 77%|███████▋ | 7695/10000 [30:11:44<8:54:38, 13.92s/it] {'loss': 0.0028, 'learning_rate': 1.1570000000000001e-05, 'epoch': 10.07} 77%|███████▋ | 7695/10000 [30:11:44<8:54:38, 13.92s/it] 77%|███████▋ | 7696/10000 [30:11:58<8:53:59, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.1565e-05, 'epoch': 10.07} 77%|███████▋ | 7696/10000 [30:11:58<8:53:59, 13.91s/it] 77%|███████▋ | 7697/10000 [30:12:11<8:52:43, 13.88s/it] {'loss': 0.0079, 'learning_rate': 1.156e-05, 'epoch': 10.07} 77%|███████▋ | 7697/10000 [30:12:11<8:52:43, 13.88s/it] 77%|███████▋ | 7698/10000 [30:12:25<8:53:06, 13.89s/it] {'loss': 0.0044, 'learning_rate': 1.1555e-05, 'epoch': 10.08} 77%|███████▋ | 7698/10000 [30:12:25<8:53:06, 13.89s/it] 77%|███████▋ | 7699/10000 [30:12:39<8:53:23, 13.91s/it] {'loss': 0.0041, 'learning_rate': 1.1550000000000001e-05, 'epoch': 10.08} 77%|███████▋ | 7699/10000 [30:12:39<8:53:23, 13.91s/it] 77%|███████▋ | 7700/10000 [30:12:53<8:51:01, 13.85s/it] {'loss': 0.0049, 'learning_rate': 1.1545e-05, 'epoch': 10.08} 77%|███████▋ | 7700/10000 [30:12:53<8:51:01, 13.85s/it] 77%|███████▋ | 7701/10000 [30:13:07<8:51:06, 13.86s/it] {'loss': 0.0039, 'learning_rate': 1.1540000000000001e-05, 'epoch': 10.08} 77%|███████▋ | 7701/10000 [30:13:07<8:51:06, 13.86s/it] 77%|███████▋ | 7702/10000 [30:13:21<8:51:35, 13.88s/it] {'loss': 0.0036, 'learning_rate': 1.1535e-05, 'epoch': 10.08} 77%|███████▋ | 7702/10000 [30:13:21<8:51:35, 13.88s/it] 77%|███████▋ | 7703/10000 [30:13:35<8:51:55, 13.89s/it] {'loss': 0.0043, 'learning_rate': 1.153e-05, 'epoch': 10.08} 77%|███████▋ | 7703/10000 [30:13:35<8:51:55, 13.89s/it] 77%|███████▋ | 7704/10000 [30:13:49<8:52:18, 13.91s/it] {'loss': 0.003, 'learning_rate': 1.1525e-05, 'epoch': 10.08} 77%|███████▋ | 7704/10000 [30:13:49<8:52:18, 13.91s/it] 77%|███████▋ | 7705/10000 [30:14:03<8:51:57, 13.91s/it] {'loss': 0.0031, 'learning_rate': 1.152e-05, 'epoch': 10.09} 77%|███████▋ | 7705/10000 [30:14:03<8:51:57, 13.91s/it] 77%|███████▋ | 7706/10000 [30:14:16<8:52:34, 13.93s/it] {'loss': 0.0062, 'learning_rate': 1.1515e-05, 'epoch': 10.09} 77%|███████▋ | 7706/10000 [30:14:17<8:52:34, 13.93s/it] 77%|███████▋ | 7707/10000 [30:14:30<8:52:09, 13.92s/it] {'loss': 0.0025, 'learning_rate': 1.151e-05, 'epoch': 10.09} 77%|███████▋ | 7707/10000 [30:14:30<8:52:09, 13.92s/it] 77%|███████▋ | 7708/10000 [30:14:44<8:51:08, 13.90s/it] {'loss': 0.0031, 'learning_rate': 1.1505e-05, 'epoch': 10.09} 77%|███████▋ | 7708/10000 [30:14:44<8:51:08, 13.90s/it] 77%|███████▋ | 7709/10000 [30:14:58<8:50:54, 13.90s/it] {'loss': 0.0034, 'learning_rate': 1.1500000000000002e-05, 'epoch': 10.09} 77%|███████▋ | 7709/10000 [30:14:58<8:50:54, 13.90s/it] 77%|███████▋ | 7710/10000 [30:15:12<8:49:43, 13.88s/it] {'loss': 0.0048, 'learning_rate': 1.1495000000000001e-05, 'epoch': 10.09} 77%|███████▋ | 7710/10000 [30:15:12<8:49:43, 13.88s/it] 77%|███████▋ | 7711/10000 [30:15:26<8:49:01, 13.87s/it] {'loss': 0.0042, 'learning_rate': 1.149e-05, 'epoch': 10.09} 77%|███████▋ | 7711/10000 [30:15:26<8:49:01, 13.87s/it] 77%|███████▋ | 7712/10000 [30:15:40<8:49:10, 13.88s/it] {'loss': 0.0028, 'learning_rate': 1.1485e-05, 'epoch': 10.09} 77%|███████▋ | 7712/10000 [30:15:40<8:49:10, 13.88s/it] 77%|███████▋ | 7713/10000 [30:15:54<8:49:53, 13.90s/it] {'loss': 0.0043, 'learning_rate': 1.148e-05, 'epoch': 10.1} 77%|███████▋ | 7713/10000 [30:15:54<8:49:53, 13.90s/it] 77%|███████▋ | 7714/10000 [30:16:08<8:49:41, 13.90s/it] {'loss': 0.0048, 'learning_rate': 1.1475000000000001e-05, 'epoch': 10.1} 77%|███████▋ | 7714/10000 [30:16:08<8:49:41, 13.90s/it] 77%|███████▋ | 7715/10000 [30:16:21<8:48:42, 13.88s/it] {'loss': 0.0031, 'learning_rate': 1.147e-05, 'epoch': 10.1} 77%|███████▋ | 7715/10000 [30:16:21<8:48:42, 13.88s/it] 77%|███████▋ | 7716/10000 [30:16:35<8:47:41, 13.86s/it] {'loss': 0.0043, 'learning_rate': 1.1465000000000001e-05, 'epoch': 10.1} 77%|███████▋ | 7716/10000 [30:16:35<8:47:41, 13.86s/it] 77%|███████▋ | 7717/10000 [30:16:49<8:49:16, 13.91s/it] {'loss': 0.0033, 'learning_rate': 1.146e-05, 'epoch': 10.1} 77%|███████▋ | 7717/10000 [30:16:49<8:49:16, 13.91s/it] 77%|███████▋ | 7718/10000 [30:17:03<8:49:40, 13.93s/it] {'loss': 0.0038, 'learning_rate': 1.1455000000000001e-05, 'epoch': 10.1} 77%|███████▋ | 7718/10000 [30:17:03<8:49:40, 13.93s/it] 77%|███████▋ | 7719/10000 [30:17:17<8:49:20, 13.92s/it] {'loss': 0.0034, 'learning_rate': 1.145e-05, 'epoch': 10.1} 77%|███████▋ | 7719/10000 [30:17:17<8:49:20, 13.92s/it] 77%|███████▋ | 7720/10000 [30:17:31<8:48:50, 13.92s/it] {'loss': 0.0039, 'learning_rate': 1.1445e-05, 'epoch': 10.1} 77%|███████▋ | 7720/10000 [30:17:31<8:48:50, 13.92s/it] 77%|███████▋ | 7721/10000 [30:17:45<8:49:44, 13.95s/it] {'loss': 0.0029, 'learning_rate': 1.144e-05, 'epoch': 10.11} 77%|███████▋ | 7721/10000 [30:17:45<8:49:44, 13.95s/it] 77%|███████▋ | 7722/10000 [30:17:59<8:49:31, 13.95s/it] {'loss': 0.0041, 'learning_rate': 1.1435e-05, 'epoch': 10.11} 77%|███████▋ | 7722/10000 [30:17:59<8:49:31, 13.95s/it] 77%|███████▋ | 7723/10000 [30:18:13<8:49:28, 13.95s/it] {'loss': 0.0041, 'learning_rate': 1.143e-05, 'epoch': 10.11} 77%|███████▋ | 7723/10000 [30:18:13<8:49:28, 13.95s/it] 77%|███████▋ | 7724/10000 [30:18:27<8:47:53, 13.92s/it] {'loss': 0.0027, 'learning_rate': 1.1425000000000002e-05, 'epoch': 10.11} 77%|███████▋ | 7724/10000 [30:18:27<8:47:53, 13.92s/it] 77%|███████▋ | 7725/10000 [30:18:41<8:48:19, 13.93s/it] {'loss': 0.0036, 'learning_rate': 1.142e-05, 'epoch': 10.11} 77%|███████▋ | 7725/10000 [30:18:41<8:48:19, 13.93s/it] 77%|███████▋ | 7726/10000 [30:18:55<8:48:29, 13.94s/it] {'loss': 0.0027, 'learning_rate': 1.1415e-05, 'epoch': 10.11} 77%|███████▋ | 7726/10000 [30:18:55<8:48:29, 13.94s/it] 77%|███████▋ | 7727/10000 [30:19:09<8:47:50, 13.93s/it] {'loss': 0.0024, 'learning_rate': 1.141e-05, 'epoch': 10.11} 77%|███████▋ | 7727/10000 [30:19:09<8:47:50, 13.93s/it] 77%|███████▋ | 7728/10000 [30:19:23<8:47:29, 13.93s/it] {'loss': 0.0027, 'learning_rate': 1.1405e-05, 'epoch': 10.12} 77%|███████▋ | 7728/10000 [30:19:23<8:47:29, 13.93s/it] 77%|███████▋ | 7729/10000 [30:19:37<8:47:16, 13.93s/it] {'loss': 0.0025, 'learning_rate': 1.1400000000000001e-05, 'epoch': 10.12} 77%|███████▋ | 7729/10000 [30:19:37<8:47:16, 13.93s/it] 77%|███████▋ | 7730/10000 [30:19:50<8:45:29, 13.89s/it] {'loss': 0.0051, 'learning_rate': 1.1395e-05, 'epoch': 10.12} 77%|███████▋ | 7730/10000 [30:19:50<8:45:29, 13.89s/it] 77%|███████▋ | 7731/10000 [30:20:04<8:46:12, 13.91s/it] {'loss': 0.0029, 'learning_rate': 1.1390000000000001e-05, 'epoch': 10.12} 77%|███████▋ | 7731/10000 [30:20:04<8:46:12, 13.91s/it] 77%|███████▋ | 7732/10000 [30:20:18<8:45:18, 13.90s/it] {'loss': 0.0061, 'learning_rate': 1.1385000000000002e-05, 'epoch': 10.12} 77%|███████▋ | 7732/10000 [30:20:18<8:45:18, 13.90s/it] 77%|███████▋ | 7733/10000 [30:20:32<8:45:29, 13.91s/it] {'loss': 0.0039, 'learning_rate': 1.1380000000000001e-05, 'epoch': 10.12} 77%|███████▋ | 7733/10000 [30:20:32<8:45:29, 13.91s/it] 77%|███████▋ | 7734/10000 [30:20:46<8:45:51, 13.92s/it] {'loss': 0.0054, 'learning_rate': 1.1375e-05, 'epoch': 10.12} 77%|███████▋ | 7734/10000 [30:20:46<8:45:51, 13.92s/it] 77%|███████▋ | 7735/10000 [30:21:00<8:46:29, 13.95s/it] {'loss': 0.0036, 'learning_rate': 1.137e-05, 'epoch': 10.12} 77%|███████▋ | 7735/10000 [30:21:00<8:46:29, 13.95s/it] 77%|███████▋ | 7736/10000 [30:21:14<8:47:32, 13.98s/it] {'loss': 0.0028, 'learning_rate': 1.1365e-05, 'epoch': 10.13} 77%|███████▋ | 7736/10000 [30:21:14<8:47:32, 13.98s/it] 77%|███████▋ | 7737/10000 [30:21:28<8:47:49, 13.99s/it] {'loss': 0.0039, 'learning_rate': 1.1360000000000001e-05, 'epoch': 10.13} 77%|███████▋ | 7737/10000 [30:21:28<8:47:49, 13.99s/it] 77%|███████▋ | 7738/10000 [30:21:42<8:47:23, 13.99s/it] {'loss': 0.0044, 'learning_rate': 1.1355e-05, 'epoch': 10.13} 77%|███████▋ | 7738/10000 [30:21:42<8:47:23, 13.99s/it] 77%|███████▋ | 7739/10000 [30:21:56<8:46:36, 13.97s/it] {'loss': 0.0051, 'learning_rate': 1.1350000000000001e-05, 'epoch': 10.13} 77%|███████▋ | 7739/10000 [30:21:56<8:46:36, 13.97s/it] 77%|███████▋ | 7740/10000 [30:22:10<8:46:17, 13.97s/it] {'loss': 0.0023, 'learning_rate': 1.1345e-05, 'epoch': 10.13} 77%|███████▋ | 7740/10000 [30:22:10<8:46:17, 13.97s/it] 77%|███████▋ | 7741/10000 [30:22:24<8:44:15, 13.92s/it] {'loss': 0.0021, 'learning_rate': 1.134e-05, 'epoch': 10.13} 77%|███████▋ | 7741/10000 [30:22:24<8:44:15, 13.92s/it] 77%|███████▋ | 7742/10000 [30:22:38<8:44:41, 13.94s/it] {'loss': 0.0029, 'learning_rate': 1.1335e-05, 'epoch': 10.13} 77%|███████▋ | 7742/10000 [30:22:38<8:44:41, 13.94s/it] 77%|███████▋ | 7743/10000 [30:22:52<8:43:53, 13.93s/it] {'loss': 0.0022, 'learning_rate': 1.133e-05, 'epoch': 10.13} 77%|███████▋ | 7743/10000 [30:22:52<8:43:53, 13.93s/it] 77%|███████▋ | 7744/10000 [30:23:06<8:44:50, 13.96s/it] {'loss': 0.0052, 'learning_rate': 1.1325e-05, 'epoch': 10.14} 77%|███████▋ | 7744/10000 [30:23:06<8:44:50, 13.96s/it] 77%|███████▋ | 7745/10000 [30:23:20<8:43:43, 13.93s/it] {'loss': 0.0047, 'learning_rate': 1.132e-05, 'epoch': 10.14} 77%|███████▋ | 7745/10000 [30:23:20<8:43:43, 13.93s/it] 77%|███████▋ | 7746/10000 [30:23:34<8:43:29, 13.94s/it] {'loss': 0.003, 'learning_rate': 1.1315000000000001e-05, 'epoch': 10.14} 77%|███████▋ | 7746/10000 [30:23:34<8:43:29, 13.94s/it] 77%|███████▋ | 7747/10000 [30:23:47<8:43:03, 13.93s/it] {'loss': 0.0082, 'learning_rate': 1.1310000000000002e-05, 'epoch': 10.14} 77%|███████▋ | 7747/10000 [30:23:47<8:43:03, 13.93s/it] 77%|███████▋ | 7748/10000 [30:24:01<8:42:53, 13.93s/it] {'loss': 0.0025, 'learning_rate': 1.1305000000000001e-05, 'epoch': 10.14} 77%|███████▋ | 7748/10000 [30:24:01<8:42:53, 13.93s/it] 77%|███████▋ | 7749/10000 [30:24:15<8:43:55, 13.97s/it] {'loss': 0.0038, 'learning_rate': 1.13e-05, 'epoch': 10.14} 77%|███████▋ | 7749/10000 [30:24:16<8:43:55, 13.97s/it] 78%|███████▊ | 7750/10000 [30:24:29<8:42:49, 13.94s/it] {'loss': 0.0037, 'learning_rate': 1.1295e-05, 'epoch': 10.14} 78%|███████▊ | 7750/10000 [30:24:29<8:42:49, 13.94s/it] 78%|███████▊ | 7751/10000 [30:24:43<8:42:50, 13.95s/it] {'loss': 0.0034, 'learning_rate': 1.129e-05, 'epoch': 10.15} 78%|███████▊ | 7751/10000 [30:24:43<8:42:50, 13.95s/it] 78%|███████▊ | 7752/10000 [30:24:57<8:43:28, 13.97s/it] {'loss': 0.0035, 'learning_rate': 1.1285000000000001e-05, 'epoch': 10.15} 78%|███████▊ | 7752/10000 [30:24:57<8:43:28, 13.97s/it] 78%|███████▊ | 7753/10000 [30:25:11<8:44:09, 14.00s/it] {'loss': 0.0042, 'learning_rate': 1.128e-05, 'epoch': 10.15} 78%|███████▊ | 7753/10000 [30:25:11<8:44:09, 14.00s/it] 78%|███████▊ | 7754/10000 [30:25:25<8:42:19, 13.95s/it] {'loss': 0.0046, 'learning_rate': 1.1275000000000001e-05, 'epoch': 10.15} 78%|███████▊ | 7754/10000 [30:25:25<8:42:19, 13.95s/it] 78%|███████▊ | 7755/10000 [30:25:39<8:40:51, 13.92s/it] {'loss': 0.0037, 'learning_rate': 1.127e-05, 'epoch': 10.15} 78%|███████▊ | 7755/10000 [30:25:39<8:40:51, 13.92s/it] 78%|███████▊ | 7756/10000 [30:25:53<8:40:25, 13.91s/it] {'loss': 0.0049, 'learning_rate': 1.1265e-05, 'epoch': 10.15} 78%|███████▊ | 7756/10000 [30:25:53<8:40:25, 13.91s/it] 78%|███████▊ | 7757/10000 [30:26:07<8:40:26, 13.92s/it] {'loss': 0.004, 'learning_rate': 1.126e-05, 'epoch': 10.15} 78%|███████▊ | 7757/10000 [30:26:07<8:40:26, 13.92s/it] 78%|███████▊ | 7758/10000 [30:26:21<8:40:52, 13.94s/it] {'loss': 0.0022, 'learning_rate': 1.1255e-05, 'epoch': 10.15} 78%|███████▊ | 7758/10000 [30:26:21<8:40:52, 13.94s/it] 78%|███████▊ | 7759/10000 [30:26:35<8:40:37, 13.94s/it] {'loss': 0.0038, 'learning_rate': 1.125e-05, 'epoch': 10.16} 78%|███████▊ | 7759/10000 [30:26:35<8:40:37, 13.94s/it] 78%|███████▊ | 7760/10000 [30:26:49<8:39:20, 13.91s/it] {'loss': 0.003, 'learning_rate': 1.1245e-05, 'epoch': 10.16} 78%|███████▊ | 7760/10000 [30:26:49<8:39:20, 13.91s/it] 78%|███████▊ | 7761/10000 [30:27:02<8:38:05, 13.88s/it] {'loss': 0.0057, 'learning_rate': 1.124e-05, 'epoch': 10.16} 78%|███████▊ | 7761/10000 [30:27:03<8:38:05, 13.88s/it] 78%|███████▊ | 7762/10000 [30:27:16<8:38:05, 13.89s/it] {'loss': 0.0056, 'learning_rate': 1.1235000000000002e-05, 'epoch': 10.16} 78%|███████▊ | 7762/10000 [30:27:16<8:38:05, 13.89s/it] 78%|███████▊ | 7763/10000 [30:27:30<8:38:11, 13.90s/it] {'loss': 0.0032, 'learning_rate': 1.1230000000000001e-05, 'epoch': 10.16} 78%|███████▊ | 7763/10000 [30:27:30<8:38:11, 13.90s/it] 78%|███████▊ | 7764/10000 [30:27:44<8:37:56, 13.90s/it] {'loss': 0.0029, 'learning_rate': 1.1225e-05, 'epoch': 10.16} 78%|███████▊ | 7764/10000 [30:27:44<8:37:56, 13.90s/it] 78%|███████▊ | 7765/10000 [30:27:58<8:38:44, 13.93s/it] {'loss': 0.0041, 'learning_rate': 1.122e-05, 'epoch': 10.16} 78%|███████▊ | 7765/10000 [30:27:58<8:38:44, 13.93s/it] 78%|███████▊ | 7766/10000 [30:28:12<8:38:10, 13.92s/it] {'loss': 0.0032, 'learning_rate': 1.1215e-05, 'epoch': 10.16} 78%|███████▊ | 7766/10000 [30:28:12<8:38:10, 13.92s/it] 78%|███████▊ | 7767/10000 [30:28:26<8:36:57, 13.89s/it] {'loss': 0.0033, 'learning_rate': 1.1210000000000001e-05, 'epoch': 10.17} 78%|███████▊ | 7767/10000 [30:28:26<8:36:57, 13.89s/it] 78%|███████▊ | 7768/10000 [30:28:40<8:37:57, 13.92s/it] {'loss': 0.0034, 'learning_rate': 1.1205e-05, 'epoch': 10.17} 78%|███████▊ | 7768/10000 [30:28:40<8:37:57, 13.92s/it] 78%|███████▊ | 7769/10000 [30:28:54<8:38:23, 13.94s/it] {'loss': 0.0036, 'learning_rate': 1.1200000000000001e-05, 'epoch': 10.17} 78%|███████▊ | 7769/10000 [30:28:54<8:38:23, 13.94s/it] 78%|███████▊ | 7770/10000 [30:29:08<8:36:56, 13.91s/it] {'loss': 0.0028, 'learning_rate': 1.1195e-05, 'epoch': 10.17} 78%|███████▊ | 7770/10000 [30:29:08<8:36:56, 13.91s/it] 78%|███████▊ | 7771/10000 [30:29:22<8:36:44, 13.91s/it] {'loss': 0.003, 'learning_rate': 1.1190000000000001e-05, 'epoch': 10.17} 78%|███████▊ | 7771/10000 [30:29:22<8:36:44, 13.91s/it] 78%|███████▊ | 7772/10000 [30:29:36<8:37:33, 13.94s/it] {'loss': 0.003, 'learning_rate': 1.1185e-05, 'epoch': 10.17} 78%|███████▊ | 7772/10000 [30:29:36<8:37:33, 13.94s/it] 78%|███████▊ | 7773/10000 [30:29:50<8:36:52, 13.93s/it] {'loss': 0.004, 'learning_rate': 1.118e-05, 'epoch': 10.17} 78%|███████▊ | 7773/10000 [30:29:50<8:36:52, 13.93s/it] 78%|███████▊ | 7774/10000 [30:30:04<8:36:58, 13.93s/it] {'loss': 0.0044, 'learning_rate': 1.1175e-05, 'epoch': 10.18} 78%|███████▊ | 7774/10000 [30:30:04<8:36:58, 13.93s/it] 78%|███████▊ | 7775/10000 [30:30:17<8:36:22, 13.92s/it] {'loss': 0.0037, 'learning_rate': 1.117e-05, 'epoch': 10.18} 78%|███████▊ | 7775/10000 [30:30:17<8:36:22, 13.92s/it] 78%|███████▊ | 7776/10000 [30:30:31<8:35:42, 13.91s/it] {'loss': 0.0044, 'learning_rate': 1.1165e-05, 'epoch': 10.18} 78%|███████▊ | 7776/10000 [30:30:31<8:35:42, 13.91s/it] 78%|███████▊ | 7777/10000 [30:30:45<8:35:49, 13.92s/it] {'loss': 0.0044, 'learning_rate': 1.1160000000000002e-05, 'epoch': 10.18} 78%|███████▊ | 7777/10000 [30:30:45<8:35:49, 13.92s/it] 78%|███████▊ | 7778/10000 [30:30:59<8:36:37, 13.95s/it] {'loss': 0.0041, 'learning_rate': 1.1155e-05, 'epoch': 10.18} 78%|███████▊ | 7778/10000 [30:30:59<8:36:37, 13.95s/it] 78%|███████▊ | 7779/10000 [30:31:13<8:36:40, 13.96s/it] {'loss': 0.0036, 'learning_rate': 1.115e-05, 'epoch': 10.18} 78%|███████▊ | 7779/10000 [30:31:13<8:36:40, 13.96s/it] 78%|███████▊ | 7780/10000 [30:31:27<8:37:07, 13.98s/it] {'loss': 0.0041, 'learning_rate': 1.1145e-05, 'epoch': 10.18} 78%|███████▊ | 7780/10000 [30:31:27<8:37:07, 13.98s/it] 78%|███████▊ | 7781/10000 [30:31:41<8:36:32, 13.97s/it] {'loss': 0.0021, 'learning_rate': 1.114e-05, 'epoch': 10.18} 78%|███████▊ | 7781/10000 [30:31:41<8:36:32, 13.97s/it] 78%|███████▊ | 7782/10000 [30:31:55<8:34:35, 13.92s/it] {'loss': 0.0036, 'learning_rate': 1.1135000000000001e-05, 'epoch': 10.19} 78%|███████▊ | 7782/10000 [30:31:55<8:34:35, 13.92s/it] 78%|███████▊ | 7783/10000 [30:32:09<8:33:10, 13.89s/it] {'loss': 0.004, 'learning_rate': 1.113e-05, 'epoch': 10.19} 78%|███████▊ | 7783/10000 [30:32:09<8:33:10, 13.89s/it] 78%|███████▊ | 7784/10000 [30:32:23<8:34:54, 13.94s/it] {'loss': 0.0057, 'learning_rate': 1.1125000000000001e-05, 'epoch': 10.19} 78%|███████▊ | 7784/10000 [30:32:23<8:34:54, 13.94s/it] 78%|███████▊ | 7785/10000 [30:32:37<8:34:18, 13.93s/it] {'loss': 0.0041, 'learning_rate': 1.112e-05, 'epoch': 10.19} 78%|███████▊ | 7785/10000 [30:32:37<8:34:18, 13.93s/it] 78%|███████▊ | 7786/10000 [30:32:51<8:34:03, 13.93s/it] {'loss': 0.0029, 'learning_rate': 1.1115000000000001e-05, 'epoch': 10.19} 78%|███████▊ | 7786/10000 [30:32:51<8:34:03, 13.93s/it] 78%|███████▊ | 7787/10000 [30:33:05<8:32:59, 13.91s/it] {'loss': 0.0058, 'learning_rate': 1.111e-05, 'epoch': 10.19} 78%|███████▊ | 7787/10000 [30:33:05<8:32:59, 13.91s/it] 78%|███████▊ | 7788/10000 [30:33:19<8:33:21, 13.92s/it] {'loss': 0.0057, 'learning_rate': 1.1105e-05, 'epoch': 10.19} 78%|███████▊ | 7788/10000 [30:33:19<8:33:21, 13.92s/it] 78%|███████▊ | 7789/10000 [30:33:32<8:31:37, 13.88s/it] {'loss': 0.0042, 'learning_rate': 1.11e-05, 'epoch': 10.2} 78%|███████▊ | 7789/10000 [30:33:32<8:31:37, 13.88s/it] 78%|███████▊ | 7790/10000 [30:33:46<8:31:39, 13.89s/it] {'loss': 0.003, 'learning_rate': 1.1095e-05, 'epoch': 10.2} 78%|███████▊ | 7790/10000 [30:33:46<8:31:39, 13.89s/it] 78%|███████▊ | 7791/10000 [30:34:00<8:32:41, 13.93s/it] {'loss': 0.0029, 'learning_rate': 1.109e-05, 'epoch': 10.2} 78%|███████▊ | 7791/10000 [30:34:00<8:32:41, 13.93s/it] 78%|███████▊ | 7792/10000 [30:34:14<8:32:27, 13.93s/it] {'loss': 0.0034, 'learning_rate': 1.1085000000000001e-05, 'epoch': 10.2} 78%|███████▊ | 7792/10000 [30:34:14<8:32:27, 13.93s/it] 78%|███████▊ | 7793/10000 [30:34:28<8:31:30, 13.91s/it] {'loss': 0.0022, 'learning_rate': 1.108e-05, 'epoch': 10.2} 78%|███████▊ | 7793/10000 [30:34:28<8:31:30, 13.91s/it] 78%|███████▊ | 7794/10000 [30:34:42<8:31:27, 13.91s/it] {'loss': 0.0066, 'learning_rate': 1.1075e-05, 'epoch': 10.2} 78%|███████▊ | 7794/10000 [30:34:42<8:31:27, 13.91s/it] 78%|███████▊ | 7795/10000 [30:34:56<8:30:32, 13.89s/it] {'loss': 0.0036, 'learning_rate': 1.107e-05, 'epoch': 10.2} 78%|███████▊ | 7795/10000 [30:34:56<8:30:32, 13.89s/it] 78%|███████▊ | 7796/10000 [30:35:10<8:28:32, 13.84s/it] {'loss': 0.0036, 'learning_rate': 1.1065e-05, 'epoch': 10.2} 78%|███████▊ | 7796/10000 [30:35:10<8:28:32, 13.84s/it] 78%|███████▊ | 7797/10000 [30:35:24<8:30:38, 13.91s/it] {'loss': 0.0038, 'learning_rate': 1.106e-05, 'epoch': 10.21} 78%|███████▊ | 7797/10000 [30:35:24<8:30:38, 13.91s/it] 78%|███████▊ | 7798/10000 [30:35:38<8:31:36, 13.94s/it] {'loss': 0.0034, 'learning_rate': 1.1055e-05, 'epoch': 10.21} 78%|███████▊ | 7798/10000 [30:35:38<8:31:36, 13.94s/it] 78%|███████▊ | 7799/10000 [30:35:51<8:30:25, 13.91s/it] {'loss': 0.0027, 'learning_rate': 1.1050000000000001e-05, 'epoch': 10.21} 78%|███████▊ | 7799/10000 [30:35:52<8:30:25, 13.91s/it] 78%|███████▊ | 7800/10000 [30:36:06<8:32:24, 13.97s/it] {'loss': 0.0041, 'learning_rate': 1.1045000000000002e-05, 'epoch': 10.21} 78%|███████▊ | 7800/10000 [30:36:06<8:32:24, 13.97s/it] 78%|███████▊ | 7801/10000 [30:36:20<8:31:48, 13.96s/it] {'loss': 0.0064, 'learning_rate': 1.1040000000000001e-05, 'epoch': 10.21} 78%|███████▊ | 7801/10000 [30:36:20<8:31:48, 13.96s/it] 78%|███████▊ | 7802/10000 [30:36:33<8:30:24, 13.93s/it] {'loss': 0.0061, 'learning_rate': 1.1035e-05, 'epoch': 10.21} 78%|███████▊ | 7802/10000 [30:36:33<8:30:24, 13.93s/it] 78%|███████▊ | 7803/10000 [30:36:47<8:28:57, 13.90s/it] {'loss': 0.0045, 'learning_rate': 1.103e-05, 'epoch': 10.21} 78%|███████▊ | 7803/10000 [30:36:47<8:28:57, 13.90s/it] 78%|███████▊ | 7804/10000 [30:37:01<8:28:41, 13.90s/it] {'loss': 0.0035, 'learning_rate': 1.1025e-05, 'epoch': 10.21} 78%|███████▊ | 7804/10000 [30:37:01<8:28:41, 13.90s/it] 78%|███████▊ | 7805/10000 [30:37:15<8:27:51, 13.88s/it] {'loss': 0.0042, 'learning_rate': 1.1020000000000001e-05, 'epoch': 10.22} 78%|███████▊ | 7805/10000 [30:37:15<8:27:51, 13.88s/it] 78%|███████▊ | 7806/10000 [30:37:29<8:27:47, 13.89s/it] {'loss': 0.0035, 'learning_rate': 1.1015e-05, 'epoch': 10.22} 78%|███████▊ | 7806/10000 [30:37:29<8:27:47, 13.89s/it] 78%|███████▊ | 7807/10000 [30:37:43<8:27:32, 13.89s/it] {'loss': 0.0036, 'learning_rate': 1.1010000000000001e-05, 'epoch': 10.22} 78%|███████▊ | 7807/10000 [30:37:43<8:27:32, 13.89s/it] 78%|███████▊ | 7808/10000 [30:37:57<8:27:31, 13.89s/it] {'loss': 0.0056, 'learning_rate': 1.1005e-05, 'epoch': 10.22} 78%|███████▊ | 7808/10000 [30:37:57<8:27:31, 13.89s/it] 78%|███████▊ | 7809/10000 [30:38:10<8:26:11, 13.86s/it] {'loss': 0.0064, 'learning_rate': 1.1000000000000001e-05, 'epoch': 10.22} 78%|███████▊ | 7809/10000 [30:38:10<8:26:11, 13.86s/it] 78%|███████▊ | 7810/10000 [30:38:24<8:26:20, 13.87s/it] {'loss': 0.0035, 'learning_rate': 1.0995e-05, 'epoch': 10.22} 78%|███████▊ | 7810/10000 [30:38:24<8:26:20, 13.87s/it] 78%|███████▊ | 7811/10000 [30:38:38<8:27:12, 13.90s/it] {'loss': 0.0026, 'learning_rate': 1.099e-05, 'epoch': 10.22} 78%|███████▊ | 7811/10000 [30:38:38<8:27:12, 13.90s/it] 78%|███████▊ | 7812/10000 [30:38:52<8:27:50, 13.93s/it] {'loss': 0.0048, 'learning_rate': 1.0985e-05, 'epoch': 10.23} 78%|███████▊ | 7812/10000 [30:38:52<8:27:50, 13.93s/it] 78%|███████▊ | 7813/10000 [30:39:06<8:27:38, 13.93s/it] {'loss': 0.0045, 'learning_rate': 1.098e-05, 'epoch': 10.23} 78%|███████▊ | 7813/10000 [30:39:06<8:27:38, 13.93s/it] 78%|███████▊ | 7814/10000 [30:39:20<8:25:52, 13.88s/it] {'loss': 0.0046, 'learning_rate': 1.0975e-05, 'epoch': 10.23} 78%|███████▊ | 7814/10000 [30:39:20<8:25:52, 13.88s/it] 78%|███████▊ | 7815/10000 [30:39:34<8:25:36, 13.88s/it] {'loss': 0.0037, 'learning_rate': 1.0970000000000002e-05, 'epoch': 10.23} 78%|███████▊ | 7815/10000 [30:39:34<8:25:36, 13.88s/it] 78%|███████▊ | 7816/10000 [30:39:48<8:25:56, 13.90s/it] {'loss': 0.0037, 'learning_rate': 1.0965000000000001e-05, 'epoch': 10.23} 78%|███████▊ | 7816/10000 [30:39:48<8:25:56, 13.90s/it] 78%|███████▊ | 7817/10000 [30:40:02<8:25:19, 13.89s/it] {'loss': 0.0022, 'learning_rate': 1.096e-05, 'epoch': 10.23} 78%|███████▊ | 7817/10000 [30:40:02<8:25:19, 13.89s/it] 78%|███████▊ | 7818/10000 [30:40:16<8:25:47, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.0955e-05, 'epoch': 10.23} 78%|███████▊ | 7818/10000 [30:40:16<8:25:47, 13.91s/it] 78%|███████▊ | 7819/10000 [30:40:30<8:25:24, 13.90s/it] {'loss': 0.0019, 'learning_rate': 1.095e-05, 'epoch': 10.23} 78%|███████▊ | 7819/10000 [30:40:30<8:25:24, 13.90s/it] 78%|███████▊ | 7820/10000 [30:40:43<8:25:50, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.0945000000000001e-05, 'epoch': 10.24} 78%|███████▊ | 7820/10000 [30:40:44<8:25:50, 13.92s/it] 78%|███████▊ | 7821/10000 [30:40:57<8:25:19, 13.91s/it] {'loss': 0.0042, 'learning_rate': 1.094e-05, 'epoch': 10.24} 78%|███████▊ | 7821/10000 [30:40:57<8:25:19, 13.91s/it] 78%|███████▊ | 7822/10000 [30:41:11<8:24:58, 13.91s/it] {'loss': 0.0065, 'learning_rate': 1.0935000000000001e-05, 'epoch': 10.24} 78%|███████▊ | 7822/10000 [30:41:11<8:24:58, 13.91s/it] 78%|███████▊ | 7823/10000 [30:41:25<8:23:55, 13.89s/it] {'loss': 0.0027, 'learning_rate': 1.093e-05, 'epoch': 10.24} 78%|███████▊ | 7823/10000 [30:41:25<8:23:55, 13.89s/it] 78%|███████▊ | 7824/10000 [30:41:39<8:22:25, 13.85s/it] {'loss': 0.0052, 'learning_rate': 1.0925000000000001e-05, 'epoch': 10.24} 78%|███████▊ | 7824/10000 [30:41:39<8:22:25, 13.85s/it] 78%|███████▊ | 7825/10000 [30:41:53<8:22:14, 13.86s/it] {'loss': 0.007, 'learning_rate': 1.092e-05, 'epoch': 10.24} 78%|███████▊ | 7825/10000 [30:41:53<8:22:14, 13.86s/it] 78%|███████▊ | 7826/10000 [30:42:07<8:21:10, 13.83s/it] {'loss': 0.0034, 'learning_rate': 1.0915e-05, 'epoch': 10.24} 78%|███████▊ | 7826/10000 [30:42:07<8:21:10, 13.83s/it] 78%|███████▊ | 7827/10000 [30:42:21<8:22:38, 13.88s/it] {'loss': 0.0035, 'learning_rate': 1.091e-05, 'epoch': 10.24} 78%|███████▊ | 7827/10000 [30:42:21<8:22:38, 13.88s/it] 78%|███████▊ | 7828/10000 [30:42:34<8:21:09, 13.84s/it] {'loss': 0.0029, 'learning_rate': 1.0905e-05, 'epoch': 10.25} 78%|███████▊ | 7828/10000 [30:42:34<8:21:09, 13.84s/it] 78%|███████▊ | 7829/10000 [30:42:48<8:21:19, 13.85s/it] {'loss': 0.0037, 'learning_rate': 1.09e-05, 'epoch': 10.25} 78%|███████▊ | 7829/10000 [30:42:48<8:21:19, 13.85s/it] 78%|███████▊ | 7830/10000 [30:43:02<8:21:29, 13.87s/it] {'loss': 0.0047, 'learning_rate': 1.0895000000000002e-05, 'epoch': 10.25} 78%|███████▊ | 7830/10000 [30:43:02<8:21:29, 13.87s/it] 78%|███████▊ | 7831/10000 [30:43:16<8:22:44, 13.91s/it] {'loss': 0.0043, 'learning_rate': 1.089e-05, 'epoch': 10.25} 78%|███████▊ | 7831/10000 [30:43:16<8:22:44, 13.91s/it] 78%|███████▊ | 7832/10000 [30:43:30<8:20:14, 13.84s/it] {'loss': 0.0054, 'learning_rate': 1.0885e-05, 'epoch': 10.25} 78%|███████▊ | 7832/10000 [30:43:30<8:20:14, 13.84s/it] 78%|███████▊ | 7833/10000 [30:43:44<8:19:42, 13.84s/it] {'loss': 0.0022, 'learning_rate': 1.088e-05, 'epoch': 10.25} 78%|███████▊ | 7833/10000 [30:43:44<8:19:42, 13.84s/it] 78%|███████▊ | 7834/10000 [30:43:57<8:20:17, 13.86s/it] {'loss': 0.0027, 'learning_rate': 1.0875e-05, 'epoch': 10.25} 78%|███████▊ | 7834/10000 [30:43:58<8:20:17, 13.86s/it] 78%|███████▊ | 7835/10000 [30:44:11<8:20:54, 13.88s/it] {'loss': 0.0037, 'learning_rate': 1.0870000000000001e-05, 'epoch': 10.26} 78%|███████▊ | 7835/10000 [30:44:11<8:20:54, 13.88s/it] 78%|███████▊ | 7836/10000 [30:44:25<8:19:57, 13.86s/it] {'loss': 0.0041, 'learning_rate': 1.0865e-05, 'epoch': 10.26} 78%|███████▊ | 7836/10000 [30:44:25<8:19:57, 13.86s/it] 78%|███████▊ | 7837/10000 [30:44:39<8:22:07, 13.93s/it] {'loss': 0.0038, 'learning_rate': 1.0860000000000001e-05, 'epoch': 10.26} 78%|███████▊ | 7837/10000 [30:44:39<8:22:07, 13.93s/it] 78%|███████▊ | 7838/10000 [30:44:53<8:22:09, 13.94s/it] {'loss': 0.0029, 'learning_rate': 1.0855e-05, 'epoch': 10.26} 78%|███████▊ | 7838/10000 [30:44:53<8:22:09, 13.94s/it] 78%|███████▊ | 7839/10000 [30:45:07<8:21:29, 13.92s/it] {'loss': 0.0043, 'learning_rate': 1.0850000000000001e-05, 'epoch': 10.26} 78%|███████▊ | 7839/10000 [30:45:07<8:21:29, 13.92s/it] 78%|███████▊ | 7840/10000 [30:45:21<8:20:05, 13.89s/it] {'loss': 0.0036, 'learning_rate': 1.0845e-05, 'epoch': 10.26} 78%|███████▊ | 7840/10000 [30:45:21<8:20:05, 13.89s/it] 78%|███████▊ | 7841/10000 [30:45:35<8:20:36, 13.91s/it] {'loss': 0.0026, 'learning_rate': 1.084e-05, 'epoch': 10.26} 78%|███████▊ | 7841/10000 [30:45:35<8:20:36, 13.91s/it] 78%|███████▊ | 7842/10000 [30:45:49<8:19:27, 13.89s/it] {'loss': 0.0028, 'learning_rate': 1.0835e-05, 'epoch': 10.26} 78%|███████▊ | 7842/10000 [30:45:49<8:19:27, 13.89s/it] 78%|███████▊ | 7843/10000 [30:46:03<8:18:56, 13.88s/it] {'loss': 0.0036, 'learning_rate': 1.083e-05, 'epoch': 10.27} 78%|███████▊ | 7843/10000 [30:46:03<8:18:56, 13.88s/it] 78%|███████▊ | 7844/10000 [30:46:17<8:20:33, 13.93s/it] {'loss': 0.017, 'learning_rate': 1.0825e-05, 'epoch': 10.27} 78%|███████▊ | 7844/10000 [30:46:17<8:20:33, 13.93s/it] 78%|███████▊ | 7845/10000 [30:46:31<8:20:26, 13.93s/it] {'loss': 0.0032, 'learning_rate': 1.0820000000000001e-05, 'epoch': 10.27} 78%|███████▊ | 7845/10000 [30:46:31<8:20:26, 13.93s/it] 78%|███████▊ | 7846/10000 [30:46:45<8:20:14, 13.93s/it] {'loss': 0.0033, 'learning_rate': 1.0815e-05, 'epoch': 10.27} 78%|███████▊ | 7846/10000 [30:46:45<8:20:14, 13.93s/it] 78%|███████▊ | 7847/10000 [30:46:59<8:20:58, 13.96s/it] {'loss': 0.0037, 'learning_rate': 1.081e-05, 'epoch': 10.27} 78%|███████▊ | 7847/10000 [30:46:59<8:20:58, 13.96s/it] 78%|███████▊ | 7848/10000 [30:47:12<8:19:12, 13.92s/it] {'loss': 0.0036, 'learning_rate': 1.0804999999999999e-05, 'epoch': 10.27} 78%|███████▊ | 7848/10000 [30:47:12<8:19:12, 13.92s/it] 78%|███████▊ | 7849/10000 [30:47:26<8:18:49, 13.91s/it] {'loss': 0.0028, 'learning_rate': 1.08e-05, 'epoch': 10.27} 78%|███████▊ | 7849/10000 [30:47:26<8:18:49, 13.91s/it] 78%|███████▊ | 7850/10000 [30:47:40<8:17:27, 13.88s/it] {'loss': 0.0065, 'learning_rate': 1.0795e-05, 'epoch': 10.27} 78%|███████▊ | 7850/10000 [30:47:40<8:17:27, 13.88s/it] 79%|███████▊ | 7851/10000 [30:47:54<8:16:43, 13.87s/it] {'loss': 0.0066, 'learning_rate': 1.079e-05, 'epoch': 10.28} 79%|███████▊ | 7851/10000 [30:47:54<8:16:43, 13.87s/it] 79%|███████▊ | 7852/10000 [30:48:08<8:15:50, 13.85s/it] {'loss': 0.0036, 'learning_rate': 1.0785000000000001e-05, 'epoch': 10.28} 79%|███████▊ | 7852/10000 [30:48:08<8:15:50, 13.85s/it] 79%|███████▊ | 7853/10000 [30:48:22<8:16:03, 13.86s/it] {'loss': 0.0031, 'learning_rate': 1.0780000000000002e-05, 'epoch': 10.28} 79%|███████▊ | 7853/10000 [30:48:22<8:16:03, 13.86s/it] 79%|███████▊ | 7854/10000 [30:48:35<8:14:53, 13.84s/it] {'loss': 0.004, 'learning_rate': 1.0775000000000001e-05, 'epoch': 10.28} 79%|███████▊ | 7854/10000 [30:48:35<8:14:53, 13.84s/it] 79%|███████▊ | 7855/10000 [30:48:49<8:14:54, 13.84s/it] {'loss': 0.0034, 'learning_rate': 1.077e-05, 'epoch': 10.28} 79%|███████▊ | 7855/10000 [30:48:49<8:14:54, 13.84s/it] 79%|███████▊ | 7856/10000 [30:49:03<8:15:09, 13.86s/it] {'loss': 0.0037, 'learning_rate': 1.0765e-05, 'epoch': 10.28} 79%|███████▊ | 7856/10000 [30:49:03<8:15:09, 13.86s/it] 79%|███████▊ | 7857/10000 [30:49:17<8:14:12, 13.84s/it] {'loss': 0.0042, 'learning_rate': 1.076e-05, 'epoch': 10.28} 79%|███████▊ | 7857/10000 [30:49:17<8:14:12, 13.84s/it] 79%|███████▊ | 7858/10000 [30:49:31<8:14:58, 13.86s/it] {'loss': 0.0044, 'learning_rate': 1.0755000000000001e-05, 'epoch': 10.29} 79%|███████▊ | 7858/10000 [30:49:31<8:14:58, 13.86s/it] 79%|███████▊ | 7859/10000 [30:49:45<8:14:14, 13.85s/it] {'loss': 0.0032, 'learning_rate': 1.075e-05, 'epoch': 10.29} 79%|███████▊ | 7859/10000 [30:49:45<8:14:14, 13.85s/it] 79%|███████▊ | 7860/10000 [30:49:58<8:13:20, 13.83s/it] {'loss': 0.0047, 'learning_rate': 1.0745000000000001e-05, 'epoch': 10.29} 79%|███████▊ | 7860/10000 [30:49:59<8:13:20, 13.83s/it] 79%|███████▊ | 7861/10000 [30:50:12<8:13:01, 13.83s/it] {'loss': 0.0033, 'learning_rate': 1.074e-05, 'epoch': 10.29} 79%|███████▊ | 7861/10000 [30:50:12<8:13:01, 13.83s/it] 79%|███████▊ | 7862/10000 [30:50:26<8:13:05, 13.84s/it] {'loss': 0.0032, 'learning_rate': 1.0735000000000001e-05, 'epoch': 10.29} 79%|███████▊ | 7862/10000 [30:50:26<8:13:05, 13.84s/it] 79%|███████▊ | 7863/10000 [30:50:40<8:12:43, 13.83s/it] {'loss': 0.0056, 'learning_rate': 1.073e-05, 'epoch': 10.29} 79%|███████▊ | 7863/10000 [30:50:40<8:12:43, 13.83s/it] 79%|███████▊ | 7864/10000 [30:50:54<8:13:06, 13.85s/it] {'loss': 0.0026, 'learning_rate': 1.0725e-05, 'epoch': 10.29} 79%|███████▊ | 7864/10000 [30:50:54<8:13:06, 13.85s/it] 79%|███████▊ | 7865/10000 [30:51:08<8:11:23, 13.81s/it] {'loss': 0.0035, 'learning_rate': 1.072e-05, 'epoch': 10.29} 79%|███████▊ | 7865/10000 [30:51:08<8:11:23, 13.81s/it] 79%|███████▊ | 7866/10000 [30:51:21<8:10:42, 13.80s/it] {'loss': 0.0031, 'learning_rate': 1.0715e-05, 'epoch': 10.3} 79%|███████▊ | 7866/10000 [30:51:21<8:10:42, 13.80s/it] 79%|███████▊ | 7867/10000 [30:51:35<8:10:25, 13.80s/it] {'loss': 0.005, 'learning_rate': 1.071e-05, 'epoch': 10.3} 79%|███████▊ | 7867/10000 [30:51:35<8:10:25, 13.80s/it] 79%|███████▊ | 7868/10000 [30:51:49<8:09:56, 13.79s/it] {'loss': 0.0042, 'learning_rate': 1.0705000000000002e-05, 'epoch': 10.3} 79%|███████▊ | 7868/10000 [30:51:49<8:09:56, 13.79s/it] 79%|███████▊ | 7869/10000 [30:52:03<8:09:55, 13.79s/it] {'loss': 0.0031, 'learning_rate': 1.0700000000000001e-05, 'epoch': 10.3} 79%|███████▊ | 7869/10000 [30:52:03<8:09:55, 13.79s/it] 79%|███████▊ | 7870/10000 [30:52:17<8:09:27, 13.79s/it] {'loss': 0.0027, 'learning_rate': 1.0695e-05, 'epoch': 10.3} 79%|███████▊ | 7870/10000 [30:52:17<8:09:27, 13.79s/it] 79%|███████▊ | 7871/10000 [30:52:30<8:10:41, 13.83s/it] {'loss': 0.0044, 'learning_rate': 1.069e-05, 'epoch': 10.3} 79%|███████▊ | 7871/10000 [30:52:30<8:10:41, 13.83s/it] 79%|███████▊ | 7872/10000 [30:52:44<8:12:11, 13.88s/it] {'loss': 0.0058, 'learning_rate': 1.0685e-05, 'epoch': 10.3} 79%|███████▊ | 7872/10000 [30:52:44<8:12:11, 13.88s/it] 79%|███████▊ | 7873/10000 [30:52:58<8:12:29, 13.89s/it] {'loss': 0.0027, 'learning_rate': 1.0680000000000001e-05, 'epoch': 10.3} 79%|███████▊ | 7873/10000 [30:52:58<8:12:29, 13.89s/it] 79%|███████▊ | 7874/10000 [30:53:12<8:12:30, 13.90s/it] {'loss': 0.0043, 'learning_rate': 1.0675e-05, 'epoch': 10.31} 79%|███████▊ | 7874/10000 [30:53:12<8:12:30, 13.90s/it] 79%|███████▉ | 7875/10000 [30:53:26<8:12:01, 13.89s/it] {'loss': 0.0034, 'learning_rate': 1.0670000000000001e-05, 'epoch': 10.31} 79%|███████▉ | 7875/10000 [30:53:26<8:12:01, 13.89s/it] 79%|███████▉ | 7876/10000 [30:53:40<8:11:09, 13.87s/it] {'loss': 0.0044, 'learning_rate': 1.0665e-05, 'epoch': 10.31} 79%|███████▉ | 7876/10000 [30:53:40<8:11:09, 13.87s/it] 79%|███████▉ | 7877/10000 [30:53:54<8:10:12, 13.85s/it] {'loss': 0.0036, 'learning_rate': 1.0660000000000001e-05, 'epoch': 10.31} 79%|███████▉ | 7877/10000 [30:53:54<8:10:12, 13.85s/it] 79%|███████▉ | 7878/10000 [30:54:08<8:09:57, 13.85s/it] {'loss': 0.0039, 'learning_rate': 1.0655e-05, 'epoch': 10.31} 79%|███████▉ | 7878/10000 [30:54:08<8:09:57, 13.85s/it] 79%|███████▉ | 7879/10000 [30:54:22<8:11:14, 13.90s/it] {'loss': 0.0047, 'learning_rate': 1.065e-05, 'epoch': 10.31} 79%|███████▉ | 7879/10000 [30:54:22<8:11:14, 13.90s/it] 79%|███████▉ | 7880/10000 [30:54:36<8:12:29, 13.94s/it] {'loss': 0.0035, 'learning_rate': 1.0645e-05, 'epoch': 10.31} 79%|███████▉ | 7880/10000 [30:54:36<8:12:29, 13.94s/it] 79%|███████▉ | 7881/10000 [30:54:50<8:12:34, 13.95s/it] {'loss': 0.0035, 'learning_rate': 1.064e-05, 'epoch': 10.32} 79%|███████▉ | 7881/10000 [30:54:50<8:12:34, 13.95s/it] 79%|███████▉ | 7882/10000 [30:55:03<8:10:47, 13.90s/it] {'loss': 0.003, 'learning_rate': 1.0635e-05, 'epoch': 10.32} 79%|███████▉ | 7882/10000 [30:55:03<8:10:47, 13.90s/it] 79%|███████▉ | 7883/10000 [30:55:17<8:08:56, 13.86s/it] {'loss': 0.0034, 'learning_rate': 1.0630000000000002e-05, 'epoch': 10.32} 79%|███████▉ | 7883/10000 [30:55:17<8:08:56, 13.86s/it] 79%|███████▉ | 7884/10000 [30:55:31<8:08:22, 13.85s/it] {'loss': 0.0034, 'learning_rate': 1.0625e-05, 'epoch': 10.32} 79%|███████▉ | 7884/10000 [30:55:31<8:08:22, 13.85s/it] 79%|███████▉ | 7885/10000 [30:55:45<8:07:59, 13.84s/it] {'loss': 0.0034, 'learning_rate': 1.062e-05, 'epoch': 10.32} 79%|███████▉ | 7885/10000 [30:55:45<8:07:59, 13.84s/it] 79%|███████▉ | 7886/10000 [30:55:59<8:09:08, 13.88s/it] {'loss': 0.0032, 'learning_rate': 1.0615e-05, 'epoch': 10.32} 79%|███████▉ | 7886/10000 [30:55:59<8:09:08, 13.88s/it] 79%|███████▉ | 7887/10000 [30:56:13<8:11:06, 13.95s/it] {'loss': 0.0046, 'learning_rate': 1.061e-05, 'epoch': 10.32} 79%|███████▉ | 7887/10000 [30:56:13<8:11:06, 13.95s/it] 79%|███████▉ | 7888/10000 [30:56:27<8:09:33, 13.91s/it] {'loss': 0.0093, 'learning_rate': 1.0605000000000001e-05, 'epoch': 10.32} 79%|███████▉ | 7888/10000 [30:56:27<8:09:33, 13.91s/it] 79%|███████▉ | 7889/10000 [30:56:41<8:09:01, 13.90s/it] {'loss': 0.0039, 'learning_rate': 1.06e-05, 'epoch': 10.33} 79%|███████▉ | 7889/10000 [30:56:41<8:09:01, 13.90s/it] 79%|███████▉ | 7890/10000 [30:56:55<8:09:08, 13.91s/it] {'loss': 0.0051, 'learning_rate': 1.0595000000000001e-05, 'epoch': 10.33} 79%|███████▉ | 7890/10000 [30:56:55<8:09:08, 13.91s/it] 79%|███████▉ | 7891/10000 [30:57:08<8:08:52, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.059e-05, 'epoch': 10.33} 79%|███████▉ | 7891/10000 [30:57:08<8:08:52, 13.91s/it] 79%|███████▉ | 7892/10000 [30:57:22<8:08:24, 13.90s/it] {'loss': 0.0036, 'learning_rate': 1.0585000000000001e-05, 'epoch': 10.33} 79%|███████▉ | 7892/10000 [30:57:22<8:08:24, 13.90s/it] 79%|███████▉ | 7893/10000 [30:57:36<8:08:16, 13.90s/it] {'loss': 0.0032, 'learning_rate': 1.058e-05, 'epoch': 10.33} 79%|███████▉ | 7893/10000 [30:57:36<8:08:16, 13.90s/it] 79%|███████▉ | 7894/10000 [30:57:50<8:07:27, 13.89s/it] {'loss': 0.0039, 'learning_rate': 1.0575e-05, 'epoch': 10.33} 79%|███████▉ | 7894/10000 [30:57:50<8:07:27, 13.89s/it] 79%|███████▉ | 7895/10000 [30:58:04<8:07:20, 13.89s/it] {'loss': 0.0038, 'learning_rate': 1.057e-05, 'epoch': 10.33} 79%|███████▉ | 7895/10000 [30:58:04<8:07:20, 13.89s/it] 79%|███████▉ | 7896/10000 [30:58:18<8:06:30, 13.87s/it] {'loss': 0.0031, 'learning_rate': 1.0565e-05, 'epoch': 10.34} 79%|███████▉ | 7896/10000 [30:58:18<8:06:30, 13.87s/it] 79%|███████▉ | 7897/10000 [30:58:32<8:06:14, 13.87s/it] {'loss': 0.0035, 'learning_rate': 1.056e-05, 'epoch': 10.34} 79%|███████▉ | 7897/10000 [30:58:32<8:06:14, 13.87s/it] 79%|███████▉ | 7898/10000 [30:58:46<8:06:29, 13.89s/it] {'loss': 0.0042, 'learning_rate': 1.0555000000000001e-05, 'epoch': 10.34} 79%|███████▉ | 7898/10000 [30:58:46<8:06:29, 13.89s/it] 79%|███████▉ | 7899/10000 [30:59:00<8:07:31, 13.92s/it] {'loss': 0.0044, 'learning_rate': 1.055e-05, 'epoch': 10.34} 79%|███████▉ | 7899/10000 [30:59:00<8:07:31, 13.92s/it] 79%|███████▉ | 7900/10000 [30:59:13<8:06:40, 13.91s/it] {'loss': 0.0037, 'learning_rate': 1.0545000000000002e-05, 'epoch': 10.34} 79%|███████▉ | 7900/10000 [30:59:14<8:06:40, 13.91s/it] 79%|███████▉ | 7901/10000 [30:59:27<8:07:12, 13.93s/it] {'loss': 0.0028, 'learning_rate': 1.0539999999999999e-05, 'epoch': 10.34} 79%|███████▉ | 7901/10000 [30:59:28<8:07:12, 13.93s/it] 79%|███████▉ | 7902/10000 [30:59:41<8:06:25, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.0535e-05, 'epoch': 10.34} 79%|███████▉ | 7902/10000 [30:59:41<8:06:25, 13.91s/it] 79%|███████▉ | 7903/10000 [30:59:55<8:06:54, 13.93s/it] {'loss': 0.0043, 'learning_rate': 1.053e-05, 'epoch': 10.34} 79%|███████▉ | 7903/10000 [30:59:55<8:06:54, 13.93s/it] 79%|███████▉ | 7904/10000 [31:00:09<8:05:33, 13.90s/it] {'loss': 0.0038, 'learning_rate': 1.0525e-05, 'epoch': 10.35} 79%|███████▉ | 7904/10000 [31:00:09<8:05:33, 13.90s/it] 79%|███████▉ | 7905/10000 [31:00:23<8:05:05, 13.89s/it] {'loss': 0.0042, 'learning_rate': 1.0520000000000001e-05, 'epoch': 10.35} 79%|███████▉ | 7905/10000 [31:00:23<8:05:05, 13.89s/it] 79%|███████▉ | 7906/10000 [31:00:37<8:05:18, 13.91s/it] {'loss': 0.0033, 'learning_rate': 1.0515e-05, 'epoch': 10.35} 79%|███████▉ | 7906/10000 [31:00:37<8:05:18, 13.91s/it] 79%|███████▉ | 7907/10000 [31:00:51<8:06:13, 13.94s/it] {'loss': 0.004, 'learning_rate': 1.0510000000000001e-05, 'epoch': 10.35} 79%|███████▉ | 7907/10000 [31:00:51<8:06:13, 13.94s/it] 79%|███████▉ | 7908/10000 [31:01:05<8:04:20, 13.89s/it] {'loss': 0.0045, 'learning_rate': 1.0505e-05, 'epoch': 10.35} 79%|███████▉ | 7908/10000 [31:01:05<8:04:20, 13.89s/it] 79%|███████▉ | 7909/10000 [31:01:19<8:05:29, 13.93s/it] {'loss': 0.0042, 'learning_rate': 1.05e-05, 'epoch': 10.35} 79%|███████▉ | 7909/10000 [31:01:19<8:05:29, 13.93s/it] 79%|███████▉ | 7910/10000 [31:01:33<8:05:33, 13.94s/it] {'loss': 0.0043, 'learning_rate': 1.0495e-05, 'epoch': 10.35} 79%|███████▉ | 7910/10000 [31:01:33<8:05:33, 13.94s/it] 79%|███████▉ | 7911/10000 [31:01:47<8:04:02, 13.90s/it] {'loss': 0.0035, 'learning_rate': 1.049e-05, 'epoch': 10.35} 79%|███████▉ | 7911/10000 [31:01:47<8:04:02, 13.90s/it] 79%|███████▉ | 7912/10000 [31:02:00<8:03:29, 13.89s/it] {'loss': 0.0041, 'learning_rate': 1.0485e-05, 'epoch': 10.36} 79%|███████▉ | 7912/10000 [31:02:00<8:03:29, 13.89s/it] 79%|███████▉ | 7913/10000 [31:02:14<8:03:01, 13.89s/it] {'loss': 0.0041, 'learning_rate': 1.0480000000000001e-05, 'epoch': 10.36} 79%|███████▉ | 7913/10000 [31:02:14<8:03:01, 13.89s/it] 79%|███████▉ | 7914/10000 [31:02:28<8:04:03, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.0475e-05, 'epoch': 10.36} 79%|███████▉ | 7914/10000 [31:02:28<8:04:03, 13.92s/it] 79%|███████▉ | 7915/10000 [31:02:42<8:03:16, 13.91s/it] {'loss': 0.0035, 'learning_rate': 1.0470000000000001e-05, 'epoch': 10.36} 79%|███████▉ | 7915/10000 [31:02:42<8:03:16, 13.91s/it] 79%|███████▉ | 7916/10000 [31:02:56<8:03:20, 13.92s/it] {'loss': 0.0042, 'learning_rate': 1.0465e-05, 'epoch': 10.36} 79%|███████▉ | 7916/10000 [31:02:56<8:03:20, 13.92s/it] 79%|███████▉ | 7917/10000 [31:03:10<8:03:03, 13.91s/it] {'loss': 0.0044, 'learning_rate': 1.046e-05, 'epoch': 10.36} 79%|███████▉ | 7917/10000 [31:03:10<8:03:03, 13.91s/it] 79%|███████▉ | 7918/10000 [31:03:24<8:02:54, 13.92s/it] {'loss': 0.0047, 'learning_rate': 1.0455e-05, 'epoch': 10.36} 79%|███████▉ | 7918/10000 [31:03:24<8:02:54, 13.92s/it] 79%|███████▉ | 7919/10000 [31:03:38<8:02:34, 13.91s/it] {'loss': 0.0058, 'learning_rate': 1.045e-05, 'epoch': 10.37} 79%|███████▉ | 7919/10000 [31:03:38<8:02:34, 13.91s/it] 79%|███████▉ | 7920/10000 [31:03:52<8:01:38, 13.89s/it] {'loss': 0.0034, 'learning_rate': 1.0445e-05, 'epoch': 10.37} 79%|███████▉ | 7920/10000 [31:03:52<8:01:38, 13.89s/it] 79%|███████▉ | 7921/10000 [31:04:06<8:02:44, 13.93s/it] {'loss': 0.0039, 'learning_rate': 1.0440000000000002e-05, 'epoch': 10.37} 79%|███████▉ | 7921/10000 [31:04:06<8:02:44, 13.93s/it] 79%|███████▉ | 7922/10000 [31:04:20<8:02:20, 13.93s/it] {'loss': 0.003, 'learning_rate': 1.0435000000000001e-05, 'epoch': 10.37} 79%|███████▉ | 7922/10000 [31:04:20<8:02:20, 13.93s/it] 79%|███████▉ | 7923/10000 [31:04:33<8:00:06, 13.87s/it] {'loss': 0.0072, 'learning_rate': 1.043e-05, 'epoch': 10.37} 79%|███████▉ | 7923/10000 [31:04:33<8:00:06, 13.87s/it] 79%|███████▉ | 7924/10000 [31:04:47<7:59:43, 13.86s/it] {'loss': 0.0035, 'learning_rate': 1.0425e-05, 'epoch': 10.37} 79%|███████▉ | 7924/10000 [31:04:47<7:59:43, 13.86s/it] 79%|███████▉ | 7925/10000 [31:05:01<7:59:39, 13.87s/it] {'loss': 0.0045, 'learning_rate': 1.042e-05, 'epoch': 10.37} 79%|███████▉ | 7925/10000 [31:05:01<7:59:39, 13.87s/it] 79%|███████▉ | 7926/10000 [31:05:15<8:00:30, 13.90s/it] {'loss': 0.0031, 'learning_rate': 1.0415000000000001e-05, 'epoch': 10.37} 79%|███████▉ | 7926/10000 [31:05:15<8:00:30, 13.90s/it] 79%|███████▉ | 7927/10000 [31:05:29<7:59:46, 13.89s/it] {'loss': 0.0047, 'learning_rate': 1.041e-05, 'epoch': 10.38} 79%|███████▉ | 7927/10000 [31:05:29<7:59:46, 13.89s/it] 79%|███████▉ | 7928/10000 [31:05:43<7:59:17, 13.88s/it] {'loss': 0.0039, 'learning_rate': 1.0405000000000001e-05, 'epoch': 10.38} 79%|███████▉ | 7928/10000 [31:05:43<7:59:17, 13.88s/it] 79%|███████▉ | 7929/10000 [31:05:57<7:59:45, 13.90s/it] {'loss': 0.0025, 'learning_rate': 1.04e-05, 'epoch': 10.38} 79%|███████▉ | 7929/10000 [31:05:57<7:59:45, 13.90s/it] 79%|███████▉ | 7930/10000 [31:06:11<7:58:58, 13.88s/it] {'loss': 0.0049, 'learning_rate': 1.0395000000000001e-05, 'epoch': 10.38} 79%|███████▉ | 7930/10000 [31:06:11<7:58:58, 13.88s/it] 79%|███████▉ | 7931/10000 [31:06:25<7:59:23, 13.90s/it] {'loss': 0.0019, 'learning_rate': 1.039e-05, 'epoch': 10.38} 79%|███████▉ | 7931/10000 [31:06:25<7:59:23, 13.90s/it] 79%|███████▉ | 7932/10000 [31:06:38<7:58:35, 13.89s/it] {'loss': 0.0041, 'learning_rate': 1.0385e-05, 'epoch': 10.38} 79%|███████▉ | 7932/10000 [31:06:38<7:58:35, 13.89s/it] 79%|███████▉ | 7933/10000 [31:06:52<7:58:51, 13.90s/it] {'loss': 0.0074, 'learning_rate': 1.038e-05, 'epoch': 10.38} 79%|███████▉ | 7933/10000 [31:06:52<7:58:51, 13.90s/it] 79%|███████▉ | 7934/10000 [31:07:06<7:59:12, 13.92s/it] {'loss': 0.0057, 'learning_rate': 1.0375e-05, 'epoch': 10.38} 79%|███████▉ | 7934/10000 [31:07:06<7:59:12, 13.92s/it] 79%|███████▉ | 7935/10000 [31:07:20<8:01:10, 13.98s/it] {'loss': 0.0038, 'learning_rate': 1.037e-05, 'epoch': 10.39} 79%|███████▉ | 7935/10000 [31:07:20<8:01:10, 13.98s/it] 79%|███████▉ | 7936/10000 [31:07:34<8:00:40, 13.97s/it] {'loss': 0.0038, 'learning_rate': 1.0365000000000002e-05, 'epoch': 10.39} 79%|███████▉ | 7936/10000 [31:07:34<8:00:40, 13.97s/it] 79%|███████▉ | 7937/10000 [31:07:48<8:00:17, 13.97s/it] {'loss': 0.0042, 'learning_rate': 1.036e-05, 'epoch': 10.39} 79%|███████▉ | 7937/10000 [31:07:48<8:00:17, 13.97s/it] 79%|███████▉ | 7938/10000 [31:08:02<7:59:24, 13.95s/it] {'loss': 0.0036, 'learning_rate': 1.0355000000000002e-05, 'epoch': 10.39} 79%|███████▉ | 7938/10000 [31:08:02<7:59:24, 13.95s/it] 79%|███████▉ | 7939/10000 [31:08:16<7:58:23, 13.93s/it] {'loss': 0.0062, 'learning_rate': 1.035e-05, 'epoch': 10.39} 79%|███████▉ | 7939/10000 [31:08:16<7:58:23, 13.93s/it] 79%|███████▉ | 7940/10000 [31:08:30<7:59:19, 13.96s/it] {'loss': 0.0033, 'learning_rate': 1.0345e-05, 'epoch': 10.39} 79%|███████▉ | 7940/10000 [31:08:30<7:59:19, 13.96s/it] 79%|███████▉ | 7941/10000 [31:08:44<7:58:09, 13.93s/it] {'loss': 0.0026, 'learning_rate': 1.0340000000000001e-05, 'epoch': 10.39} 79%|███████▉ | 7941/10000 [31:08:44<7:58:09, 13.93s/it] 79%|███████▉ | 7942/10000 [31:08:58<7:57:46, 13.93s/it] {'loss': 0.0043, 'learning_rate': 1.0335e-05, 'epoch': 10.4} 79%|███████▉ | 7942/10000 [31:08:58<7:57:46, 13.93s/it] 79%|███████▉ | 7943/10000 [31:09:12<7:57:33, 13.93s/it] {'loss': 0.0051, 'learning_rate': 1.0330000000000001e-05, 'epoch': 10.4} 79%|███████▉ | 7943/10000 [31:09:12<7:57:33, 13.93s/it] 79%|███████▉ | 7944/10000 [31:09:26<7:57:30, 13.93s/it] {'loss': 0.0031, 'learning_rate': 1.0325e-05, 'epoch': 10.4} 79%|███████▉ | 7944/10000 [31:09:26<7:57:30, 13.93s/it] 79%|███████▉ | 7945/10000 [31:09:40<7:56:35, 13.92s/it] {'loss': 0.0028, 'learning_rate': 1.0320000000000001e-05, 'epoch': 10.4} 79%|███████▉ | 7945/10000 [31:09:40<7:56:35, 13.92s/it] 79%|███████▉ | 7946/10000 [31:09:54<7:56:21, 13.92s/it] {'loss': 0.0024, 'learning_rate': 1.0315e-05, 'epoch': 10.4} 79%|███████▉ | 7946/10000 [31:09:54<7:56:21, 13.92s/it] 79%|███████▉ | 7947/10000 [31:10:08<7:56:14, 13.92s/it] {'loss': 0.0031, 'learning_rate': 1.031e-05, 'epoch': 10.4} 79%|███████▉ | 7947/10000 [31:10:08<7:56:14, 13.92s/it] 79%|███████▉ | 7948/10000 [31:10:21<7:55:48, 13.91s/it] {'loss': 0.0036, 'learning_rate': 1.0305e-05, 'epoch': 10.4} 79%|███████▉ | 7948/10000 [31:10:21<7:55:48, 13.91s/it] 79%|███████▉ | 7949/10000 [31:10:35<7:55:36, 13.91s/it] {'loss': 0.003, 'learning_rate': 1.03e-05, 'epoch': 10.4} 79%|███████▉ | 7949/10000 [31:10:35<7:55:36, 13.91s/it] 80%|███████▉ | 7950/10000 [31:10:49<7:55:35, 13.92s/it] {'loss': 0.0029, 'learning_rate': 1.0295e-05, 'epoch': 10.41} 80%|███████▉ | 7950/10000 [31:10:49<7:55:35, 13.92s/it] 80%|███████▉ | 7951/10000 [31:11:03<7:54:51, 13.90s/it] {'loss': 0.0034, 'learning_rate': 1.0290000000000001e-05, 'epoch': 10.41} 80%|███████▉ | 7951/10000 [31:11:03<7:54:51, 13.90s/it] 80%|███████▉ | 7952/10000 [31:11:17<7:56:18, 13.95s/it] {'loss': 0.0042, 'learning_rate': 1.0285e-05, 'epoch': 10.41} 80%|███████▉ | 7952/10000 [31:11:17<7:56:18, 13.95s/it] 80%|███████▉ | 7953/10000 [31:11:31<7:55:40, 13.94s/it] {'loss': 0.004, 'learning_rate': 1.0280000000000002e-05, 'epoch': 10.41} 80%|███████▉ | 7953/10000 [31:11:31<7:55:40, 13.94s/it] 80%|███████▉ | 7954/10000 [31:11:45<7:56:06, 13.96s/it] {'loss': 0.0058, 'learning_rate': 1.0275e-05, 'epoch': 10.41} 80%|███████▉ | 7954/10000 [31:11:45<7:56:06, 13.96s/it] 80%|███████▉ | 7955/10000 [31:11:59<7:54:36, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.027e-05, 'epoch': 10.41} 80%|███████▉ | 7955/10000 [31:11:59<7:54:36, 13.92s/it] 80%|███████▉ | 7956/10000 [31:12:13<7:55:25, 13.96s/it] {'loss': 0.0036, 'learning_rate': 1.0265e-05, 'epoch': 10.41} 80%|███████▉ | 7956/10000 [31:12:13<7:55:25, 13.96s/it] 80%|███████▉ | 7957/10000 [31:12:27<7:54:27, 13.93s/it] {'loss': 0.0033, 'learning_rate': 1.026e-05, 'epoch': 10.41} 80%|███████▉ | 7957/10000 [31:12:27<7:54:27, 13.93s/it] 80%|███████▉ | 7958/10000 [31:12:41<7:53:44, 13.92s/it] {'loss': 0.003, 'learning_rate': 1.0255000000000001e-05, 'epoch': 10.42} 80%|███████▉ | 7958/10000 [31:12:41<7:53:44, 13.92s/it] 80%|███████▉ | 7959/10000 [31:12:55<7:53:09, 13.91s/it] {'loss': 0.0043, 'learning_rate': 1.025e-05, 'epoch': 10.42} 80%|███████▉ | 7959/10000 [31:12:55<7:53:09, 13.91s/it] 80%|███████▉ | 7960/10000 [31:13:09<7:54:06, 13.94s/it] {'loss': 0.0032, 'learning_rate': 1.0245000000000001e-05, 'epoch': 10.42} 80%|███████▉ | 7960/10000 [31:13:09<7:54:06, 13.94s/it] 80%|███████▉ | 7961/10000 [31:13:23<7:52:47, 13.91s/it] {'loss': 0.0029, 'learning_rate': 1.024e-05, 'epoch': 10.42} 80%|███████▉ | 7961/10000 [31:13:23<7:52:47, 13.91s/it] 80%|███████▉ | 7962/10000 [31:13:36<7:51:47, 13.89s/it] {'loss': 0.008, 'learning_rate': 1.0235e-05, 'epoch': 10.42} 80%|███████▉ | 7962/10000 [31:13:36<7:51:47, 13.89s/it] 80%|███████▉ | 7963/10000 [31:13:50<7:51:19, 13.88s/it] {'loss': 0.0039, 'learning_rate': 1.023e-05, 'epoch': 10.42} 80%|███████▉ | 7963/10000 [31:13:50<7:51:19, 13.88s/it] 80%|███████▉ | 7964/10000 [31:14:04<7:51:42, 13.90s/it] {'loss': 0.0044, 'learning_rate': 1.0225e-05, 'epoch': 10.42} 80%|███████▉ | 7964/10000 [31:14:04<7:51:42, 13.90s/it] 80%|███████▉ | 7965/10000 [31:14:18<7:51:18, 13.90s/it] {'loss': 0.0037, 'learning_rate': 1.022e-05, 'epoch': 10.43} 80%|███████▉ | 7965/10000 [31:14:18<7:51:18, 13.90s/it] 80%|███████▉ | 7966/10000 [31:14:32<7:51:54, 13.92s/it] {'loss': 0.004, 'learning_rate': 1.0215000000000001e-05, 'epoch': 10.43} 80%|███████▉ | 7966/10000 [31:14:32<7:51:54, 13.92s/it] 80%|███████▉ | 7967/10000 [31:14:46<7:50:40, 13.89s/it] {'loss': 0.0054, 'learning_rate': 1.021e-05, 'epoch': 10.43} 80%|███████▉ | 7967/10000 [31:14:46<7:50:40, 13.89s/it] 80%|███████▉ | 7968/10000 [31:15:00<7:50:11, 13.88s/it] {'loss': 0.0035, 'learning_rate': 1.0205000000000001e-05, 'epoch': 10.43} 80%|███████▉ | 7968/10000 [31:15:00<7:50:11, 13.88s/it] 80%|███████▉ | 7969/10000 [31:15:14<7:49:42, 13.88s/it] {'loss': 0.0043, 'learning_rate': 1.02e-05, 'epoch': 10.43} 80%|███████▉ | 7969/10000 [31:15:14<7:49:42, 13.88s/it] 80%|███████▉ | 7970/10000 [31:15:27<7:49:45, 13.88s/it] {'loss': 0.0029, 'learning_rate': 1.0195e-05, 'epoch': 10.43} 80%|███████▉ | 7970/10000 [31:15:28<7:49:45, 13.88s/it] 80%|███████▉ | 7971/10000 [31:15:41<7:49:50, 13.89s/it] {'loss': 0.0083, 'learning_rate': 1.019e-05, 'epoch': 10.43} 80%|███████▉ | 7971/10000 [31:15:41<7:49:50, 13.89s/it] 80%|███████▉ | 7972/10000 [31:15:55<7:49:13, 13.88s/it] {'loss': 0.004, 'learning_rate': 1.0185e-05, 'epoch': 10.43} 80%|███████▉ | 7972/10000 [31:15:55<7:49:13, 13.88s/it] 80%|███████▉ | 7973/10000 [31:16:09<7:49:14, 13.89s/it] {'loss': 0.003, 'learning_rate': 1.018e-05, 'epoch': 10.44} 80%|███████▉ | 7973/10000 [31:16:09<7:49:14, 13.89s/it] 80%|███████▉ | 7974/10000 [31:16:23<7:49:26, 13.90s/it] {'loss': 0.0054, 'learning_rate': 1.0175e-05, 'epoch': 10.44} 80%|███████▉ | 7974/10000 [31:16:23<7:49:26, 13.90s/it] 80%|███████▉ | 7975/10000 [31:16:37<7:50:37, 13.94s/it] {'loss': 0.0034, 'learning_rate': 1.0170000000000001e-05, 'epoch': 10.44} 80%|███████▉ | 7975/10000 [31:16:37<7:50:37, 13.94s/it] 80%|███████▉ | 7976/10000 [31:16:51<7:49:30, 13.92s/it] {'loss': 0.0038, 'learning_rate': 1.0165e-05, 'epoch': 10.44} 80%|███████▉ | 7976/10000 [31:16:51<7:49:30, 13.92s/it] 80%|███████▉ | 7977/10000 [31:17:05<7:48:49, 13.90s/it] {'loss': 0.0034, 'learning_rate': 1.016e-05, 'epoch': 10.44} 80%|███████▉ | 7977/10000 [31:17:05<7:48:49, 13.90s/it] 80%|███████▉ | 7978/10000 [31:17:19<7:48:07, 13.89s/it] {'loss': 0.0045, 'learning_rate': 1.0155e-05, 'epoch': 10.44} 80%|███████▉ | 7978/10000 [31:17:19<7:48:07, 13.89s/it] 80%|███████▉ | 7979/10000 [31:17:33<7:47:12, 13.87s/it] {'loss': 0.0066, 'learning_rate': 1.0150000000000001e-05, 'epoch': 10.44} 80%|███████▉ | 7979/10000 [31:17:33<7:47:12, 13.87s/it] 80%|███████▉ | 7980/10000 [31:17:47<7:48:14, 13.91s/it] {'loss': 0.0026, 'learning_rate': 1.0145e-05, 'epoch': 10.45} 80%|███████▉ | 7980/10000 [31:17:47<7:48:14, 13.91s/it] 80%|███████▉ | 7981/10000 [31:18:00<7:46:39, 13.87s/it] {'loss': 0.0037, 'learning_rate': 1.0140000000000001e-05, 'epoch': 10.45} 80%|███████▉ | 7981/10000 [31:18:00<7:46:39, 13.87s/it] 80%|███████▉ | 7982/10000 [31:18:14<7:48:00, 13.91s/it] {'loss': 0.0035, 'learning_rate': 1.0135e-05, 'epoch': 10.45} 80%|███████▉ | 7982/10000 [31:18:14<7:48:00, 13.91s/it] 80%|███████▉ | 7983/10000 [31:18:28<7:47:13, 13.90s/it] {'loss': 0.0033, 'learning_rate': 1.0130000000000001e-05, 'epoch': 10.45} 80%|███████▉ | 7983/10000 [31:18:28<7:47:13, 13.90s/it] 80%|███████▉ | 7984/10000 [31:18:42<7:47:22, 13.91s/it] {'loss': 0.0046, 'learning_rate': 1.0125e-05, 'epoch': 10.45} 80%|███████▉ | 7984/10000 [31:18:42<7:47:22, 13.91s/it] 80%|███████▉ | 7985/10000 [31:18:56<7:47:45, 13.93s/it] {'loss': 0.0028, 'learning_rate': 1.012e-05, 'epoch': 10.45} 80%|███████▉ | 7985/10000 [31:18:56<7:47:45, 13.93s/it] 80%|███████▉ | 7986/10000 [31:19:10<7:47:12, 13.92s/it] {'loss': 0.0028, 'learning_rate': 1.0115e-05, 'epoch': 10.45} 80%|███████▉ | 7986/10000 [31:19:10<7:47:12, 13.92s/it] 80%|███████▉ | 7987/10000 [31:19:24<7:46:55, 13.92s/it] {'loss': 0.0033, 'learning_rate': 1.011e-05, 'epoch': 10.45} 80%|███████▉ | 7987/10000 [31:19:24<7:46:55, 13.92s/it] 80%|███████▉ | 7988/10000 [31:19:38<7:46:11, 13.90s/it] {'loss': 0.0026, 'learning_rate': 1.0105e-05, 'epoch': 10.46} 80%|███████▉ | 7988/10000 [31:19:38<7:46:11, 13.90s/it] 80%|███████▉ | 7989/10000 [31:19:52<7:46:04, 13.91s/it] {'loss': 0.006, 'learning_rate': 1.0100000000000002e-05, 'epoch': 10.46} 80%|███████▉ | 7989/10000 [31:19:52<7:46:04, 13.91s/it] 80%|███████▉ | 7990/10000 [31:20:06<7:46:50, 13.94s/it] {'loss': 0.0041, 'learning_rate': 1.0095e-05, 'epoch': 10.46} 80%|███████▉ | 7990/10000 [31:20:06<7:46:50, 13.94s/it] 80%|███████▉ | 7991/10000 [31:20:20<7:47:36, 13.97s/it] {'loss': 0.004, 'learning_rate': 1.0090000000000002e-05, 'epoch': 10.46} 80%|███████▉ | 7991/10000 [31:20:20<7:47:36, 13.97s/it] 80%|███████▉ | 7992/10000 [31:20:34<7:46:43, 13.95s/it] {'loss': 0.0045, 'learning_rate': 1.0085e-05, 'epoch': 10.46} 80%|███████▉ | 7992/10000 [31:20:34<7:46:43, 13.95s/it] 80%|███████▉ | 7993/10000 [31:20:48<7:46:33, 13.95s/it] {'loss': 0.0032, 'learning_rate': 1.008e-05, 'epoch': 10.46} 80%|███████▉ | 7993/10000 [31:20:48<7:46:33, 13.95s/it] 80%|███████▉ | 7994/10000 [31:21:01<7:45:12, 13.91s/it] {'loss': 0.004, 'learning_rate': 1.0075000000000001e-05, 'epoch': 10.46} 80%|███████▉ | 7994/10000 [31:21:01<7:45:12, 13.91s/it] 80%|███████▉ | 7995/10000 [31:21:15<7:45:23, 13.93s/it] {'loss': 0.0037, 'learning_rate': 1.007e-05, 'epoch': 10.46} 80%|███████▉ | 7995/10000 [31:21:15<7:45:23, 13.93s/it] 80%|███████▉ | 7996/10000 [31:21:29<7:45:15, 13.93s/it] {'loss': 0.0145, 'learning_rate': 1.0065000000000001e-05, 'epoch': 10.47} 80%|███████▉ | 7996/10000 [31:21:29<7:45:15, 13.93s/it] 80%|███████▉ | 7997/10000 [31:21:43<7:44:18, 13.91s/it] {'loss': 0.0047, 'learning_rate': 1.006e-05, 'epoch': 10.47} 80%|███████▉ | 7997/10000 [31:21:43<7:44:18, 13.91s/it] 80%|███████▉ | 7998/10000 [31:21:57<7:43:14, 13.88s/it] {'loss': 0.0029, 'learning_rate': 1.0055000000000001e-05, 'epoch': 10.47} 80%|███████▉ | 7998/10000 [31:21:57<7:43:14, 13.88s/it] 80%|███████▉ | 7999/10000 [31:22:11<7:45:22, 13.95s/it] {'loss': 0.0036, 'learning_rate': 1.005e-05, 'epoch': 10.47} 80%|███████▉ | 7999/10000 [31:22:11<7:45:22, 13.95s/it] 80%|████████ | 8000/10000 [31:22:25<7:43:09, 13.89s/it] {'loss': 0.0053, 'learning_rate': 1.0045e-05, 'epoch': 10.47} 80%|████████ | 8000/10000 [31:22:25<7:43:09, 13.89s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-05 03:40:33,198 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-8000/config.json [INFO|configuration_utils.py:364] 2024-11-05 03:40:33,200 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-8000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-05 03:41:21,727 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-8000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-05 03:41:21,730 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-8000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-05 03:41:21,731 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-8000/special_tokens_map.json [2024-11-05 03:41:21,748] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is about to be saved! [2024-11-05 03:41:21,783] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt [2024-11-05 03:41:21,783] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt... [2024-11-05 03:42:11,723] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/mp_rank_00_model_states.pt. [2024-11-05 03:42:11,819] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-05 03:44:08,364] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-05 03:44:08,455] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-8000/global_step8000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-05 03:44:08,455] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step8000 is ready now! 80%|████████ | 8001/10000 [31:26:14<43:36:24, 78.53s/it] {'loss': 0.0055, 'learning_rate': 1.004e-05, 'epoch': 10.47} 80%|████████ | 8001/10000 [31:26:14<43:36:24, 78.53s/it] 80%|████████ | 8002/10000 [31:26:28<32:46:41, 59.06s/it] {'loss': 0.0038, 'learning_rate': 1.0035e-05, 'epoch': 10.47} 80%|████████ | 8002/10000 [31:26:28<32:46:41, 59.06s/it] 80%|████████ | 8003/10000 [31:26:42<25:14:00, 45.49s/it] {'loss': 0.003, 'learning_rate': 1.003e-05, 'epoch': 10.48} 80%|████████ | 8003/10000 [31:26:42<25:14:00, 45.49s/it] 80%|████████ | 8004/10000 [31:26:55<19:56:09, 35.96s/it] {'loss': 0.0038, 'learning_rate': 1.0025000000000001e-05, 'epoch': 10.48} 80%|████████ | 8004/10000 [31:26:55<19:56:09, 35.96s/it] 80%|████████ | 8005/10000 [31:27:09<16:14:02, 29.29s/it] {'loss': 0.0031, 'learning_rate': 1.002e-05, 'epoch': 10.48} 80%|████████ | 8005/10000 [31:27:09<16:14:02, 29.29s/it] 80%|████████ | 8006/10000 [31:27:23<13:39:44, 24.67s/it] {'loss': 0.0031, 'learning_rate': 1.0015000000000002e-05, 'epoch': 10.48} 80%|████████ | 8006/10000 [31:27:23<13:39:44, 24.67s/it] 80%|████████ | 8007/10000 [31:27:37<11:53:12, 21.47s/it] {'loss': 0.0039, 'learning_rate': 1.001e-05, 'epoch': 10.48} 80%|████████ | 8007/10000 [31:27:37<11:53:12, 21.47s/it] 80%|████████ | 8008/10000 [31:27:51<10:36:11, 19.16s/it] {'loss': 0.0049, 'learning_rate': 1.0005e-05, 'epoch': 10.48} 80%|████████ | 8008/10000 [31:27:51<10:36:11, 19.16s/it] 80%|████████ | 8009/10000 [31:28:05<9:43:41, 17.59s/it] {'loss': 0.0038, 'learning_rate': 1e-05, 'epoch': 10.48} 80%|████████ | 8009/10000 [31:28:05<9:43:41, 17.59s/it] 80%|████████ | 8010/10000 [31:28:19<9:06:55, 16.49s/it] {'loss': 0.0028, 'learning_rate': 9.995e-06, 'epoch': 10.48} 80%|████████ | 8010/10000 [31:28:19<9:06:55, 16.49s/it] 80%|████████ | 8011/10000 [31:28:33<8:41:14, 15.72s/it] {'loss': 0.0045, 'learning_rate': 9.990000000000001e-06, 'epoch': 10.49} 80%|████████ | 8011/10000 [31:28:33<8:41:14, 15.72s/it] 80%|████████ | 8012/10000 [31:28:46<8:23:06, 15.18s/it] {'loss': 0.0039, 'learning_rate': 9.985e-06, 'epoch': 10.49} 80%|████████ | 8012/10000 [31:28:47<8:23:06, 15.18s/it] 80%|████████ | 8013/10000 [31:29:00<8:10:43, 14.82s/it] {'loss': 0.0034, 'learning_rate': 9.980000000000001e-06, 'epoch': 10.49} 80%|████████ | 8013/10000 [31:29:00<8:10:43, 14.82s/it] 80%|████████ | 8014/10000 [31:29:14<8:00:45, 14.52s/it] {'loss': 0.0039, 'learning_rate': 9.975e-06, 'epoch': 10.49} 80%|████████ | 8014/10000 [31:29:14<8:00:45, 14.52s/it] 80%|████████ | 8015/10000 [31:29:28<7:53:47, 14.32s/it] {'loss': 0.0041, 'learning_rate': 9.97e-06, 'epoch': 10.49} 80%|████████ | 8015/10000 [31:29:28<7:53:47, 14.32s/it] 80%|████████ | 8016/10000 [31:29:42<7:49:31, 14.20s/it] {'loss': 0.0031, 'learning_rate': 9.965e-06, 'epoch': 10.49} 80%|████████ | 8016/10000 [31:29:42<7:49:31, 14.20s/it] 80%|████████ | 8017/10000 [31:29:56<7:46:03, 14.10s/it] {'loss': 0.0043, 'learning_rate': 9.96e-06, 'epoch': 10.49} 80%|████████ | 8017/10000 [31:29:56<7:46:03, 14.10s/it] 80%|████████ | 8018/10000 [31:30:10<7:44:36, 14.06s/it] {'loss': 0.0048, 'learning_rate': 9.955e-06, 'epoch': 10.49} 80%|████████ | 8018/10000 [31:30:10<7:44:36, 14.06s/it] 80%|████████ | 8019/10000 [31:30:24<7:44:29, 14.07s/it] {'loss': 0.0023, 'learning_rate': 9.950000000000001e-06, 'epoch': 10.5} 80%|████████ | 8019/10000 [31:30:24<7:44:29, 14.07s/it] 80%|████████ | 8020/10000 [31:30:38<7:42:52, 14.03s/it] {'loss': 0.0022, 'learning_rate': 9.945e-06, 'epoch': 10.5} 80%|████████ | 8020/10000 [31:30:38<7:42:52, 14.03s/it] 80%|████████ | 8021/10000 [31:30:52<7:42:08, 14.01s/it] {'loss': 0.0034, 'learning_rate': 9.940000000000001e-06, 'epoch': 10.5} 80%|████████ | 8021/10000 [31:30:52<7:42:08, 14.01s/it] 80%|████████ | 8022/10000 [31:31:06<7:40:56, 13.98s/it] {'loss': 0.0047, 'learning_rate': 9.935e-06, 'epoch': 10.5} 80%|████████ | 8022/10000 [31:31:06<7:40:56, 13.98s/it] 80%|████████ | 8023/10000 [31:31:20<7:40:11, 13.97s/it] {'loss': 0.004, 'learning_rate': 9.93e-06, 'epoch': 10.5} 80%|████████ | 8023/10000 [31:31:20<7:40:11, 13.97s/it] 80%|████████ | 8024/10000 [31:31:34<7:39:32, 13.95s/it] {'loss': 0.0051, 'learning_rate': 9.925e-06, 'epoch': 10.5} 80%|████████ | 8024/10000 [31:31:34<7:39:32, 13.95s/it] 80%|████████ | 8025/10000 [31:31:48<7:39:34, 13.96s/it] {'loss': 0.0046, 'learning_rate': 9.92e-06, 'epoch': 10.5} 80%|████████ | 8025/10000 [31:31:48<7:39:34, 13.96s/it] 80%|████████ | 8026/10000 [31:32:02<7:39:05, 13.95s/it] {'loss': 0.0047, 'learning_rate': 9.915e-06, 'epoch': 10.51} 80%|████████ | 8026/10000 [31:32:02<7:39:05, 13.95s/it] 80%|████████ | 8027/10000 [31:32:16<7:40:56, 14.02s/it] {'loss': 0.0021, 'learning_rate': 9.91e-06, 'epoch': 10.51} 80%|████████ | 8027/10000 [31:32:16<7:40:56, 14.02s/it] 80%|████████ | 8028/10000 [31:32:30<7:39:29, 13.98s/it] {'loss': 0.0044, 'learning_rate': 9.905000000000001e-06, 'epoch': 10.51} 80%|████████ | 8028/10000 [31:32:30<7:39:29, 13.98s/it] 80%|████████ | 8029/10000 [31:32:44<7:39:32, 13.99s/it] {'loss': 0.0037, 'learning_rate': 9.900000000000002e-06, 'epoch': 10.51} 80%|████████ | 8029/10000 [31:32:44<7:39:32, 13.99s/it] 80%|████████ | 8030/10000 [31:32:58<7:38:22, 13.96s/it] {'loss': 0.0035, 'learning_rate': 9.895e-06, 'epoch': 10.51} 80%|████████ | 8030/10000 [31:32:58<7:38:22, 13.96s/it] 80%|████████ | 8031/10000 [31:33:11<7:37:15, 13.93s/it] {'loss': 0.0055, 'learning_rate': 9.89e-06, 'epoch': 10.51} 80%|████████ | 8031/10000 [31:33:11<7:37:15, 13.93s/it] 80%|████████ | 8032/10000 [31:33:25<7:36:21, 13.91s/it] {'loss': 0.0041, 'learning_rate': 9.885e-06, 'epoch': 10.51} 80%|████████ | 8032/10000 [31:33:25<7:36:21, 13.91s/it] 80%|████████ | 8033/10000 [31:33:39<7:37:24, 13.95s/it] {'loss': 0.0039, 'learning_rate': 9.88e-06, 'epoch': 10.51} 80%|████████ | 8033/10000 [31:33:39<7:37:24, 13.95s/it] 80%|████████ | 8034/10000 [31:33:53<7:37:17, 13.96s/it] {'loss': 0.0039, 'learning_rate': 9.875000000000001e-06, 'epoch': 10.52} 80%|████████ | 8034/10000 [31:33:53<7:37:17, 13.96s/it] 80%|████████ | 8035/10000 [31:34:07<7:37:24, 13.97s/it] {'loss': 0.0035, 'learning_rate': 9.87e-06, 'epoch': 10.52} 80%|████████ | 8035/10000 [31:34:07<7:37:24, 13.97s/it] 80%|████████ | 8036/10000 [31:34:21<7:37:11, 13.97s/it] {'loss': 0.0039, 'learning_rate': 9.865000000000001e-06, 'epoch': 10.52} 80%|████████ | 8036/10000 [31:34:21<7:37:11, 13.97s/it] 80%|████████ | 8037/10000 [31:34:35<7:36:13, 13.94s/it] {'loss': 0.0044, 'learning_rate': 9.86e-06, 'epoch': 10.52} 80%|████████ | 8037/10000 [31:34:35<7:36:13, 13.94s/it] 80%|████████ | 8038/10000 [31:34:49<7:37:15, 13.98s/it] {'loss': 0.004, 'learning_rate': 9.855e-06, 'epoch': 10.52} 80%|████████ | 8038/10000 [31:34:49<7:37:15, 13.98s/it] 80%|████████ | 8039/10000 [31:35:03<7:37:10, 13.99s/it] {'loss': 0.0031, 'learning_rate': 9.85e-06, 'epoch': 10.52} 80%|████████ | 8039/10000 [31:35:03<7:37:10, 13.99s/it] 80%|████████ | 8040/10000 [31:35:17<7:35:26, 13.94s/it] {'loss': 0.0042, 'learning_rate': 9.845e-06, 'epoch': 10.52} 80%|████████ | 8040/10000 [31:35:17<7:35:26, 13.94s/it] 80%|████████ | 8041/10000 [31:35:31<7:34:57, 13.93s/it] {'loss': 0.0045, 'learning_rate': 9.84e-06, 'epoch': 10.52} 80%|████████ | 8041/10000 [31:35:31<7:34:57, 13.93s/it] 80%|████████ | 8042/10000 [31:35:45<7:34:35, 13.93s/it] {'loss': 0.0045, 'learning_rate': 9.835000000000002e-06, 'epoch': 10.53} 80%|████████ | 8042/10000 [31:35:45<7:34:35, 13.93s/it] 80%|████████ | 8043/10000 [31:35:59<7:34:35, 13.94s/it] {'loss': 0.0022, 'learning_rate': 9.83e-06, 'epoch': 10.53} 80%|████████ | 8043/10000 [31:35:59<7:34:35, 13.94s/it] 80%|████████ | 8044/10000 [31:36:13<7:32:49, 13.89s/it] {'loss': 0.0047, 'learning_rate': 9.825000000000002e-06, 'epoch': 10.53} 80%|████████ | 8044/10000 [31:36:13<7:32:49, 13.89s/it] 80%|████████ | 8045/10000 [31:36:27<7:33:05, 13.91s/it] {'loss': 0.0047, 'learning_rate': 9.820000000000001e-06, 'epoch': 10.53} 80%|████████ | 8045/10000 [31:36:27<7:33:05, 13.91s/it] 80%|████████ | 8046/10000 [31:36:40<7:32:27, 13.89s/it] {'loss': 0.0028, 'learning_rate': 9.815e-06, 'epoch': 10.53} 80%|████████ | 8046/10000 [31:36:40<7:32:27, 13.89s/it] 80%|████████ | 8047/10000 [31:36:54<7:34:00, 13.95s/it] {'loss': 0.0036, 'learning_rate': 9.810000000000001e-06, 'epoch': 10.53} 80%|████████ | 8047/10000 [31:36:55<7:34:00, 13.95s/it] 80%|████████ | 8048/10000 [31:37:08<7:33:58, 13.95s/it] {'loss': 0.0027, 'learning_rate': 9.805e-06, 'epoch': 10.53} 80%|████████ | 8048/10000 [31:37:08<7:33:58, 13.95s/it] 80%|████████ | 8049/10000 [31:37:22<7:32:54, 13.93s/it] {'loss': 0.0033, 'learning_rate': 9.800000000000001e-06, 'epoch': 10.54} 80%|████████ | 8049/10000 [31:37:22<7:32:54, 13.93s/it] 80%|████████ | 8050/10000 [31:37:36<7:31:38, 13.90s/it] {'loss': 0.0047, 'learning_rate': 9.795e-06, 'epoch': 10.54} 80%|████████ | 8050/10000 [31:37:36<7:31:38, 13.90s/it] 81%|████████ | 8051/10000 [31:37:50<7:30:46, 13.88s/it] {'loss': 0.0039, 'learning_rate': 9.790000000000001e-06, 'epoch': 10.54} 81%|████████ | 8051/10000 [31:37:50<7:30:46, 13.88s/it] 81%|████████ | 8052/10000 [31:38:04<7:30:58, 13.89s/it] {'loss': 0.0028, 'learning_rate': 9.785e-06, 'epoch': 10.54} 81%|████████ | 8052/10000 [31:38:04<7:30:58, 13.89s/it] 81%|████████ | 8053/10000 [31:38:18<7:31:55, 13.93s/it] {'loss': 0.0034, 'learning_rate': 9.78e-06, 'epoch': 10.54} 81%|████████ | 8053/10000 [31:38:18<7:31:55, 13.93s/it] 81%|████████ | 8054/10000 [31:38:32<7:30:21, 13.89s/it] {'loss': 0.0034, 'learning_rate': 9.775e-06, 'epoch': 10.54} 81%|████████ | 8054/10000 [31:38:32<7:30:21, 13.89s/it] 81%|████████ | 8055/10000 [31:38:46<7:29:56, 13.88s/it] {'loss': 0.0039, 'learning_rate': 9.77e-06, 'epoch': 10.54} 81%|████████ | 8055/10000 [31:38:46<7:29:56, 13.88s/it] 81%|████████ | 8056/10000 [31:39:00<7:30:08, 13.89s/it] {'loss': 0.0031, 'learning_rate': 9.765e-06, 'epoch': 10.54} 81%|████████ | 8056/10000 [31:39:00<7:30:08, 13.89s/it] 81%|████████ | 8057/10000 [31:39:13<7:29:39, 13.89s/it] {'loss': 0.0038, 'learning_rate': 9.760000000000001e-06, 'epoch': 10.55} 81%|████████ | 8057/10000 [31:39:13<7:29:39, 13.89s/it] 81%|████████ | 8058/10000 [31:39:27<7:30:53, 13.93s/it] {'loss': 0.0046, 'learning_rate': 9.755e-06, 'epoch': 10.55} 81%|████████ | 8058/10000 [31:39:27<7:30:53, 13.93s/it] 81%|████████ | 8059/10000 [31:39:41<7:29:44, 13.90s/it] {'loss': 0.0053, 'learning_rate': 9.750000000000002e-06, 'epoch': 10.55} 81%|████████ | 8059/10000 [31:39:41<7:29:44, 13.90s/it] 81%|████████ | 8060/10000 [31:39:55<7:31:18, 13.96s/it] {'loss': 0.0027, 'learning_rate': 9.745e-06, 'epoch': 10.55} 81%|████████ | 8060/10000 [31:39:55<7:31:18, 13.96s/it] 81%|████████ | 8061/10000 [31:40:09<7:30:23, 13.94s/it] {'loss': 0.0025, 'learning_rate': 9.74e-06, 'epoch': 10.55} 81%|████████ | 8061/10000 [31:40:09<7:30:23, 13.94s/it] 81%|████████ | 8062/10000 [31:40:23<7:29:22, 13.91s/it] {'loss': 0.0039, 'learning_rate': 9.735e-06, 'epoch': 10.55} 81%|████████ | 8062/10000 [31:40:23<7:29:22, 13.91s/it] 81%|████████ | 8063/10000 [31:40:37<7:30:12, 13.95s/it] {'loss': 0.0034, 'learning_rate': 9.73e-06, 'epoch': 10.55} 81%|████████ | 8063/10000 [31:40:37<7:30:12, 13.95s/it] 81%|████████ | 8064/10000 [31:40:51<7:28:51, 13.91s/it] {'loss': 0.0026, 'learning_rate': 9.725000000000001e-06, 'epoch': 10.55} 81%|████████ | 8064/10000 [31:40:51<7:28:51, 13.91s/it] 81%|████████ | 8065/10000 [31:41:05<7:28:47, 13.92s/it] {'loss': 0.0035, 'learning_rate': 9.72e-06, 'epoch': 10.56} 81%|████████ | 8065/10000 [31:41:05<7:28:47, 13.92s/it] 81%|████████ | 8066/10000 [31:41:19<7:28:54, 13.93s/it] {'loss': 0.0047, 'learning_rate': 9.715000000000001e-06, 'epoch': 10.56} 81%|████████ | 8066/10000 [31:41:19<7:28:54, 13.93s/it] 81%|████████ | 8067/10000 [31:41:33<7:30:51, 13.99s/it] {'loss': 0.0043, 'learning_rate': 9.71e-06, 'epoch': 10.56} 81%|████████ | 8067/10000 [31:41:33<7:30:51, 13.99s/it] 81%|████████ | 8068/10000 [31:41:47<7:29:38, 13.96s/it] {'loss': 0.0034, 'learning_rate': 9.705e-06, 'epoch': 10.56} 81%|████████ | 8068/10000 [31:41:47<7:29:38, 13.96s/it] 81%|████████ | 8069/10000 [31:42:01<7:30:17, 13.99s/it] {'loss': 0.004, 'learning_rate': 9.7e-06, 'epoch': 10.56} 81%|████████ | 8069/10000 [31:42:01<7:30:17, 13.99s/it] 81%|████████ | 8070/10000 [31:42:15<7:29:56, 13.99s/it] {'loss': 0.0026, 'learning_rate': 9.695e-06, 'epoch': 10.56} 81%|████████ | 8070/10000 [31:42:15<7:29:56, 13.99s/it] 81%|████████ | 8071/10000 [31:42:29<7:27:56, 13.93s/it] {'loss': 0.0036, 'learning_rate': 9.69e-06, 'epoch': 10.56} 81%|████████ | 8071/10000 [31:42:29<7:27:56, 13.93s/it] 81%|████████ | 8072/10000 [31:42:43<7:27:44, 13.93s/it] {'loss': 0.0045, 'learning_rate': 9.685000000000001e-06, 'epoch': 10.57} 81%|████████ | 8072/10000 [31:42:43<7:27:44, 13.93s/it] 81%|████████ | 8073/10000 [31:42:56<7:26:52, 13.91s/it] {'loss': 0.0027, 'learning_rate': 9.68e-06, 'epoch': 10.57} 81%|████████ | 8073/10000 [31:42:57<7:26:52, 13.91s/it] 81%|████████ | 8074/10000 [31:43:10<7:25:41, 13.88s/it] {'loss': 0.0072, 'learning_rate': 9.675000000000001e-06, 'epoch': 10.57} 81%|████████ | 8074/10000 [31:43:10<7:25:41, 13.88s/it] 81%|████████ | 8075/10000 [31:43:24<7:26:31, 13.92s/it] {'loss': 0.0038, 'learning_rate': 9.67e-06, 'epoch': 10.57} 81%|████████ | 8075/10000 [31:43:24<7:26:31, 13.92s/it] 81%|████████ | 8076/10000 [31:43:38<7:27:26, 13.95s/it] {'loss': 0.0032, 'learning_rate': 9.665e-06, 'epoch': 10.57} 81%|████████ | 8076/10000 [31:43:38<7:27:26, 13.95s/it] 81%|████████ | 8077/10000 [31:43:52<7:27:11, 13.95s/it] {'loss': 0.0032, 'learning_rate': 9.66e-06, 'epoch': 10.57} 81%|████████ | 8077/10000 [31:43:52<7:27:11, 13.95s/it] 81%|████████ | 8078/10000 [31:44:06<7:26:12, 13.93s/it] {'loss': 0.0035, 'learning_rate': 9.655e-06, 'epoch': 10.57} 81%|████████ | 8078/10000 [31:44:06<7:26:12, 13.93s/it] 81%|████████ | 8079/10000 [31:44:20<7:25:23, 13.91s/it] {'loss': 0.0024, 'learning_rate': 9.65e-06, 'epoch': 10.57} 81%|████████ | 8079/10000 [31:44:20<7:25:23, 13.91s/it] 81%|████████ | 8080/10000 [31:44:34<7:26:00, 13.94s/it] {'loss': 0.0025, 'learning_rate': 9.645e-06, 'epoch': 10.58} 81%|████████ | 8080/10000 [31:44:34<7:26:00, 13.94s/it] 81%|████████ | 8081/10000 [31:44:48<7:25:30, 13.93s/it] {'loss': 0.0038, 'learning_rate': 9.640000000000001e-06, 'epoch': 10.58} 81%|████████ | 8081/10000 [31:44:48<7:25:30, 13.93s/it] 81%|████████ | 8082/10000 [31:45:02<7:24:41, 13.91s/it] {'loss': 0.0042, 'learning_rate': 9.635000000000002e-06, 'epoch': 10.58} 81%|████████ | 8082/10000 [31:45:02<7:24:41, 13.91s/it] 81%|████████ | 8083/10000 [31:45:16<7:24:40, 13.92s/it] {'loss': 0.0035, 'learning_rate': 9.630000000000001e-06, 'epoch': 10.58} 81%|████████ | 8083/10000 [31:45:16<7:24:40, 13.92s/it] 81%|████████ | 8084/10000 [31:45:30<7:25:06, 13.94s/it] {'loss': 0.0036, 'learning_rate': 9.625e-06, 'epoch': 10.58} 81%|████████ | 8084/10000 [31:45:30<7:25:06, 13.94s/it] 81%|████████ | 8085/10000 [31:45:44<7:25:09, 13.95s/it] {'loss': 0.0066, 'learning_rate': 9.62e-06, 'epoch': 10.58} 81%|████████ | 8085/10000 [31:45:44<7:25:09, 13.95s/it] 81%|████████ | 8086/10000 [31:45:58<7:24:06, 13.92s/it] {'loss': 0.0033, 'learning_rate': 9.615e-06, 'epoch': 10.58} 81%|████████ | 8086/10000 [31:45:58<7:24:06, 13.92s/it] 81%|████████ | 8087/10000 [31:46:11<7:23:26, 13.91s/it] {'loss': 0.0041, 'learning_rate': 9.610000000000001e-06, 'epoch': 10.59} 81%|████████ | 8087/10000 [31:46:11<7:23:26, 13.91s/it] 81%|████████ | 8088/10000 [31:46:25<7:22:26, 13.88s/it] {'loss': 0.004, 'learning_rate': 9.605e-06, 'epoch': 10.59} 81%|████████ | 8088/10000 [31:46:25<7:22:26, 13.88s/it] 81%|████████ | 8089/10000 [31:46:39<7:23:30, 13.93s/it] {'loss': 0.0045, 'learning_rate': 9.600000000000001e-06, 'epoch': 10.59} 81%|████████ | 8089/10000 [31:46:39<7:23:30, 13.93s/it] 81%|████████ | 8090/10000 [31:46:53<7:22:09, 13.89s/it] {'loss': 0.0039, 'learning_rate': 9.595e-06, 'epoch': 10.59} 81%|████████ | 8090/10000 [31:46:53<7:22:09, 13.89s/it] 81%|████████ | 8091/10000 [31:47:07<7:22:17, 13.90s/it] {'loss': 0.0044, 'learning_rate': 9.59e-06, 'epoch': 10.59} 81%|████████ | 8091/10000 [31:47:07<7:22:17, 13.90s/it] 81%|████████ | 8092/10000 [31:47:21<7:23:20, 13.94s/it] {'loss': 0.0027, 'learning_rate': 9.585e-06, 'epoch': 10.59} 81%|████████ | 8092/10000 [31:47:21<7:23:20, 13.94s/it] 81%|████████ | 8093/10000 [31:47:35<7:22:15, 13.91s/it] {'loss': 0.0037, 'learning_rate': 9.58e-06, 'epoch': 10.59} 81%|████████ | 8093/10000 [31:47:35<7:22:15, 13.91s/it] 81%|████████ | 8094/10000 [31:47:49<7:21:55, 13.91s/it] {'loss': 0.0047, 'learning_rate': 9.575e-06, 'epoch': 10.59} 81%|████████ | 8094/10000 [31:47:49<7:21:55, 13.91s/it] 81%|████████ | 8095/10000 [31:48:03<7:21:00, 13.89s/it] {'loss': 0.0036, 'learning_rate': 9.57e-06, 'epoch': 10.6} 81%|████████ | 8095/10000 [31:48:03<7:21:00, 13.89s/it] 81%|████████ | 8096/10000 [31:48:17<7:20:59, 13.90s/it] {'loss': 0.0036, 'learning_rate': 9.565e-06, 'epoch': 10.6} 81%|████████ | 8096/10000 [31:48:17<7:20:59, 13.90s/it] 81%|████████ | 8097/10000 [31:48:30<7:20:58, 13.90s/it] {'loss': 0.0029, 'learning_rate': 9.560000000000002e-06, 'epoch': 10.6} 81%|████████ | 8097/10000 [31:48:31<7:20:58, 13.90s/it] 81%|████████ | 8098/10000 [31:48:44<7:21:21, 13.92s/it] {'loss': 0.0027, 'learning_rate': 9.555e-06, 'epoch': 10.6} 81%|████████ | 8098/10000 [31:48:45<7:21:21, 13.92s/it] 81%|████████ | 8099/10000 [31:48:58<7:20:56, 13.92s/it] {'loss': 0.0049, 'learning_rate': 9.55e-06, 'epoch': 10.6} 81%|████████ | 8099/10000 [31:48:58<7:20:56, 13.92s/it] 81%|████████ | 8100/10000 [31:49:12<7:20:42, 13.92s/it] {'loss': 0.0044, 'learning_rate': 9.545e-06, 'epoch': 10.6} 81%|████████ | 8100/10000 [31:49:12<7:20:42, 13.92s/it] 81%|████████ | 8101/10000 [31:49:26<7:20:11, 13.91s/it] {'loss': 0.0043, 'learning_rate': 9.54e-06, 'epoch': 10.6} 81%|████████ | 8101/10000 [31:49:26<7:20:11, 13.91s/it] 81%|████████ | 8102/10000 [31:49:40<7:20:31, 13.93s/it] {'loss': 0.0039, 'learning_rate': 9.535000000000001e-06, 'epoch': 10.6} 81%|████████ | 8102/10000 [31:49:40<7:20:31, 13.93s/it] 81%|████████ | 8103/10000 [31:49:54<7:19:42, 13.91s/it] {'loss': 0.0027, 'learning_rate': 9.53e-06, 'epoch': 10.61} 81%|████████ | 8103/10000 [31:49:54<7:19:42, 13.91s/it] 81%|████████ | 8104/10000 [31:50:08<7:19:04, 13.89s/it] {'loss': 0.0038, 'learning_rate': 9.525000000000001e-06, 'epoch': 10.61} 81%|████████ | 8104/10000 [31:50:08<7:19:04, 13.89s/it] 81%|████████ | 8105/10000 [31:50:22<7:18:09, 13.87s/it] {'loss': 0.0039, 'learning_rate': 9.52e-06, 'epoch': 10.61} 81%|████████ | 8105/10000 [31:50:22<7:18:09, 13.87s/it] 81%|████████ | 8106/10000 [31:50:36<7:18:57, 13.91s/it] {'loss': 0.0032, 'learning_rate': 9.515e-06, 'epoch': 10.61} 81%|████████ | 8106/10000 [31:50:36<7:18:57, 13.91s/it] 81%|████████ | 8107/10000 [31:50:50<7:18:42, 13.91s/it] {'loss': 0.004, 'learning_rate': 9.51e-06, 'epoch': 10.61} 81%|████████ | 8107/10000 [31:50:50<7:18:42, 13.91s/it] 81%|████████ | 8108/10000 [31:51:03<7:18:36, 13.91s/it] {'loss': 0.0034, 'learning_rate': 9.505e-06, 'epoch': 10.61} 81%|████████ | 8108/10000 [31:51:04<7:18:36, 13.91s/it] 81%|████████ | 8109/10000 [31:51:17<7:18:17, 13.91s/it] {'loss': 0.0043, 'learning_rate': 9.5e-06, 'epoch': 10.61} 81%|████████ | 8109/10000 [31:51:17<7:18:17, 13.91s/it] 81%|████████ | 8110/10000 [31:51:31<7:17:25, 13.89s/it] {'loss': 0.0048, 'learning_rate': 9.495000000000001e-06, 'epoch': 10.62} 81%|████████ | 8110/10000 [31:51:31<7:17:25, 13.89s/it] 81%|████████ | 8111/10000 [31:51:45<7:18:27, 13.93s/it] {'loss': 0.005, 'learning_rate': 9.49e-06, 'epoch': 10.62} 81%|████████ | 8111/10000 [31:51:45<7:18:27, 13.93s/it] 81%|████████ | 8112/10000 [31:51:59<7:18:39, 13.94s/it] {'loss': 0.004, 'learning_rate': 9.485000000000002e-06, 'epoch': 10.62} 81%|████████ | 8112/10000 [31:51:59<7:18:39, 13.94s/it] 81%|████████ | 8113/10000 [31:52:13<7:16:36, 13.88s/it] {'loss': 0.0042, 'learning_rate': 9.48e-06, 'epoch': 10.62} 81%|████████ | 8113/10000 [31:52:13<7:16:36, 13.88s/it] 81%|████████ | 8114/10000 [31:52:27<7:16:11, 13.88s/it] {'loss': 0.007, 'learning_rate': 9.475e-06, 'epoch': 10.62} 81%|████████ | 8114/10000 [31:52:27<7:16:11, 13.88s/it] 81%|████████ | 8115/10000 [31:52:41<7:15:04, 13.85s/it] {'loss': 0.0034, 'learning_rate': 9.47e-06, 'epoch': 10.62} 81%|████████ | 8115/10000 [31:52:41<7:15:04, 13.85s/it] 81%|████████ | 8116/10000 [31:52:55<7:15:59, 13.88s/it] {'loss': 0.0053, 'learning_rate': 9.465e-06, 'epoch': 10.62} 81%|████████ | 8116/10000 [31:52:55<7:15:59, 13.88s/it] 81%|████████ | 8117/10000 [31:53:09<7:16:41, 13.91s/it] {'loss': 0.0055, 'learning_rate': 9.460000000000001e-06, 'epoch': 10.62} 81%|████████ | 8117/10000 [31:53:09<7:16:41, 13.91s/it] 81%|████████ | 8118/10000 [31:53:23<7:16:55, 13.93s/it] {'loss': 0.0049, 'learning_rate': 9.455e-06, 'epoch': 10.63} 81%|████████ | 8118/10000 [31:53:23<7:16:55, 13.93s/it] 81%|████████ | 8119/10000 [31:53:37<7:17:41, 13.96s/it] {'loss': 0.005, 'learning_rate': 9.450000000000001e-06, 'epoch': 10.63} 81%|████████ | 8119/10000 [31:53:37<7:17:41, 13.96s/it] 81%|████████ | 8120/10000 [31:53:50<7:16:50, 13.94s/it] {'loss': 0.0049, 'learning_rate': 9.445000000000002e-06, 'epoch': 10.63} 81%|████████ | 8120/10000 [31:53:50<7:16:50, 13.94s/it] 81%|████████ | 8121/10000 [31:54:04<7:16:21, 13.93s/it] {'loss': 0.0035, 'learning_rate': 9.44e-06, 'epoch': 10.63} 81%|████████ | 8121/10000 [31:54:04<7:16:21, 13.93s/it] 81%|████████ | 8122/10000 [31:54:18<7:16:48, 13.96s/it] {'loss': 0.0041, 'learning_rate': 9.435e-06, 'epoch': 10.63} 81%|████████ | 8122/10000 [31:54:18<7:16:48, 13.96s/it] 81%|████████ | 8123/10000 [31:54:32<7:15:30, 13.92s/it] {'loss': 0.0025, 'learning_rate': 9.43e-06, 'epoch': 10.63} 81%|████████ | 8123/10000 [31:54:32<7:15:30, 13.92s/it] 81%|████████ | 8124/10000 [31:54:46<7:14:46, 13.91s/it] {'loss': 0.0045, 'learning_rate': 9.425e-06, 'epoch': 10.63} 81%|████████ | 8124/10000 [31:54:46<7:14:46, 13.91s/it] 81%|████████▏ | 8125/10000 [31:55:00<7:14:09, 13.89s/it] {'loss': 0.0047, 'learning_rate': 9.420000000000001e-06, 'epoch': 10.63} 81%|████████▏ | 8125/10000 [31:55:00<7:14:09, 13.89s/it] 81%|████████▏ | 8126/10000 [31:55:14<7:12:58, 13.86s/it] {'loss': 0.0056, 'learning_rate': 9.415e-06, 'epoch': 10.64} 81%|████████▏ | 8126/10000 [31:55:14<7:12:58, 13.86s/it] 81%|████████▏ | 8127/10000 [31:55:28<7:14:12, 13.91s/it] {'loss': 0.0046, 'learning_rate': 9.410000000000001e-06, 'epoch': 10.64} 81%|████████▏ | 8127/10000 [31:55:28<7:14:12, 13.91s/it] 81%|████████▏ | 8128/10000 [31:55:42<7:15:57, 13.97s/it] {'loss': 0.0044, 'learning_rate': 9.405e-06, 'epoch': 10.64} 81%|████████▏ | 8128/10000 [31:55:42<7:15:57, 13.97s/it] 81%|████████▏ | 8129/10000 [31:55:56<7:15:47, 13.98s/it] {'loss': 0.0059, 'learning_rate': 9.4e-06, 'epoch': 10.64} 81%|████████▏ | 8129/10000 [31:55:56<7:15:47, 13.98s/it] 81%|████████▏ | 8130/10000 [31:56:10<7:13:50, 13.92s/it] {'loss': 0.0042, 'learning_rate': 9.395e-06, 'epoch': 10.64} 81%|████████▏ | 8130/10000 [31:56:10<7:13:50, 13.92s/it] 81%|████████▏ | 8131/10000 [31:56:23<7:11:46, 13.86s/it] {'loss': 0.0093, 'learning_rate': 9.39e-06, 'epoch': 10.64} 81%|████████▏ | 8131/10000 [31:56:23<7:11:46, 13.86s/it] 81%|████████▏ | 8132/10000 [31:56:37<7:12:21, 13.89s/it] {'loss': 0.004, 'learning_rate': 9.385e-06, 'epoch': 10.64} 81%|████████▏ | 8132/10000 [31:56:37<7:12:21, 13.89s/it] 81%|████████▏ | 8133/10000 [31:56:51<7:13:26, 13.93s/it] {'loss': 0.0047, 'learning_rate': 9.38e-06, 'epoch': 10.65} 81%|████████▏ | 8133/10000 [31:56:51<7:13:26, 13.93s/it] 81%|████████▏ | 8134/10000 [31:57:05<7:14:09, 13.96s/it] {'loss': 0.0022, 'learning_rate': 9.375000000000001e-06, 'epoch': 10.65} 81%|████████▏ | 8134/10000 [31:57:05<7:14:09, 13.96s/it] 81%|████████▏ | 8135/10000 [31:57:19<7:14:15, 13.97s/it] {'loss': 0.0035, 'learning_rate': 9.370000000000002e-06, 'epoch': 10.65} 81%|████████▏ | 8135/10000 [31:57:19<7:14:15, 13.97s/it] 81%|████████▏ | 8136/10000 [31:57:33<7:14:12, 13.98s/it] {'loss': 0.0024, 'learning_rate': 9.365000000000001e-06, 'epoch': 10.65} 81%|████████▏ | 8136/10000 [31:57:33<7:14:12, 13.98s/it] 81%|████████▏ | 8137/10000 [31:57:47<7:13:43, 13.97s/it] {'loss': 0.0051, 'learning_rate': 9.36e-06, 'epoch': 10.65} 81%|████████▏ | 8137/10000 [31:57:47<7:13:43, 13.97s/it] 81%|████████▏ | 8138/10000 [31:58:01<7:13:20, 13.96s/it] {'loss': 0.0033, 'learning_rate': 9.355e-06, 'epoch': 10.65} 81%|████████▏ | 8138/10000 [31:58:01<7:13:20, 13.96s/it] 81%|████████▏ | 8139/10000 [31:58:15<7:12:46, 13.95s/it] {'loss': 0.004, 'learning_rate': 9.35e-06, 'epoch': 10.65} 81%|████████▏ | 8139/10000 [31:58:15<7:12:46, 13.95s/it] 81%|████████▏ | 8140/10000 [31:58:29<7:12:06, 13.94s/it] {'loss': 0.0027, 'learning_rate': 9.345000000000001e-06, 'epoch': 10.65} 81%|████████▏ | 8140/10000 [31:58:29<7:12:06, 13.94s/it] 81%|████████▏ | 8141/10000 [31:58:43<7:12:01, 13.94s/it] {'loss': 0.0032, 'learning_rate': 9.34e-06, 'epoch': 10.66} 81%|████████▏ | 8141/10000 [31:58:43<7:12:01, 13.94s/it] 81%|████████▏ | 8142/10000 [31:58:57<7:12:25, 13.96s/it] {'loss': 0.0034, 'learning_rate': 9.335000000000001e-06, 'epoch': 10.66} 81%|████████▏ | 8142/10000 [31:58:57<7:12:25, 13.96s/it] 81%|████████▏ | 8143/10000 [31:59:11<7:11:53, 13.95s/it] {'loss': 0.0035, 'learning_rate': 9.33e-06, 'epoch': 10.66} 81%|████████▏ | 8143/10000 [31:59:11<7:11:53, 13.95s/it] 81%|████████▏ | 8144/10000 [31:59:25<7:12:08, 13.97s/it] {'loss': 0.0048, 'learning_rate': 9.325e-06, 'epoch': 10.66} 81%|████████▏ | 8144/10000 [31:59:25<7:12:08, 13.97s/it] 81%|████████▏ | 8145/10000 [31:59:39<7:12:29, 13.99s/it] {'loss': 0.0044, 'learning_rate': 9.32e-06, 'epoch': 10.66} 81%|████████▏ | 8145/10000 [31:59:39<7:12:29, 13.99s/it] 81%|████████▏ | 8146/10000 [31:59:53<7:11:50, 13.98s/it] {'loss': 0.0048, 'learning_rate': 9.315e-06, 'epoch': 10.66} 81%|████████▏ | 8146/10000 [31:59:53<7:11:50, 13.98s/it] 81%|████████▏ | 8147/10000 [32:00:07<7:11:10, 13.96s/it] {'loss': 0.0048, 'learning_rate': 9.31e-06, 'epoch': 10.66} 81%|████████▏ | 8147/10000 [32:00:07<7:11:10, 13.96s/it] 81%|████████▏ | 8148/10000 [32:00:21<7:09:21, 13.91s/it] {'loss': 0.004, 'learning_rate': 9.305e-06, 'epoch': 10.66} 81%|████████▏ | 8148/10000 [32:00:21<7:09:21, 13.91s/it] 81%|████████▏ | 8149/10000 [32:00:35<7:10:32, 13.96s/it] {'loss': 0.004, 'learning_rate': 9.3e-06, 'epoch': 10.67} 81%|████████▏ | 8149/10000 [32:00:35<7:10:32, 13.96s/it] 82%|████████▏ | 8150/10000 [32:00:49<7:11:19, 13.99s/it] {'loss': 0.0043, 'learning_rate': 9.295000000000002e-06, 'epoch': 10.67} 82%|████████▏ | 8150/10000 [32:00:49<7:11:19, 13.99s/it] 82%|████████▏ | 8151/10000 [32:01:03<7:10:13, 13.96s/it] {'loss': 0.0024, 'learning_rate': 9.29e-06, 'epoch': 10.67} 82%|████████▏ | 8151/10000 [32:01:03<7:10:13, 13.96s/it] 82%|████████▏ | 8152/10000 [32:01:17<7:09:28, 13.94s/it] {'loss': 0.0038, 'learning_rate': 9.285e-06, 'epoch': 10.67} 82%|████████▏ | 8152/10000 [32:01:17<7:09:28, 13.94s/it] 82%|████████▏ | 8153/10000 [32:01:31<7:10:38, 13.99s/it] {'loss': 0.0052, 'learning_rate': 9.28e-06, 'epoch': 10.67} 82%|████████▏ | 8153/10000 [32:01:31<7:10:38, 13.99s/it] 82%|████████▏ | 8154/10000 [32:01:45<7:10:19, 13.99s/it] {'loss': 0.0049, 'learning_rate': 9.275e-06, 'epoch': 10.67} 82%|████████▏ | 8154/10000 [32:01:45<7:10:19, 13.99s/it] 82%|████████▏ | 8155/10000 [32:01:59<7:09:07, 13.96s/it] {'loss': 0.0058, 'learning_rate': 9.270000000000001e-06, 'epoch': 10.67} 82%|████████▏ | 8155/10000 [32:01:59<7:09:07, 13.96s/it] 82%|████████▏ | 8156/10000 [32:02:13<7:09:19, 13.97s/it] {'loss': 0.003, 'learning_rate': 9.265e-06, 'epoch': 10.68} 82%|████████▏ | 8156/10000 [32:02:13<7:09:19, 13.97s/it] 82%|████████▏ | 8157/10000 [32:02:27<7:09:09, 13.97s/it] {'loss': 0.0037, 'learning_rate': 9.260000000000001e-06, 'epoch': 10.68} 82%|████████▏ | 8157/10000 [32:02:27<7:09:09, 13.97s/it] 82%|████████▏ | 8158/10000 [32:02:40<7:07:22, 13.92s/it] {'loss': 0.0048, 'learning_rate': 9.255e-06, 'epoch': 10.68} 82%|████████▏ | 8158/10000 [32:02:40<7:07:22, 13.92s/it] 82%|████████▏ | 8159/10000 [32:02:54<7:06:56, 13.91s/it] {'loss': 0.0047, 'learning_rate': 9.25e-06, 'epoch': 10.68} 82%|████████▏ | 8159/10000 [32:02:54<7:06:56, 13.91s/it] 82%|████████▏ | 8160/10000 [32:03:08<7:07:21, 13.94s/it] {'loss': 0.0028, 'learning_rate': 9.245e-06, 'epoch': 10.68} 82%|████████▏ | 8160/10000 [32:03:08<7:07:21, 13.94s/it] 82%|████████▏ | 8161/10000 [32:03:22<7:07:56, 13.96s/it] {'loss': 0.0034, 'learning_rate': 9.24e-06, 'epoch': 10.68} 82%|████████▏ | 8161/10000 [32:03:22<7:07:56, 13.96s/it] 82%|████████▏ | 8162/10000 [32:03:36<7:07:12, 13.95s/it] {'loss': 0.0055, 'learning_rate': 9.235e-06, 'epoch': 10.68} 82%|████████▏ | 8162/10000 [32:03:36<7:07:12, 13.95s/it] 82%|████████▏ | 8163/10000 [32:03:50<7:06:57, 13.95s/it] {'loss': 0.0043, 'learning_rate': 9.23e-06, 'epoch': 10.68} 82%|████████▏ | 8163/10000 [32:03:50<7:06:57, 13.95s/it] 82%|████████▏ | 8164/10000 [32:04:04<7:05:37, 13.91s/it] {'loss': 0.0052, 'learning_rate': 9.225e-06, 'epoch': 10.69} 82%|████████▏ | 8164/10000 [32:04:04<7:05:37, 13.91s/it] 82%|████████▏ | 8165/10000 [32:04:18<7:05:45, 13.92s/it] {'loss': 0.003, 'learning_rate': 9.220000000000002e-06, 'epoch': 10.69} 82%|████████▏ | 8165/10000 [32:04:18<7:05:45, 13.92s/it] 82%|████████▏ | 8166/10000 [32:04:32<7:04:38, 13.89s/it] {'loss': 0.0038, 'learning_rate': 9.215e-06, 'epoch': 10.69} 82%|████████▏ | 8166/10000 [32:04:32<7:04:38, 13.89s/it] 82%|████████▏ | 8167/10000 [32:04:46<7:04:20, 13.89s/it] {'loss': 0.0047, 'learning_rate': 9.21e-06, 'epoch': 10.69} 82%|████████▏ | 8167/10000 [32:04:46<7:04:20, 13.89s/it] 82%|████████▏ | 8168/10000 [32:05:00<7:04:23, 13.90s/it] {'loss': 0.0036, 'learning_rate': 9.205e-06, 'epoch': 10.69} 82%|████████▏ | 8168/10000 [32:05:00<7:04:23, 13.90s/it] 82%|████████▏ | 8169/10000 [32:05:13<7:03:41, 13.88s/it] {'loss': 0.006, 'learning_rate': 9.2e-06, 'epoch': 10.69} 82%|████████▏ | 8169/10000 [32:05:13<7:03:41, 13.88s/it] 82%|████████▏ | 8170/10000 [32:05:27<7:02:59, 13.87s/it] {'loss': 0.0032, 'learning_rate': 9.195000000000001e-06, 'epoch': 10.69} 82%|████████▏ | 8170/10000 [32:05:27<7:02:59, 13.87s/it] 82%|████████▏ | 8171/10000 [32:05:41<7:03:46, 13.90s/it] {'loss': 0.0054, 'learning_rate': 9.19e-06, 'epoch': 10.7} 82%|████████▏ | 8171/10000 [32:05:41<7:03:46, 13.90s/it] 82%|████████▏ | 8172/10000 [32:05:55<7:04:37, 13.94s/it] {'loss': 0.0034, 'learning_rate': 9.185000000000001e-06, 'epoch': 10.7} 82%|████████▏ | 8172/10000 [32:05:55<7:04:37, 13.94s/it] 82%|████████▏ | 8173/10000 [32:06:09<7:04:40, 13.95s/it] {'loss': 0.0046, 'learning_rate': 9.180000000000002e-06, 'epoch': 10.7} 82%|████████▏ | 8173/10000 [32:06:09<7:04:40, 13.95s/it] 82%|████████▏ | 8174/10000 [32:06:23<7:04:20, 13.94s/it] {'loss': 0.0036, 'learning_rate': 9.175000000000001e-06, 'epoch': 10.7} 82%|████████▏ | 8174/10000 [32:06:23<7:04:20, 13.94s/it] 82%|████████▏ | 8175/10000 [32:06:37<7:05:40, 14.00s/it] {'loss': 0.0056, 'learning_rate': 9.17e-06, 'epoch': 10.7} 82%|████████▏ | 8175/10000 [32:06:37<7:05:40, 14.00s/it] 82%|████████▏ | 8176/10000 [32:06:51<7:02:52, 13.91s/it] {'loss': 0.0038, 'learning_rate': 9.165e-06, 'epoch': 10.7} 82%|████████▏ | 8176/10000 [32:06:51<7:02:52, 13.91s/it] 82%|████████▏ | 8177/10000 [32:07:05<7:02:47, 13.92s/it] {'loss': 0.0052, 'learning_rate': 9.16e-06, 'epoch': 10.7} 82%|████████▏ | 8177/10000 [32:07:05<7:02:47, 13.92s/it] 82%|████████▏ | 8178/10000 [32:07:19<7:02:10, 13.90s/it] {'loss': 0.0028, 'learning_rate': 9.155000000000001e-06, 'epoch': 10.7} 82%|████████▏ | 8178/10000 [32:07:19<7:02:10, 13.90s/it] 82%|████████▏ | 8179/10000 [32:07:33<7:02:34, 13.92s/it] {'loss': 0.0042, 'learning_rate': 9.15e-06, 'epoch': 10.71} 82%|████████▏ | 8179/10000 [32:07:33<7:02:34, 13.92s/it] 82%|████████▏ | 8180/10000 [32:07:47<7:02:16, 13.92s/it] {'loss': 0.004, 'learning_rate': 9.145000000000001e-06, 'epoch': 10.71} 82%|████████▏ | 8180/10000 [32:07:47<7:02:16, 13.92s/it] 82%|████████▏ | 8181/10000 [32:08:01<7:01:39, 13.91s/it] {'loss': 0.0051, 'learning_rate': 9.14e-06, 'epoch': 10.71} 82%|████████▏ | 8181/10000 [32:08:01<7:01:39, 13.91s/it] 82%|████████▏ | 8182/10000 [32:08:14<7:01:10, 13.90s/it] {'loss': 0.0033, 'learning_rate': 9.135e-06, 'epoch': 10.71} 82%|████████▏ | 8182/10000 [32:08:14<7:01:10, 13.90s/it] 82%|████████▏ | 8183/10000 [32:08:28<7:00:28, 13.88s/it] {'loss': 0.0033, 'learning_rate': 9.13e-06, 'epoch': 10.71} 82%|████████▏ | 8183/10000 [32:08:28<7:00:28, 13.88s/it] 82%|████████▏ | 8184/10000 [32:08:42<7:00:55, 13.91s/it] {'loss': 0.0032, 'learning_rate': 9.125e-06, 'epoch': 10.71} 82%|████████▏ | 8184/10000 [32:08:42<7:00:55, 13.91s/it] 82%|████████▏ | 8185/10000 [32:08:56<7:00:59, 13.92s/it] {'loss': 0.0032, 'learning_rate': 9.12e-06, 'epoch': 10.71} 82%|████████▏ | 8185/10000 [32:08:56<7:00:59, 13.92s/it] 82%|████████▏ | 8186/10000 [32:09:10<7:01:21, 13.94s/it] {'loss': 0.0032, 'learning_rate': 9.115e-06, 'epoch': 10.71} 82%|████████▏ | 8186/10000 [32:09:10<7:01:21, 13.94s/it] 82%|████████▏ | 8187/10000 [32:09:24<7:01:09, 13.94s/it] {'loss': 0.0045, 'learning_rate': 9.110000000000001e-06, 'epoch': 10.72} 82%|████████▏ | 8187/10000 [32:09:24<7:01:09, 13.94s/it] 82%|████████▏ | 8188/10000 [32:09:38<7:00:29, 13.92s/it] {'loss': 0.0028, 'learning_rate': 9.105000000000002e-06, 'epoch': 10.72} 82%|████████▏ | 8188/10000 [32:09:38<7:00:29, 13.92s/it] 82%|████████▏ | 8189/10000 [32:09:52<6:59:38, 13.90s/it] {'loss': 0.0039, 'learning_rate': 9.100000000000001e-06, 'epoch': 10.72} 82%|████████▏ | 8189/10000 [32:09:52<6:59:38, 13.90s/it] 82%|████████▏ | 8190/10000 [32:10:06<6:59:20, 13.90s/it] {'loss': 0.0029, 'learning_rate': 9.095e-06, 'epoch': 10.72} 82%|████████▏ | 8190/10000 [32:10:06<6:59:20, 13.90s/it] 82%|████████▏ | 8191/10000 [32:10:20<6:59:45, 13.92s/it] {'loss': 0.004, 'learning_rate': 9.09e-06, 'epoch': 10.72} 82%|████████▏ | 8191/10000 [32:10:20<6:59:45, 13.92s/it] 82%|████████▏ | 8192/10000 [32:10:34<7:00:00, 13.94s/it] {'loss': 0.0032, 'learning_rate': 9.085e-06, 'epoch': 10.72} 82%|████████▏ | 8192/10000 [32:10:34<7:00:00, 13.94s/it] 82%|████████▏ | 8193/10000 [32:10:48<6:59:45, 13.94s/it] {'loss': 0.0032, 'learning_rate': 9.080000000000001e-06, 'epoch': 10.72} 82%|████████▏ | 8193/10000 [32:10:48<6:59:45, 13.94s/it] 82%|████████▏ | 8194/10000 [32:11:01<6:58:46, 13.91s/it] {'loss': 0.0034, 'learning_rate': 9.075e-06, 'epoch': 10.73} 82%|████████▏ | 8194/10000 [32:11:02<6:58:46, 13.91s/it] 82%|████████▏ | 8195/10000 [32:11:15<6:58:37, 13.92s/it] {'loss': 0.004, 'learning_rate': 9.070000000000001e-06, 'epoch': 10.73} 82%|████████▏ | 8195/10000 [32:11:15<6:58:37, 13.92s/it] 82%|████████▏ | 8196/10000 [32:11:29<6:58:12, 13.91s/it] {'loss': 0.0034, 'learning_rate': 9.065e-06, 'epoch': 10.73} 82%|████████▏ | 8196/10000 [32:11:29<6:58:12, 13.91s/it] 82%|████████▏ | 8197/10000 [32:11:43<6:57:14, 13.89s/it] {'loss': 0.0039, 'learning_rate': 9.06e-06, 'epoch': 10.73} 82%|████████▏ | 8197/10000 [32:11:43<6:57:14, 13.89s/it] 82%|████████▏ | 8198/10000 [32:11:57<6:57:08, 13.89s/it] {'loss': 0.0466, 'learning_rate': 9.055e-06, 'epoch': 10.73} 82%|████████▏ | 8198/10000 [32:11:57<6:57:08, 13.89s/it] 82%|████████▏ | 8199/10000 [32:12:11<6:56:09, 13.86s/it] {'loss': 0.003, 'learning_rate': 9.05e-06, 'epoch': 10.73} 82%|████████▏ | 8199/10000 [32:12:11<6:56:09, 13.86s/it] 82%|████████▏ | 8200/10000 [32:12:25<6:56:46, 13.89s/it] {'loss': 0.0037, 'learning_rate': 9.045e-06, 'epoch': 10.73} 82%|████████▏ | 8200/10000 [32:12:25<6:56:46, 13.89s/it] 82%|████████▏ | 8201/10000 [32:12:39<6:56:15, 13.88s/it] {'loss': 0.0034, 'learning_rate': 9.04e-06, 'epoch': 10.73} 82%|████████▏ | 8201/10000 [32:12:39<6:56:15, 13.88s/it] 82%|████████▏ | 8202/10000 [32:12:53<6:57:04, 13.92s/it] {'loss': 0.0031, 'learning_rate': 9.035e-06, 'epoch': 10.74} 82%|████████▏ | 8202/10000 [32:12:53<6:57:04, 13.92s/it] 82%|████████▏ | 8203/10000 [32:13:07<6:57:11, 13.93s/it] {'loss': 0.0046, 'learning_rate': 9.030000000000002e-06, 'epoch': 10.74} 82%|████████▏ | 8203/10000 [32:13:07<6:57:11, 13.93s/it] 82%|████████▏ | 8204/10000 [32:13:21<6:57:21, 13.94s/it] {'loss': 0.0057, 'learning_rate': 9.025e-06, 'epoch': 10.74} 82%|████████▏ | 8204/10000 [32:13:21<6:57:21, 13.94s/it] 82%|████████▏ | 8205/10000 [32:13:34<6:55:32, 13.89s/it] {'loss': 0.0039, 'learning_rate': 9.02e-06, 'epoch': 10.74} 82%|████████▏ | 8205/10000 [32:13:34<6:55:32, 13.89s/it] 82%|████████▏ | 8206/10000 [32:13:48<6:56:20, 13.92s/it] {'loss': 0.0037, 'learning_rate': 9.015e-06, 'epoch': 10.74} 82%|████████▏ | 8206/10000 [32:13:48<6:56:20, 13.92s/it] 82%|████████▏ | 8207/10000 [32:14:02<6:57:10, 13.96s/it] {'loss': 0.0045, 'learning_rate': 9.01e-06, 'epoch': 10.74} 82%|████████▏ | 8207/10000 [32:14:02<6:57:10, 13.96s/it] 82%|████████▏ | 8208/10000 [32:14:17<6:58:33, 14.01s/it] {'loss': 0.0049, 'learning_rate': 9.005000000000001e-06, 'epoch': 10.74} 82%|████████▏ | 8208/10000 [32:14:17<6:58:33, 14.01s/it] 82%|████████▏ | 8209/10000 [32:14:30<6:56:09, 13.94s/it] {'loss': 0.0046, 'learning_rate': 9e-06, 'epoch': 10.74} 82%|████████▏ | 8209/10000 [32:14:30<6:56:09, 13.94s/it] 82%|████████▏ | 8210/10000 [32:14:44<6:54:21, 13.89s/it] {'loss': 0.004, 'learning_rate': 8.995000000000001e-06, 'epoch': 10.75} 82%|████████▏ | 8210/10000 [32:14:44<6:54:21, 13.89s/it] 82%|████████▏ | 8211/10000 [32:14:58<6:55:43, 13.94s/it] {'loss': 0.0038, 'learning_rate': 8.99e-06, 'epoch': 10.75} 82%|████████▏ | 8211/10000 [32:14:58<6:55:43, 13.94s/it] 82%|████████▏ | 8212/10000 [32:15:12<6:56:11, 13.97s/it] {'loss': 0.004, 'learning_rate': 8.985e-06, 'epoch': 10.75} 82%|████████▏ | 8212/10000 [32:15:12<6:56:11, 13.97s/it] 82%|████████▏ | 8213/10000 [32:15:26<6:54:53, 13.93s/it] {'loss': 0.0036, 'learning_rate': 8.98e-06, 'epoch': 10.75} 82%|████████▏ | 8213/10000 [32:15:26<6:54:53, 13.93s/it] 82%|████████▏ | 8214/10000 [32:15:40<6:56:25, 13.99s/it] {'loss': 0.0061, 'learning_rate': 8.975e-06, 'epoch': 10.75} 82%|████████▏ | 8214/10000 [32:15:40<6:56:25, 13.99s/it] 82%|████████▏ | 8215/10000 [32:15:54<6:53:50, 13.91s/it] {'loss': 0.0074, 'learning_rate': 8.97e-06, 'epoch': 10.75} 82%|████████▏ | 8215/10000 [32:15:54<6:53:50, 13.91s/it] 82%|████████▏ | 8216/10000 [32:16:08<6:54:32, 13.94s/it] {'loss': 0.0034, 'learning_rate': 8.965e-06, 'epoch': 10.75} 82%|████████▏ | 8216/10000 [32:16:08<6:54:32, 13.94s/it] 82%|████████▏ | 8217/10000 [32:16:22<6:53:55, 13.93s/it] {'loss': 0.0025, 'learning_rate': 8.96e-06, 'epoch': 10.76} 82%|████████▏ | 8217/10000 [32:16:22<6:53:55, 13.93s/it] 82%|████████▏ | 8218/10000 [32:16:36<6:53:48, 13.93s/it] {'loss': 0.0046, 'learning_rate': 8.955000000000002e-06, 'epoch': 10.76} 82%|████████▏ | 8218/10000 [32:16:36<6:53:48, 13.93s/it] 82%|████████▏ | 8219/10000 [32:16:50<6:53:46, 13.94s/it] {'loss': 0.0027, 'learning_rate': 8.95e-06, 'epoch': 10.76} 82%|████████▏ | 8219/10000 [32:16:50<6:53:46, 13.94s/it] 82%|████████▏ | 8220/10000 [32:17:04<6:54:32, 13.97s/it] {'loss': 0.0036, 'learning_rate': 8.945e-06, 'epoch': 10.76} 82%|████████▏ | 8220/10000 [32:17:04<6:54:32, 13.97s/it] 82%|████████▏ | 8221/10000 [32:17:18<6:53:26, 13.94s/it] {'loss': 0.0059, 'learning_rate': 8.939999999999999e-06, 'epoch': 10.76} 82%|████████▏ | 8221/10000 [32:17:18<6:53:26, 13.94s/it] 82%|████████▏ | 8222/10000 [32:17:32<6:53:14, 13.95s/it] {'loss': 0.0051, 'learning_rate': 8.935e-06, 'epoch': 10.76} 82%|████████▏ | 8222/10000 [32:17:32<6:53:14, 13.95s/it] 82%|████████▏ | 8223/10000 [32:17:46<6:54:46, 14.00s/it] {'loss': 0.0032, 'learning_rate': 8.930000000000001e-06, 'epoch': 10.76} 82%|████████▏ | 8223/10000 [32:17:46<6:54:46, 14.00s/it] 82%|████████▏ | 8224/10000 [32:18:00<6:53:32, 13.97s/it] {'loss': 0.0027, 'learning_rate': 8.925e-06, 'epoch': 10.76} 82%|████████▏ | 8224/10000 [32:18:00<6:53:32, 13.97s/it] 82%|████████▏ | 8225/10000 [32:18:14<6:53:16, 13.97s/it] {'loss': 0.004, 'learning_rate': 8.920000000000001e-06, 'epoch': 10.77} 82%|████████▏ | 8225/10000 [32:18:14<6:53:16, 13.97s/it] 82%|████████▏ | 8226/10000 [32:18:27<6:52:02, 13.94s/it] {'loss': 0.0065, 'learning_rate': 8.915e-06, 'epoch': 10.77} 82%|████████▏ | 8226/10000 [32:18:27<6:52:02, 13.94s/it] 82%|████████▏ | 8227/10000 [32:18:41<6:51:20, 13.92s/it] {'loss': 0.0051, 'learning_rate': 8.910000000000001e-06, 'epoch': 10.77} 82%|████████▏ | 8227/10000 [32:18:41<6:51:20, 13.92s/it] 82%|████████▏ | 8228/10000 [32:18:55<6:50:19, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.905e-06, 'epoch': 10.77} 82%|████████▏ | 8228/10000 [32:18:55<6:50:19, 13.89s/it] 82%|████████▏ | 8229/10000 [32:19:09<6:49:58, 13.89s/it] {'loss': 0.0044, 'learning_rate': 8.9e-06, 'epoch': 10.77} 82%|████████▏ | 8229/10000 [32:19:09<6:49:58, 13.89s/it] 82%|████████▏ | 8230/10000 [32:19:23<6:50:58, 13.93s/it] {'loss': 0.0045, 'learning_rate': 8.895e-06, 'epoch': 10.77} 82%|████████▏ | 8230/10000 [32:19:23<6:50:58, 13.93s/it] 82%|████████▏ | 8231/10000 [32:19:37<6:50:24, 13.92s/it] {'loss': 0.0029, 'learning_rate': 8.890000000000001e-06, 'epoch': 10.77} 82%|████████▏ | 8231/10000 [32:19:37<6:50:24, 13.92s/it] 82%|████████▏ | 8232/10000 [32:19:51<6:49:45, 13.91s/it] {'loss': 0.0037, 'learning_rate': 8.885e-06, 'epoch': 10.77} 82%|████████▏ | 8232/10000 [32:19:51<6:49:45, 13.91s/it] 82%|████████▏ | 8233/10000 [32:20:05<6:49:50, 13.92s/it] {'loss': 0.0046, 'learning_rate': 8.880000000000001e-06, 'epoch': 10.78} 82%|████████▏ | 8233/10000 [32:20:05<6:49:50, 13.92s/it] 82%|████████▏ | 8234/10000 [32:20:19<6:49:04, 13.90s/it] {'loss': 0.0034, 'learning_rate': 8.875e-06, 'epoch': 10.78} 82%|████████▏ | 8234/10000 [32:20:19<6:49:04, 13.90s/it] 82%|████████▏ | 8235/10000 [32:20:32<6:48:41, 13.89s/it] {'loss': 0.0059, 'learning_rate': 8.87e-06, 'epoch': 10.78} 82%|████████▏ | 8235/10000 [32:20:33<6:48:41, 13.89s/it] 82%|████████▏ | 8236/10000 [32:20:46<6:48:26, 13.89s/it] {'loss': 0.0037, 'learning_rate': 8.865e-06, 'epoch': 10.78} 82%|████████▏ | 8236/10000 [32:20:46<6:48:26, 13.89s/it] 82%|████████▏ | 8237/10000 [32:21:00<6:49:24, 13.93s/it] {'loss': 0.0026, 'learning_rate': 8.86e-06, 'epoch': 10.78} 82%|████████▏ | 8237/10000 [32:21:00<6:49:24, 13.93s/it] 82%|████████▏ | 8238/10000 [32:21:14<6:48:48, 13.92s/it] {'loss': 0.0044, 'learning_rate': 8.855e-06, 'epoch': 10.78} 82%|████████▏ | 8238/10000 [32:21:14<6:48:48, 13.92s/it] 82%|████████▏ | 8239/10000 [32:21:28<6:49:17, 13.95s/it] {'loss': 0.0025, 'learning_rate': 8.85e-06, 'epoch': 10.78} 82%|████████▏ | 8239/10000 [32:21:28<6:49:17, 13.95s/it] 82%|████████▏ | 8240/10000 [32:21:42<6:47:43, 13.90s/it] {'loss': 0.0031, 'learning_rate': 8.845000000000001e-06, 'epoch': 10.79} 82%|████████▏ | 8240/10000 [32:21:42<6:47:43, 13.90s/it] 82%|████████▏ | 8241/10000 [32:21:56<6:48:14, 13.93s/it] {'loss': 0.0051, 'learning_rate': 8.840000000000002e-06, 'epoch': 10.79} 82%|████████▏ | 8241/10000 [32:21:56<6:48:14, 13.93s/it] 82%|████████▏ | 8242/10000 [32:22:10<6:48:23, 13.94s/it] {'loss': 0.0063, 'learning_rate': 8.835000000000001e-06, 'epoch': 10.79} 82%|████████▏ | 8242/10000 [32:22:10<6:48:23, 13.94s/it] 82%|████████▏ | 8243/10000 [32:22:24<6:48:09, 13.94s/it] {'loss': 0.0034, 'learning_rate': 8.83e-06, 'epoch': 10.79} 82%|████████▏ | 8243/10000 [32:22:24<6:48:09, 13.94s/it] 82%|████████▏ | 8244/10000 [32:22:38<6:47:21, 13.92s/it] {'loss': 0.0036, 'learning_rate': 8.825e-06, 'epoch': 10.79} 82%|████████▏ | 8244/10000 [32:22:38<6:47:21, 13.92s/it] 82%|████████▏ | 8245/10000 [32:22:52<6:48:21, 13.96s/it] {'loss': 0.0039, 'learning_rate': 8.82e-06, 'epoch': 10.79} 82%|████████▏ | 8245/10000 [32:22:52<6:48:21, 13.96s/it] 82%|████████▏ | 8246/10000 [32:23:06<6:48:47, 13.98s/it] {'loss': 0.0035, 'learning_rate': 8.815000000000001e-06, 'epoch': 10.79} 82%|████████▏ | 8246/10000 [32:23:06<6:48:47, 13.98s/it] 82%|████████▏ | 8247/10000 [32:23:20<6:48:31, 13.98s/it] {'loss': 0.0025, 'learning_rate': 8.81e-06, 'epoch': 10.79} 82%|████████▏ | 8247/10000 [32:23:20<6:48:31, 13.98s/it] 82%|████████▏ | 8248/10000 [32:23:34<6:47:29, 13.96s/it] {'loss': 0.0044, 'learning_rate': 8.805000000000001e-06, 'epoch': 10.8} 82%|████████▏ | 8248/10000 [32:23:34<6:47:29, 13.96s/it] 82%|████████▏ | 8249/10000 [32:23:48<6:46:02, 13.91s/it] {'loss': 0.0042, 'learning_rate': 8.8e-06, 'epoch': 10.8} 82%|████████▏ | 8249/10000 [32:23:48<6:46:02, 13.91s/it] 82%|████████▎ | 8250/10000 [32:24:01<6:45:03, 13.89s/it] {'loss': 0.0043, 'learning_rate': 8.795e-06, 'epoch': 10.8} 82%|████████▎ | 8250/10000 [32:24:01<6:45:03, 13.89s/it] 83%|████████▎ | 8251/10000 [32:24:15<6:44:56, 13.89s/it] {'loss': 0.0033, 'learning_rate': 8.79e-06, 'epoch': 10.8} 83%|████████▎ | 8251/10000 [32:24:15<6:44:56, 13.89s/it] 83%|████████▎ | 8252/10000 [32:24:29<6:43:52, 13.86s/it] {'loss': 0.008, 'learning_rate': 8.785e-06, 'epoch': 10.8} 83%|████████▎ | 8252/10000 [32:24:29<6:43:52, 13.86s/it] 83%|████████▎ | 8253/10000 [32:24:43<6:43:59, 13.88s/it] {'loss': 0.003, 'learning_rate': 8.78e-06, 'epoch': 10.8} 83%|████████▎ | 8253/10000 [32:24:43<6:43:59, 13.88s/it] 83%|████████▎ | 8254/10000 [32:24:57<6:43:14, 13.86s/it] {'loss': 0.0034, 'learning_rate': 8.775e-06, 'epoch': 10.8} 83%|████████▎ | 8254/10000 [32:24:57<6:43:14, 13.86s/it] 83%|████████▎ | 8255/10000 [32:25:11<6:43:00, 13.86s/it] {'loss': 0.0045, 'learning_rate': 8.77e-06, 'epoch': 10.8} 83%|████████▎ | 8255/10000 [32:25:11<6:43:00, 13.86s/it] 83%|████████▎ | 8256/10000 [32:25:24<6:41:58, 13.83s/it] {'loss': 0.0031, 'learning_rate': 8.765000000000002e-06, 'epoch': 10.81} 83%|████████▎ | 8256/10000 [32:25:25<6:41:58, 13.83s/it] 83%|████████▎ | 8257/10000 [32:25:38<6:41:52, 13.83s/it] {'loss': 0.0041, 'learning_rate': 8.76e-06, 'epoch': 10.81} 83%|████████▎ | 8257/10000 [32:25:38<6:41:52, 13.83s/it] 83%|████████▎ | 8258/10000 [32:25:52<6:42:55, 13.88s/it] {'loss': 0.0031, 'learning_rate': 8.755e-06, 'epoch': 10.81} 83%|████████▎ | 8258/10000 [32:25:52<6:42:55, 13.88s/it] 83%|████████▎ | 8259/10000 [32:26:06<6:43:21, 13.90s/it] {'loss': 0.0043, 'learning_rate': 8.75e-06, 'epoch': 10.81} 83%|████████▎ | 8259/10000 [32:26:06<6:43:21, 13.90s/it] 83%|████████▎ | 8260/10000 [32:26:20<6:44:51, 13.96s/it] {'loss': 0.0044, 'learning_rate': 8.745e-06, 'epoch': 10.81} 83%|████████▎ | 8260/10000 [32:26:20<6:44:51, 13.96s/it] 83%|████████▎ | 8261/10000 [32:26:34<6:43:33, 13.92s/it] {'loss': 0.0033, 'learning_rate': 8.740000000000001e-06, 'epoch': 10.81} 83%|████████▎ | 8261/10000 [32:26:34<6:43:33, 13.92s/it] 83%|████████▎ | 8262/10000 [32:26:48<6:42:51, 13.91s/it] {'loss': 0.004, 'learning_rate': 8.735e-06, 'epoch': 10.81} 83%|████████▎ | 8262/10000 [32:26:48<6:42:51, 13.91s/it] 83%|████████▎ | 8263/10000 [32:27:02<6:42:44, 13.91s/it] {'loss': 0.0034, 'learning_rate': 8.730000000000001e-06, 'epoch': 10.82} 83%|████████▎ | 8263/10000 [32:27:02<6:42:44, 13.91s/it] 83%|████████▎ | 8264/10000 [32:27:16<6:42:40, 13.92s/it] {'loss': 0.0046, 'learning_rate': 8.725e-06, 'epoch': 10.82} 83%|████████▎ | 8264/10000 [32:27:16<6:42:40, 13.92s/it] 83%|████████▎ | 8265/10000 [32:27:30<6:42:16, 13.91s/it] {'loss': 0.0038, 'learning_rate': 8.720000000000001e-06, 'epoch': 10.82} 83%|████████▎ | 8265/10000 [32:27:30<6:42:16, 13.91s/it] 83%|████████▎ | 8266/10000 [32:27:44<6:41:44, 13.90s/it] {'loss': 0.0037, 'learning_rate': 8.715e-06, 'epoch': 10.82} 83%|████████▎ | 8266/10000 [32:27:44<6:41:44, 13.90s/it] 83%|████████▎ | 8267/10000 [32:27:58<6:42:35, 13.94s/it] {'loss': 0.0028, 'learning_rate': 8.71e-06, 'epoch': 10.82} 83%|████████▎ | 8267/10000 [32:27:58<6:42:35, 13.94s/it] 83%|████████▎ | 8268/10000 [32:28:12<6:41:24, 13.91s/it] {'loss': 0.0037, 'learning_rate': 8.705e-06, 'epoch': 10.82} 83%|████████▎ | 8268/10000 [32:28:12<6:41:24, 13.91s/it] 83%|████████▎ | 8269/10000 [32:28:26<6:41:36, 13.92s/it] {'loss': 0.0044, 'learning_rate': 8.7e-06, 'epoch': 10.82} 83%|████████▎ | 8269/10000 [32:28:26<6:41:36, 13.92s/it] 83%|████████▎ | 8270/10000 [32:28:39<6:41:10, 13.91s/it] {'loss': 0.0056, 'learning_rate': 8.695e-06, 'epoch': 10.82} 83%|████████▎ | 8270/10000 [32:28:39<6:41:10, 13.91s/it] 83%|████████▎ | 8271/10000 [32:28:53<6:40:42, 13.91s/it] {'loss': 0.0039, 'learning_rate': 8.690000000000002e-06, 'epoch': 10.83} 83%|████████▎ | 8271/10000 [32:28:53<6:40:42, 13.91s/it] 83%|████████▎ | 8272/10000 [32:29:07<6:40:37, 13.91s/it] {'loss': 0.0029, 'learning_rate': 8.685e-06, 'epoch': 10.83} 83%|████████▎ | 8272/10000 [32:29:07<6:40:37, 13.91s/it] 83%|████████▎ | 8273/10000 [32:29:21<6:39:41, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.68e-06, 'epoch': 10.83} 83%|████████▎ | 8273/10000 [32:29:21<6:39:41, 13.89s/it] 83%|████████▎ | 8274/10000 [32:29:35<6:39:36, 13.89s/it] {'loss': 0.0026, 'learning_rate': 8.674999999999999e-06, 'epoch': 10.83} 83%|████████▎ | 8274/10000 [32:29:35<6:39:36, 13.89s/it] 83%|████████▎ | 8275/10000 [32:29:49<6:39:26, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.67e-06, 'epoch': 10.83} 83%|████████▎ | 8275/10000 [32:29:49<6:39:26, 13.89s/it] 83%|████████▎ | 8276/10000 [32:30:03<6:38:58, 13.89s/it] {'loss': 0.0046, 'learning_rate': 8.665000000000001e-06, 'epoch': 10.83} 83%|████████▎ | 8276/10000 [32:30:03<6:38:58, 13.89s/it] 83%|████████▎ | 8277/10000 [32:30:17<6:38:40, 13.88s/it] {'loss': 0.0036, 'learning_rate': 8.66e-06, 'epoch': 10.83} 83%|████████▎ | 8277/10000 [32:30:17<6:38:40, 13.88s/it] 83%|████████▎ | 8278/10000 [32:30:31<6:38:45, 13.89s/it] {'loss': 0.0041, 'learning_rate': 8.655000000000001e-06, 'epoch': 10.84} 83%|████████▎ | 8278/10000 [32:30:31<6:38:45, 13.89s/it] 83%|████████▎ | 8279/10000 [32:30:44<6:38:11, 13.88s/it] {'loss': 0.0044, 'learning_rate': 8.65e-06, 'epoch': 10.84} 83%|████████▎ | 8279/10000 [32:30:44<6:38:11, 13.88s/it] 83%|████████▎ | 8280/10000 [32:30:58<6:37:09, 13.85s/it] {'loss': 0.0048, 'learning_rate': 8.645000000000001e-06, 'epoch': 10.84} 83%|████████▎ | 8280/10000 [32:30:58<6:37:09, 13.85s/it] 83%|████████▎ | 8281/10000 [32:31:12<6:37:12, 13.86s/it] {'loss': 0.0057, 'learning_rate': 8.64e-06, 'epoch': 10.84} 83%|████████▎ | 8281/10000 [32:31:12<6:37:12, 13.86s/it] 83%|████████▎ | 8282/10000 [32:31:26<6:38:39, 13.92s/it] {'loss': 0.0043, 'learning_rate': 8.635e-06, 'epoch': 10.84} 83%|████████▎ | 8282/10000 [32:31:26<6:38:39, 13.92s/it] 83%|████████▎ | 8283/10000 [32:31:40<6:38:17, 13.92s/it] {'loss': 0.0022, 'learning_rate': 8.63e-06, 'epoch': 10.84} 83%|████████▎ | 8283/10000 [32:31:40<6:38:17, 13.92s/it] 83%|████████▎ | 8284/10000 [32:31:54<6:39:01, 13.95s/it] {'loss': 0.0029, 'learning_rate': 8.625e-06, 'epoch': 10.84} 83%|████████▎ | 8284/10000 [32:31:54<6:39:01, 13.95s/it] 83%|████████▎ | 8285/10000 [32:32:08<6:39:13, 13.97s/it] {'loss': 0.0031, 'learning_rate': 8.62e-06, 'epoch': 10.84} 83%|████████▎ | 8285/10000 [32:32:08<6:39:13, 13.97s/it] 83%|████████▎ | 8286/10000 [32:32:22<6:38:39, 13.96s/it] {'loss': 0.0076, 'learning_rate': 8.615000000000001e-06, 'epoch': 10.85} 83%|████████▎ | 8286/10000 [32:32:22<6:38:39, 13.96s/it] 83%|████████▎ | 8287/10000 [32:32:36<6:38:02, 13.94s/it] {'loss': 0.0043, 'learning_rate': 8.61e-06, 'epoch': 10.85} 83%|████████▎ | 8287/10000 [32:32:36<6:38:02, 13.94s/it] 83%|████████▎ | 8288/10000 [32:32:50<6:37:02, 13.92s/it] {'loss': 0.0035, 'learning_rate': 8.605e-06, 'epoch': 10.85} 83%|████████▎ | 8288/10000 [32:32:50<6:37:02, 13.92s/it] 83%|████████▎ | 8289/10000 [32:33:04<6:35:58, 13.89s/it] {'loss': 0.0025, 'learning_rate': 8.599999999999999e-06, 'epoch': 10.85} 83%|████████▎ | 8289/10000 [32:33:04<6:35:58, 13.89s/it] 83%|████████▎ | 8290/10000 [32:33:17<6:36:01, 13.90s/it] {'loss': 0.0037, 'learning_rate': 8.595e-06, 'epoch': 10.85} 83%|████████▎ | 8290/10000 [32:33:18<6:36:01, 13.90s/it] 83%|████████▎ | 8291/10000 [32:33:32<6:37:32, 13.96s/it] {'loss': 0.0032, 'learning_rate': 8.59e-06, 'epoch': 10.85} 83%|████████▎ | 8291/10000 [32:33:32<6:37:32, 13.96s/it] 83%|████████▎ | 8292/10000 [32:33:46<6:38:50, 14.01s/it] {'loss': 0.0021, 'learning_rate': 8.585e-06, 'epoch': 10.85} 83%|████████▎ | 8292/10000 [32:33:46<6:38:50, 14.01s/it] 83%|████████▎ | 8293/10000 [32:34:00<6:37:47, 13.98s/it] {'loss': 0.0027, 'learning_rate': 8.580000000000001e-06, 'epoch': 10.85} 83%|████████▎ | 8293/10000 [32:34:00<6:37:47, 13.98s/it] 83%|████████▎ | 8294/10000 [32:34:14<6:37:18, 13.97s/it] {'loss': 0.0047, 'learning_rate': 8.575000000000002e-06, 'epoch': 10.86} 83%|████████▎ | 8294/10000 [32:34:14<6:37:18, 13.97s/it] 83%|████████▎ | 8295/10000 [32:34:28<6:38:19, 14.02s/it] {'loss': 0.004, 'learning_rate': 8.570000000000001e-06, 'epoch': 10.86} 83%|████████▎ | 8295/10000 [32:34:28<6:38:19, 14.02s/it] 83%|████████▎ | 8296/10000 [32:34:41<6:35:53, 13.94s/it] {'loss': 0.0051, 'learning_rate': 8.565e-06, 'epoch': 10.86} 83%|████████▎ | 8296/10000 [32:34:42<6:35:53, 13.94s/it] 83%|████████▎ | 8297/10000 [32:34:56<6:36:59, 13.99s/it] {'loss': 0.0081, 'learning_rate': 8.56e-06, 'epoch': 10.86} 83%|████████▎ | 8297/10000 [32:34:56<6:36:59, 13.99s/it] 83%|████████▎ | 8298/10000 [32:35:10<6:36:28, 13.98s/it] {'loss': 0.0045, 'learning_rate': 8.555e-06, 'epoch': 10.86} 83%|████████▎ | 8298/10000 [32:35:10<6:36:28, 13.98s/it] 83%|████████▎ | 8299/10000 [32:35:23<6:36:11, 13.98s/it] {'loss': 0.0033, 'learning_rate': 8.550000000000001e-06, 'epoch': 10.86} 83%|████████▎ | 8299/10000 [32:35:24<6:36:11, 13.98s/it] 83%|████████▎ | 8300/10000 [32:35:37<6:35:06, 13.94s/it] {'loss': 0.005, 'learning_rate': 8.545e-06, 'epoch': 10.86} 83%|████████▎ | 8300/10000 [32:35:37<6:35:06, 13.94s/it] 83%|████████▎ | 8301/10000 [32:35:51<6:34:07, 13.92s/it] {'loss': 0.0036, 'learning_rate': 8.540000000000001e-06, 'epoch': 10.87} 83%|████████▎ | 8301/10000 [32:35:51<6:34:07, 13.92s/it] 83%|████████▎ | 8302/10000 [32:36:05<6:34:23, 13.94s/it] {'loss': 0.0054, 'learning_rate': 8.535e-06, 'epoch': 10.87} 83%|████████▎ | 8302/10000 [32:36:05<6:34:23, 13.94s/it] 83%|████████▎ | 8303/10000 [32:36:19<6:34:00, 13.93s/it] {'loss': 0.0032, 'learning_rate': 8.53e-06, 'epoch': 10.87} 83%|████████▎ | 8303/10000 [32:36:19<6:34:00, 13.93s/it] 83%|████████▎ | 8304/10000 [32:36:33<6:35:32, 13.99s/it] {'loss': 0.0045, 'learning_rate': 8.525e-06, 'epoch': 10.87} 83%|████████▎ | 8304/10000 [32:36:33<6:35:32, 13.99s/it] 83%|████████▎ | 8305/10000 [32:36:47<6:33:52, 13.94s/it] {'loss': 0.0044, 'learning_rate': 8.52e-06, 'epoch': 10.87} 83%|████████▎ | 8305/10000 [32:36:47<6:33:52, 13.94s/it] 83%|████████▎ | 8306/10000 [32:37:01<6:33:37, 13.94s/it] {'loss': 0.0027, 'learning_rate': 8.515e-06, 'epoch': 10.87} 83%|████████▎ | 8306/10000 [32:37:01<6:33:37, 13.94s/it] 83%|████████▎ | 8307/10000 [32:37:15<6:33:24, 13.94s/it] {'loss': 0.004, 'learning_rate': 8.51e-06, 'epoch': 10.87} 83%|████████▎ | 8307/10000 [32:37:15<6:33:24, 13.94s/it] 83%|████████▎ | 8308/10000 [32:37:29<6:32:57, 13.93s/it] {'loss': 0.0048, 'learning_rate': 8.505e-06, 'epoch': 10.87} 83%|████████▎ | 8308/10000 [32:37:29<6:32:57, 13.93s/it] 83%|████████▎ | 8309/10000 [32:37:43<6:33:00, 13.94s/it] {'loss': 0.0035, 'learning_rate': 8.500000000000002e-06, 'epoch': 10.88} 83%|████████▎ | 8309/10000 [32:37:43<6:33:00, 13.94s/it] 83%|████████▎ | 8310/10000 [32:37:57<6:32:28, 13.93s/it] {'loss': 0.004, 'learning_rate': 8.495e-06, 'epoch': 10.88} 83%|████████▎ | 8310/10000 [32:37:57<6:32:28, 13.93s/it] 83%|████████▎ | 8311/10000 [32:38:11<6:31:54, 13.92s/it] {'loss': 0.004, 'learning_rate': 8.49e-06, 'epoch': 10.88} 83%|████████▎ | 8311/10000 [32:38:11<6:31:54, 13.92s/it] 83%|████████▎ | 8312/10000 [32:38:25<6:31:24, 13.91s/it] {'loss': 0.0032, 'learning_rate': 8.485e-06, 'epoch': 10.88} 83%|████████▎ | 8312/10000 [32:38:25<6:31:24, 13.91s/it] 83%|████████▎ | 8313/10000 [32:38:38<6:30:43, 13.90s/it] {'loss': 0.004, 'learning_rate': 8.48e-06, 'epoch': 10.88} 83%|████████▎ | 8313/10000 [32:38:38<6:30:43, 13.90s/it] 83%|████████▎ | 8314/10000 [32:38:52<6:31:04, 13.92s/it] {'loss': 0.0031, 'learning_rate': 8.475000000000001e-06, 'epoch': 10.88} 83%|████████▎ | 8314/10000 [32:38:52<6:31:04, 13.92s/it] 83%|████████▎ | 8315/10000 [32:39:06<6:31:45, 13.95s/it] {'loss': 0.0036, 'learning_rate': 8.47e-06, 'epoch': 10.88} 83%|████████▎ | 8315/10000 [32:39:06<6:31:45, 13.95s/it] 83%|████████▎ | 8316/10000 [32:39:20<6:30:19, 13.91s/it] {'loss': 0.0047, 'learning_rate': 8.465000000000001e-06, 'epoch': 10.88} 83%|████████▎ | 8316/10000 [32:39:20<6:30:19, 13.91s/it] 83%|████████▎ | 8317/10000 [32:39:34<6:29:36, 13.89s/it] {'loss': 0.0031, 'learning_rate': 8.46e-06, 'epoch': 10.89} 83%|████████▎ | 8317/10000 [32:39:34<6:29:36, 13.89s/it] 83%|████████▎ | 8318/10000 [32:39:48<6:29:43, 13.90s/it] {'loss': 0.0046, 'learning_rate': 8.455000000000001e-06, 'epoch': 10.89} 83%|████████▎ | 8318/10000 [32:39:48<6:29:43, 13.90s/it] 83%|████████▎ | 8319/10000 [32:40:02<6:29:29, 13.90s/it] {'loss': 0.0116, 'learning_rate': 8.45e-06, 'epoch': 10.89} 83%|████████▎ | 8319/10000 [32:40:02<6:29:29, 13.90s/it] 83%|████████▎ | 8320/10000 [32:40:16<6:29:42, 13.92s/it] {'loss': 0.004, 'learning_rate': 8.445e-06, 'epoch': 10.89} 83%|████████▎ | 8320/10000 [32:40:16<6:29:42, 13.92s/it] 83%|████████▎ | 8321/10000 [32:40:30<6:29:28, 13.92s/it] {'loss': 0.003, 'learning_rate': 8.44e-06, 'epoch': 10.89} 83%|████████▎ | 8321/10000 [32:40:30<6:29:28, 13.92s/it] 83%|████████▎ | 8322/10000 [32:40:44<6:29:13, 13.92s/it] {'loss': 0.0032, 'learning_rate': 8.435e-06, 'epoch': 10.89} 83%|████████▎ | 8322/10000 [32:40:44<6:29:13, 13.92s/it] 83%|████████▎ | 8323/10000 [32:40:58<6:28:33, 13.90s/it] {'loss': 0.0045, 'learning_rate': 8.43e-06, 'epoch': 10.89} 83%|████████▎ | 8323/10000 [32:40:58<6:28:33, 13.90s/it] 83%|████████▎ | 8324/10000 [32:41:11<6:28:38, 13.91s/it] {'loss': 0.0043, 'learning_rate': 8.425000000000001e-06, 'epoch': 10.9} 83%|████████▎ | 8324/10000 [32:41:12<6:28:38, 13.91s/it] 83%|████████▎ | 8325/10000 [32:41:25<6:28:47, 13.93s/it] {'loss': 0.0031, 'learning_rate': 8.42e-06, 'epoch': 10.9} 83%|████████▎ | 8325/10000 [32:41:25<6:28:47, 13.93s/it] 83%|████████▎ | 8326/10000 [32:41:39<6:28:32, 13.93s/it] {'loss': 0.0034, 'learning_rate': 8.415e-06, 'epoch': 10.9} 83%|████████▎ | 8326/10000 [32:41:39<6:28:32, 13.93s/it] 83%|████████▎ | 8327/10000 [32:41:53<6:29:53, 13.98s/it] {'loss': 0.0039, 'learning_rate': 8.409999999999999e-06, 'epoch': 10.9} 83%|████████▎ | 8327/10000 [32:41:54<6:29:53, 13.98s/it] 83%|████████▎ | 8328/10000 [32:42:07<6:29:52, 13.99s/it] {'loss': 0.0038, 'learning_rate': 8.405e-06, 'epoch': 10.9} 83%|████████▎ | 8328/10000 [32:42:08<6:29:52, 13.99s/it] 83%|████████▎ | 8329/10000 [32:42:21<6:29:13, 13.98s/it] {'loss': 0.0044, 'learning_rate': 8.400000000000001e-06, 'epoch': 10.9} 83%|████████▎ | 8329/10000 [32:42:21<6:29:13, 13.98s/it] 83%|████████▎ | 8330/10000 [32:42:35<6:28:44, 13.97s/it] {'loss': 0.0031, 'learning_rate': 8.395e-06, 'epoch': 10.9} 83%|████████▎ | 8330/10000 [32:42:35<6:28:44, 13.97s/it] 83%|████████▎ | 8331/10000 [32:42:49<6:27:29, 13.93s/it] {'loss': 0.0034, 'learning_rate': 8.390000000000001e-06, 'epoch': 10.9} 83%|████████▎ | 8331/10000 [32:42:49<6:27:29, 13.93s/it] 83%|████████▎ | 8332/10000 [32:43:03<6:26:38, 13.91s/it] {'loss': 0.0046, 'learning_rate': 8.385e-06, 'epoch': 10.91} 83%|████████▎ | 8332/10000 [32:43:03<6:26:38, 13.91s/it] 83%|████████▎ | 8333/10000 [32:43:17<6:26:34, 13.91s/it] {'loss': 0.0045, 'learning_rate': 8.380000000000001e-06, 'epoch': 10.91} 83%|████████▎ | 8333/10000 [32:43:17<6:26:34, 13.91s/it] 83%|████████▎ | 8334/10000 [32:43:31<6:26:24, 13.92s/it] {'loss': 0.0056, 'learning_rate': 8.375e-06, 'epoch': 10.91} 83%|████████▎ | 8334/10000 [32:43:31<6:26:24, 13.92s/it] 83%|████████▎ | 8335/10000 [32:43:45<6:25:35, 13.90s/it] {'loss': 0.0053, 'learning_rate': 8.37e-06, 'epoch': 10.91} 83%|████████▎ | 8335/10000 [32:43:45<6:25:35, 13.90s/it] 83%|████████▎ | 8336/10000 [32:43:59<6:25:52, 13.91s/it] {'loss': 0.0033, 'learning_rate': 8.365e-06, 'epoch': 10.91} 83%|████████▎ | 8336/10000 [32:43:59<6:25:52, 13.91s/it] 83%|████████▎ | 8337/10000 [32:44:13<6:26:00, 13.93s/it] {'loss': 0.004, 'learning_rate': 8.36e-06, 'epoch': 10.91} 83%|████████▎ | 8337/10000 [32:44:13<6:26:00, 13.93s/it] 83%|████████▎ | 8338/10000 [32:44:26<6:24:57, 13.90s/it] {'loss': 0.0027, 'learning_rate': 8.355e-06, 'epoch': 10.91} 83%|████████▎ | 8338/10000 [32:44:27<6:24:57, 13.90s/it] 83%|████████▎ | 8339/10000 [32:44:40<6:24:35, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.350000000000001e-06, 'epoch': 10.91} 83%|████████▎ | 8339/10000 [32:44:40<6:24:35, 13.89s/it] 83%|████████▎ | 8340/10000 [32:44:54<6:24:53, 13.91s/it] {'loss': 0.0045, 'learning_rate': 8.345e-06, 'epoch': 10.92} 83%|████████▎ | 8340/10000 [32:44:54<6:24:53, 13.91s/it] 83%|████████▎ | 8341/10000 [32:45:08<6:24:07, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.34e-06, 'epoch': 10.92} 83%|████████▎ | 8341/10000 [32:45:08<6:24:07, 13.89s/it] 83%|████████▎ | 8342/10000 [32:45:22<6:25:10, 13.94s/it] {'loss': 0.0035, 'learning_rate': 8.334999999999999e-06, 'epoch': 10.92} 83%|████████▎ | 8342/10000 [32:45:22<6:25:10, 13.94s/it] 83%|████████▎ | 8343/10000 [32:45:36<6:24:31, 13.92s/it] {'loss': 0.0069, 'learning_rate': 8.33e-06, 'epoch': 10.92} 83%|████████▎ | 8343/10000 [32:45:36<6:24:31, 13.92s/it] 83%|████████▎ | 8344/10000 [32:45:50<6:24:15, 13.92s/it] {'loss': 0.0047, 'learning_rate': 8.325e-06, 'epoch': 10.92} 83%|████████▎ | 8344/10000 [32:45:50<6:24:15, 13.92s/it] 83%|████████▎ | 8345/10000 [32:46:04<6:23:32, 13.90s/it] {'loss': 0.0033, 'learning_rate': 8.32e-06, 'epoch': 10.92} 83%|████████▎ | 8345/10000 [32:46:04<6:23:32, 13.90s/it] 83%|████████▎ | 8346/10000 [32:46:18<6:22:48, 13.89s/it] {'loss': 0.0044, 'learning_rate': 8.315000000000001e-06, 'epoch': 10.92} 83%|████████▎ | 8346/10000 [32:46:18<6:22:48, 13.89s/it] 83%|████████▎ | 8347/10000 [32:46:32<6:23:15, 13.91s/it] {'loss': 0.0024, 'learning_rate': 8.31e-06, 'epoch': 10.93} 83%|████████▎ | 8347/10000 [32:46:32<6:23:15, 13.91s/it] 83%|████████▎ | 8348/10000 [32:46:46<6:23:27, 13.93s/it] {'loss': 0.0034, 'learning_rate': 8.305000000000001e-06, 'epoch': 10.93} 83%|████████▎ | 8348/10000 [32:46:46<6:23:27, 13.93s/it] 83%|████████▎ | 8349/10000 [32:47:00<6:24:05, 13.96s/it] {'loss': 0.0038, 'learning_rate': 8.3e-06, 'epoch': 10.93} 83%|████████▎ | 8349/10000 [32:47:00<6:24:05, 13.96s/it] 84%|████████▎ | 8350/10000 [32:47:14<6:24:50, 13.99s/it] {'loss': 0.0029, 'learning_rate': 8.295e-06, 'epoch': 10.93} 84%|████████▎ | 8350/10000 [32:47:14<6:24:50, 13.99s/it] 84%|████████▎ | 8351/10000 [32:47:28<6:24:22, 13.99s/it] {'loss': 0.0034, 'learning_rate': 8.29e-06, 'epoch': 10.93} 84%|████████▎ | 8351/10000 [32:47:28<6:24:22, 13.99s/it] 84%|████████▎ | 8352/10000 [32:47:42<6:23:54, 13.98s/it] {'loss': 0.0036, 'learning_rate': 8.285e-06, 'epoch': 10.93} 84%|████████▎ | 8352/10000 [32:47:42<6:23:54, 13.98s/it] 84%|████████▎ | 8353/10000 [32:47:56<6:25:04, 14.03s/it] {'loss': 0.0037, 'learning_rate': 8.28e-06, 'epoch': 10.93} 84%|████████▎ | 8353/10000 [32:47:56<6:25:04, 14.03s/it] 84%|████████▎ | 8354/10000 [32:48:10<6:23:25, 13.98s/it] {'loss': 0.0025, 'learning_rate': 8.275000000000001e-06, 'epoch': 10.93} 84%|████████▎ | 8354/10000 [32:48:10<6:23:25, 13.98s/it] 84%|████████▎ | 8355/10000 [32:48:24<6:22:23, 13.95s/it] {'loss': 0.0053, 'learning_rate': 8.27e-06, 'epoch': 10.94} 84%|████████▎ | 8355/10000 [32:48:24<6:22:23, 13.95s/it] 84%|████████▎ | 8356/10000 [32:48:38<6:22:42, 13.97s/it] {'loss': 0.0024, 'learning_rate': 8.265000000000001e-06, 'epoch': 10.94} 84%|████████▎ | 8356/10000 [32:48:38<6:22:42, 13.97s/it] 84%|████████▎ | 8357/10000 [32:48:51<6:21:01, 13.91s/it] {'loss': 0.0028, 'learning_rate': 8.26e-06, 'epoch': 10.94} 84%|████████▎ | 8357/10000 [32:48:51<6:21:01, 13.91s/it] 84%|████████▎ | 8358/10000 [32:49:05<6:20:52, 13.92s/it] {'loss': 0.0028, 'learning_rate': 8.255e-06, 'epoch': 10.94} 84%|████████▎ | 8358/10000 [32:49:05<6:20:52, 13.92s/it] 84%|████████▎ | 8359/10000 [32:49:19<6:20:26, 13.91s/it] {'loss': 0.0036, 'learning_rate': 8.25e-06, 'epoch': 10.94} 84%|████████▎ | 8359/10000 [32:49:19<6:20:26, 13.91s/it] 84%|████████▎ | 8360/10000 [32:49:33<6:21:29, 13.96s/it] {'loss': 0.0048, 'learning_rate': 8.245e-06, 'epoch': 10.94} 84%|████████▎ | 8360/10000 [32:49:33<6:21:29, 13.96s/it] 84%|████████▎ | 8361/10000 [32:49:47<6:21:33, 13.97s/it] {'loss': 0.0037, 'learning_rate': 8.24e-06, 'epoch': 10.94} 84%|████████▎ | 8361/10000 [32:49:47<6:21:33, 13.97s/it] 84%|████████▎ | 8362/10000 [32:50:01<6:20:49, 13.95s/it] {'loss': 0.0031, 'learning_rate': 8.235000000000002e-06, 'epoch': 10.95} 84%|████████▎ | 8362/10000 [32:50:01<6:20:49, 13.95s/it] 84%|████████▎ | 8363/10000 [32:50:15<6:21:08, 13.97s/it] {'loss': 0.0037, 'learning_rate': 8.23e-06, 'epoch': 10.95} 84%|████████▎ | 8363/10000 [32:50:15<6:21:08, 13.97s/it] 84%|████████▎ | 8364/10000 [32:50:29<6:20:59, 13.97s/it] {'loss': 0.0028, 'learning_rate': 8.225e-06, 'epoch': 10.95} 84%|████████▎ | 8364/10000 [32:50:29<6:20:59, 13.97s/it] 84%|████████▎ | 8365/10000 [32:50:43<6:20:29, 13.96s/it] {'loss': 0.0039, 'learning_rate': 8.22e-06, 'epoch': 10.95} 84%|████████▎ | 8365/10000 [32:50:43<6:20:29, 13.96s/it] 84%|████████▎ | 8366/10000 [32:50:57<6:21:04, 13.99s/it] {'loss': 0.0018, 'learning_rate': 8.215e-06, 'epoch': 10.95} 84%|████████▎ | 8366/10000 [32:50:57<6:21:04, 13.99s/it] 84%|████████▎ | 8367/10000 [32:51:11<6:19:37, 13.95s/it] {'loss': 0.0036, 'learning_rate': 8.210000000000001e-06, 'epoch': 10.95} 84%|████████▎ | 8367/10000 [32:51:11<6:19:37, 13.95s/it] 84%|████████▎ | 8368/10000 [32:51:25<6:18:22, 13.91s/it] {'loss': 0.0034, 'learning_rate': 8.205e-06, 'epoch': 10.95} 84%|████████▎ | 8368/10000 [32:51:25<6:18:22, 13.91s/it] 84%|████████▎ | 8369/10000 [32:51:39<6:19:34, 13.96s/it] {'loss': 0.003, 'learning_rate': 8.200000000000001e-06, 'epoch': 10.95} 84%|████████▎ | 8369/10000 [32:51:39<6:19:34, 13.96s/it] 84%|████████▎ | 8370/10000 [32:51:53<6:18:25, 13.93s/it] {'loss': 0.004, 'learning_rate': 8.195e-06, 'epoch': 10.96} 84%|████████▎ | 8370/10000 [32:51:53<6:18:25, 13.93s/it] 84%|████████▎ | 8371/10000 [32:52:07<6:18:49, 13.95s/it] {'loss': 0.0054, 'learning_rate': 8.190000000000001e-06, 'epoch': 10.96} 84%|████████▎ | 8371/10000 [32:52:07<6:18:49, 13.95s/it] 84%|████████▎ | 8372/10000 [32:52:21<6:18:35, 13.95s/it] {'loss': 0.003, 'learning_rate': 8.185e-06, 'epoch': 10.96} 84%|████████▎ | 8372/10000 [32:52:21<6:18:35, 13.95s/it] 84%|████████▎ | 8373/10000 [32:52:35<6:17:53, 13.94s/it] {'loss': 0.004, 'learning_rate': 8.18e-06, 'epoch': 10.96} 84%|████████▎ | 8373/10000 [32:52:35<6:17:53, 13.94s/it] 84%|████████▎ | 8374/10000 [32:52:48<6:16:44, 13.90s/it] {'loss': 0.0033, 'learning_rate': 8.175e-06, 'epoch': 10.96} 84%|████████▎ | 8374/10000 [32:52:49<6:16:44, 13.90s/it] 84%|████████▍ | 8375/10000 [32:53:02<6:17:13, 13.93s/it] {'loss': 0.0041, 'learning_rate': 8.17e-06, 'epoch': 10.96} 84%|████████▍ | 8375/10000 [32:53:03<6:17:13, 13.93s/it] 84%|████████▍ | 8376/10000 [32:53:16<6:16:13, 13.90s/it] {'loss': 0.0032, 'learning_rate': 8.165e-06, 'epoch': 10.96} 84%|████████▍ | 8376/10000 [32:53:16<6:16:13, 13.90s/it] 84%|████████▍ | 8377/10000 [32:53:30<6:16:31, 13.92s/it] {'loss': 0.0037, 'learning_rate': 8.160000000000001e-06, 'epoch': 10.96} 84%|████████▍ | 8377/10000 [32:53:30<6:16:31, 13.92s/it] 84%|████████▍ | 8378/10000 [32:53:44<6:17:00, 13.95s/it] {'loss': 0.0024, 'learning_rate': 8.155e-06, 'epoch': 10.97} 84%|████████▍ | 8378/10000 [32:53:44<6:17:00, 13.95s/it] 84%|████████▍ | 8379/10000 [32:53:58<6:15:46, 13.91s/it] {'loss': 0.0024, 'learning_rate': 8.15e-06, 'epoch': 10.97} 84%|████████▍ | 8379/10000 [32:53:58<6:15:46, 13.91s/it] 84%|████████▍ | 8380/10000 [32:54:12<6:14:52, 13.88s/it] {'loss': 0.0033, 'learning_rate': 8.144999999999999e-06, 'epoch': 10.97} 84%|████████▍ | 8380/10000 [32:54:12<6:14:52, 13.88s/it] 84%|████████▍ | 8381/10000 [32:54:26<6:14:47, 13.89s/it] {'loss': 0.0035, 'learning_rate': 8.14e-06, 'epoch': 10.97} 84%|████████▍ | 8381/10000 [32:54:26<6:14:47, 13.89s/it] 84%|████████▍ | 8382/10000 [32:54:40<6:15:18, 13.92s/it] {'loss': 0.0042, 'learning_rate': 8.135000000000001e-06, 'epoch': 10.97} 84%|████████▍ | 8382/10000 [32:54:40<6:15:18, 13.92s/it] 84%|████████▍ | 8383/10000 [32:54:54<6:14:13, 13.89s/it] {'loss': 0.0031, 'learning_rate': 8.13e-06, 'epoch': 10.97} 84%|████████▍ | 8383/10000 [32:54:54<6:14:13, 13.89s/it] 84%|████████▍ | 8384/10000 [32:55:07<6:13:43, 13.88s/it] {'loss': 0.0022, 'learning_rate': 8.125000000000001e-06, 'epoch': 10.97} 84%|████████▍ | 8384/10000 [32:55:07<6:13:43, 13.88s/it] 84%|████████▍ | 8385/10000 [32:55:21<6:14:20, 13.91s/it] {'loss': 0.0032, 'learning_rate': 8.12e-06, 'epoch': 10.98} 84%|████████▍ | 8385/10000 [32:55:21<6:14:20, 13.91s/it] 84%|████████▍ | 8386/10000 [32:55:35<6:15:00, 13.94s/it] {'loss': 0.0025, 'learning_rate': 8.115000000000001e-06, 'epoch': 10.98} 84%|████████▍ | 8386/10000 [32:55:36<6:15:00, 13.94s/it] 84%|████████▍ | 8387/10000 [32:55:49<6:14:27, 13.93s/it] {'loss': 0.0039, 'learning_rate': 8.11e-06, 'epoch': 10.98} 84%|████████▍ | 8387/10000 [32:55:49<6:14:27, 13.93s/it] 84%|████████▍ | 8388/10000 [32:56:03<6:14:47, 13.95s/it] {'loss': 0.0032, 'learning_rate': 8.105e-06, 'epoch': 10.98} 84%|████████▍ | 8388/10000 [32:56:03<6:14:47, 13.95s/it] 84%|████████▍ | 8389/10000 [32:56:17<6:14:23, 13.94s/it] {'loss': 0.003, 'learning_rate': 8.1e-06, 'epoch': 10.98} 84%|████████▍ | 8389/10000 [32:56:17<6:14:23, 13.94s/it] 84%|████████▍ | 8390/10000 [32:56:31<6:14:22, 13.95s/it] {'loss': 0.0043, 'learning_rate': 8.095e-06, 'epoch': 10.98} 84%|████████▍ | 8390/10000 [32:56:31<6:14:22, 13.95s/it] 84%|████████▍ | 8391/10000 [32:56:45<6:13:55, 13.94s/it] {'loss': 0.0036, 'learning_rate': 8.09e-06, 'epoch': 10.98} 84%|████████▍ | 8391/10000 [32:56:45<6:13:55, 13.94s/it] 84%|████████▍ | 8392/10000 [32:56:59<6:13:21, 13.93s/it] {'loss': 0.0051, 'learning_rate': 8.085000000000001e-06, 'epoch': 10.98} 84%|████████▍ | 8392/10000 [32:56:59<6:13:21, 13.93s/it] 84%|████████▍ | 8393/10000 [32:57:13<6:12:18, 13.90s/it] {'loss': 0.0045, 'learning_rate': 8.08e-06, 'epoch': 10.99} 84%|████████▍ | 8393/10000 [32:57:13<6:12:18, 13.90s/it] 84%|████████▍ | 8394/10000 [32:57:27<6:13:24, 13.95s/it] {'loss': 0.0034, 'learning_rate': 8.075000000000001e-06, 'epoch': 10.99} 84%|████████▍ | 8394/10000 [32:57:27<6:13:24, 13.95s/it] 84%|████████▍ | 8395/10000 [32:57:41<6:13:42, 13.97s/it] {'loss': 0.0038, 'learning_rate': 8.069999999999999e-06, 'epoch': 10.99} 84%|████████▍ | 8395/10000 [32:57:41<6:13:42, 13.97s/it] 84%|████████▍ | 8396/10000 [32:57:55<6:13:18, 13.96s/it] {'loss': 0.0036, 'learning_rate': 8.065e-06, 'epoch': 10.99} 84%|████████▍ | 8396/10000 [32:57:55<6:13:18, 13.96s/it] 84%|████████▍ | 8397/10000 [32:58:09<6:13:18, 13.97s/it] {'loss': 0.0044, 'learning_rate': 8.06e-06, 'epoch': 10.99} 84%|████████▍ | 8397/10000 [32:58:09<6:13:18, 13.97s/it] 84%|████████▍ | 8398/10000 [32:58:23<6:13:10, 13.98s/it] {'loss': 0.0028, 'learning_rate': 8.055e-06, 'epoch': 10.99} 84%|████████▍ | 8398/10000 [32:58:23<6:13:10, 13.98s/it] 84%|████████▍ | 8399/10000 [32:58:37<6:12:57, 13.98s/it] {'loss': 0.0028, 'learning_rate': 8.050000000000001e-06, 'epoch': 10.99} 84%|████████▍ | 8399/10000 [32:58:37<6:12:57, 13.98s/it] 84%|████████▍ | 8400/10000 [32:58:51<6:12:00, 13.95s/it] {'loss': 0.0046, 'learning_rate': 8.045e-06, 'epoch': 10.99} 84%|████████▍ | 8400/10000 [32:58:51<6:12:00, 13.95s/it] 84%|████████▍ | 8401/10000 [32:59:05<6:11:45, 13.95s/it] {'loss': 0.0029, 'learning_rate': 8.040000000000001e-06, 'epoch': 11.0} 84%|████████▍ | 8401/10000 [32:59:05<6:11:45, 13.95s/it] 84%|████████▍ | 8402/10000 [32:59:19<6:10:36, 13.91s/it] {'loss': 0.002, 'learning_rate': 8.035e-06, 'epoch': 11.0} 84%|████████▍ | 8402/10000 [32:59:19<6:10:36, 13.91s/it] 84%|████████▍ | 8403/10000 [32:59:33<6:10:31, 13.92s/it] {'loss': 0.0032, 'learning_rate': 8.03e-06, 'epoch': 11.0} 84%|████████▍ | 8403/10000 [32:59:33<6:10:31, 13.92s/it] 84%|████████▍ | 8404/10000 [32:59:45<5:59:38, 13.52s/it] {'loss': 0.0036, 'learning_rate': 8.025e-06, 'epoch': 11.0} 84%|████████▍ | 8404/10000 [32:59:45<5:59:38, 13.52s/it] 84%|████████▍ | 8405/10000 [32:59:59<6:02:11, 13.63s/it] {'loss': 0.0034, 'learning_rate': 8.02e-06, 'epoch': 11.0} 84%|████████▍ | 8405/10000 [32:59:59<6:02:11, 13.63s/it] 84%|████████▍ | 8406/10000 [33:00:13<6:04:05, 13.70s/it] {'loss': 0.0039, 'learning_rate': 8.015e-06, 'epoch': 11.0} 84%|████████▍ | 8406/10000 [33:00:13<6:04:05, 13.70s/it] 84%|████████▍ | 8407/10000 [33:00:27<6:06:21, 13.80s/it] {'loss': 0.0027, 'learning_rate': 8.010000000000001e-06, 'epoch': 11.0} 84%|████████▍ | 8407/10000 [33:00:27<6:06:21, 13.80s/it] 84%|████████▍ | 8408/10000 [33:00:41<6:06:25, 13.81s/it] {'loss': 0.0034, 'learning_rate': 8.005e-06, 'epoch': 11.01} 84%|████████▍ | 8408/10000 [33:00:41<6:06:25, 13.81s/it] 84%|████████▍ | 8409/10000 [33:00:55<6:06:12, 13.81s/it] {'loss': 0.0032, 'learning_rate': 8.000000000000001e-06, 'epoch': 11.01} 84%|████████▍ | 8409/10000 [33:00:55<6:06:12, 13.81s/it] 84%|████████▍ | 8410/10000 [33:01:08<6:06:26, 13.83s/it] {'loss': 0.0032, 'learning_rate': 7.995e-06, 'epoch': 11.01} 84%|████████▍ | 8410/10000 [33:01:08<6:06:26, 13.83s/it] 84%|████████▍ | 8411/10000 [33:01:22<6:08:10, 13.90s/it] {'loss': 0.003, 'learning_rate': 7.99e-06, 'epoch': 11.01} 84%|████████▍ | 8411/10000 [33:01:23<6:08:10, 13.90s/it] 84%|████████▍ | 8412/10000 [33:01:36<6:07:57, 13.90s/it] {'loss': 0.003, 'learning_rate': 7.985e-06, 'epoch': 11.01} 84%|████████▍ | 8412/10000 [33:01:36<6:07:57, 13.90s/it] 84%|████████▍ | 8413/10000 [33:01:50<6:08:13, 13.92s/it] {'loss': 0.0017, 'learning_rate': 7.98e-06, 'epoch': 11.01} 84%|████████▍ | 8413/10000 [33:01:50<6:08:13, 13.92s/it] 84%|████████▍ | 8414/10000 [33:02:04<6:08:40, 13.95s/it] {'loss': 0.0037, 'learning_rate': 7.975e-06, 'epoch': 11.01} 84%|████████▍ | 8414/10000 [33:02:04<6:08:40, 13.95s/it] 84%|████████▍ | 8415/10000 [33:02:18<6:07:47, 13.92s/it] {'loss': 0.0035, 'learning_rate': 7.97e-06, 'epoch': 11.01} 84%|████████▍ | 8415/10000 [33:02:18<6:07:47, 13.92s/it] 84%|████████▍ | 8416/10000 [33:02:32<6:07:06, 13.91s/it] {'loss': 0.0016, 'learning_rate': 7.965e-06, 'epoch': 11.02} 84%|████████▍ | 8416/10000 [33:02:32<6:07:06, 13.91s/it] 84%|████████▍ | 8417/10000 [33:02:46<6:07:41, 13.94s/it] {'loss': 0.0024, 'learning_rate': 7.96e-06, 'epoch': 11.02} 84%|████████▍ | 8417/10000 [33:02:46<6:07:41, 13.94s/it] 84%|████████▍ | 8418/10000 [33:03:00<6:06:59, 13.92s/it] {'loss': 0.0027, 'learning_rate': 7.955e-06, 'epoch': 11.02} 84%|████████▍ | 8418/10000 [33:03:00<6:06:59, 13.92s/it] 84%|████████▍ | 8419/10000 [33:03:14<6:06:18, 13.90s/it] {'loss': 0.0025, 'learning_rate': 7.95e-06, 'epoch': 11.02} 84%|████████▍ | 8419/10000 [33:03:14<6:06:18, 13.90s/it] 84%|████████▍ | 8420/10000 [33:03:28<6:06:01, 13.90s/it] {'loss': 0.0023, 'learning_rate': 7.945000000000001e-06, 'epoch': 11.02} 84%|████████▍ | 8420/10000 [33:03:28<6:06:01, 13.90s/it] 84%|████████▍ | 8421/10000 [33:03:42<6:06:06, 13.91s/it] {'loss': 0.0023, 'learning_rate': 7.94e-06, 'epoch': 11.02} 84%|████████▍ | 8421/10000 [33:03:42<6:06:06, 13.91s/it] 84%|████████▍ | 8422/10000 [33:03:56<6:06:12, 13.92s/it] {'loss': 0.0034, 'learning_rate': 7.935000000000001e-06, 'epoch': 11.02} 84%|████████▍ | 8422/10000 [33:03:56<6:06:12, 13.92s/it] 84%|████████▍ | 8423/10000 [33:04:09<6:05:25, 13.90s/it] {'loss': 0.0026, 'learning_rate': 7.93e-06, 'epoch': 11.02} 84%|████████▍ | 8423/10000 [33:04:10<6:05:25, 13.90s/it] 84%|████████▍ | 8424/10000 [33:04:23<6:05:37, 13.92s/it] {'loss': 0.0021, 'learning_rate': 7.925000000000001e-06, 'epoch': 11.03} 84%|████████▍ | 8424/10000 [33:04:23<6:05:37, 13.92s/it] 84%|████████▍ | 8425/10000 [33:04:37<6:05:29, 13.92s/it] {'loss': 0.0037, 'learning_rate': 7.92e-06, 'epoch': 11.03} 84%|████████▍ | 8425/10000 [33:04:37<6:05:29, 13.92s/it] 84%|████████▍ | 8426/10000 [33:04:51<6:04:47, 13.91s/it] {'loss': 0.0027, 'learning_rate': 7.915e-06, 'epoch': 11.03} 84%|████████▍ | 8426/10000 [33:04:51<6:04:47, 13.91s/it] 84%|████████▍ | 8427/10000 [33:05:05<6:04:42, 13.91s/it] {'loss': 0.0033, 'learning_rate': 7.91e-06, 'epoch': 11.03} 84%|████████▍ | 8427/10000 [33:05:05<6:04:42, 13.91s/it] 84%|████████▍ | 8428/10000 [33:05:19<6:05:19, 13.94s/it] {'loss': 0.0035, 'learning_rate': 7.905e-06, 'epoch': 11.03} 84%|████████▍ | 8428/10000 [33:05:19<6:05:19, 13.94s/it] 84%|████████▍ | 8429/10000 [33:05:33<6:04:14, 13.91s/it] {'loss': 0.003, 'learning_rate': 7.9e-06, 'epoch': 11.03} 84%|████████▍ | 8429/10000 [33:05:33<6:04:14, 13.91s/it] 84%|████████▍ | 8430/10000 [33:05:47<6:03:33, 13.89s/it] {'loss': 0.0028, 'learning_rate': 7.895000000000001e-06, 'epoch': 11.03} 84%|████████▍ | 8430/10000 [33:05:47<6:03:33, 13.89s/it] 84%|████████▍ | 8431/10000 [33:06:01<6:02:46, 13.87s/it] {'loss': 0.0035, 'learning_rate': 7.89e-06, 'epoch': 11.04} 84%|████████▍ | 8431/10000 [33:06:01<6:02:46, 13.87s/it] 84%|████████▍ | 8432/10000 [33:06:15<6:02:53, 13.89s/it] {'loss': 0.0028, 'learning_rate': 7.885e-06, 'epoch': 11.04} 84%|████████▍ | 8432/10000 [33:06:15<6:02:53, 13.89s/it] 84%|████████▍ | 8433/10000 [33:06:28<6:02:49, 13.89s/it] {'loss': 0.0022, 'learning_rate': 7.879999999999999e-06, 'epoch': 11.04} 84%|████████▍ | 8433/10000 [33:06:29<6:02:49, 13.89s/it] 84%|████████▍ | 8434/10000 [33:06:42<6:03:24, 13.92s/it] {'loss': 0.0024, 'learning_rate': 7.875e-06, 'epoch': 11.04} 84%|████████▍ | 8434/10000 [33:06:43<6:03:24, 13.92s/it] 84%|████████▍ | 8435/10000 [33:06:56<6:02:55, 13.91s/it] {'loss': 0.0032, 'learning_rate': 7.870000000000001e-06, 'epoch': 11.04} 84%|████████▍ | 8435/10000 [33:06:56<6:02:55, 13.91s/it] 84%|████████▍ | 8436/10000 [33:07:10<6:02:44, 13.92s/it] {'loss': 0.0017, 'learning_rate': 7.865e-06, 'epoch': 11.04} 84%|████████▍ | 8436/10000 [33:07:10<6:02:44, 13.92s/it] 84%|████████▍ | 8437/10000 [33:07:24<6:01:55, 13.89s/it] {'loss': 0.0024, 'learning_rate': 7.860000000000001e-06, 'epoch': 11.04} 84%|████████▍ | 8437/10000 [33:07:24<6:01:55, 13.89s/it] 84%|████████▍ | 8438/10000 [33:07:38<6:01:23, 13.88s/it] {'loss': 0.0041, 'learning_rate': 7.855e-06, 'epoch': 11.04} 84%|████████▍ | 8438/10000 [33:07:38<6:01:23, 13.88s/it] 84%|████████▍ | 8439/10000 [33:07:52<6:01:01, 13.88s/it] {'loss': 0.0055, 'learning_rate': 7.850000000000001e-06, 'epoch': 11.05} 84%|████████▍ | 8439/10000 [33:07:52<6:01:01, 13.88s/it] 84%|████████▍ | 8440/10000 [33:08:06<6:00:37, 13.87s/it] {'loss': 0.0027, 'learning_rate': 7.845e-06, 'epoch': 11.05} 84%|████████▍ | 8440/10000 [33:08:06<6:00:37, 13.87s/it] 84%|████████▍ | 8441/10000 [33:08:20<6:00:25, 13.87s/it] {'loss': 0.0028, 'learning_rate': 7.84e-06, 'epoch': 11.05} 84%|████████▍ | 8441/10000 [33:08:20<6:00:25, 13.87s/it] 84%|████████▍ | 8442/10000 [33:08:33<6:00:07, 13.87s/it] {'loss': 0.0038, 'learning_rate': 7.835e-06, 'epoch': 11.05} 84%|████████▍ | 8442/10000 [33:08:33<6:00:07, 13.87s/it] 84%|████████▍ | 8443/10000 [33:08:47<5:59:52, 13.87s/it] {'loss': 0.0037, 'learning_rate': 7.83e-06, 'epoch': 11.05} 84%|████████▍ | 8443/10000 [33:08:47<5:59:52, 13.87s/it] 84%|████████▍ | 8444/10000 [33:09:01<6:00:03, 13.88s/it] {'loss': 0.0032, 'learning_rate': 7.825e-06, 'epoch': 11.05} 84%|████████▍ | 8444/10000 [33:09:01<6:00:03, 13.88s/it] 84%|████████▍ | 8445/10000 [33:09:15<6:00:34, 13.91s/it] {'loss': 0.0028, 'learning_rate': 7.820000000000001e-06, 'epoch': 11.05} 84%|████████▍ | 8445/10000 [33:09:15<6:00:34, 13.91s/it] 84%|████████▍ | 8446/10000 [33:09:29<6:01:05, 13.94s/it] {'loss': 0.0035, 'learning_rate': 7.815e-06, 'epoch': 11.05} 84%|████████▍ | 8446/10000 [33:09:29<6:01:05, 13.94s/it] 84%|████████▍ | 8447/10000 [33:09:43<6:00:09, 13.91s/it] {'loss': 0.0024, 'learning_rate': 7.810000000000001e-06, 'epoch': 11.06} 84%|████████▍ | 8447/10000 [33:09:43<6:00:09, 13.91s/it] 84%|████████▍ | 8448/10000 [33:09:57<5:59:41, 13.91s/it] {'loss': 0.0027, 'learning_rate': 7.805e-06, 'epoch': 11.06} 84%|████████▍ | 8448/10000 [33:09:57<5:59:41, 13.91s/it] 84%|████████▍ | 8449/10000 [33:10:11<5:58:54, 13.88s/it] {'loss': 0.0021, 'learning_rate': 7.8e-06, 'epoch': 11.06} 84%|████████▍ | 8449/10000 [33:10:11<5:58:54, 13.88s/it] 84%|████████▍ | 8450/10000 [33:10:25<5:59:12, 13.90s/it] {'loss': 0.0023, 'learning_rate': 7.795e-06, 'epoch': 11.06} 84%|████████▍ | 8450/10000 [33:10:25<5:59:12, 13.90s/it] 85%|████████▍ | 8451/10000 [33:10:39<6:00:49, 13.98s/it] {'loss': 0.0025, 'learning_rate': 7.79e-06, 'epoch': 11.06} 85%|████████▍ | 8451/10000 [33:10:39<6:00:49, 13.98s/it] 85%|████████▍ | 8452/10000 [33:10:53<6:00:10, 13.96s/it] {'loss': 0.0025, 'learning_rate': 7.785000000000001e-06, 'epoch': 11.06} 85%|████████▍ | 8452/10000 [33:10:53<6:00:10, 13.96s/it] 85%|████████▍ | 8453/10000 [33:11:07<6:00:10, 13.97s/it] {'loss': 0.0032, 'learning_rate': 7.78e-06, 'epoch': 11.06} 85%|████████▍ | 8453/10000 [33:11:07<6:00:10, 13.97s/it] 85%|████████▍ | 8454/10000 [33:11:21<5:59:11, 13.94s/it] {'loss': 0.002, 'learning_rate': 7.775000000000001e-06, 'epoch': 11.07} 85%|████████▍ | 8454/10000 [33:11:21<5:59:11, 13.94s/it] 85%|████████▍ | 8455/10000 [33:11:35<5:59:04, 13.94s/it] {'loss': 0.0046, 'learning_rate': 7.77e-06, 'epoch': 11.07} 85%|████████▍ | 8455/10000 [33:11:35<5:59:04, 13.94s/it] 85%|████████▍ | 8456/10000 [33:11:49<5:58:34, 13.93s/it] {'loss': 0.0023, 'learning_rate': 7.765e-06, 'epoch': 11.07} 85%|████████▍ | 8456/10000 [33:11:49<5:58:34, 13.93s/it] 85%|████████▍ | 8457/10000 [33:12:02<5:58:11, 13.93s/it] {'loss': 0.0028, 'learning_rate': 7.76e-06, 'epoch': 11.07} 85%|████████▍ | 8457/10000 [33:12:03<5:58:11, 13.93s/it] 85%|████████▍ | 8458/10000 [33:12:16<5:56:58, 13.89s/it] {'loss': 0.0027, 'learning_rate': 7.755e-06, 'epoch': 11.07} 85%|████████▍ | 8458/10000 [33:12:16<5:56:58, 13.89s/it] 85%|████████▍ | 8459/10000 [33:12:30<5:57:13, 13.91s/it] {'loss': 0.0022, 'learning_rate': 7.75e-06, 'epoch': 11.07} 85%|████████▍ | 8459/10000 [33:12:30<5:57:13, 13.91s/it] 85%|████████▍ | 8460/10000 [33:12:44<5:56:23, 13.89s/it] {'loss': 0.0041, 'learning_rate': 7.745000000000001e-06, 'epoch': 11.07} 85%|████████▍ | 8460/10000 [33:12:44<5:56:23, 13.89s/it] 85%|████████▍ | 8461/10000 [33:12:58<5:56:06, 13.88s/it] {'loss': 0.0021, 'learning_rate': 7.74e-06, 'epoch': 11.07} 85%|████████▍ | 8461/10000 [33:12:58<5:56:06, 13.88s/it] 85%|████████▍ | 8462/10000 [33:13:12<5:56:58, 13.93s/it] {'loss': 0.0021, 'learning_rate': 7.735000000000001e-06, 'epoch': 11.08} 85%|████████▍ | 8462/10000 [33:13:12<5:56:58, 13.93s/it] 85%|████████▍ | 8463/10000 [33:13:26<5:56:00, 13.90s/it] {'loss': 0.0044, 'learning_rate': 7.73e-06, 'epoch': 11.08} 85%|████████▍ | 8463/10000 [33:13:26<5:56:00, 13.90s/it] 85%|████████▍ | 8464/10000 [33:13:40<5:55:29, 13.89s/it] {'loss': 0.0022, 'learning_rate': 7.725e-06, 'epoch': 11.08} 85%|████████▍ | 8464/10000 [33:13:40<5:55:29, 13.89s/it] 85%|████████▍ | 8465/10000 [33:13:53<5:54:50, 13.87s/it] {'loss': 0.0036, 'learning_rate': 7.72e-06, 'epoch': 11.08} 85%|████████▍ | 8465/10000 [33:13:54<5:54:50, 13.87s/it] 85%|████████▍ | 8466/10000 [33:14:07<5:55:36, 13.91s/it] {'loss': 0.0029, 'learning_rate': 7.715e-06, 'epoch': 11.08} 85%|████████▍ | 8466/10000 [33:14:08<5:55:36, 13.91s/it] 85%|████████▍ | 8467/10000 [33:14:21<5:55:25, 13.91s/it] {'loss': 0.0034, 'learning_rate': 7.71e-06, 'epoch': 11.08} 85%|████████▍ | 8467/10000 [33:14:21<5:55:25, 13.91s/it] 85%|████████▍ | 8468/10000 [33:14:35<5:54:27, 13.88s/it] {'loss': 0.0028, 'learning_rate': 7.705e-06, 'epoch': 11.08} 85%|████████▍ | 8468/10000 [33:14:35<5:54:27, 13.88s/it] 85%|████████▍ | 8469/10000 [33:14:49<5:55:09, 13.92s/it] {'loss': 0.0026, 'learning_rate': 7.7e-06, 'epoch': 11.09} 85%|████████▍ | 8469/10000 [33:14:49<5:55:09, 13.92s/it] 85%|████████▍ | 8470/10000 [33:15:03<5:53:46, 13.87s/it] {'loss': 0.0019, 'learning_rate': 7.695e-06, 'epoch': 11.09} 85%|████████▍ | 8470/10000 [33:15:03<5:53:46, 13.87s/it] 85%|████████▍ | 8471/10000 [33:15:17<5:54:41, 13.92s/it] {'loss': 0.0028, 'learning_rate': 7.69e-06, 'epoch': 11.09} 85%|████████▍ | 8471/10000 [33:15:17<5:54:41, 13.92s/it] 85%|████████▍ | 8472/10000 [33:15:31<5:55:03, 13.94s/it] {'loss': 0.0024, 'learning_rate': 7.685e-06, 'epoch': 11.09} 85%|████████▍ | 8472/10000 [33:15:31<5:55:03, 13.94s/it] 85%|████████▍ | 8473/10000 [33:15:45<5:54:33, 13.93s/it] {'loss': 0.0024, 'learning_rate': 7.68e-06, 'epoch': 11.09} 85%|████████▍ | 8473/10000 [33:15:45<5:54:33, 13.93s/it] 85%|████████▍ | 8474/10000 [33:15:59<5:53:18, 13.89s/it] {'loss': 0.0031, 'learning_rate': 7.675e-06, 'epoch': 11.09} 85%|████████▍ | 8474/10000 [33:15:59<5:53:18, 13.89s/it] 85%|████████▍ | 8475/10000 [33:16:13<5:52:28, 13.87s/it] {'loss': 0.0019, 'learning_rate': 7.670000000000001e-06, 'epoch': 11.09} 85%|████████▍ | 8475/10000 [33:16:13<5:52:28, 13.87s/it] 85%|████████▍ | 8476/10000 [33:16:26<5:52:26, 13.88s/it] {'loss': 0.0029, 'learning_rate': 7.665e-06, 'epoch': 11.09} 85%|████████▍ | 8476/10000 [33:16:26<5:52:26, 13.88s/it] 85%|████████▍ | 8477/10000 [33:16:40<5:51:35, 13.85s/it] {'loss': 0.0027, 'learning_rate': 7.660000000000001e-06, 'epoch': 11.1} 85%|████████▍ | 8477/10000 [33:16:40<5:51:35, 13.85s/it] 85%|████████▍ | 8478/10000 [33:16:54<5:50:35, 13.82s/it] {'loss': 0.003, 'learning_rate': 7.655e-06, 'epoch': 11.1} 85%|████████▍ | 8478/10000 [33:16:54<5:50:35, 13.82s/it] 85%|████████▍ | 8479/10000 [33:17:08<5:51:45, 13.88s/it] {'loss': 0.0036, 'learning_rate': 7.65e-06, 'epoch': 11.1} 85%|████████▍ | 8479/10000 [33:17:08<5:51:45, 13.88s/it] 85%|████████▍ | 8480/10000 [33:17:22<5:51:56, 13.89s/it] {'loss': 0.0032, 'learning_rate': 7.645e-06, 'epoch': 11.1} 85%|████████▍ | 8480/10000 [33:17:22<5:51:56, 13.89s/it] 85%|████████▍ | 8481/10000 [33:17:36<5:51:44, 13.89s/it] {'loss': 0.0018, 'learning_rate': 7.64e-06, 'epoch': 11.1} 85%|████████▍ | 8481/10000 [33:17:36<5:51:44, 13.89s/it] 85%|████████▍ | 8482/10000 [33:17:50<5:50:50, 13.87s/it] {'loss': 0.0031, 'learning_rate': 7.635e-06, 'epoch': 11.1} 85%|████████▍ | 8482/10000 [33:17:50<5:50:50, 13.87s/it] 85%|████████▍ | 8483/10000 [33:18:04<5:50:59, 13.88s/it] {'loss': 0.0038, 'learning_rate': 7.630000000000001e-06, 'epoch': 11.1} 85%|████████▍ | 8483/10000 [33:18:04<5:50:59, 13.88s/it] 85%|████████▍ | 8484/10000 [33:18:18<5:51:48, 13.92s/it] {'loss': 0.0024, 'learning_rate': 7.625e-06, 'epoch': 11.1} 85%|████████▍ | 8484/10000 [33:18:18<5:51:48, 13.92s/it] 85%|████████▍ | 8485/10000 [33:18:31<5:51:15, 13.91s/it] {'loss': 0.0023, 'learning_rate': 7.620000000000001e-06, 'epoch': 11.11} 85%|████████▍ | 8485/10000 [33:18:31<5:51:15, 13.91s/it] 85%|████████▍ | 8486/10000 [33:18:45<5:50:12, 13.88s/it] {'loss': 0.0025, 'learning_rate': 7.615e-06, 'epoch': 11.11} 85%|████████▍ | 8486/10000 [33:18:45<5:50:12, 13.88s/it] 85%|████████▍ | 8487/10000 [33:18:59<5:50:17, 13.89s/it] {'loss': 0.0038, 'learning_rate': 7.610000000000001e-06, 'epoch': 11.11} 85%|████████▍ | 8487/10000 [33:18:59<5:50:17, 13.89s/it] 85%|████████▍ | 8488/10000 [33:19:13<5:50:00, 13.89s/it] {'loss': 0.003, 'learning_rate': 7.605000000000001e-06, 'epoch': 11.11} 85%|████████▍ | 8488/10000 [33:19:13<5:50:00, 13.89s/it] 85%|████████▍ | 8489/10000 [33:19:27<5:48:58, 13.86s/it] {'loss': 0.0038, 'learning_rate': 7.6e-06, 'epoch': 11.11} 85%|████████▍ | 8489/10000 [33:19:27<5:48:58, 13.86s/it] 85%|████████▍ | 8490/10000 [33:19:41<5:50:11, 13.92s/it] {'loss': 0.0029, 'learning_rate': 7.595000000000001e-06, 'epoch': 11.11} 85%|████████▍ | 8490/10000 [33:19:41<5:50:11, 13.92s/it] 85%|████████▍ | 8491/10000 [33:19:55<5:48:50, 13.87s/it] {'loss': 0.0043, 'learning_rate': 7.59e-06, 'epoch': 11.11} 85%|████████▍ | 8491/10000 [33:19:55<5:48:50, 13.87s/it] 85%|████████▍ | 8492/10000 [33:20:08<5:48:37, 13.87s/it] {'loss': 0.0028, 'learning_rate': 7.585e-06, 'epoch': 11.12} 85%|████████▍ | 8492/10000 [33:20:09<5:48:37, 13.87s/it] 85%|████████▍ | 8493/10000 [33:20:22<5:48:27, 13.87s/it] {'loss': 0.0034, 'learning_rate': 7.580000000000001e-06, 'epoch': 11.12} 85%|████████▍ | 8493/10000 [33:20:22<5:48:27, 13.87s/it] 85%|████████▍ | 8494/10000 [33:20:36<5:49:35, 13.93s/it] {'loss': 0.0036, 'learning_rate': 7.575e-06, 'epoch': 11.12} 85%|████████▍ | 8494/10000 [33:20:36<5:49:35, 13.93s/it] 85%|████████▍ | 8495/10000 [33:20:50<5:49:44, 13.94s/it] {'loss': 0.0024, 'learning_rate': 7.57e-06, 'epoch': 11.12} 85%|████████▍ | 8495/10000 [33:20:50<5:49:44, 13.94s/it] 85%|████████▍ | 8496/10000 [33:21:04<5:49:26, 13.94s/it] {'loss': 0.0031, 'learning_rate': 7.5649999999999996e-06, 'epoch': 11.12} 85%|████████▍ | 8496/10000 [33:21:04<5:49:26, 13.94s/it] 85%|████████▍ | 8497/10000 [33:21:18<5:48:55, 13.93s/it] {'loss': 0.0037, 'learning_rate': 7.5600000000000005e-06, 'epoch': 11.12} 85%|████████▍ | 8497/10000 [33:21:18<5:48:55, 13.93s/it] 85%|████████▍ | 8498/10000 [33:21:32<5:48:57, 13.94s/it] {'loss': 0.0033, 'learning_rate': 7.555000000000001e-06, 'epoch': 11.12} 85%|████████▍ | 8498/10000 [33:21:32<5:48:57, 13.94s/it] 85%|████████▍ | 8499/10000 [33:21:46<5:48:20, 13.92s/it] {'loss': 0.0023, 'learning_rate': 7.55e-06, 'epoch': 11.12} 85%|████████▍ | 8499/10000 [33:21:46<5:48:20, 13.92s/it] 85%|████████▌ | 8500/10000 [33:22:00<5:46:57, 13.88s/it] {'loss': 0.0017, 'learning_rate': 7.545000000000001e-06, 'epoch': 11.13} 85%|████████▌ | 8500/10000 [33:22:00<5:46:57, 13.88s/it] 85%|████████▌ | 8501/10000 [33:22:14<5:46:35, 13.87s/it] {'loss': 0.004, 'learning_rate': 7.54e-06, 'epoch': 11.13} 85%|████████▌ | 8501/10000 [33:22:14<5:46:35, 13.87s/it] 85%|████████▌ | 8502/10000 [33:22:28<5:46:41, 13.89s/it] {'loss': 0.0036, 'learning_rate': 7.535000000000001e-06, 'epoch': 11.13} 85%|████████▌ | 8502/10000 [33:22:28<5:46:41, 13.89s/it] 85%|████████▌ | 8503/10000 [33:22:42<5:46:37, 13.89s/it] {'loss': 0.0022, 'learning_rate': 7.530000000000001e-06, 'epoch': 11.13} 85%|████████▌ | 8503/10000 [33:22:42<5:46:37, 13.89s/it] 85%|████████▌ | 8504/10000 [33:22:55<5:45:52, 13.87s/it] {'loss': 0.0033, 'learning_rate': 7.525e-06, 'epoch': 11.13} 85%|████████▌ | 8504/10000 [33:22:55<5:45:52, 13.87s/it] 85%|████████▌ | 8505/10000 [33:23:09<5:45:16, 13.86s/it] {'loss': 0.0028, 'learning_rate': 7.520000000000001e-06, 'epoch': 11.13} 85%|████████▌ | 8505/10000 [33:23:09<5:45:16, 13.86s/it] 85%|████████▌ | 8506/10000 [33:23:23<5:46:33, 13.92s/it] {'loss': 0.002, 'learning_rate': 7.515e-06, 'epoch': 11.13} 85%|████████▌ | 8506/10000 [33:23:23<5:46:33, 13.92s/it] 85%|████████▌ | 8507/10000 [33:23:37<5:46:01, 13.91s/it] {'loss': 0.0021, 'learning_rate': 7.51e-06, 'epoch': 11.13} 85%|████████▌ | 8507/10000 [33:23:37<5:46:01, 13.91s/it] 85%|████████▌ | 8508/10000 [33:23:51<5:45:51, 13.91s/it] {'loss': 0.0029, 'learning_rate': 7.505000000000001e-06, 'epoch': 11.14} 85%|████████▌ | 8508/10000 [33:23:51<5:45:51, 13.91s/it] 85%|████████▌ | 8509/10000 [33:24:05<5:45:37, 13.91s/it] {'loss': 0.0026, 'learning_rate': 7.5e-06, 'epoch': 11.14} 85%|████████▌ | 8509/10000 [33:24:05<5:45:37, 13.91s/it] 85%|████████▌ | 8510/10000 [33:24:19<5:45:46, 13.92s/it] {'loss': 0.0024, 'learning_rate': 7.495e-06, 'epoch': 11.14} 85%|████████▌ | 8510/10000 [33:24:19<5:45:46, 13.92s/it] 85%|████████▌ | 8511/10000 [33:24:33<5:45:32, 13.92s/it] {'loss': 0.0043, 'learning_rate': 7.4899999999999994e-06, 'epoch': 11.14} 85%|████████▌ | 8511/10000 [33:24:33<5:45:32, 13.92s/it] 85%|████████▌ | 8512/10000 [33:24:47<5:45:26, 13.93s/it] {'loss': 0.0021, 'learning_rate': 7.485e-06, 'epoch': 11.14} 85%|████████▌ | 8512/10000 [33:24:47<5:45:26, 13.93s/it] 85%|████████▌ | 8513/10000 [33:25:01<5:45:11, 13.93s/it] {'loss': 0.0029, 'learning_rate': 7.480000000000001e-06, 'epoch': 11.14} 85%|████████▌ | 8513/10000 [33:25:01<5:45:11, 13.93s/it] 85%|████████▌ | 8514/10000 [33:25:15<5:45:40, 13.96s/it] {'loss': 0.0035, 'learning_rate': 7.4750000000000004e-06, 'epoch': 11.14} 85%|████████▌ | 8514/10000 [33:25:15<5:45:40, 13.96s/it] 85%|████████▌ | 8515/10000 [33:25:29<5:45:20, 13.95s/it] {'loss': 0.0035, 'learning_rate': 7.4700000000000005e-06, 'epoch': 11.15} 85%|████████▌ | 8515/10000 [33:25:29<5:45:20, 13.95s/it] 85%|████████▌ | 8516/10000 [33:25:43<5:44:24, 13.93s/it] {'loss': 0.0018, 'learning_rate': 7.465e-06, 'epoch': 11.15} 85%|████████▌ | 8516/10000 [33:25:43<5:44:24, 13.93s/it] 85%|████████▌ | 8517/10000 [33:25:56<5:43:58, 13.92s/it] {'loss': 0.0037, 'learning_rate': 7.4600000000000006e-06, 'epoch': 11.15} 85%|████████▌ | 8517/10000 [33:25:56<5:43:58, 13.92s/it] 85%|████████▌ | 8518/10000 [33:26:10<5:43:26, 13.90s/it] {'loss': 0.0041, 'learning_rate': 7.455000000000001e-06, 'epoch': 11.15} 85%|████████▌ | 8518/10000 [33:26:10<5:43:26, 13.90s/it] 85%|████████▌ | 8519/10000 [33:26:24<5:42:36, 13.88s/it] {'loss': 0.0031, 'learning_rate': 7.45e-06, 'epoch': 11.15} 85%|████████▌ | 8519/10000 [33:26:24<5:42:36, 13.88s/it] 85%|████████▌ | 8520/10000 [33:26:38<5:41:49, 13.86s/it] {'loss': 0.0018, 'learning_rate': 7.445000000000001e-06, 'epoch': 11.15} 85%|████████▌ | 8520/10000 [33:26:38<5:41:49, 13.86s/it] 85%|████████▌ | 8521/10000 [33:26:52<5:40:34, 13.82s/it] {'loss': 0.002, 'learning_rate': 7.44e-06, 'epoch': 11.15} 85%|████████▌ | 8521/10000 [33:26:52<5:40:34, 13.82s/it] 85%|████████▌ | 8522/10000 [33:27:05<5:40:01, 13.80s/it] {'loss': 0.0025, 'learning_rate': 7.435e-06, 'epoch': 11.15} 85%|████████▌ | 8522/10000 [33:27:05<5:40:01, 13.80s/it] 85%|████████▌ | 8523/10000 [33:27:19<5:39:57, 13.81s/it] {'loss': 0.0027, 'learning_rate': 7.430000000000001e-06, 'epoch': 11.16} 85%|████████▌ | 8523/10000 [33:27:19<5:39:57, 13.81s/it] 85%|████████▌ | 8524/10000 [33:27:33<5:40:56, 13.86s/it] {'loss': 0.0041, 'learning_rate': 7.425e-06, 'epoch': 11.16} 85%|████████▌ | 8524/10000 [33:27:33<5:40:56, 13.86s/it] 85%|████████▌ | 8525/10000 [33:27:47<5:39:23, 13.81s/it] {'loss': 0.0024, 'learning_rate': 7.420000000000001e-06, 'epoch': 11.16} 85%|████████▌ | 8525/10000 [33:27:47<5:39:23, 13.81s/it] 85%|████████▌ | 8526/10000 [33:28:01<5:39:37, 13.82s/it] {'loss': 0.0035, 'learning_rate': 7.414999999999999e-06, 'epoch': 11.16} 85%|████████▌ | 8526/10000 [33:28:01<5:39:37, 13.82s/it] 85%|████████▌ | 8527/10000 [33:28:15<5:39:48, 13.84s/it] {'loss': 0.0035, 'learning_rate': 7.41e-06, 'epoch': 11.16} 85%|████████▌ | 8527/10000 [33:28:15<5:39:48, 13.84s/it] 85%|████████▌ | 8528/10000 [33:28:29<5:40:01, 13.86s/it] {'loss': 0.0024, 'learning_rate': 7.405000000000001e-06, 'epoch': 11.16} 85%|████████▌ | 8528/10000 [33:28:29<5:40:01, 13.86s/it] 85%|████████▌ | 8529/10000 [33:28:42<5:39:59, 13.87s/it] {'loss': 0.0038, 'learning_rate': 7.4e-06, 'epoch': 11.16} 85%|████████▌ | 8529/10000 [33:28:43<5:39:59, 13.87s/it] 85%|████████▌ | 8530/10000 [33:28:56<5:39:28, 13.86s/it] {'loss': 0.003, 'learning_rate': 7.395e-06, 'epoch': 11.16} 85%|████████▌ | 8530/10000 [33:28:56<5:39:28, 13.86s/it] 85%|████████▌ | 8531/10000 [33:29:10<5:40:25, 13.90s/it] {'loss': 0.0029, 'learning_rate': 7.3899999999999995e-06, 'epoch': 11.17} 85%|████████▌ | 8531/10000 [33:29:10<5:40:25, 13.90s/it] 85%|████████▌ | 8532/10000 [33:29:24<5:40:23, 13.91s/it] {'loss': 0.0022, 'learning_rate': 7.3850000000000004e-06, 'epoch': 11.17} 85%|████████▌ | 8532/10000 [33:29:24<5:40:23, 13.91s/it] 85%|████████▌ | 8533/10000 [33:29:38<5:39:10, 13.87s/it] {'loss': 0.0022, 'learning_rate': 7.3800000000000005e-06, 'epoch': 11.17} 85%|████████▌ | 8533/10000 [33:29:38<5:39:10, 13.87s/it] 85%|████████▌ | 8534/10000 [33:29:52<5:39:37, 13.90s/it] {'loss': 0.0036, 'learning_rate': 7.375e-06, 'epoch': 11.17} 85%|████████▌ | 8534/10000 [33:29:52<5:39:37, 13.90s/it] 85%|████████▌ | 8535/10000 [33:30:06<5:38:13, 13.85s/it] {'loss': 0.002, 'learning_rate': 7.370000000000001e-06, 'epoch': 11.17} 85%|████████▌ | 8535/10000 [33:30:06<5:38:13, 13.85s/it] 85%|████████▌ | 8536/10000 [33:30:20<5:37:38, 13.84s/it] {'loss': 0.0029, 'learning_rate': 7.365e-06, 'epoch': 11.17} 85%|████████▌ | 8536/10000 [33:30:20<5:37:38, 13.84s/it] 85%|████████▌ | 8537/10000 [33:30:33<5:37:33, 13.84s/it] {'loss': 0.0018, 'learning_rate': 7.36e-06, 'epoch': 11.17} 85%|████████▌ | 8537/10000 [33:30:33<5:37:33, 13.84s/it] 85%|████████▌ | 8538/10000 [33:30:47<5:36:51, 13.82s/it] {'loss': 0.0027, 'learning_rate': 7.355000000000001e-06, 'epoch': 11.18} 85%|████████▌ | 8538/10000 [33:30:47<5:36:51, 13.82s/it] 85%|████████▌ | 8539/10000 [33:31:01<5:36:58, 13.84s/it] {'loss': 0.0028, 'learning_rate': 7.35e-06, 'epoch': 11.18} 85%|████████▌ | 8539/10000 [33:31:01<5:36:58, 13.84s/it] 85%|████████▌ | 8540/10000 [33:31:15<5:36:25, 13.83s/it] {'loss': 0.0032, 'learning_rate': 7.345000000000001e-06, 'epoch': 11.18} 85%|████████▌ | 8540/10000 [33:31:15<5:36:25, 13.83s/it] 85%|████████▌ | 8541/10000 [33:31:29<5:36:24, 13.83s/it] {'loss': 0.0019, 'learning_rate': 7.340000000000001e-06, 'epoch': 11.18} 85%|████████▌ | 8541/10000 [33:31:29<5:36:24, 13.83s/it] 85%|████████▌ | 8542/10000 [33:31:43<5:36:57, 13.87s/it] {'loss': 0.0027, 'learning_rate': 7.335e-06, 'epoch': 11.18} 85%|████████▌ | 8542/10000 [33:31:43<5:36:57, 13.87s/it] 85%|████████▌ | 8543/10000 [33:31:57<5:37:01, 13.88s/it] {'loss': 0.002, 'learning_rate': 7.330000000000001e-06, 'epoch': 11.18} 85%|████████▌ | 8543/10000 [33:31:57<5:37:01, 13.88s/it] 85%|████████▌ | 8544/10000 [33:32:10<5:36:23, 13.86s/it] {'loss': 0.0021, 'learning_rate': 7.325e-06, 'epoch': 11.18} 85%|████████▌ | 8544/10000 [33:32:10<5:36:23, 13.86s/it] 85%|████████▌ | 8545/10000 [33:32:24<5:36:27, 13.87s/it] {'loss': 0.0026, 'learning_rate': 7.32e-06, 'epoch': 11.18} 85%|████████▌ | 8545/10000 [33:32:24<5:36:27, 13.87s/it] 85%|████████▌ | 8546/10000 [33:32:38<5:35:45, 13.85s/it] {'loss': 0.0021, 'learning_rate': 7.315000000000001e-06, 'epoch': 11.19} 85%|████████▌ | 8546/10000 [33:32:38<5:35:45, 13.85s/it] 85%|████████▌ | 8547/10000 [33:32:52<5:35:26, 13.85s/it] {'loss': 0.0018, 'learning_rate': 7.31e-06, 'epoch': 11.19} 85%|████████▌ | 8547/10000 [33:32:52<5:35:26, 13.85s/it] 85%|████████▌ | 8548/10000 [33:33:06<5:36:01, 13.89s/it] {'loss': 0.003, 'learning_rate': 7.305e-06, 'epoch': 11.19} 85%|████████▌ | 8548/10000 [33:33:06<5:36:01, 13.89s/it] 85%|████████▌ | 8549/10000 [33:33:20<5:35:12, 13.86s/it] {'loss': 0.0018, 'learning_rate': 7.2999999999999996e-06, 'epoch': 11.19} 85%|████████▌ | 8549/10000 [33:33:20<5:35:12, 13.86s/it] 86%|████████▌ | 8550/10000 [33:33:33<5:34:28, 13.84s/it] {'loss': 0.0029, 'learning_rate': 7.2950000000000005e-06, 'epoch': 11.19} 86%|████████▌ | 8550/10000 [33:33:34<5:34:28, 13.84s/it] 86%|████████▌ | 8551/10000 [33:33:47<5:34:38, 13.86s/it] {'loss': 0.0031, 'learning_rate': 7.290000000000001e-06, 'epoch': 11.19} 86%|████████▌ | 8551/10000 [33:33:47<5:34:38, 13.86s/it] 86%|████████▌ | 8552/10000 [33:34:01<5:34:36, 13.87s/it] {'loss': 0.0022, 'learning_rate': 7.2850000000000006e-06, 'epoch': 11.19} 86%|████████▌ | 8552/10000 [33:34:01<5:34:36, 13.87s/it] 86%|████████▌ | 8553/10000 [33:34:15<5:33:17, 13.82s/it] {'loss': 0.0025, 'learning_rate': 7.280000000000001e-06, 'epoch': 11.2} 86%|████████▌ | 8553/10000 [33:34:15<5:33:17, 13.82s/it] 86%|████████▌ | 8554/10000 [33:34:29<5:33:51, 13.85s/it] {'loss': 0.0024, 'learning_rate': 7.275e-06, 'epoch': 11.2} 86%|████████▌ | 8554/10000 [33:34:29<5:33:51, 13.85s/it] 86%|████████▌ | 8555/10000 [33:34:43<5:33:23, 13.84s/it] {'loss': 0.0017, 'learning_rate': 7.270000000000001e-06, 'epoch': 11.2} 86%|████████▌ | 8555/10000 [33:34:43<5:33:23, 13.84s/it] 86%|████████▌ | 8556/10000 [33:34:57<5:34:43, 13.91s/it] {'loss': 0.0027, 'learning_rate': 7.265000000000001e-06, 'epoch': 11.2} 86%|████████▌ | 8556/10000 [33:34:57<5:34:43, 13.91s/it] 86%|████████▌ | 8557/10000 [33:35:11<5:33:44, 13.88s/it] {'loss': 0.0026, 'learning_rate': 7.26e-06, 'epoch': 11.2} 86%|████████▌ | 8557/10000 [33:35:11<5:33:44, 13.88s/it] 86%|████████▌ | 8558/10000 [33:35:24<5:33:21, 13.87s/it] {'loss': 0.0027, 'learning_rate': 7.255000000000001e-06, 'epoch': 11.2} 86%|████████▌ | 8558/10000 [33:35:24<5:33:21, 13.87s/it] 86%|████████▌ | 8559/10000 [33:35:38<5:33:39, 13.89s/it] {'loss': 0.0031, 'learning_rate': 7.25e-06, 'epoch': 11.2} 86%|████████▌ | 8559/10000 [33:35:38<5:33:39, 13.89s/it] 86%|████████▌ | 8560/10000 [33:35:52<5:33:26, 13.89s/it] {'loss': 0.0025, 'learning_rate': 7.245e-06, 'epoch': 11.2} 86%|████████▌ | 8560/10000 [33:35:52<5:33:26, 13.89s/it] 86%|████████▌ | 8561/10000 [33:36:06<5:33:00, 13.88s/it] {'loss': 0.0022, 'learning_rate': 7.240000000000001e-06, 'epoch': 11.21} 86%|████████▌ | 8561/10000 [33:36:06<5:33:00, 13.88s/it] 86%|████████▌ | 8562/10000 [33:36:20<5:32:15, 13.86s/it] {'loss': 0.0022, 'learning_rate': 7.235e-06, 'epoch': 11.21} 86%|████████▌ | 8562/10000 [33:36:20<5:32:15, 13.86s/it] 86%|████████▌ | 8563/10000 [33:36:34<5:32:49, 13.90s/it] {'loss': 0.0022, 'learning_rate': 7.230000000000001e-06, 'epoch': 11.21} 86%|████████▌ | 8563/10000 [33:36:34<5:32:49, 13.90s/it] 86%|████████▌ | 8564/10000 [33:36:48<5:33:35, 13.94s/it] {'loss': 0.0027, 'learning_rate': 7.2249999999999994e-06, 'epoch': 11.21} 86%|████████▌ | 8564/10000 [33:36:48<5:33:35, 13.94s/it] 86%|████████▌ | 8565/10000 [33:37:02<5:32:42, 13.91s/it] {'loss': 0.0018, 'learning_rate': 7.22e-06, 'epoch': 11.21} 86%|████████▌ | 8565/10000 [33:37:02<5:32:42, 13.91s/it] 86%|████████▌ | 8566/10000 [33:37:16<5:33:05, 13.94s/it] {'loss': 0.0026, 'learning_rate': 7.215000000000001e-06, 'epoch': 11.21} 86%|████████▌ | 8566/10000 [33:37:16<5:33:05, 13.94s/it] 86%|████████▌ | 8567/10000 [33:37:30<5:32:37, 13.93s/it] {'loss': 0.0033, 'learning_rate': 7.2100000000000004e-06, 'epoch': 11.21} 86%|████████▌ | 8567/10000 [33:37:30<5:32:37, 13.93s/it] 86%|████████▌ | 8568/10000 [33:37:44<5:32:32, 13.93s/it] {'loss': 0.0039, 'learning_rate': 7.2050000000000005e-06, 'epoch': 11.21} 86%|████████▌ | 8568/10000 [33:37:44<5:32:32, 13.93s/it] 86%|████████▌ | 8569/10000 [33:37:58<5:31:56, 13.92s/it] {'loss': 0.0028, 'learning_rate': 7.2e-06, 'epoch': 11.22} 86%|████████▌ | 8569/10000 [33:37:58<5:31:56, 13.92s/it] 86%|████████▌ | 8570/10000 [33:38:11<5:31:44, 13.92s/it] {'loss': 0.0034, 'learning_rate': 7.1950000000000006e-06, 'epoch': 11.22} 86%|████████▌ | 8570/10000 [33:38:12<5:31:44, 13.92s/it] 86%|████████▌ | 8571/10000 [33:38:26<5:32:39, 13.97s/it] {'loss': 0.0015, 'learning_rate': 7.190000000000001e-06, 'epoch': 11.22} 86%|████████▌ | 8571/10000 [33:38:26<5:32:39, 13.97s/it] 86%|████████▌ | 8572/10000 [33:38:39<5:31:20, 13.92s/it] {'loss': 0.0018, 'learning_rate': 7.185e-06, 'epoch': 11.22} 86%|████████▌ | 8572/10000 [33:38:39<5:31:20, 13.92s/it] 86%|████████▌ | 8573/10000 [33:38:53<5:30:57, 13.92s/it] {'loss': 0.0038, 'learning_rate': 7.180000000000001e-06, 'epoch': 11.22} 86%|████████▌ | 8573/10000 [33:38:53<5:30:57, 13.92s/it] 86%|████████▌ | 8574/10000 [33:39:07<5:30:12, 13.89s/it] {'loss': 0.0027, 'learning_rate': 7.175e-06, 'epoch': 11.22} 86%|████████▌ | 8574/10000 [33:39:07<5:30:12, 13.89s/it] 86%|████████▌ | 8575/10000 [33:39:21<5:30:40, 13.92s/it] {'loss': 0.0017, 'learning_rate': 7.17e-06, 'epoch': 11.22} 86%|████████▌ | 8575/10000 [33:39:21<5:30:40, 13.92s/it] 86%|████████▌ | 8576/10000 [33:39:35<5:30:18, 13.92s/it] {'loss': 0.005, 'learning_rate': 7.165000000000001e-06, 'epoch': 11.23} 86%|████████▌ | 8576/10000 [33:39:35<5:30:18, 13.92s/it] 86%|████████▌ | 8577/10000 [33:39:49<5:29:52, 13.91s/it] {'loss': 0.0026, 'learning_rate': 7.16e-06, 'epoch': 11.23} 86%|████████▌ | 8577/10000 [33:39:49<5:29:52, 13.91s/it] 86%|████████▌ | 8578/10000 [33:40:03<5:29:51, 13.92s/it] {'loss': 0.0039, 'learning_rate': 7.155000000000001e-06, 'epoch': 11.23} 86%|████████▌ | 8578/10000 [33:40:03<5:29:51, 13.92s/it] 86%|████████▌ | 8579/10000 [33:40:17<5:29:55, 13.93s/it] {'loss': 0.0032, 'learning_rate': 7.15e-06, 'epoch': 11.23} 86%|████████▌ | 8579/10000 [33:40:17<5:29:55, 13.93s/it] 86%|████████▌ | 8580/10000 [33:40:31<5:29:28, 13.92s/it] {'loss': 0.0024, 'learning_rate': 7.145e-06, 'epoch': 11.23} 86%|████████▌ | 8580/10000 [33:40:31<5:29:28, 13.92s/it] 86%|████████▌ | 8581/10000 [33:40:45<5:29:53, 13.95s/it] {'loss': 0.0041, 'learning_rate': 7.140000000000001e-06, 'epoch': 11.23} 86%|████████▌ | 8581/10000 [33:40:45<5:29:53, 13.95s/it] 86%|████████▌ | 8582/10000 [33:40:59<5:29:37, 13.95s/it] {'loss': 0.0032, 'learning_rate': 7.135e-06, 'epoch': 11.23} 86%|████████▌ | 8582/10000 [33:40:59<5:29:37, 13.95s/it] 86%|████████▌ | 8583/10000 [33:41:12<5:28:25, 13.91s/it] {'loss': 0.0015, 'learning_rate': 7.13e-06, 'epoch': 11.23} 86%|████████▌ | 8583/10000 [33:41:13<5:28:25, 13.91s/it] 86%|████████▌ | 8584/10000 [33:41:26<5:27:43, 13.89s/it] {'loss': 0.003, 'learning_rate': 7.1249999999999995e-06, 'epoch': 11.24} 86%|████████▌ | 8584/10000 [33:41:26<5:27:43, 13.89s/it] 86%|████████▌ | 8585/10000 [33:41:40<5:27:21, 13.88s/it] {'loss': 0.0046, 'learning_rate': 7.1200000000000004e-06, 'epoch': 11.24} 86%|████████▌ | 8585/10000 [33:41:40<5:27:21, 13.88s/it] 86%|████████▌ | 8586/10000 [33:41:54<5:26:52, 13.87s/it] {'loss': 0.0034, 'learning_rate': 7.1150000000000005e-06, 'epoch': 11.24} 86%|████████▌ | 8586/10000 [33:41:54<5:26:52, 13.87s/it] 86%|████████▌ | 8587/10000 [33:42:08<5:27:02, 13.89s/it] {'loss': 0.0023, 'learning_rate': 7.11e-06, 'epoch': 11.24} 86%|████████▌ | 8587/10000 [33:42:08<5:27:02, 13.89s/it] 86%|████████▌ | 8588/10000 [33:42:22<5:26:59, 13.89s/it] {'loss': 0.0022, 'learning_rate': 7.105000000000001e-06, 'epoch': 11.24} 86%|████████▌ | 8588/10000 [33:42:22<5:26:59, 13.89s/it] 86%|████████▌ | 8589/10000 [33:42:36<5:26:52, 13.90s/it] {'loss': 0.003, 'learning_rate': 7.1e-06, 'epoch': 11.24} 86%|████████▌ | 8589/10000 [33:42:36<5:26:52, 13.90s/it] 86%|████████▌ | 8590/10000 [33:42:50<5:26:40, 13.90s/it] {'loss': 0.0021, 'learning_rate': 7.095000000000001e-06, 'epoch': 11.24} 86%|████████▌ | 8590/10000 [33:42:50<5:26:40, 13.90s/it] 86%|████████▌ | 8591/10000 [33:43:04<5:26:31, 13.90s/it] {'loss': 0.0025, 'learning_rate': 7.090000000000001e-06, 'epoch': 11.24} 86%|████████▌ | 8591/10000 [33:43:04<5:26:31, 13.90s/it] 86%|████████▌ | 8592/10000 [33:43:17<5:25:55, 13.89s/it] {'loss': 0.0033, 'learning_rate': 7.085e-06, 'epoch': 11.25} 86%|████████▌ | 8592/10000 [33:43:17<5:25:55, 13.89s/it] 86%|████████▌ | 8593/10000 [33:43:31<5:25:35, 13.88s/it] {'loss': 0.0026, 'learning_rate': 7.080000000000001e-06, 'epoch': 11.25} 86%|████████▌ | 8593/10000 [33:43:31<5:25:35, 13.88s/it] 86%|████████▌ | 8594/10000 [33:43:45<5:25:08, 13.88s/it] {'loss': 0.0034, 'learning_rate': 7.075e-06, 'epoch': 11.25} 86%|████████▌ | 8594/10000 [33:43:45<5:25:08, 13.88s/it] 86%|████████▌ | 8595/10000 [33:43:59<5:24:37, 13.86s/it] {'loss': 0.0029, 'learning_rate': 7.07e-06, 'epoch': 11.25} 86%|████████▌ | 8595/10000 [33:43:59<5:24:37, 13.86s/it] 86%|████████▌ | 8596/10000 [33:44:13<5:25:16, 13.90s/it] {'loss': 0.0025, 'learning_rate': 7.065000000000001e-06, 'epoch': 11.25} 86%|████████▌ | 8596/10000 [33:44:13<5:25:16, 13.90s/it] 86%|████████▌ | 8597/10000 [33:44:27<5:24:30, 13.88s/it] {'loss': 0.0034, 'learning_rate': 7.06e-06, 'epoch': 11.25} 86%|████████▌ | 8597/10000 [33:44:27<5:24:30, 13.88s/it] 86%|████████▌ | 8598/10000 [33:44:41<5:23:57, 13.86s/it] {'loss': 0.0032, 'learning_rate': 7.055e-06, 'epoch': 11.25} 86%|████████▌ | 8598/10000 [33:44:41<5:23:57, 13.86s/it] 86%|████████▌ | 8599/10000 [33:44:55<5:24:43, 13.91s/it] {'loss': 0.002, 'learning_rate': 7.049999999999999e-06, 'epoch': 11.26} 86%|████████▌ | 8599/10000 [33:44:55<5:24:43, 13.91s/it] 86%|████████▌ | 8600/10000 [33:45:08<5:24:04, 13.89s/it] {'loss': 0.0034, 'learning_rate': 7.045e-06, 'epoch': 11.26} 86%|████████▌ | 8600/10000 [33:45:09<5:24:04, 13.89s/it] 86%|████████▌ | 8601/10000 [33:45:23<5:25:15, 13.95s/it] {'loss': 0.0026, 'learning_rate': 7.04e-06, 'epoch': 11.26} 86%|████████▌ | 8601/10000 [33:45:23<5:25:15, 13.95s/it] 86%|████████▌ | 8602/10000 [33:45:36<5:23:26, 13.88s/it] {'loss': 0.0039, 'learning_rate': 7.0349999999999996e-06, 'epoch': 11.26} 86%|████████▌ | 8602/10000 [33:45:36<5:23:26, 13.88s/it] 86%|████████▌ | 8603/10000 [33:45:50<5:22:50, 13.87s/it] {'loss': 0.0032, 'learning_rate': 7.0300000000000005e-06, 'epoch': 11.26} 86%|████████▌ | 8603/10000 [33:45:50<5:22:50, 13.87s/it] 86%|████████▌ | 8604/10000 [33:46:04<5:22:11, 13.85s/it] {'loss': 0.0037, 'learning_rate': 7.025000000000001e-06, 'epoch': 11.26} 86%|████████▌ | 8604/10000 [33:46:04<5:22:11, 13.85s/it] 86%|████████▌ | 8605/10000 [33:46:18<5:21:58, 13.85s/it] {'loss': 0.0021, 'learning_rate': 7.0200000000000006e-06, 'epoch': 11.26} 86%|████████▌ | 8605/10000 [33:46:18<5:21:58, 13.85s/it] 86%|████████▌ | 8606/10000 [33:46:32<5:22:09, 13.87s/it] {'loss': 0.0022, 'learning_rate': 7.015000000000001e-06, 'epoch': 11.26} 86%|████████▌ | 8606/10000 [33:46:32<5:22:09, 13.87s/it] 86%|████████▌ | 8607/10000 [33:46:45<5:21:22, 13.84s/it] {'loss': 0.0018, 'learning_rate': 7.01e-06, 'epoch': 11.27} 86%|████████▌ | 8607/10000 [33:46:46<5:21:22, 13.84s/it] 86%|████████▌ | 8608/10000 [33:46:59<5:21:36, 13.86s/it] {'loss': 0.0024, 'learning_rate': 7.005000000000001e-06, 'epoch': 11.27} 86%|████████▌ | 8608/10000 [33:46:59<5:21:36, 13.86s/it] 86%|████████▌ | 8609/10000 [33:47:13<5:23:03, 13.93s/it] {'loss': 0.0027, 'learning_rate': 7.000000000000001e-06, 'epoch': 11.27} 86%|████████▌ | 8609/10000 [33:47:14<5:23:03, 13.93s/it] 86%|████████▌ | 8610/10000 [33:47:27<5:22:31, 13.92s/it] {'loss': 0.0032, 'learning_rate': 6.995e-06, 'epoch': 11.27} 86%|████████▌ | 8610/10000 [33:47:27<5:22:31, 13.92s/it] 86%|████████▌ | 8611/10000 [33:47:41<5:22:12, 13.92s/it] {'loss': 0.0033, 'learning_rate': 6.990000000000001e-06, 'epoch': 11.27} 86%|████████▌ | 8611/10000 [33:47:41<5:22:12, 13.92s/it] 86%|████████▌ | 8612/10000 [33:47:55<5:21:42, 13.91s/it] {'loss': 0.0043, 'learning_rate': 6.985e-06, 'epoch': 11.27} 86%|████████▌ | 8612/10000 [33:47:55<5:21:42, 13.91s/it] 86%|████████▌ | 8613/10000 [33:48:09<5:21:00, 13.89s/it] {'loss': 0.0043, 'learning_rate': 6.98e-06, 'epoch': 11.27} 86%|████████▌ | 8613/10000 [33:48:09<5:21:00, 13.89s/it] 86%|████████▌ | 8614/10000 [33:48:23<5:20:44, 13.88s/it] {'loss': 0.0019, 'learning_rate': 6.975000000000001e-06, 'epoch': 11.27} 86%|████████▌ | 8614/10000 [33:48:23<5:20:44, 13.88s/it] 86%|████████▌ | 8615/10000 [33:48:37<5:19:56, 13.86s/it] {'loss': 0.0016, 'learning_rate': 6.97e-06, 'epoch': 11.28} 86%|████████▌ | 8615/10000 [33:48:37<5:19:56, 13.86s/it] 86%|████████▌ | 8616/10000 [33:48:51<5:19:34, 13.85s/it] {'loss': 0.0036, 'learning_rate': 6.965000000000001e-06, 'epoch': 11.28} 86%|████████▌ | 8616/10000 [33:48:51<5:19:34, 13.85s/it] 86%|████████▌ | 8617/10000 [33:49:04<5:18:49, 13.83s/it] {'loss': 0.0034, 'learning_rate': 6.9599999999999994e-06, 'epoch': 11.28} 86%|████████▌ | 8617/10000 [33:49:04<5:18:49, 13.83s/it] 86%|████████▌ | 8618/10000 [33:49:18<5:19:12, 13.86s/it] {'loss': 0.002, 'learning_rate': 6.955e-06, 'epoch': 11.28} 86%|████████▌ | 8618/10000 [33:49:18<5:19:12, 13.86s/it] 86%|████████▌ | 8619/10000 [33:49:32<5:18:31, 13.84s/it] {'loss': 0.0022, 'learning_rate': 6.950000000000001e-06, 'epoch': 11.28} 86%|████████▌ | 8619/10000 [33:49:32<5:18:31, 13.84s/it] 86%|████████▌ | 8620/10000 [33:49:46<5:18:07, 13.83s/it] {'loss': 0.003, 'learning_rate': 6.945e-06, 'epoch': 11.28} 86%|████████▌ | 8620/10000 [33:49:46<5:18:07, 13.83s/it] 86%|████████▌ | 8621/10000 [33:50:00<5:18:15, 13.85s/it] {'loss': 0.0018, 'learning_rate': 6.9400000000000005e-06, 'epoch': 11.28} 86%|████████▌ | 8621/10000 [33:50:00<5:18:15, 13.85s/it] 86%|████████▌ | 8622/10000 [33:50:14<5:18:12, 13.86s/it] {'loss': 0.0024, 'learning_rate': 6.935e-06, 'epoch': 11.29} 86%|████████▌ | 8622/10000 [33:50:14<5:18:12, 13.86s/it] 86%|████████▌ | 8623/10000 [33:50:28<5:18:22, 13.87s/it] {'loss': 0.0045, 'learning_rate': 6.9300000000000006e-06, 'epoch': 11.29} 86%|████████▌ | 8623/10000 [33:50:28<5:18:22, 13.87s/it] 86%|████████▌ | 8624/10000 [33:50:41<5:17:43, 13.85s/it] {'loss': 0.003, 'learning_rate': 6.925000000000001e-06, 'epoch': 11.29} 86%|████████▌ | 8624/10000 [33:50:41<5:17:43, 13.85s/it] 86%|████████▋ | 8625/10000 [33:50:55<5:17:27, 13.85s/it] {'loss': 0.0048, 'learning_rate': 6.92e-06, 'epoch': 11.29} 86%|████████▋ | 8625/10000 [33:50:55<5:17:27, 13.85s/it] 86%|████████▋ | 8626/10000 [33:51:09<5:16:32, 13.82s/it] {'loss': 0.002, 'learning_rate': 6.915000000000001e-06, 'epoch': 11.29} 86%|████████▋ | 8626/10000 [33:51:09<5:16:32, 13.82s/it] 86%|████████▋ | 8627/10000 [33:51:23<5:16:15, 13.82s/it] {'loss': 0.0019, 'learning_rate': 6.91e-06, 'epoch': 11.29} 86%|████████▋ | 8627/10000 [33:51:23<5:16:15, 13.82s/it] 86%|████████▋ | 8628/10000 [33:51:37<5:16:31, 13.84s/it] {'loss': 0.0027, 'learning_rate': 6.905e-06, 'epoch': 11.29} 86%|████████▋ | 8628/10000 [33:51:37<5:16:31, 13.84s/it] 86%|████████▋ | 8629/10000 [33:51:51<5:17:38, 13.90s/it] {'loss': 0.0024, 'learning_rate': 6.900000000000001e-06, 'epoch': 11.29} 86%|████████▋ | 8629/10000 [33:51:51<5:17:38, 13.90s/it] 86%|████████▋ | 8630/10000 [33:52:05<5:17:34, 13.91s/it] {'loss': 0.0028, 'learning_rate': 6.895e-06, 'epoch': 11.3} 86%|████████▋ | 8630/10000 [33:52:05<5:17:34, 13.91s/it] 86%|████████▋ | 8631/10000 [33:52:19<5:18:01, 13.94s/it] {'loss': 0.0024, 'learning_rate': 6.890000000000001e-06, 'epoch': 11.3} 86%|████████▋ | 8631/10000 [33:52:19<5:18:01, 13.94s/it] 86%|████████▋ | 8632/10000 [33:52:33<5:17:29, 13.93s/it] {'loss': 0.0032, 'learning_rate': 6.885e-06, 'epoch': 11.3} 86%|████████▋ | 8632/10000 [33:52:33<5:17:29, 13.93s/it] 86%|████████▋ | 8633/10000 [33:52:46<5:16:38, 13.90s/it] {'loss': 0.0025, 'learning_rate': 6.88e-06, 'epoch': 11.3} 86%|████████▋ | 8633/10000 [33:52:46<5:16:38, 13.90s/it] 86%|████████▋ | 8634/10000 [33:53:00<5:16:24, 13.90s/it] {'loss': 0.0037, 'learning_rate': 6.875000000000001e-06, 'epoch': 11.3} 86%|████████▋ | 8634/10000 [33:53:00<5:16:24, 13.90s/it] 86%|████████▋ | 8635/10000 [33:53:14<5:15:50, 13.88s/it] {'loss': 0.0039, 'learning_rate': 6.87e-06, 'epoch': 11.3} 86%|████████▋ | 8635/10000 [33:53:14<5:15:50, 13.88s/it] 86%|████████▋ | 8636/10000 [33:53:28<5:16:14, 13.91s/it] {'loss': 0.0023, 'learning_rate': 6.865e-06, 'epoch': 11.3} 86%|████████▋ | 8636/10000 [33:53:28<5:16:14, 13.91s/it] 86%|████████▋ | 8637/10000 [33:53:42<5:15:44, 13.90s/it] {'loss': 0.0039, 'learning_rate': 6.8599999999999995e-06, 'epoch': 11.3} 86%|████████▋ | 8637/10000 [33:53:42<5:15:44, 13.90s/it] 86%|████████▋ | 8638/10000 [33:53:56<5:14:59, 13.88s/it] {'loss': 0.0026, 'learning_rate': 6.8550000000000004e-06, 'epoch': 11.31} 86%|████████▋ | 8638/10000 [33:53:56<5:14:59, 13.88s/it] 86%|████████▋ | 8639/10000 [33:54:10<5:16:03, 13.93s/it] {'loss': 0.002, 'learning_rate': 6.8500000000000005e-06, 'epoch': 11.31} 86%|████████▋ | 8639/10000 [33:54:10<5:16:03, 13.93s/it] 86%|████████▋ | 8640/10000 [33:54:24<5:15:31, 13.92s/it] {'loss': 0.0024, 'learning_rate': 6.845e-06, 'epoch': 11.31} 86%|████████▋ | 8640/10000 [33:54:24<5:15:31, 13.92s/it] 86%|████████▋ | 8641/10000 [33:54:38<5:15:25, 13.93s/it] {'loss': 0.0027, 'learning_rate': 6.840000000000001e-06, 'epoch': 11.31} 86%|████████▋ | 8641/10000 [33:54:38<5:15:25, 13.93s/it] 86%|████████▋ | 8642/10000 [33:54:52<5:15:07, 13.92s/it] {'loss': 0.0038, 'learning_rate': 6.835e-06, 'epoch': 11.31} 86%|████████▋ | 8642/10000 [33:54:52<5:15:07, 13.92s/it] 86%|████████▋ | 8643/10000 [33:55:05<5:13:40, 13.87s/it] {'loss': 0.0043, 'learning_rate': 6.830000000000001e-06, 'epoch': 11.31} 86%|████████▋ | 8643/10000 [33:55:05<5:13:40, 13.87s/it] 86%|████████▋ | 8644/10000 [33:55:19<5:14:14, 13.90s/it] {'loss': 0.0027, 'learning_rate': 6.825000000000001e-06, 'epoch': 11.31} 86%|████████▋ | 8644/10000 [33:55:19<5:14:14, 13.90s/it] 86%|████████▋ | 8645/10000 [33:55:33<5:14:27, 13.92s/it] {'loss': 0.0014, 'learning_rate': 6.82e-06, 'epoch': 11.32} 86%|████████▋ | 8645/10000 [33:55:33<5:14:27, 13.92s/it] 86%|████████▋ | 8646/10000 [33:55:47<5:14:20, 13.93s/it] {'loss': 0.0026, 'learning_rate': 6.815000000000001e-06, 'epoch': 11.32} 86%|████████▋ | 8646/10000 [33:55:47<5:14:20, 13.93s/it] 86%|████████▋ | 8647/10000 [33:56:01<5:12:47, 13.87s/it] {'loss': 0.0026, 'learning_rate': 6.81e-06, 'epoch': 11.32} 86%|████████▋ | 8647/10000 [33:56:01<5:12:47, 13.87s/it] 86%|████████▋ | 8648/10000 [33:56:15<5:12:44, 13.88s/it] {'loss': 0.0012, 'learning_rate': 6.805e-06, 'epoch': 11.32} 86%|████████▋ | 8648/10000 [33:56:15<5:12:44, 13.88s/it] 86%|████████▋ | 8649/10000 [33:56:29<5:11:57, 13.85s/it] {'loss': 0.0016, 'learning_rate': 6.800000000000001e-06, 'epoch': 11.32} 86%|████████▋ | 8649/10000 [33:56:29<5:11:57, 13.85s/it] 86%|████████▋ | 8650/10000 [33:56:43<5:12:54, 13.91s/it] {'loss': 0.0016, 'learning_rate': 6.795e-06, 'epoch': 11.32} 86%|████████▋ | 8650/10000 [33:56:43<5:12:54, 13.91s/it] 87%|████████▋ | 8651/10000 [33:56:57<5:12:27, 13.90s/it] {'loss': 0.0034, 'learning_rate': 6.79e-06, 'epoch': 11.32} 87%|████████▋ | 8651/10000 [33:56:57<5:12:27, 13.90s/it] 87%|████████▋ | 8652/10000 [33:57:11<5:13:09, 13.94s/it] {'loss': 0.0015, 'learning_rate': 6.784999999999999e-06, 'epoch': 11.32} 87%|████████▋ | 8652/10000 [33:57:11<5:13:09, 13.94s/it] 87%|████████▋ | 8653/10000 [33:57:24<5:12:40, 13.93s/it] {'loss': 0.0023, 'learning_rate': 6.78e-06, 'epoch': 11.33} 87%|████████▋ | 8653/10000 [33:57:25<5:12:40, 13.93s/it] 87%|████████▋ | 8654/10000 [33:57:38<5:11:48, 13.90s/it] {'loss': 0.0025, 'learning_rate': 6.775000000000001e-06, 'epoch': 11.33} 87%|████████▋ | 8654/10000 [33:57:38<5:11:48, 13.90s/it] 87%|████████▋ | 8655/10000 [33:57:52<5:11:43, 13.91s/it] {'loss': 0.003, 'learning_rate': 6.7699999999999996e-06, 'epoch': 11.33} 87%|████████▋ | 8655/10000 [33:57:52<5:11:43, 13.91s/it] 87%|████████▋ | 8656/10000 [33:58:06<5:10:56, 13.88s/it] {'loss': 0.0028, 'learning_rate': 6.7650000000000005e-06, 'epoch': 11.33} 87%|████████▋ | 8656/10000 [33:58:06<5:10:56, 13.88s/it] 87%|████████▋ | 8657/10000 [33:58:20<5:10:15, 13.86s/it] {'loss': 0.0029, 'learning_rate': 6.76e-06, 'epoch': 11.33} 87%|████████▋ | 8657/10000 [33:58:20<5:10:15, 13.86s/it] 87%|████████▋ | 8658/10000 [33:58:34<5:10:09, 13.87s/it] {'loss': 0.0018, 'learning_rate': 6.7550000000000005e-06, 'epoch': 11.33} 87%|████████▋ | 8658/10000 [33:58:34<5:10:09, 13.87s/it] 87%|████████▋ | 8659/10000 [33:58:48<5:09:29, 13.85s/it] {'loss': 0.0026, 'learning_rate': 6.750000000000001e-06, 'epoch': 11.33} 87%|████████▋ | 8659/10000 [33:58:48<5:09:29, 13.85s/it] 87%|████████▋ | 8660/10000 [33:59:01<5:09:23, 13.85s/it] {'loss': 0.0024, 'learning_rate': 6.745e-06, 'epoch': 11.34} 87%|████████▋ | 8660/10000 [33:59:01<5:09:23, 13.85s/it] 87%|████████▋ | 8661/10000 [33:59:15<5:09:03, 13.85s/it] {'loss': 0.0032, 'learning_rate': 6.740000000000001e-06, 'epoch': 11.34} 87%|████████▋ | 8661/10000 [33:59:15<5:09:03, 13.85s/it] 87%|████████▋ | 8662/10000 [33:59:29<5:08:48, 13.85s/it] {'loss': 0.0054, 'learning_rate': 6.735e-06, 'epoch': 11.34} 87%|████████▋ | 8662/10000 [33:59:29<5:08:48, 13.85s/it] 87%|████████▋ | 8663/10000 [33:59:43<5:07:47, 13.81s/it] {'loss': 0.003, 'learning_rate': 6.73e-06, 'epoch': 11.34} 87%|████████▋ | 8663/10000 [33:59:43<5:07:47, 13.81s/it] 87%|████████▋ | 8664/10000 [33:59:57<5:09:22, 13.89s/it] {'loss': 0.0022, 'learning_rate': 6.725000000000001e-06, 'epoch': 11.34} 87%|████████▋ | 8664/10000 [33:59:57<5:09:22, 13.89s/it] 87%|████████▋ | 8665/10000 [34:00:11<5:08:51, 13.88s/it] {'loss': 0.0027, 'learning_rate': 6.72e-06, 'epoch': 11.34} 87%|████████▋ | 8665/10000 [34:00:11<5:08:51, 13.88s/it] 87%|████████▋ | 8666/10000 [34:00:25<5:08:44, 13.89s/it] {'loss': 0.0032, 'learning_rate': 6.715e-06, 'epoch': 11.34} 87%|████████▋ | 8666/10000 [34:00:25<5:08:44, 13.89s/it] 87%|████████▋ | 8667/10000 [34:00:39<5:08:20, 13.88s/it] {'loss': 0.0027, 'learning_rate': 6.710000000000001e-06, 'epoch': 11.34} 87%|████████▋ | 8667/10000 [34:00:39<5:08:20, 13.88s/it] 87%|████████▋ | 8668/10000 [34:00:52<5:07:16, 13.84s/it] {'loss': 0.0025, 'learning_rate': 6.705e-06, 'epoch': 11.35} 87%|████████▋ | 8668/10000 [34:00:52<5:07:16, 13.84s/it] 87%|████████▋ | 8669/10000 [34:01:06<5:08:42, 13.92s/it] {'loss': 0.0021, 'learning_rate': 6.700000000000001e-06, 'epoch': 11.35} 87%|████████▋ | 8669/10000 [34:01:06<5:08:42, 13.92s/it] 87%|████████▋ | 8670/10000 [34:01:20<5:08:17, 13.91s/it] {'loss': 0.0025, 'learning_rate': 6.695e-06, 'epoch': 11.35} 87%|████████▋ | 8670/10000 [34:01:20<5:08:17, 13.91s/it] 87%|████████▋ | 8671/10000 [34:01:34<5:07:37, 13.89s/it] {'loss': 0.0089, 'learning_rate': 6.69e-06, 'epoch': 11.35} 87%|████████▋ | 8671/10000 [34:01:34<5:07:37, 13.89s/it] 87%|████████▋ | 8672/10000 [34:01:48<5:07:26, 13.89s/it] {'loss': 0.0031, 'learning_rate': 6.685000000000001e-06, 'epoch': 11.35} 87%|████████▋ | 8672/10000 [34:01:48<5:07:26, 13.89s/it] 87%|████████▋ | 8673/10000 [34:02:02<5:06:16, 13.85s/it] {'loss': 0.0027, 'learning_rate': 6.68e-06, 'epoch': 11.35} 87%|████████▋ | 8673/10000 [34:02:02<5:06:16, 13.85s/it] 87%|████████▋ | 8674/10000 [34:02:16<5:06:29, 13.87s/it] {'loss': 0.0022, 'learning_rate': 6.6750000000000005e-06, 'epoch': 11.35} 87%|████████▋ | 8674/10000 [34:02:16<5:06:29, 13.87s/it] 87%|████████▋ | 8675/10000 [34:02:29<5:05:58, 13.86s/it] {'loss': 0.0018, 'learning_rate': 6.67e-06, 'epoch': 11.35} 87%|████████▋ | 8675/10000 [34:02:30<5:05:58, 13.86s/it] 87%|████████▋ | 8676/10000 [34:02:43<5:05:51, 13.86s/it] {'loss': 0.0024, 'learning_rate': 6.6650000000000006e-06, 'epoch': 11.36} 87%|████████▋ | 8676/10000 [34:02:43<5:05:51, 13.86s/it] 87%|████████▋ | 8677/10000 [34:02:57<5:05:57, 13.88s/it] {'loss': 0.0018, 'learning_rate': 6.660000000000001e-06, 'epoch': 11.36} 87%|████████▋ | 8677/10000 [34:02:57<5:05:57, 13.88s/it][2024-11-05 06:21:18,443] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 87%|████████▋ | 8678/10000 [34:03:10<4:59:21, 13.59s/it] {'loss': 0.0029, 'learning_rate': 6.660000000000001e-06, 'epoch': 11.36} 87%|████████▋ | 8678/10000 [34:03:10<4:59:21, 13.59s/it][2024-11-05 06:21:31,349] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 87%|████████▋ | 8679/10000 [34:03:23<4:54:45, 13.39s/it] {'loss': 0.0027, 'learning_rate': 6.660000000000001e-06, 'epoch': 11.36} 87%|████████▋ | 8679/10000 [34:03:23<4:54:45, 13.39s/it] 87%|████████▋ | 8680/10000 [34:03:37<4:57:59, 13.55s/it] {'loss': 0.0033, 'learning_rate': 6.655e-06, 'epoch': 11.36} 87%|████████▋ | 8680/10000 [34:03:37<4:57:59, 13.55s/it] 87%|████████▋ | 8681/10000 [34:03:51<4:59:42, 13.63s/it] {'loss': 0.0031, 'learning_rate': 6.650000000000001e-06, 'epoch': 11.36} 87%|████████▋ | 8681/10000 [34:03:51<4:59:42, 13.63s/it] 87%|████████▋ | 8682/10000 [34:04:05<5:00:44, 13.69s/it] {'loss': 0.0023, 'learning_rate': 6.645e-06, 'epoch': 11.36} 87%|████████▋ | 8682/10000 [34:04:05<5:00:44, 13.69s/it] 87%|████████▋ | 8683/10000 [34:04:19<5:02:25, 13.78s/it] {'loss': 0.0025, 'learning_rate': 6.640000000000001e-06, 'epoch': 11.37} 87%|████████▋ | 8683/10000 [34:04:19<5:02:25, 13.78s/it] 87%|████████▋ | 8684/10000 [34:04:33<5:02:45, 13.80s/it] {'loss': 0.0023, 'learning_rate': 6.635000000000001e-06, 'epoch': 11.37} 87%|████████▋ | 8684/10000 [34:04:33<5:02:45, 13.80s/it] 87%|████████▋ | 8685/10000 [34:04:46<5:03:17, 13.84s/it] {'loss': 0.0014, 'learning_rate': 6.63e-06, 'epoch': 11.37} 87%|████████▋ | 8685/10000 [34:04:46<5:03:17, 13.84s/it] 87%|████████▋ | 8686/10000 [34:05:00<5:02:31, 13.81s/it] {'loss': 0.0024, 'learning_rate': 6.625000000000001e-06, 'epoch': 11.37} 87%|████████▋ | 8686/10000 [34:05:00<5:02:31, 13.81s/it] 87%|████████▋ | 8687/10000 [34:05:14<5:03:13, 13.86s/it] {'loss': 0.0028, 'learning_rate': 6.62e-06, 'epoch': 11.37} 87%|████████▋ | 8687/10000 [34:05:14<5:03:13, 13.86s/it] 87%|████████▋ | 8688/10000 [34:05:28<5:02:45, 13.85s/it] {'loss': 0.0027, 'learning_rate': 6.615e-06, 'epoch': 11.37} 87%|████████▋ | 8688/10000 [34:05:28<5:02:45, 13.85s/it] 87%|████████▋ | 8689/10000 [34:05:42<5:02:51, 13.86s/it] {'loss': 0.0017, 'learning_rate': 6.610000000000001e-06, 'epoch': 11.37} 87%|████████▋ | 8689/10000 [34:05:42<5:02:51, 13.86s/it] 87%|████████▋ | 8690/10000 [34:05:56<5:02:15, 13.84s/it] {'loss': 0.0031, 'learning_rate': 6.605e-06, 'epoch': 11.37} 87%|████████▋ | 8690/10000 [34:05:56<5:02:15, 13.84s/it] 87%|████████▋ | 8691/10000 [34:06:10<5:02:08, 13.85s/it] {'loss': 0.0068, 'learning_rate': 6.6e-06, 'epoch': 11.38} 87%|████████▋ | 8691/10000 [34:06:10<5:02:08, 13.85s/it] 87%|████████▋ | 8692/10000 [34:06:23<5:01:57, 13.85s/it] {'loss': 0.003, 'learning_rate': 6.5949999999999995e-06, 'epoch': 11.38} 87%|████████▋ | 8692/10000 [34:06:23<5:01:57, 13.85s/it] 87%|████████▋ | 8693/10000 [34:06:37<5:02:07, 13.87s/it] {'loss': 0.003, 'learning_rate': 6.5900000000000004e-06, 'epoch': 11.38} 87%|████████▋ | 8693/10000 [34:06:37<5:02:07, 13.87s/it] 87%|████████▋ | 8694/10000 [34:06:51<5:02:07, 13.88s/it] {'loss': 0.0026, 'learning_rate': 6.5850000000000005e-06, 'epoch': 11.38} 87%|████████▋ | 8694/10000 [34:06:51<5:02:07, 13.88s/it] 87%|████████▋ | 8695/10000 [34:07:05<5:02:13, 13.90s/it] {'loss': 0.0025, 'learning_rate': 6.58e-06, 'epoch': 11.38} 87%|████████▋ | 8695/10000 [34:07:05<5:02:13, 13.90s/it] 87%|████████▋ | 8696/10000 [34:07:19<5:01:25, 13.87s/it] {'loss': 0.0031, 'learning_rate': 6.5750000000000006e-06, 'epoch': 11.38} 87%|████████▋ | 8696/10000 [34:07:19<5:01:25, 13.87s/it] 87%|████████▋ | 8697/10000 [34:07:33<5:02:26, 13.93s/it] {'loss': 0.0026, 'learning_rate': 6.57e-06, 'epoch': 11.38} 87%|████████▋ | 8697/10000 [34:07:33<5:02:26, 13.93s/it] 87%|████████▋ | 8698/10000 [34:07:47<5:02:08, 13.92s/it] {'loss': 0.0045, 'learning_rate': 6.565000000000001e-06, 'epoch': 11.38} 87%|████████▋ | 8698/10000 [34:07:47<5:02:08, 13.92s/it] 87%|████████▋ | 8699/10000 [34:08:01<5:00:26, 13.86s/it] {'loss': 0.0024, 'learning_rate': 6.560000000000001e-06, 'epoch': 11.39} 87%|████████▋ | 8699/10000 [34:08:01<5:00:26, 13.86s/it] 87%|████████▋ | 8700/10000 [34:08:15<5:00:28, 13.87s/it] {'loss': 0.0034, 'learning_rate': 6.555e-06, 'epoch': 11.39} 87%|████████▋ | 8700/10000 [34:08:15<5:00:28, 13.87s/it] 87%|████████▋ | 8701/10000 [34:08:28<5:00:13, 13.87s/it] {'loss': 0.009, 'learning_rate': 6.550000000000001e-06, 'epoch': 11.39} 87%|████████▋ | 8701/10000 [34:08:28<5:00:13, 13.87s/it] 87%|████████▋ | 8702/10000 [34:08:42<5:00:34, 13.89s/it] {'loss': 0.0021, 'learning_rate': 6.545e-06, 'epoch': 11.39} 87%|████████▋ | 8702/10000 [34:08:42<5:00:34, 13.89s/it] 87%|████████▋ | 8703/10000 [34:08:56<5:00:05, 13.88s/it] {'loss': 0.0055, 'learning_rate': 6.54e-06, 'epoch': 11.39} 87%|████████▋ | 8703/10000 [34:08:56<5:00:05, 13.88s/it] 87%|████████▋ | 8704/10000 [34:09:10<4:59:16, 13.86s/it] {'loss': 0.0034, 'learning_rate': 6.535000000000001e-06, 'epoch': 11.39} 87%|████████▋ | 8704/10000 [34:09:10<4:59:16, 13.86s/it] 87%|████████▋ | 8705/10000 [34:09:24<4:59:47, 13.89s/it] {'loss': 0.0036, 'learning_rate': 6.53e-06, 'epoch': 11.39} 87%|████████▋ | 8705/10000 [34:09:24<4:59:47, 13.89s/it] 87%|████████▋ | 8706/10000 [34:09:38<5:00:36, 13.94s/it] {'loss': 0.0036, 'learning_rate': 6.525e-06, 'epoch': 11.4} 87%|████████▋ | 8706/10000 [34:09:38<5:00:36, 13.94s/it] 87%|████████▋ | 8707/10000 [34:09:52<4:59:05, 13.88s/it] {'loss': 0.0023, 'learning_rate': 6.519999999999999e-06, 'epoch': 11.4} 87%|████████▋ | 8707/10000 [34:09:52<4:59:05, 13.88s/it] 87%|████████▋ | 8708/10000 [34:10:06<4:57:59, 13.84s/it] {'loss': 0.0017, 'learning_rate': 6.515e-06, 'epoch': 11.4} 87%|████████▋ | 8708/10000 [34:10:06<4:57:59, 13.84s/it] 87%|████████▋ | 8709/10000 [34:10:20<4:59:36, 13.92s/it] {'loss': 0.0032, 'learning_rate': 6.510000000000001e-06, 'epoch': 11.4} 87%|████████▋ | 8709/10000 [34:10:20<4:59:36, 13.92s/it] 87%|████████▋ | 8710/10000 [34:10:34<4:59:28, 13.93s/it] {'loss': 0.0031, 'learning_rate': 6.505e-06, 'epoch': 11.4} 87%|████████▋ | 8710/10000 [34:10:34<4:59:28, 13.93s/it] 87%|████████▋ | 8711/10000 [34:10:47<4:59:08, 13.92s/it] {'loss': 0.0041, 'learning_rate': 6.5000000000000004e-06, 'epoch': 11.4} 87%|████████▋ | 8711/10000 [34:10:48<4:59:08, 13.92s/it] 87%|████████▋ | 8712/10000 [34:11:01<4:58:30, 13.91s/it] {'loss': 0.0029, 'learning_rate': 6.495e-06, 'epoch': 11.4} 87%|████████▋ | 8712/10000 [34:11:01<4:58:30, 13.91s/it] 87%|████████▋ | 8713/10000 [34:11:15<4:57:55, 13.89s/it] {'loss': 0.0024, 'learning_rate': 6.4900000000000005e-06, 'epoch': 11.4} 87%|████████▋ | 8713/10000 [34:11:15<4:57:55, 13.89s/it] 87%|████████▋ | 8714/10000 [34:11:29<4:57:45, 13.89s/it] {'loss': 0.0029, 'learning_rate': 6.485000000000001e-06, 'epoch': 11.41} 87%|████████▋ | 8714/10000 [34:11:29<4:57:45, 13.89s/it] 87%|████████▋ | 8715/10000 [34:11:43<4:57:12, 13.88s/it] {'loss': 0.0026, 'learning_rate': 6.48e-06, 'epoch': 11.41} 87%|████████▋ | 8715/10000 [34:11:43<4:57:12, 13.88s/it] 87%|████████▋ | 8716/10000 [34:11:57<4:57:12, 13.89s/it] {'loss': 0.0024, 'learning_rate': 6.475000000000001e-06, 'epoch': 11.41} 87%|████████▋ | 8716/10000 [34:11:57<4:57:12, 13.89s/it] 87%|████████▋ | 8717/10000 [34:12:11<4:57:04, 13.89s/it] {'loss': 0.0024, 'learning_rate': 6.47e-06, 'epoch': 11.41} 87%|████████▋ | 8717/10000 [34:12:11<4:57:04, 13.89s/it] 87%|████████▋ | 8718/10000 [34:12:25<4:56:47, 13.89s/it] {'loss': 0.0016, 'learning_rate': 6.465e-06, 'epoch': 11.41} 87%|████████▋ | 8718/10000 [34:12:25<4:56:47, 13.89s/it] 87%|████████▋ | 8719/10000 [34:12:39<4:56:29, 13.89s/it] {'loss': 0.0033, 'learning_rate': 6.460000000000001e-06, 'epoch': 11.41} 87%|████████▋ | 8719/10000 [34:12:39<4:56:29, 13.89s/it] 87%|████████▋ | 8720/10000 [34:12:53<4:57:00, 13.92s/it] {'loss': 0.0024, 'learning_rate': 6.455e-06, 'epoch': 11.41} 87%|████████▋ | 8720/10000 [34:12:53<4:57:00, 13.92s/it] 87%|████████▋ | 8721/10000 [34:13:06<4:57:03, 13.94s/it] {'loss': 0.0031, 'learning_rate': 6.45e-06, 'epoch': 11.41} 87%|████████▋ | 8721/10000 [34:13:07<4:57:03, 13.94s/it] 87%|████████▋ | 8722/10000 [34:13:20<4:56:31, 13.92s/it] {'loss': 0.0019, 'learning_rate': 6.444999999999999e-06, 'epoch': 11.42} 87%|████████▋ | 8722/10000 [34:13:20<4:56:31, 13.92s/it] 87%|████████▋ | 8723/10000 [34:13:34<4:56:11, 13.92s/it] {'loss': 0.0044, 'learning_rate': 6.44e-06, 'epoch': 11.42} 87%|████████▋ | 8723/10000 [34:13:34<4:56:11, 13.92s/it] 87%|████████▋ | 8724/10000 [34:13:48<4:56:13, 13.93s/it] {'loss': 0.0028, 'learning_rate': 6.435000000000001e-06, 'epoch': 11.42} 87%|████████▋ | 8724/10000 [34:13:48<4:56:13, 13.93s/it] 87%|████████▋ | 8725/10000 [34:14:02<4:55:37, 13.91s/it] {'loss': 0.0024, 'learning_rate': 6.43e-06, 'epoch': 11.42} 87%|████████▋ | 8725/10000 [34:14:02<4:55:37, 13.91s/it] 87%|████████▋ | 8726/10000 [34:14:16<4:55:09, 13.90s/it] {'loss': 0.0023, 'learning_rate': 6.425e-06, 'epoch': 11.42} 87%|████████▋ | 8726/10000 [34:14:16<4:55:09, 13.90s/it] 87%|████████▋ | 8727/10000 [34:14:30<4:54:33, 13.88s/it] {'loss': 0.0057, 'learning_rate': 6.4199999999999995e-06, 'epoch': 11.42} 87%|████████▋ | 8727/10000 [34:14:30<4:54:33, 13.88s/it] 87%|████████▋ | 8728/10000 [34:14:44<4:54:51, 13.91s/it] {'loss': 0.0014, 'learning_rate': 6.415e-06, 'epoch': 11.42} 87%|████████▋ | 8728/10000 [34:14:44<4:54:51, 13.91s/it] 87%|████████▋ | 8729/10000 [34:14:58<4:54:05, 13.88s/it] {'loss': 0.0045, 'learning_rate': 6.4100000000000005e-06, 'epoch': 11.43} 87%|████████▋ | 8729/10000 [34:14:58<4:54:05, 13.88s/it] 87%|████████▋ | 8730/10000 [34:15:12<4:54:18, 13.90s/it] {'loss': 0.0022, 'learning_rate': 6.405e-06, 'epoch': 11.43} 87%|████████▋ | 8730/10000 [34:15:12<4:54:18, 13.90s/it] 87%|████████▋ | 8731/10000 [34:15:25<4:53:11, 13.86s/it] {'loss': 0.0029, 'learning_rate': 6.4000000000000006e-06, 'epoch': 11.43} 87%|████████▋ | 8731/10000 [34:15:25<4:53:11, 13.86s/it] 87%|████████▋ | 8732/10000 [34:15:39<4:53:46, 13.90s/it] {'loss': 0.003, 'learning_rate': 6.395000000000001e-06, 'epoch': 11.43} 87%|████████▋ | 8732/10000 [34:15:39<4:53:46, 13.90s/it] 87%|████████▋ | 8733/10000 [34:15:53<4:53:40, 13.91s/it] {'loss': 0.0017, 'learning_rate': 6.39e-06, 'epoch': 11.43} 87%|████████▋ | 8733/10000 [34:15:53<4:53:40, 13.91s/it] 87%|████████▋ | 8734/10000 [34:16:07<4:53:05, 13.89s/it] {'loss': 0.0033, 'learning_rate': 6.385000000000001e-06, 'epoch': 11.43} 87%|████████▋ | 8734/10000 [34:16:07<4:53:05, 13.89s/it] 87%|████████▋ | 8735/10000 [34:16:21<4:52:25, 13.87s/it] {'loss': 0.0017, 'learning_rate': 6.38e-06, 'epoch': 11.43} 87%|████████▋ | 8735/10000 [34:16:21<4:52:25, 13.87s/it] 87%|████████▋ | 8736/10000 [34:16:35<4:53:18, 13.92s/it] {'loss': 0.0031, 'learning_rate': 6.375000000000001e-06, 'epoch': 11.43} 87%|████████▋ | 8736/10000 [34:16:35<4:53:18, 13.92s/it] 87%|████████▋ | 8737/10000 [34:16:49<4:52:03, 13.87s/it] {'loss': 0.0061, 'learning_rate': 6.370000000000001e-06, 'epoch': 11.44} 87%|████████▋ | 8737/10000 [34:16:49<4:52:03, 13.87s/it] 87%|████████▋ | 8738/10000 [34:17:03<4:51:39, 13.87s/it] {'loss': 0.004, 'learning_rate': 6.365e-06, 'epoch': 11.44} 87%|████████▋ | 8738/10000 [34:17:03<4:51:39, 13.87s/it] 87%|████████▋ | 8739/10000 [34:17:17<4:51:54, 13.89s/it] {'loss': 0.003, 'learning_rate': 6.360000000000001e-06, 'epoch': 11.44} 87%|████████▋ | 8739/10000 [34:17:17<4:51:54, 13.89s/it] 87%|████████▋ | 8740/10000 [34:17:30<4:51:19, 13.87s/it] {'loss': 0.0018, 'learning_rate': 6.355e-06, 'epoch': 11.44} 87%|████████▋ | 8740/10000 [34:17:30<4:51:19, 13.87s/it] 87%|████████▋ | 8741/10000 [34:17:44<4:51:18, 13.88s/it] {'loss': 0.0023, 'learning_rate': 6.35e-06, 'epoch': 11.44} 87%|████████▋ | 8741/10000 [34:17:44<4:51:18, 13.88s/it] 87%|████████▋ | 8742/10000 [34:17:58<4:51:08, 13.89s/it] {'loss': 0.0021, 'learning_rate': 6.345000000000001e-06, 'epoch': 11.44} 87%|████████▋ | 8742/10000 [34:17:58<4:51:08, 13.89s/it] 87%|████████▋ | 8743/10000 [34:18:12<4:50:17, 13.86s/it] {'loss': 0.0037, 'learning_rate': 6.34e-06, 'epoch': 11.44} 87%|████████▋ | 8743/10000 [34:18:12<4:50:17, 13.86s/it] 87%|████████▋ | 8744/10000 [34:18:26<4:49:56, 13.85s/it] {'loss': 0.0024, 'learning_rate': 6.335e-06, 'epoch': 11.45} 87%|████████▋ | 8744/10000 [34:18:26<4:49:56, 13.85s/it] 87%|████████▋ | 8745/10000 [34:18:40<4:49:56, 13.86s/it] {'loss': 0.0023, 'learning_rate': 6.3299999999999995e-06, 'epoch': 11.45} 87%|████████▋ | 8745/10000 [34:18:40<4:49:56, 13.86s/it] 87%|████████▋ | 8746/10000 [34:18:54<4:50:14, 13.89s/it] {'loss': 0.0051, 'learning_rate': 6.3250000000000004e-06, 'epoch': 11.45} 87%|████████▋ | 8746/10000 [34:18:54<4:50:14, 13.89s/it] 87%|████████▋ | 8747/10000 [34:19:08<4:50:06, 13.89s/it] {'loss': 0.0034, 'learning_rate': 6.320000000000001e-06, 'epoch': 11.45} 87%|████████▋ | 8747/10000 [34:19:08<4:50:06, 13.89s/it] 87%|████████▋ | 8748/10000 [34:19:21<4:49:30, 13.87s/it] {'loss': 0.0026, 'learning_rate': 6.315e-06, 'epoch': 11.45} 87%|████████▋ | 8748/10000 [34:19:21<4:49:30, 13.87s/it] 87%|████████▋ | 8749/10000 [34:19:35<4:49:27, 13.88s/it] {'loss': 0.002, 'learning_rate': 6.3100000000000006e-06, 'epoch': 11.45} 87%|████████▋ | 8749/10000 [34:19:35<4:49:27, 13.88s/it] 88%|████████▊ | 8750/10000 [34:19:49<4:49:20, 13.89s/it] {'loss': 0.0035, 'learning_rate': 6.305e-06, 'epoch': 11.45} 88%|████████▊ | 8750/10000 [34:19:49<4:49:20, 13.89s/it] 88%|████████▊ | 8751/10000 [34:20:03<4:49:00, 13.88s/it] {'loss': 0.0032, 'learning_rate': 6.300000000000001e-06, 'epoch': 11.45} 88%|████████▊ | 8751/10000 [34:20:03<4:49:00, 13.88s/it] 88%|████████▊ | 8752/10000 [34:20:17<4:48:46, 13.88s/it] {'loss': 0.0029, 'learning_rate': 6.295000000000001e-06, 'epoch': 11.46} 88%|████████▊ | 8752/10000 [34:20:17<4:48:46, 13.88s/it] 88%|████████▊ | 8753/10000 [34:20:31<4:48:05, 13.86s/it] {'loss': 0.0024, 'learning_rate': 6.29e-06, 'epoch': 11.46} 88%|████████▊ | 8753/10000 [34:20:31<4:48:05, 13.86s/it] 88%|████████▊ | 8754/10000 [34:20:45<4:48:18, 13.88s/it] {'loss': 0.0026, 'learning_rate': 6.285000000000001e-06, 'epoch': 11.46} 88%|████████▊ | 8754/10000 [34:20:45<4:48:18, 13.88s/it] 88%|████████▊ | 8755/10000 [34:20:59<4:48:00, 13.88s/it] {'loss': 0.0023, 'learning_rate': 6.28e-06, 'epoch': 11.46} 88%|████████▊ | 8755/10000 [34:20:59<4:48:00, 13.88s/it] 88%|████████▊ | 8756/10000 [34:21:12<4:47:30, 13.87s/it] {'loss': 0.0046, 'learning_rate': 6.275e-06, 'epoch': 11.46} 88%|████████▊ | 8756/10000 [34:21:12<4:47:30, 13.87s/it] 88%|████████▊ | 8757/10000 [34:21:26<4:47:30, 13.88s/it] {'loss': 0.002, 'learning_rate': 6.270000000000001e-06, 'epoch': 11.46} 88%|████████▊ | 8757/10000 [34:21:26<4:47:30, 13.88s/it] 88%|████████▊ | 8758/10000 [34:21:40<4:47:15, 13.88s/it] {'loss': 0.0025, 'learning_rate': 6.265e-06, 'epoch': 11.46} 88%|████████▊ | 8758/10000 [34:21:40<4:47:15, 13.88s/it] 88%|████████▊ | 8759/10000 [34:21:54<4:46:25, 13.85s/it] {'loss': 0.0022, 'learning_rate': 6.26e-06, 'epoch': 11.46} 88%|████████▊ | 8759/10000 [34:21:54<4:46:25, 13.85s/it] 88%|████████▊ | 8760/10000 [34:22:08<4:46:53, 13.88s/it] {'loss': 0.0022, 'learning_rate': 6.254999999999999e-06, 'epoch': 11.47} 88%|████████▊ | 8760/10000 [34:22:08<4:46:53, 13.88s/it] 88%|████████▊ | 8761/10000 [34:22:22<4:46:17, 13.86s/it] {'loss': 0.0029, 'learning_rate': 6.25e-06, 'epoch': 11.47} 88%|████████▊ | 8761/10000 [34:22:22<4:46:17, 13.86s/it] 88%|████████▊ | 8762/10000 [34:22:36<4:46:52, 13.90s/it] {'loss': 0.0025, 'learning_rate': 6.245e-06, 'epoch': 11.47} 88%|████████▊ | 8762/10000 [34:22:36<4:46:52, 13.90s/it] 88%|████████▊ | 8763/10000 [34:22:50<4:47:04, 13.92s/it] {'loss': 0.0013, 'learning_rate': 6.24e-06, 'epoch': 11.47} 88%|████████▊ | 8763/10000 [34:22:50<4:47:04, 13.92s/it] 88%|████████▊ | 8764/10000 [34:23:03<4:45:58, 13.88s/it] {'loss': 0.0031, 'learning_rate': 6.2350000000000004e-06, 'epoch': 11.47} 88%|████████▊ | 8764/10000 [34:23:04<4:45:58, 13.88s/it] 88%|████████▊ | 8765/10000 [34:23:17<4:45:58, 13.89s/it] {'loss': 0.0025, 'learning_rate': 6.2300000000000005e-06, 'epoch': 11.47} 88%|████████▊ | 8765/10000 [34:23:17<4:45:58, 13.89s/it] 88%|████████▊ | 8766/10000 [34:23:31<4:45:56, 13.90s/it] {'loss': 0.002, 'learning_rate': 6.2250000000000005e-06, 'epoch': 11.47} 88%|████████▊ | 8766/10000 [34:23:31<4:45:56, 13.90s/it] 88%|████████▊ | 8767/10000 [34:23:45<4:45:42, 13.90s/it] {'loss': 0.0028, 'learning_rate': 6.22e-06, 'epoch': 11.48} 88%|████████▊ | 8767/10000 [34:23:45<4:45:42, 13.90s/it] 88%|████████▊ | 8768/10000 [34:23:59<4:45:18, 13.89s/it] {'loss': 0.0039, 'learning_rate': 6.215e-06, 'epoch': 11.48} 88%|████████▊ | 8768/10000 [34:23:59<4:45:18, 13.89s/it] 88%|████████▊ | 8769/10000 [34:24:13<4:45:35, 13.92s/it] {'loss': 0.0024, 'learning_rate': 6.210000000000001e-06, 'epoch': 11.48} 88%|████████▊ | 8769/10000 [34:24:13<4:45:35, 13.92s/it] 88%|████████▊ | 8770/10000 [34:24:27<4:44:42, 13.89s/it] {'loss': 0.0042, 'learning_rate': 6.205000000000001e-06, 'epoch': 11.48} 88%|████████▊ | 8770/10000 [34:24:27<4:44:42, 13.89s/it] 88%|████████▊ | 8771/10000 [34:24:41<4:44:38, 13.90s/it] {'loss': 0.0024, 'learning_rate': 6.2e-06, 'epoch': 11.48} 88%|████████▊ | 8771/10000 [34:24:41<4:44:38, 13.90s/it] 88%|████████▊ | 8772/10000 [34:24:55<4:44:13, 13.89s/it] {'loss': 0.0012, 'learning_rate': 6.195e-06, 'epoch': 11.48} 88%|████████▊ | 8772/10000 [34:24:55<4:44:13, 13.89s/it] 88%|████████▊ | 8773/10000 [34:25:09<4:43:55, 13.88s/it] {'loss': 0.0019, 'learning_rate': 6.19e-06, 'epoch': 11.48} 88%|████████▊ | 8773/10000 [34:25:09<4:43:55, 13.88s/it] 88%|████████▊ | 8774/10000 [34:25:22<4:43:22, 13.87s/it] {'loss': 0.0031, 'learning_rate': 6.185000000000001e-06, 'epoch': 11.48} 88%|████████▊ | 8774/10000 [34:25:22<4:43:22, 13.87s/it] 88%|████████▊ | 8775/10000 [34:25:36<4:42:59, 13.86s/it] {'loss': 0.0028, 'learning_rate': 6.18e-06, 'epoch': 11.49} 88%|████████▊ | 8775/10000 [34:25:36<4:42:59, 13.86s/it] 88%|████████▊ | 8776/10000 [34:25:50<4:43:16, 13.89s/it] {'loss': 0.0036, 'learning_rate': 6.175e-06, 'epoch': 11.49} 88%|████████▊ | 8776/10000 [34:25:50<4:43:16, 13.89s/it] 88%|████████▊ | 8777/10000 [34:26:04<4:42:57, 13.88s/it] {'loss': 0.0027, 'learning_rate': 6.17e-06, 'epoch': 11.49} 88%|████████▊ | 8777/10000 [34:26:04<4:42:57, 13.88s/it] 88%|████████▊ | 8778/10000 [34:26:18<4:42:59, 13.89s/it] {'loss': 0.0028, 'learning_rate': 6.165e-06, 'epoch': 11.49} 88%|████████▊ | 8778/10000 [34:26:18<4:42:59, 13.89s/it] 88%|████████▊ | 8779/10000 [34:26:32<4:42:42, 13.89s/it] {'loss': 0.0031, 'learning_rate': 6.16e-06, 'epoch': 11.49} 88%|████████▊ | 8779/10000 [34:26:32<4:42:42, 13.89s/it] 88%|████████▊ | 8780/10000 [34:26:46<4:42:43, 13.90s/it] {'loss': 0.0025, 'learning_rate': 6.155e-06, 'epoch': 11.49} 88%|████████▊ | 8780/10000 [34:26:46<4:42:43, 13.90s/it] 88%|████████▊ | 8781/10000 [34:27:00<4:42:43, 13.92s/it] {'loss': 0.0024, 'learning_rate': 6.15e-06, 'epoch': 11.49} 88%|████████▊ | 8781/10000 [34:27:00<4:42:43, 13.92s/it] 88%|████████▊ | 8782/10000 [34:27:14<4:42:43, 13.93s/it] {'loss': 0.0028, 'learning_rate': 6.1450000000000005e-06, 'epoch': 11.49} 88%|████████▊ | 8782/10000 [34:27:14<4:42:43, 13.93s/it] 88%|████████▊ | 8783/10000 [34:27:28<4:43:01, 13.95s/it] {'loss': 0.0028, 'learning_rate': 6.1400000000000005e-06, 'epoch': 11.5} 88%|████████▊ | 8783/10000 [34:27:28<4:43:01, 13.95s/it] 88%|████████▊ | 8784/10000 [34:27:42<4:42:26, 13.94s/it] {'loss': 0.0024, 'learning_rate': 6.1350000000000006e-06, 'epoch': 11.5} 88%|████████▊ | 8784/10000 [34:27:42<4:42:26, 13.94s/it] 88%|████████▊ | 8785/10000 [34:27:55<4:41:47, 13.92s/it] {'loss': 0.0026, 'learning_rate': 6.130000000000001e-06, 'epoch': 11.5} 88%|████████▊ | 8785/10000 [34:27:56<4:41:47, 13.92s/it] 88%|████████▊ | 8786/10000 [34:28:09<4:41:25, 13.91s/it] {'loss': 0.0034, 'learning_rate': 6.125e-06, 'epoch': 11.5} 88%|████████▊ | 8786/10000 [34:28:09<4:41:25, 13.91s/it] 88%|████████▊ | 8787/10000 [34:28:23<4:41:31, 13.93s/it] {'loss': 0.0028, 'learning_rate': 6.12e-06, 'epoch': 11.5} 88%|████████▊ | 8787/10000 [34:28:23<4:41:31, 13.93s/it] 88%|████████▊ | 8788/10000 [34:28:37<4:41:44, 13.95s/it] {'loss': 0.0032, 'learning_rate': 6.115000000000001e-06, 'epoch': 11.5} 88%|████████▊ | 8788/10000 [34:28:37<4:41:44, 13.95s/it] 88%|████████▊ | 8789/10000 [34:28:51<4:40:42, 13.91s/it] {'loss': 0.0024, 'learning_rate': 6.110000000000001e-06, 'epoch': 11.5} 88%|████████▊ | 8789/10000 [34:28:51<4:40:42, 13.91s/it] 88%|████████▊ | 8790/10000 [34:29:05<4:40:54, 13.93s/it] {'loss': 0.0024, 'learning_rate': 6.105e-06, 'epoch': 11.51} 88%|████████▊ | 8790/10000 [34:29:05<4:40:54, 13.93s/it] 88%|████████▊ | 8791/10000 [34:29:19<4:40:28, 13.92s/it] {'loss': 0.0017, 'learning_rate': 6.1e-06, 'epoch': 11.51} 88%|████████▊ | 8791/10000 [34:29:19<4:40:28, 13.92s/it] 88%|████████▊ | 8792/10000 [34:29:33<4:39:24, 13.88s/it] {'loss': 0.0036, 'learning_rate': 6.095e-06, 'epoch': 11.51} 88%|████████▊ | 8792/10000 [34:29:33<4:39:24, 13.88s/it] 88%|████████▊ | 8793/10000 [34:29:47<4:38:47, 13.86s/it] {'loss': 0.0034, 'learning_rate': 6.090000000000001e-06, 'epoch': 11.51} 88%|████████▊ | 8793/10000 [34:29:47<4:38:47, 13.86s/it] 88%|████████▊ | 8794/10000 [34:30:01<4:38:57, 13.88s/it] {'loss': 0.0028, 'learning_rate': 6.085e-06, 'epoch': 11.51} 88%|████████▊ | 8794/10000 [34:30:01<4:38:57, 13.88s/it] 88%|████████▊ | 8795/10000 [34:30:14<4:38:44, 13.88s/it] {'loss': 0.0026, 'learning_rate': 6.08e-06, 'epoch': 11.51} 88%|████████▊ | 8795/10000 [34:30:14<4:38:44, 13.88s/it] 88%|████████▊ | 8796/10000 [34:30:28<4:38:14, 13.87s/it] {'loss': 0.0035, 'learning_rate': 6.075e-06, 'epoch': 11.51} 88%|████████▊ | 8796/10000 [34:30:28<4:38:14, 13.87s/it] 88%|████████▊ | 8797/10000 [34:30:42<4:38:10, 13.87s/it] {'loss': 0.0038, 'learning_rate': 6.07e-06, 'epoch': 11.51} 88%|████████▊ | 8797/10000 [34:30:42<4:38:10, 13.87s/it] 88%|████████▊ | 8798/10000 [34:30:56<4:38:47, 13.92s/it] {'loss': 0.0022, 'learning_rate': 6.065e-06, 'epoch': 11.52} 88%|████████▊ | 8798/10000 [34:30:56<4:38:47, 13.92s/it] 88%|████████▊ | 8799/10000 [34:31:10<4:37:50, 13.88s/it] {'loss': 0.0021, 'learning_rate': 6.0600000000000004e-06, 'epoch': 11.52} 88%|████████▊ | 8799/10000 [34:31:10<4:37:50, 13.88s/it] 88%|████████▊ | 8800/10000 [34:31:24<4:37:52, 13.89s/it] {'loss': 0.0021, 'learning_rate': 6.0550000000000005e-06, 'epoch': 11.52} 88%|████████▊ | 8800/10000 [34:31:24<4:37:52, 13.89s/it] 88%|████████▊ | 8801/10000 [34:31:38<4:37:20, 13.88s/it] {'loss': 0.0033, 'learning_rate': 6.0500000000000005e-06, 'epoch': 11.52} 88%|████████▊ | 8801/10000 [34:31:38<4:37:20, 13.88s/it] 88%|████████▊ | 8802/10000 [34:31:52<4:36:41, 13.86s/it] {'loss': 0.004, 'learning_rate': 6.045e-06, 'epoch': 11.52} 88%|████████▊ | 8802/10000 [34:31:52<4:36:41, 13.86s/it] 88%|████████▊ | 8803/10000 [34:32:05<4:36:26, 13.86s/it] {'loss': 0.0023, 'learning_rate': 6.040000000000001e-06, 'epoch': 11.52} 88%|████████▊ | 8803/10000 [34:32:05<4:36:26, 13.86s/it] 88%|████████▊ | 8804/10000 [34:32:19<4:35:52, 13.84s/it] {'loss': 0.002, 'learning_rate': 6.035000000000001e-06, 'epoch': 11.52} 88%|████████▊ | 8804/10000 [34:32:19<4:35:52, 13.84s/it] 88%|████████▊ | 8805/10000 [34:32:33<4:35:24, 13.83s/it] {'loss': 0.0048, 'learning_rate': 6.03e-06, 'epoch': 11.52} 88%|████████▊ | 8805/10000 [34:32:33<4:35:24, 13.83s/it] 88%|████████▊ | 8806/10000 [34:32:47<4:35:47, 13.86s/it] {'loss': 0.0024, 'learning_rate': 6.025e-06, 'epoch': 11.53} 88%|████████▊ | 8806/10000 [34:32:47<4:35:47, 13.86s/it] 88%|████████▊ | 8807/10000 [34:33:01<4:35:32, 13.86s/it] {'loss': 0.0016, 'learning_rate': 6.02e-06, 'epoch': 11.53} 88%|████████▊ | 8807/10000 [34:33:01<4:35:32, 13.86s/it] 88%|████████▊ | 8808/10000 [34:33:15<4:34:56, 13.84s/it] {'loss': 0.0029, 'learning_rate': 6.015000000000001e-06, 'epoch': 11.53} 88%|████████▊ | 8808/10000 [34:33:15<4:34:56, 13.84s/it] 88%|████████▊ | 8809/10000 [34:33:28<4:34:37, 13.83s/it] {'loss': 0.0022, 'learning_rate': 6.01e-06, 'epoch': 11.53} 88%|████████▊ | 8809/10000 [34:33:28<4:34:37, 13.83s/it] 88%|████████▊ | 8810/10000 [34:33:42<4:34:34, 13.84s/it] {'loss': 0.0025, 'learning_rate': 6.005e-06, 'epoch': 11.53} 88%|████████▊ | 8810/10000 [34:33:42<4:34:34, 13.84s/it] 88%|████████▊ | 8811/10000 [34:33:56<4:34:31, 13.85s/it] {'loss': 0.0048, 'learning_rate': 6e-06, 'epoch': 11.53} 88%|████████▊ | 8811/10000 [34:33:56<4:34:31, 13.85s/it] 88%|████████▊ | 8812/10000 [34:34:10<4:34:07, 13.84s/it] {'loss': 0.0036, 'learning_rate': 5.995e-06, 'epoch': 11.53} 88%|████████▊ | 8812/10000 [34:34:10<4:34:07, 13.84s/it] 88%|████████▊ | 8813/10000 [34:34:24<4:33:53, 13.84s/it] {'loss': 0.0049, 'learning_rate': 5.99e-06, 'epoch': 11.54} 88%|████████▊ | 8813/10000 [34:34:24<4:33:53, 13.84s/it] 88%|████████▊ | 8814/10000 [34:34:38<4:34:00, 13.86s/it] {'loss': 0.002, 'learning_rate': 5.985e-06, 'epoch': 11.54} 88%|████████▊ | 8814/10000 [34:34:38<4:34:00, 13.86s/it] 88%|████████▊ | 8815/10000 [34:34:52<4:33:54, 13.87s/it] {'loss': 0.0031, 'learning_rate': 5.98e-06, 'epoch': 11.54} 88%|████████▊ | 8815/10000 [34:34:52<4:33:54, 13.87s/it] 88%|████████▊ | 8816/10000 [34:35:05<4:33:53, 13.88s/it] {'loss': 0.0018, 'learning_rate': 5.975e-06, 'epoch': 11.54} 88%|████████▊ | 8816/10000 [34:35:06<4:33:53, 13.88s/it] 88%|████████▊ | 8817/10000 [34:35:19<4:34:10, 13.91s/it] {'loss': 0.0043, 'learning_rate': 5.9700000000000004e-06, 'epoch': 11.54} 88%|████████▊ | 8817/10000 [34:35:19<4:34:10, 13.91s/it] 88%|████████▊ | 8818/10000 [34:35:33<4:34:13, 13.92s/it] {'loss': 0.0022, 'learning_rate': 5.9650000000000005e-06, 'epoch': 11.54} 88%|████████▊ | 8818/10000 [34:35:33<4:34:13, 13.92s/it] 88%|████████▊ | 8819/10000 [34:35:47<4:33:40, 13.90s/it] {'loss': 0.0023, 'learning_rate': 5.9600000000000005e-06, 'epoch': 11.54} 88%|████████▊ | 8819/10000 [34:35:47<4:33:40, 13.90s/it] 88%|████████▊ | 8820/10000 [34:36:01<4:33:13, 13.89s/it] {'loss': 0.0027, 'learning_rate': 5.955000000000001e-06, 'epoch': 11.54} 88%|████████▊ | 8820/10000 [34:36:01<4:33:13, 13.89s/it] 88%|████████▊ | 8821/10000 [34:36:15<4:33:35, 13.92s/it] {'loss': 0.0014, 'learning_rate': 5.95e-06, 'epoch': 11.55} 88%|████████▊ | 8821/10000 [34:36:15<4:33:35, 13.92s/it] 88%|████████▊ | 8822/10000 [34:36:29<4:33:04, 13.91s/it] {'loss': 0.0028, 'learning_rate': 5.945000000000001e-06, 'epoch': 11.55} 88%|████████▊ | 8822/10000 [34:36:29<4:33:04, 13.91s/it] 88%|████████▊ | 8823/10000 [34:36:43<4:32:07, 13.87s/it] {'loss': 0.0034, 'learning_rate': 5.940000000000001e-06, 'epoch': 11.55} 88%|████████▊ | 8823/10000 [34:36:43<4:32:07, 13.87s/it] 88%|████████▊ | 8824/10000 [34:36:57<4:32:08, 13.89s/it] {'loss': 0.0028, 'learning_rate': 5.935e-06, 'epoch': 11.55} 88%|████████▊ | 8824/10000 [34:36:57<4:32:08, 13.89s/it] 88%|████████▊ | 8825/10000 [34:37:11<4:31:42, 13.87s/it] {'loss': 0.0037, 'learning_rate': 5.93e-06, 'epoch': 11.55} 88%|████████▊ | 8825/10000 [34:37:11<4:31:42, 13.87s/it] 88%|████████▊ | 8826/10000 [34:37:24<4:31:19, 13.87s/it] {'loss': 0.0025, 'learning_rate': 5.925e-06, 'epoch': 11.55} 88%|████████▊ | 8826/10000 [34:37:24<4:31:19, 13.87s/it] 88%|████████▊ | 8827/10000 [34:37:38<4:31:16, 13.88s/it] {'loss': 0.0035, 'learning_rate': 5.920000000000001e-06, 'epoch': 11.55} 88%|████████▊ | 8827/10000 [34:37:38<4:31:16, 13.88s/it] 88%|████████▊ | 8828/10000 [34:37:52<4:31:13, 13.89s/it] {'loss': 0.0042, 'learning_rate': 5.915e-06, 'epoch': 11.55} 88%|████████▊ | 8828/10000 [34:37:52<4:31:13, 13.89s/it] 88%|████████▊ | 8829/10000 [34:38:06<4:31:06, 13.89s/it] {'loss': 0.0034, 'learning_rate': 5.91e-06, 'epoch': 11.56} 88%|████████▊ | 8829/10000 [34:38:06<4:31:06, 13.89s/it] 88%|████████▊ | 8830/10000 [34:38:20<4:30:43, 13.88s/it] {'loss': 0.0028, 'learning_rate': 5.905e-06, 'epoch': 11.56} 88%|████████▊ | 8830/10000 [34:38:20<4:30:43, 13.88s/it] 88%|████████▊ | 8831/10000 [34:38:34<4:31:00, 13.91s/it] {'loss': 0.0026, 'learning_rate': 5.9e-06, 'epoch': 11.56} 88%|████████▊ | 8831/10000 [34:38:34<4:31:00, 13.91s/it] 88%|████████▊ | 8832/10000 [34:38:48<4:30:41, 13.91s/it] {'loss': 0.0017, 'learning_rate': 5.895e-06, 'epoch': 11.56} 88%|████████▊ | 8832/10000 [34:38:48<4:30:41, 13.91s/it] 88%|████████▊ | 8833/10000 [34:39:02<4:31:03, 13.94s/it] {'loss': 0.002, 'learning_rate': 5.89e-06, 'epoch': 11.56} 88%|████████▊ | 8833/10000 [34:39:02<4:31:03, 13.94s/it] 88%|████████▊ | 8834/10000 [34:39:16<4:30:22, 13.91s/it] {'loss': 0.0041, 'learning_rate': 5.885e-06, 'epoch': 11.56} 88%|████████▊ | 8834/10000 [34:39:16<4:30:22, 13.91s/it] 88%|████████▊ | 8835/10000 [34:39:30<4:30:52, 13.95s/it] {'loss': 0.0025, 'learning_rate': 5.8800000000000005e-06, 'epoch': 11.56} 88%|████████▊ | 8835/10000 [34:39:30<4:30:52, 13.95s/it] 88%|████████▊ | 8836/10000 [34:39:44<4:30:29, 13.94s/it] {'loss': 0.0019, 'learning_rate': 5.875e-06, 'epoch': 11.57} 88%|████████▊ | 8836/10000 [34:39:44<4:30:29, 13.94s/it] 88%|████████▊ | 8837/10000 [34:39:58<4:29:51, 13.92s/it] {'loss': 0.0026, 'learning_rate': 5.8700000000000005e-06, 'epoch': 11.57} 88%|████████▊ | 8837/10000 [34:39:58<4:29:51, 13.92s/it] 88%|████████▊ | 8838/10000 [34:40:11<4:29:46, 13.93s/it] {'loss': 0.0025, 'learning_rate': 5.865000000000001e-06, 'epoch': 11.57} 88%|████████▊ | 8838/10000 [34:40:12<4:29:46, 13.93s/it] 88%|████████▊ | 8839/10000 [34:40:25<4:28:59, 13.90s/it] {'loss': 0.0034, 'learning_rate': 5.86e-06, 'epoch': 11.57} 88%|████████▊ | 8839/10000 [34:40:25<4:28:59, 13.90s/it] 88%|████████▊ | 8840/10000 [34:40:39<4:28:37, 13.89s/it] {'loss': 0.0021, 'learning_rate': 5.855e-06, 'epoch': 11.57} 88%|████████▊ | 8840/10000 [34:40:39<4:28:37, 13.89s/it] 88%|████████▊ | 8841/10000 [34:40:53<4:29:03, 13.93s/it] {'loss': 0.0029, 'learning_rate': 5.850000000000001e-06, 'epoch': 11.57} 88%|████████▊ | 8841/10000 [34:40:53<4:29:03, 13.93s/it] 88%|████████▊ | 8842/10000 [34:41:07<4:28:41, 13.92s/it] {'loss': 0.002, 'learning_rate': 5.845000000000001e-06, 'epoch': 11.57} 88%|████████▊ | 8842/10000 [34:41:07<4:28:41, 13.92s/it] 88%|████████▊ | 8843/10000 [34:41:21<4:28:19, 13.92s/it] {'loss': 0.0018, 'learning_rate': 5.84e-06, 'epoch': 11.57} 88%|████████▊ | 8843/10000 [34:41:21<4:28:19, 13.92s/it] 88%|████████▊ | 8844/10000 [34:41:35<4:28:28, 13.93s/it] {'loss': 0.0014, 'learning_rate': 5.835e-06, 'epoch': 11.58} 88%|████████▊ | 8844/10000 [34:41:35<4:28:28, 13.93s/it] 88%|████████▊ | 8845/10000 [34:41:49<4:27:11, 13.88s/it] {'loss': 0.0018, 'learning_rate': 5.83e-06, 'epoch': 11.58} 88%|████████▊ | 8845/10000 [34:41:49<4:27:11, 13.88s/it] 88%|████████▊ | 8846/10000 [34:42:03<4:27:02, 13.88s/it] {'loss': 0.0025, 'learning_rate': 5.825000000000001e-06, 'epoch': 11.58} 88%|████████▊ | 8846/10000 [34:42:03<4:27:02, 13.88s/it] 88%|████████▊ | 8847/10000 [34:42:17<4:27:00, 13.89s/it] {'loss': 0.003, 'learning_rate': 5.82e-06, 'epoch': 11.58} 88%|████████▊ | 8847/10000 [34:42:17<4:27:00, 13.89s/it] 88%|████████▊ | 8848/10000 [34:42:31<4:27:05, 13.91s/it] {'loss': 0.0018, 'learning_rate': 5.815e-06, 'epoch': 11.58} 88%|████████▊ | 8848/10000 [34:42:31<4:27:05, 13.91s/it] 88%|████████▊ | 8849/10000 [34:42:44<4:26:32, 13.89s/it] {'loss': 0.0029, 'learning_rate': 5.81e-06, 'epoch': 11.58} 88%|████████▊ | 8849/10000 [34:42:44<4:26:32, 13.89s/it] 88%|████████▊ | 8850/10000 [34:42:58<4:26:14, 13.89s/it] {'loss': 0.0032, 'learning_rate': 5.805e-06, 'epoch': 11.58} 88%|████████▊ | 8850/10000 [34:42:58<4:26:14, 13.89s/it] 89%|████████▊ | 8851/10000 [34:43:12<4:26:10, 13.90s/it] {'loss': 0.0035, 'learning_rate': 5.8e-06, 'epoch': 11.59} 89%|████████▊ | 8851/10000 [34:43:12<4:26:10, 13.90s/it] 89%|████████▊ | 8852/10000 [34:43:26<4:25:41, 13.89s/it] {'loss': 0.0019, 'learning_rate': 5.795e-06, 'epoch': 11.59} 89%|████████▊ | 8852/10000 [34:43:26<4:25:41, 13.89s/it] 89%|████████▊ | 8853/10000 [34:43:40<4:25:17, 13.88s/it] {'loss': 0.0047, 'learning_rate': 5.7900000000000005e-06, 'epoch': 11.59} 89%|████████▊ | 8853/10000 [34:43:40<4:25:17, 13.88s/it] 89%|████████▊ | 8854/10000 [34:43:54<4:25:33, 13.90s/it] {'loss': 0.0032, 'learning_rate': 5.7850000000000005e-06, 'epoch': 11.59} 89%|████████▊ | 8854/10000 [34:43:54<4:25:33, 13.90s/it] 89%|████████▊ | 8855/10000 [34:44:08<4:25:22, 13.91s/it] {'loss': 0.0014, 'learning_rate': 5.78e-06, 'epoch': 11.59} 89%|████████▊ | 8855/10000 [34:44:08<4:25:22, 13.91s/it] 89%|████████▊ | 8856/10000 [34:44:22<4:24:19, 13.86s/it] {'loss': 0.0036, 'learning_rate': 5.775000000000001e-06, 'epoch': 11.59} 89%|████████▊ | 8856/10000 [34:44:22<4:24:19, 13.86s/it] 89%|████████▊ | 8857/10000 [34:44:35<4:24:00, 13.86s/it] {'loss': 0.0029, 'learning_rate': 5.770000000000001e-06, 'epoch': 11.59} 89%|████████▊ | 8857/10000 [34:44:35<4:24:00, 13.86s/it] 89%|████████▊ | 8858/10000 [34:44:49<4:24:16, 13.89s/it] {'loss': 0.0023, 'learning_rate': 5.765e-06, 'epoch': 11.59} 89%|████████▊ | 8858/10000 [34:44:49<4:24:16, 13.89s/it] 89%|████████▊ | 8859/10000 [34:45:03<4:23:47, 13.87s/it] {'loss': 0.0028, 'learning_rate': 5.76e-06, 'epoch': 11.6} 89%|████████▊ | 8859/10000 [34:45:03<4:23:47, 13.87s/it] 89%|████████▊ | 8860/10000 [34:45:17<4:23:37, 13.88s/it] {'loss': 0.0043, 'learning_rate': 5.755e-06, 'epoch': 11.6} 89%|████████▊ | 8860/10000 [34:45:17<4:23:37, 13.88s/it] 89%|████████▊ | 8861/10000 [34:45:31<4:23:09, 13.86s/it] {'loss': 0.0024, 'learning_rate': 5.750000000000001e-06, 'epoch': 11.6} 89%|████████▊ | 8861/10000 [34:45:31<4:23:09, 13.86s/it] 89%|████████▊ | 8862/10000 [34:45:45<4:22:22, 13.83s/it] {'loss': 0.0024, 'learning_rate': 5.745e-06, 'epoch': 11.6} 89%|████████▊ | 8862/10000 [34:45:45<4:22:22, 13.83s/it] 89%|████████▊ | 8863/10000 [34:45:59<4:22:33, 13.86s/it] {'loss': 0.0028, 'learning_rate': 5.74e-06, 'epoch': 11.6} 89%|████████▊ | 8863/10000 [34:45:59<4:22:33, 13.86s/it] 89%|████████▊ | 8864/10000 [34:46:12<4:22:49, 13.88s/it] {'loss': 0.0029, 'learning_rate': 5.735e-06, 'epoch': 11.6} 89%|████████▊ | 8864/10000 [34:46:13<4:22:49, 13.88s/it] 89%|████████▊ | 8865/10000 [34:46:26<4:22:39, 13.89s/it] {'loss': 0.0038, 'learning_rate': 5.73e-06, 'epoch': 11.6} 89%|████████▊ | 8865/10000 [34:46:26<4:22:39, 13.89s/it] 89%|████████▊ | 8866/10000 [34:46:40<4:22:19, 13.88s/it] {'loss': 0.0031, 'learning_rate': 5.725e-06, 'epoch': 11.6} 89%|████████▊ | 8866/10000 [34:46:40<4:22:19, 13.88s/it] 89%|████████▊ | 8867/10000 [34:46:54<4:22:33, 13.90s/it] {'loss': 0.0024, 'learning_rate': 5.72e-06, 'epoch': 11.61} 89%|████████▊ | 8867/10000 [34:46:54<4:22:33, 13.90s/it] 89%|████████▊ | 8868/10000 [34:47:08<4:22:01, 13.89s/it] {'loss': 0.0025, 'learning_rate': 5.715e-06, 'epoch': 11.61} 89%|████████▊ | 8868/10000 [34:47:08<4:22:01, 13.89s/it] 89%|████████▊ | 8869/10000 [34:47:22<4:21:47, 13.89s/it] {'loss': 0.0025, 'learning_rate': 5.71e-06, 'epoch': 11.61} 89%|████████▊ | 8869/10000 [34:47:22<4:21:47, 13.89s/it] 89%|████████▊ | 8870/10000 [34:47:36<4:21:40, 13.89s/it] {'loss': 0.0022, 'learning_rate': 5.705e-06, 'epoch': 11.61} 89%|████████▊ | 8870/10000 [34:47:36<4:21:40, 13.89s/it] 89%|████████▊ | 8871/10000 [34:47:50<4:21:17, 13.89s/it] {'loss': 0.0059, 'learning_rate': 5.7000000000000005e-06, 'epoch': 11.61} 89%|████████▊ | 8871/10000 [34:47:50<4:21:17, 13.89s/it] 89%|████████▊ | 8872/10000 [34:48:04<4:22:27, 13.96s/it] {'loss': 0.0018, 'learning_rate': 5.6950000000000005e-06, 'epoch': 11.61} 89%|████████▊ | 8872/10000 [34:48:04<4:22:27, 13.96s/it] 89%|████████▊ | 8873/10000 [34:48:18<4:21:02, 13.90s/it] {'loss': 0.0028, 'learning_rate': 5.690000000000001e-06, 'epoch': 11.61} 89%|████████▊ | 8873/10000 [34:48:18<4:21:02, 13.90s/it] 89%|████████▊ | 8874/10000 [34:48:32<4:21:06, 13.91s/it] {'loss': 0.0028, 'learning_rate': 5.685e-06, 'epoch': 11.62} 89%|████████▊ | 8874/10000 [34:48:32<4:21:06, 13.91s/it] 89%|████████▉ | 8875/10000 [34:48:45<4:20:26, 13.89s/it] {'loss': 0.0026, 'learning_rate': 5.680000000000001e-06, 'epoch': 11.62} 89%|████████▉ | 8875/10000 [34:48:45<4:20:26, 13.89s/it] 89%|████████▉ | 8876/10000 [34:48:59<4:19:48, 13.87s/it] {'loss': 0.0024, 'learning_rate': 5.675000000000001e-06, 'epoch': 11.62} 89%|████████▉ | 8876/10000 [34:48:59<4:19:48, 13.87s/it] 89%|████████▉ | 8877/10000 [34:49:13<4:19:50, 13.88s/it] {'loss': 0.0026, 'learning_rate': 5.67e-06, 'epoch': 11.62} 89%|████████▉ | 8877/10000 [34:49:13<4:19:50, 13.88s/it] 89%|████████▉ | 8878/10000 [34:49:27<4:19:36, 13.88s/it] {'loss': 0.0022, 'learning_rate': 5.665e-06, 'epoch': 11.62} 89%|████████▉ | 8878/10000 [34:49:27<4:19:36, 13.88s/it] 89%|████████▉ | 8879/10000 [34:49:41<4:19:14, 13.88s/it] {'loss': 0.0028, 'learning_rate': 5.66e-06, 'epoch': 11.62} 89%|████████▉ | 8879/10000 [34:49:41<4:19:14, 13.88s/it] 89%|████████▉ | 8880/10000 [34:49:55<4:18:21, 13.84s/it] {'loss': 0.0047, 'learning_rate': 5.655000000000001e-06, 'epoch': 11.62} 89%|████████▉ | 8880/10000 [34:49:55<4:18:21, 13.84s/it] 89%|████████▉ | 8881/10000 [34:50:09<4:18:46, 13.88s/it] {'loss': 0.0042, 'learning_rate': 5.65e-06, 'epoch': 11.62} 89%|████████▉ | 8881/10000 [34:50:09<4:18:46, 13.88s/it] 89%|████████▉ | 8882/10000 [34:50:23<4:18:44, 13.89s/it] {'loss': 0.0041, 'learning_rate': 5.645e-06, 'epoch': 11.63} 89%|████████▉ | 8882/10000 [34:50:23<4:18:44, 13.89s/it] 89%|████████▉ | 8883/10000 [34:50:36<4:18:45, 13.90s/it] {'loss': 0.0023, 'learning_rate': 5.64e-06, 'epoch': 11.63} 89%|████████▉ | 8883/10000 [34:50:36<4:18:45, 13.90s/it] 89%|████████▉ | 8884/10000 [34:50:50<4:18:14, 13.88s/it] {'loss': 0.0025, 'learning_rate': 5.635e-06, 'epoch': 11.63} 89%|████████▉ | 8884/10000 [34:50:50<4:18:14, 13.88s/it] 89%|████████▉ | 8885/10000 [34:51:04<4:17:55, 13.88s/it] {'loss': 0.0031, 'learning_rate': 5.63e-06, 'epoch': 11.63} 89%|████████▉ | 8885/10000 [34:51:04<4:17:55, 13.88s/it] 89%|████████▉ | 8886/10000 [34:51:18<4:18:07, 13.90s/it] {'loss': 0.003, 'learning_rate': 5.625e-06, 'epoch': 11.63} 89%|████████▉ | 8886/10000 [34:51:18<4:18:07, 13.90s/it] 89%|████████▉ | 8887/10000 [34:51:32<4:17:55, 13.90s/it] {'loss': 0.0024, 'learning_rate': 5.62e-06, 'epoch': 11.63} 89%|████████▉ | 8887/10000 [34:51:32<4:17:55, 13.90s/it] 89%|████████▉ | 8888/10000 [34:51:46<4:18:07, 13.93s/it] {'loss': 0.0025, 'learning_rate': 5.6150000000000005e-06, 'epoch': 11.63} 89%|████████▉ | 8888/10000 [34:51:46<4:18:07, 13.93s/it] 89%|████████▉ | 8889/10000 [34:52:00<4:17:37, 13.91s/it] {'loss': 0.0024, 'learning_rate': 5.61e-06, 'epoch': 11.63} 89%|████████▉ | 8889/10000 [34:52:00<4:17:37, 13.91s/it] 89%|████████▉ | 8890/10000 [34:52:14<4:17:28, 13.92s/it] {'loss': 0.0026, 'learning_rate': 5.6050000000000005e-06, 'epoch': 11.64} 89%|████████▉ | 8890/10000 [34:52:14<4:17:28, 13.92s/it] 89%|████████▉ | 8891/10000 [34:52:28<4:16:56, 13.90s/it] {'loss': 0.0032, 'learning_rate': 5.600000000000001e-06, 'epoch': 11.64} 89%|████████▉ | 8891/10000 [34:52:28<4:16:56, 13.90s/it] 89%|████████▉ | 8892/10000 [34:52:42<4:17:09, 13.93s/it] {'loss': 0.0027, 'learning_rate': 5.595000000000001e-06, 'epoch': 11.64} 89%|████████▉ | 8892/10000 [34:52:42<4:17:09, 13.93s/it] 89%|████████▉ | 8893/10000 [34:52:56<4:16:34, 13.91s/it] {'loss': 0.0019, 'learning_rate': 5.59e-06, 'epoch': 11.64} 89%|████████▉ | 8893/10000 [34:52:56<4:16:34, 13.91s/it] 89%|████████▉ | 8894/10000 [34:53:09<4:15:58, 13.89s/it] {'loss': 0.0023, 'learning_rate': 5.585e-06, 'epoch': 11.64} 89%|████████▉ | 8894/10000 [34:53:09<4:15:58, 13.89s/it] 89%|████████▉ | 8895/10000 [34:53:23<4:15:47, 13.89s/it] {'loss': 0.0027, 'learning_rate': 5.580000000000001e-06, 'epoch': 11.64} 89%|████████▉ | 8895/10000 [34:53:23<4:15:47, 13.89s/it] 89%|████████▉ | 8896/10000 [34:53:37<4:15:08, 13.87s/it] {'loss': 0.0022, 'learning_rate': 5.575e-06, 'epoch': 11.64} 89%|████████▉ | 8896/10000 [34:53:37<4:15:08, 13.87s/it] 89%|████████▉ | 8897/10000 [34:53:51<4:15:14, 13.88s/it] {'loss': 0.002, 'learning_rate': 5.57e-06, 'epoch': 11.65} 89%|████████▉ | 8897/10000 [34:53:51<4:15:14, 13.88s/it] 89%|████████▉ | 8898/10000 [34:54:05<4:14:51, 13.88s/it] {'loss': 0.0051, 'learning_rate': 5.565e-06, 'epoch': 11.65} 89%|████████▉ | 8898/10000 [34:54:05<4:14:51, 13.88s/it] 89%|████████▉ | 8899/10000 [34:54:19<4:14:46, 13.88s/it] {'loss': 0.0036, 'learning_rate': 5.56e-06, 'epoch': 11.65} 89%|████████▉ | 8899/10000 [34:54:19<4:14:46, 13.88s/it] 89%|████████▉ | 8900/10000 [34:54:33<4:15:03, 13.91s/it] {'loss': 0.0031, 'learning_rate': 5.555e-06, 'epoch': 11.65} 89%|████████▉ | 8900/10000 [34:54:33<4:15:03, 13.91s/it] 89%|████████▉ | 8901/10000 [34:54:47<4:14:10, 13.88s/it] {'loss': 0.0023, 'learning_rate': 5.55e-06, 'epoch': 11.65} 89%|████████▉ | 8901/10000 [34:54:47<4:14:10, 13.88s/it] 89%|████████▉ | 8902/10000 [34:55:01<4:15:10, 13.94s/it] {'loss': 0.002, 'learning_rate': 5.545e-06, 'epoch': 11.65} 89%|████████▉ | 8902/10000 [34:55:01<4:15:10, 13.94s/it] 89%|████████▉ | 8903/10000 [34:55:14<4:14:29, 13.92s/it] {'loss': 0.0019, 'learning_rate': 5.54e-06, 'epoch': 11.65} 89%|████████▉ | 8903/10000 [34:55:15<4:14:29, 13.92s/it] 89%|████████▉ | 8904/10000 [34:55:28<4:14:40, 13.94s/it] {'loss': 0.0014, 'learning_rate': 5.535e-06, 'epoch': 11.65} 89%|████████▉ | 8904/10000 [34:55:29<4:14:40, 13.94s/it] 89%|████████▉ | 8905/10000 [34:55:42<4:14:39, 13.95s/it] {'loss': 0.0067, 'learning_rate': 5.53e-06, 'epoch': 11.66} 89%|████████▉ | 8905/10000 [34:55:43<4:14:39, 13.95s/it] 89%|████████▉ | 8906/10000 [34:55:56<4:14:12, 13.94s/it] {'loss': 0.0024, 'learning_rate': 5.5250000000000005e-06, 'epoch': 11.66} 89%|████████▉ | 8906/10000 [34:55:56<4:14:12, 13.94s/it] 89%|████████▉ | 8907/10000 [34:56:10<4:14:09, 13.95s/it] {'loss': 0.0025, 'learning_rate': 5.5200000000000005e-06, 'epoch': 11.66} 89%|████████▉ | 8907/10000 [34:56:10<4:14:09, 13.95s/it] 89%|████████▉ | 8908/10000 [34:56:24<4:13:17, 13.92s/it] {'loss': 0.0023, 'learning_rate': 5.515e-06, 'epoch': 11.66} 89%|████████▉ | 8908/10000 [34:56:24<4:13:17, 13.92s/it] 89%|████████▉ | 8909/10000 [34:56:38<4:13:15, 13.93s/it] {'loss': 0.0025, 'learning_rate': 5.510000000000001e-06, 'epoch': 11.66} 89%|████████▉ | 8909/10000 [34:56:38<4:13:15, 13.93s/it] 89%|████████▉ | 8910/10000 [34:56:52<4:12:50, 13.92s/it] {'loss': 0.0021, 'learning_rate': 5.505000000000001e-06, 'epoch': 11.66} 89%|████████▉ | 8910/10000 [34:56:52<4:12:50, 13.92s/it] 89%|████████▉ | 8911/10000 [34:57:06<4:13:12, 13.95s/it] {'loss': 0.0032, 'learning_rate': 5.500000000000001e-06, 'epoch': 11.66} 89%|████████▉ | 8911/10000 [34:57:06<4:13:12, 13.95s/it] 89%|████████▉ | 8912/10000 [34:57:20<4:11:59, 13.90s/it] {'loss': 0.0027, 'learning_rate': 5.495e-06, 'epoch': 11.66} 89%|████████▉ | 8912/10000 [34:57:20<4:11:59, 13.90s/it] 89%|████████▉ | 8913/10000 [34:57:34<4:11:34, 13.89s/it] {'loss': 0.0031, 'learning_rate': 5.49e-06, 'epoch': 11.67} 89%|████████▉ | 8913/10000 [34:57:34<4:11:34, 13.89s/it] 89%|████████▉ | 8914/10000 [34:57:47<4:10:50, 13.86s/it] {'loss': 0.0019, 'learning_rate': 5.485000000000001e-06, 'epoch': 11.67} 89%|████████▉ | 8914/10000 [34:57:48<4:10:50, 13.86s/it] 89%|████████▉ | 8915/10000 [34:58:01<4:10:55, 13.88s/it] {'loss': 0.0013, 'learning_rate': 5.48e-06, 'epoch': 11.67} 89%|████████▉ | 8915/10000 [34:58:01<4:10:55, 13.88s/it] 89%|████████▉ | 8916/10000 [34:58:15<4:10:52, 13.89s/it] {'loss': 0.0019, 'learning_rate': 5.475e-06, 'epoch': 11.67} 89%|████████▉ | 8916/10000 [34:58:15<4:10:52, 13.89s/it] 89%|████████▉ | 8917/10000 [34:58:29<4:09:44, 13.84s/it] {'loss': 0.0029, 'learning_rate': 5.47e-06, 'epoch': 11.67} 89%|████████▉ | 8917/10000 [34:58:29<4:09:44, 13.84s/it] 89%|████████▉ | 8918/10000 [34:58:43<4:10:20, 13.88s/it] {'loss': 0.0027, 'learning_rate': 5.465e-06, 'epoch': 11.67} 89%|████████▉ | 8918/10000 [34:58:43<4:10:20, 13.88s/it] 89%|████████▉ | 8919/10000 [34:58:57<4:10:13, 13.89s/it] {'loss': 0.0035, 'learning_rate': 5.46e-06, 'epoch': 11.67} 89%|████████▉ | 8919/10000 [34:58:57<4:10:13, 13.89s/it] 89%|████████▉ | 8920/10000 [34:59:11<4:09:48, 13.88s/it] {'loss': 0.0026, 'learning_rate': 5.455e-06, 'epoch': 11.68} 89%|████████▉ | 8920/10000 [34:59:11<4:09:48, 13.88s/it] 89%|████████▉ | 8921/10000 [34:59:25<4:09:02, 13.85s/it] {'loss': 0.0018, 'learning_rate': 5.45e-06, 'epoch': 11.68} 89%|████████▉ | 8921/10000 [34:59:25<4:09:02, 13.85s/it] 89%|████████▉ | 8922/10000 [34:59:38<4:08:52, 13.85s/it] {'loss': 0.0027, 'learning_rate': 5.445e-06, 'epoch': 11.68} 89%|████████▉ | 8922/10000 [34:59:38<4:08:52, 13.85s/it] 89%|████████▉ | 8923/10000 [34:59:52<4:09:22, 13.89s/it] {'loss': 0.0017, 'learning_rate': 5.44e-06, 'epoch': 11.68} 89%|████████▉ | 8923/10000 [34:59:52<4:09:22, 13.89s/it] 89%|████████▉ | 8924/10000 [35:00:06<4:08:47, 13.87s/it] {'loss': 0.0024, 'learning_rate': 5.4350000000000005e-06, 'epoch': 11.68} 89%|████████▉ | 8924/10000 [35:00:06<4:08:47, 13.87s/it] 89%|████████▉ | 8925/10000 [35:00:20<4:08:40, 13.88s/it] {'loss': 0.0051, 'learning_rate': 5.4300000000000005e-06, 'epoch': 11.68} 89%|████████▉ | 8925/10000 [35:00:20<4:08:40, 13.88s/it] 89%|████████▉ | 8926/10000 [35:00:34<4:08:03, 13.86s/it] {'loss': 0.0021, 'learning_rate': 5.4250000000000006e-06, 'epoch': 11.68} 89%|████████▉ | 8926/10000 [35:00:34<4:08:03, 13.86s/it] 89%|████████▉ | 8927/10000 [35:00:48<4:08:45, 13.91s/it] {'loss': 0.0078, 'learning_rate': 5.42e-06, 'epoch': 11.68} 89%|████████▉ | 8927/10000 [35:00:48<4:08:45, 13.91s/it] 89%|████████▉ | 8928/10000 [35:01:02<4:08:54, 13.93s/it] {'loss': 0.0024, 'learning_rate': 5.415e-06, 'epoch': 11.69} 89%|████████▉ | 8928/10000 [35:01:02<4:08:54, 13.93s/it] 89%|████████▉ | 8929/10000 [35:01:16<4:08:33, 13.92s/it] {'loss': 0.002, 'learning_rate': 5.410000000000001e-06, 'epoch': 11.69} 89%|████████▉ | 8929/10000 [35:01:16<4:08:33, 13.92s/it] 89%|████████▉ | 8930/10000 [35:01:30<4:07:54, 13.90s/it] {'loss': 0.0027, 'learning_rate': 5.405e-06, 'epoch': 11.69} 89%|████████▉ | 8930/10000 [35:01:30<4:07:54, 13.90s/it] 89%|████████▉ | 8931/10000 [35:01:44<4:07:25, 13.89s/it] {'loss': 0.0019, 'learning_rate': 5.4e-06, 'epoch': 11.69} 89%|████████▉ | 8931/10000 [35:01:44<4:07:25, 13.89s/it] 89%|████████▉ | 8932/10000 [35:01:57<4:07:26, 13.90s/it] {'loss': 0.0026, 'learning_rate': 5.395e-06, 'epoch': 11.69} 89%|████████▉ | 8932/10000 [35:01:58<4:07:26, 13.90s/it] 89%|████████▉ | 8933/10000 [35:02:11<4:07:09, 13.90s/it] {'loss': 0.0029, 'learning_rate': 5.390000000000001e-06, 'epoch': 11.69} 89%|████████▉ | 8933/10000 [35:02:11<4:07:09, 13.90s/it] 89%|████████▉ | 8934/10000 [35:02:25<4:06:32, 13.88s/it] {'loss': 0.0038, 'learning_rate': 5.385e-06, 'epoch': 11.69} 89%|████████▉ | 8934/10000 [35:02:25<4:06:32, 13.88s/it] 89%|████████▉ | 8935/10000 [35:02:39<4:06:32, 13.89s/it] {'loss': 0.0025, 'learning_rate': 5.38e-06, 'epoch': 11.7} 89%|████████▉ | 8935/10000 [35:02:39<4:06:32, 13.89s/it] 89%|████████▉ | 8936/10000 [35:02:53<4:06:28, 13.90s/it] {'loss': 0.0027, 'learning_rate': 5.375e-06, 'epoch': 11.7} 89%|████████▉ | 8936/10000 [35:02:53<4:06:28, 13.90s/it] 89%|████████▉ | 8937/10000 [35:03:07<4:05:52, 13.88s/it] {'loss': 0.0033, 'learning_rate': 5.37e-06, 'epoch': 11.7} 89%|████████▉ | 8937/10000 [35:03:07<4:05:52, 13.88s/it] 89%|████████▉ | 8938/10000 [35:03:21<4:05:43, 13.88s/it] {'loss': 0.0026, 'learning_rate': 5.365e-06, 'epoch': 11.7} 89%|████████▉ | 8938/10000 [35:03:21<4:05:43, 13.88s/it] 89%|████████▉ | 8939/10000 [35:03:35<4:05:35, 13.89s/it] {'loss': 0.0149, 'learning_rate': 5.36e-06, 'epoch': 11.7} 89%|████████▉ | 8939/10000 [35:03:35<4:05:35, 13.89s/it] 89%|████████▉ | 8940/10000 [35:03:49<4:05:07, 13.88s/it] {'loss': 0.002, 'learning_rate': 5.355e-06, 'epoch': 11.7} 89%|████████▉ | 8940/10000 [35:03:49<4:05:07, 13.88s/it] 89%|████████▉ | 8941/10000 [35:04:02<4:04:47, 13.87s/it] {'loss': 0.0017, 'learning_rate': 5.3500000000000004e-06, 'epoch': 11.7} 89%|████████▉ | 8941/10000 [35:04:02<4:04:47, 13.87s/it] 89%|████████▉ | 8942/10000 [35:04:16<4:04:33, 13.87s/it] {'loss': 0.0028, 'learning_rate': 5.345e-06, 'epoch': 11.7} 89%|████████▉ | 8942/10000 [35:04:16<4:04:33, 13.87s/it] 89%|████████▉ | 8943/10000 [35:04:30<4:04:41, 13.89s/it] {'loss': 0.0016, 'learning_rate': 5.3400000000000005e-06, 'epoch': 11.71} 89%|████████▉ | 8943/10000 [35:04:30<4:04:41, 13.89s/it] 89%|████████▉ | 8944/10000 [35:04:44<4:04:08, 13.87s/it] {'loss': 0.0022, 'learning_rate': 5.335000000000001e-06, 'epoch': 11.71} 89%|████████▉ | 8944/10000 [35:04:44<4:04:08, 13.87s/it] 89%|████████▉ | 8945/10000 [35:04:58<4:04:04, 13.88s/it] {'loss': 0.0026, 'learning_rate': 5.330000000000001e-06, 'epoch': 11.71} 89%|████████▉ | 8945/10000 [35:04:58<4:04:04, 13.88s/it] 89%|████████▉ | 8946/10000 [35:05:12<4:03:45, 13.88s/it] {'loss': 0.0025, 'learning_rate': 5.325e-06, 'epoch': 11.71} 89%|████████▉ | 8946/10000 [35:05:12<4:03:45, 13.88s/it] 89%|████████▉ | 8947/10000 [35:05:26<4:04:02, 13.91s/it] {'loss': 0.0024, 'learning_rate': 5.32e-06, 'epoch': 11.71} 89%|████████▉ | 8947/10000 [35:05:26<4:04:02, 13.91s/it] 89%|████████▉ | 8948/10000 [35:05:40<4:03:24, 13.88s/it] {'loss': 0.0028, 'learning_rate': 5.315000000000001e-06, 'epoch': 11.71} 89%|████████▉ | 8948/10000 [35:05:40<4:03:24, 13.88s/it] 89%|████████▉ | 8949/10000 [35:05:53<4:02:58, 13.87s/it] {'loss': 0.0036, 'learning_rate': 5.31e-06, 'epoch': 11.71} 89%|████████▉ | 8949/10000 [35:05:53<4:02:58, 13.87s/it] 90%|████████▉ | 8950/10000 [35:06:07<4:02:33, 13.86s/it] {'loss': 0.0016, 'learning_rate': 5.305e-06, 'epoch': 11.71} 90%|████████▉ | 8950/10000 [35:06:07<4:02:33, 13.86s/it] 90%|████████▉ | 8951/10000 [35:06:21<4:02:34, 13.87s/it] {'loss': 0.004, 'learning_rate': 5.3e-06, 'epoch': 11.72} 90%|████████▉ | 8951/10000 [35:06:21<4:02:34, 13.87s/it] 90%|████████▉ | 8952/10000 [35:06:35<4:02:10, 13.86s/it] {'loss': 0.003, 'learning_rate': 5.295e-06, 'epoch': 11.72} 90%|████████▉ | 8952/10000 [35:06:35<4:02:10, 13.86s/it] 90%|████████▉ | 8953/10000 [35:06:49<4:02:04, 13.87s/it] {'loss': 0.0021, 'learning_rate': 5.29e-06, 'epoch': 11.72} 90%|████████▉ | 8953/10000 [35:06:49<4:02:04, 13.87s/it] 90%|████████▉ | 8954/10000 [35:07:03<4:01:54, 13.88s/it] {'loss': 0.0025, 'learning_rate': 5.285e-06, 'epoch': 11.72} 90%|████████▉ | 8954/10000 [35:07:03<4:01:54, 13.88s/it] 90%|████████▉ | 8955/10000 [35:07:17<4:02:11, 13.91s/it] {'loss': 0.0035, 'learning_rate': 5.28e-06, 'epoch': 11.72} 90%|████████▉ | 8955/10000 [35:07:17<4:02:11, 13.91s/it] 90%|████████▉ | 8956/10000 [35:07:31<4:01:47, 13.90s/it] {'loss': 0.003, 'learning_rate': 5.275e-06, 'epoch': 11.72} 90%|████████▉ | 8956/10000 [35:07:31<4:01:47, 13.90s/it] 90%|████████▉ | 8957/10000 [35:07:45<4:01:29, 13.89s/it] {'loss': 0.0019, 'learning_rate': 5.2699999999999995e-06, 'epoch': 11.72} 90%|████████▉ | 8957/10000 [35:07:45<4:01:29, 13.89s/it] 90%|████████▉ | 8958/10000 [35:07:58<4:01:28, 13.90s/it] {'loss': 0.0037, 'learning_rate': 5.265e-06, 'epoch': 11.73} 90%|████████▉ | 8958/10000 [35:07:58<4:01:28, 13.90s/it] 90%|████████▉ | 8959/10000 [35:08:12<4:00:54, 13.89s/it] {'loss': 0.0032, 'learning_rate': 5.2600000000000005e-06, 'epoch': 11.73} 90%|████████▉ | 8959/10000 [35:08:12<4:00:54, 13.89s/it] 90%|████████▉ | 8960/10000 [35:08:26<4:00:03, 13.85s/it] {'loss': 0.002, 'learning_rate': 5.2550000000000005e-06, 'epoch': 11.73} 90%|████████▉ | 8960/10000 [35:08:26<4:00:03, 13.85s/it] 90%|████████▉ | 8961/10000 [35:08:40<3:59:42, 13.84s/it] {'loss': 0.0023, 'learning_rate': 5.25e-06, 'epoch': 11.73} 90%|████████▉ | 8961/10000 [35:08:40<3:59:42, 13.84s/it] 90%|████████▉ | 8962/10000 [35:08:54<3:59:45, 13.86s/it] {'loss': 0.0026, 'learning_rate': 5.245e-06, 'epoch': 11.73} 90%|████████▉ | 8962/10000 [35:08:54<3:59:45, 13.86s/it] 90%|████████▉ | 8963/10000 [35:09:08<3:59:57, 13.88s/it] {'loss': 0.0026, 'learning_rate': 5.240000000000001e-06, 'epoch': 11.73} 90%|████████▉ | 8963/10000 [35:09:08<3:59:57, 13.88s/it] 90%|████████▉ | 8964/10000 [35:09:22<3:59:40, 13.88s/it] {'loss': 0.0052, 'learning_rate': 5.235000000000001e-06, 'epoch': 11.73} 90%|████████▉ | 8964/10000 [35:09:22<3:59:40, 13.88s/it] 90%|████████▉ | 8965/10000 [35:09:36<3:59:57, 13.91s/it] {'loss': 0.0031, 'learning_rate': 5.23e-06, 'epoch': 11.73} 90%|████████▉ | 8965/10000 [35:09:36<3:59:57, 13.91s/it] 90%|████████▉ | 8966/10000 [35:09:49<3:59:15, 13.88s/it] {'loss': 0.0029, 'learning_rate': 5.225e-06, 'epoch': 11.74} 90%|████████▉ | 8966/10000 [35:09:49<3:59:15, 13.88s/it] 90%|████████▉ | 8967/10000 [35:10:03<3:58:58, 13.88s/it] {'loss': 0.0027, 'learning_rate': 5.220000000000001e-06, 'epoch': 11.74} 90%|████████▉ | 8967/10000 [35:10:03<3:58:58, 13.88s/it] 90%|████████▉ | 8968/10000 [35:10:17<3:58:45, 13.88s/it] {'loss': 0.0027, 'learning_rate': 5.215e-06, 'epoch': 11.74} 90%|████████▉ | 8968/10000 [35:10:17<3:58:45, 13.88s/it] 90%|████████▉ | 8969/10000 [35:10:31<3:58:40, 13.89s/it] {'loss': 0.0022, 'learning_rate': 5.21e-06, 'epoch': 11.74} 90%|████████▉ | 8969/10000 [35:10:31<3:58:40, 13.89s/it] 90%|████████▉ | 8970/10000 [35:10:45<3:58:26, 13.89s/it] {'loss': 0.0031, 'learning_rate': 5.205e-06, 'epoch': 11.74} 90%|████████▉ | 8970/10000 [35:10:45<3:58:26, 13.89s/it] 90%|████████▉ | 8971/10000 [35:10:59<3:58:16, 13.89s/it] {'loss': 0.0018, 'learning_rate': 5.2e-06, 'epoch': 11.74} 90%|████████▉ | 8971/10000 [35:10:59<3:58:16, 13.89s/it] 90%|████████▉ | 8972/10000 [35:11:13<3:57:30, 13.86s/it] {'loss': 0.0035, 'learning_rate': 5.195e-06, 'epoch': 11.74} 90%|████████▉ | 8972/10000 [35:11:13<3:57:30, 13.86s/it] 90%|████████▉ | 8973/10000 [35:11:26<3:57:10, 13.86s/it] {'loss': 0.0022, 'learning_rate': 5.19e-06, 'epoch': 11.74} 90%|████████▉ | 8973/10000 [35:11:27<3:57:10, 13.86s/it] 90%|████████▉ | 8974/10000 [35:11:40<3:56:56, 13.86s/it] {'loss': 0.0025, 'learning_rate': 5.185e-06, 'epoch': 11.75} 90%|████████▉ | 8974/10000 [35:11:40<3:56:56, 13.86s/it] 90%|████████▉ | 8975/10000 [35:11:54<3:57:29, 13.90s/it] {'loss': 0.0016, 'learning_rate': 5.18e-06, 'epoch': 11.75} 90%|████████▉ | 8975/10000 [35:11:54<3:57:29, 13.90s/it] 90%|████████▉ | 8976/10000 [35:12:08<3:56:45, 13.87s/it] {'loss': 0.0019, 'learning_rate': 5.175e-06, 'epoch': 11.75} 90%|████████▉ | 8976/10000 [35:12:08<3:56:45, 13.87s/it] 90%|████████▉ | 8977/10000 [35:12:22<3:56:37, 13.88s/it] {'loss': 0.004, 'learning_rate': 5.1700000000000005e-06, 'epoch': 11.75} 90%|████████▉ | 8977/10000 [35:12:22<3:56:37, 13.88s/it] 90%|████████▉ | 8978/10000 [35:12:36<3:56:04, 13.86s/it] {'loss': 0.0031, 'learning_rate': 5.1650000000000005e-06, 'epoch': 11.75} 90%|████████▉ | 8978/10000 [35:12:36<3:56:04, 13.86s/it] 90%|████████▉ | 8979/10000 [35:12:50<3:55:54, 13.86s/it] {'loss': 0.0031, 'learning_rate': 5.1600000000000006e-06, 'epoch': 11.75} 90%|████████▉ | 8979/10000 [35:12:50<3:55:54, 13.86s/it] 90%|████████▉ | 8980/10000 [35:13:04<3:55:22, 13.85s/it] {'loss': 0.0034, 'learning_rate': 5.155e-06, 'epoch': 11.75} 90%|████████▉ | 8980/10000 [35:13:04<3:55:22, 13.85s/it] 90%|████████▉ | 8981/10000 [35:13:17<3:55:14, 13.85s/it] {'loss': 0.0015, 'learning_rate': 5.15e-06, 'epoch': 11.76} 90%|████████▉ | 8981/10000 [35:13:17<3:55:14, 13.85s/it] 90%|████████▉ | 8982/10000 [35:13:31<3:54:56, 13.85s/it] {'loss': 0.0025, 'learning_rate': 5.145000000000001e-06, 'epoch': 11.76} 90%|████████▉ | 8982/10000 [35:13:31<3:54:56, 13.85s/it] 90%|████████▉ | 8983/10000 [35:13:45<3:55:07, 13.87s/it] {'loss': 0.0051, 'learning_rate': 5.140000000000001e-06, 'epoch': 11.76} 90%|████████▉ | 8983/10000 [35:13:45<3:55:07, 13.87s/it] 90%|████████▉ | 8984/10000 [35:13:59<3:54:24, 13.84s/it] {'loss': 0.0047, 'learning_rate': 5.135e-06, 'epoch': 11.76} 90%|████████▉ | 8984/10000 [35:13:59<3:54:24, 13.84s/it] 90%|████████▉ | 8985/10000 [35:14:13<3:55:24, 13.92s/it] {'loss': 0.002, 'learning_rate': 5.13e-06, 'epoch': 11.76} 90%|████████▉ | 8985/10000 [35:14:13<3:55:24, 13.92s/it] 90%|████████▉ | 8986/10000 [35:14:27<3:55:07, 13.91s/it] {'loss': 0.0025, 'learning_rate': 5.125e-06, 'epoch': 11.76} 90%|████████▉ | 8986/10000 [35:14:27<3:55:07, 13.91s/it] 90%|████████▉ | 8987/10000 [35:14:41<3:55:04, 13.92s/it] {'loss': 0.0038, 'learning_rate': 5.12e-06, 'epoch': 11.76} 90%|████████▉ | 8987/10000 [35:14:41<3:55:04, 13.92s/it] 90%|████████▉ | 8988/10000 [35:14:55<3:54:49, 13.92s/it] {'loss': 0.0028, 'learning_rate': 5.115e-06, 'epoch': 11.76} 90%|████████▉ | 8988/10000 [35:14:55<3:54:49, 13.92s/it] 90%|████████▉ | 8989/10000 [35:15:09<3:54:10, 13.90s/it] {'loss': 0.0037, 'learning_rate': 5.11e-06, 'epoch': 11.77} 90%|████████▉ | 8989/10000 [35:15:09<3:54:10, 13.90s/it] 90%|████████▉ | 8990/10000 [35:15:23<3:54:16, 13.92s/it] {'loss': 0.002, 'learning_rate': 5.105e-06, 'epoch': 11.77} 90%|████████▉ | 8990/10000 [35:15:23<3:54:16, 13.92s/it] 90%|████████▉ | 8991/10000 [35:15:36<3:53:43, 13.90s/it] {'loss': 0.0014, 'learning_rate': 5.1e-06, 'epoch': 11.77} 90%|████████▉ | 8991/10000 [35:15:37<3:53:43, 13.90s/it] 90%|████████▉ | 8992/10000 [35:15:50<3:53:55, 13.92s/it] {'loss': 0.0051, 'learning_rate': 5.095e-06, 'epoch': 11.77} 90%|████████▉ | 8992/10000 [35:15:50<3:53:55, 13.92s/it] 90%|████████▉ | 8993/10000 [35:16:04<3:53:36, 13.92s/it] {'loss': 0.0024, 'learning_rate': 5.09e-06, 'epoch': 11.77} 90%|████████▉ | 8993/10000 [35:16:04<3:53:36, 13.92s/it] 90%|████████▉ | 8994/10000 [35:16:18<3:53:06, 13.90s/it] {'loss': 0.0028, 'learning_rate': 5.0850000000000004e-06, 'epoch': 11.77} 90%|████████▉ | 8994/10000 [35:16:18<3:53:06, 13.90s/it] 90%|████████▉ | 8995/10000 [35:16:32<3:52:50, 13.90s/it] {'loss': 0.0024, 'learning_rate': 5.08e-06, 'epoch': 11.77} 90%|████████▉ | 8995/10000 [35:16:32<3:52:50, 13.90s/it] 90%|████████▉ | 8996/10000 [35:16:46<3:52:24, 13.89s/it] {'loss': 0.0026, 'learning_rate': 5.0750000000000005e-06, 'epoch': 11.77} 90%|████████▉ | 8996/10000 [35:16:46<3:52:24, 13.89s/it] 90%|████████▉ | 8997/10000 [35:17:00<3:52:19, 13.90s/it] {'loss': 0.0022, 'learning_rate': 5.070000000000001e-06, 'epoch': 11.78} 90%|████████▉ | 8997/10000 [35:17:00<3:52:19, 13.90s/it] 90%|████████▉ | 8998/10000 [35:17:14<3:52:02, 13.90s/it] {'loss': 0.0016, 'learning_rate': 5.065000000000001e-06, 'epoch': 11.78} 90%|████████▉ | 8998/10000 [35:17:14<3:52:02, 13.90s/it] 90%|████████▉ | 8999/10000 [35:17:27<3:50:40, 13.83s/it] {'loss': 0.004, 'learning_rate': 5.06e-06, 'epoch': 11.78} 90%|████████▉ | 8999/10000 [35:17:28<3:50:40, 13.83s/it] 90%|█████████ | 9000/10000 [35:17:41<3:50:51, 13.85s/it] {'loss': 0.0029, 'learning_rate': 5.055e-06, 'epoch': 11.78} 90%|█████████ | 9000/10000 [35:17:41<3:50:51, 13.85s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-05 07:35:49,686 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-9000/config.json [INFO|configuration_utils.py:364] 2024-11-05 07:35:49,688 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-9000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-05 07:36:37,110 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-9000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-05 07:36:37,112 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-9000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-05 07:36:37,113 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-9000/special_tokens_map.json [2024-11-05 07:36:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is about to be saved! [2024-11-05 07:36:37,161] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt [2024-11-05 07:36:37,161] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt... [2024-11-05 07:37:24,437] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/mp_rank_00_model_states.pt. [2024-11-05 07:37:24,559] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-05 07:37:59,777] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-05 07:38:00,031] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-9000/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-05 07:38:00,031] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step9000 is ready now! 90%|█████████ | 9001/10000 [35:21:23<21:06:34, 76.07s/it] {'loss': 0.0027, 'learning_rate': 5.050000000000001e-06, 'epoch': 11.78} 90%|█████████ | 9001/10000 [35:21:23<21:06:34, 76.07s/it] 90%|█████████ | 9002/10000 [35:21:36<15:54:28, 57.38s/it] {'loss': 0.0032, 'learning_rate': 5.045000000000001e-06, 'epoch': 11.78} 90%|█████████ | 9002/10000 [35:21:36<15:54:28, 57.38s/it] 90%|█████████ | 9003/10000 [35:21:50<12:15:53, 44.29s/it] {'loss': 0.003, 'learning_rate': 5.04e-06, 'epoch': 11.78} 90%|█████████ | 9003/10000 [35:21:50<12:15:53, 44.29s/it] 90%|█████████ | 9004/10000 [35:22:04<9:43:26, 35.15s/it] {'loss': 0.0026, 'learning_rate': 5.035e-06, 'epoch': 11.79} 90%|█████████ | 9004/10000 [35:22:04<9:43:26, 35.15s/it] 90%|█████████ | 9005/10000 [35:22:18<7:56:38, 28.74s/it] {'loss': 0.0036, 'learning_rate': 5.03e-06, 'epoch': 11.79} 90%|█████████ | 9005/10000 [35:22:18<7:56:38, 28.74s/it] 90%|█████████ | 9006/10000 [35:22:32<6:42:12, 24.28s/it] {'loss': 0.0018, 'learning_rate': 5.025e-06, 'epoch': 11.79} 90%|█████████ | 9006/10000 [35:22:32<6:42:12, 24.28s/it] 90%|█████████ | 9007/10000 [35:22:46<5:50:32, 21.18s/it] {'loss': 0.0022, 'learning_rate': 5.02e-06, 'epoch': 11.79} 90%|█████████ | 9007/10000 [35:22:46<5:50:32, 21.18s/it] 90%|█████████ | 9008/10000 [35:22:59<5:14:12, 19.00s/it] {'loss': 0.0033, 'learning_rate': 5.015e-06, 'epoch': 11.79} 90%|█████████ | 9008/10000 [35:23:00<5:14:12, 19.00s/it] 90%|█████████ | 9009/10000 [35:23:13<4:48:54, 17.49s/it] {'loss': 0.0032, 'learning_rate': 5.01e-06, 'epoch': 11.79} 90%|█████████ | 9009/10000 [35:23:13<4:48:54, 17.49s/it] 90%|█████████ | 9010/10000 [35:23:27<4:30:59, 16.42s/it] {'loss': 0.0021, 'learning_rate': 5.005e-06, 'epoch': 11.79} 90%|█████████ | 9010/10000 [35:23:27<4:30:59, 16.42s/it] 90%|█████████ | 9011/10000 [35:23:41<4:18:20, 15.67s/it] {'loss': 0.0031, 'learning_rate': 5e-06, 'epoch': 11.79} 90%|█████████ | 9011/10000 [35:23:41<4:18:20, 15.67s/it] 90%|█████████ | 9012/10000 [35:23:55<4:10:16, 15.20s/it] {'loss': 0.0034, 'learning_rate': 4.9950000000000005e-06, 'epoch': 11.8} 90%|█████████ | 9012/10000 [35:23:55<4:10:16, 15.20s/it] 90%|█████████ | 9013/10000 [35:24:09<4:03:16, 14.79s/it] {'loss': 0.0031, 'learning_rate': 4.9900000000000005e-06, 'epoch': 11.8} 90%|█████████ | 9013/10000 [35:24:09<4:03:16, 14.79s/it] 90%|█████████ | 9014/10000 [35:24:23<3:58:38, 14.52s/it] {'loss': 0.0032, 'learning_rate': 4.985e-06, 'epoch': 11.8} 90%|█████████ | 9014/10000 [35:24:23<3:58:38, 14.52s/it] 90%|█████████ | 9015/10000 [35:24:37<3:55:45, 14.36s/it] {'loss': 0.0026, 'learning_rate': 4.98e-06, 'epoch': 11.8} 90%|█████████ | 9015/10000 [35:24:37<3:55:45, 14.36s/it] 90%|█████████ | 9016/10000 [35:24:51<3:53:23, 14.23s/it] {'loss': 0.002, 'learning_rate': 4.975000000000001e-06, 'epoch': 11.8} 90%|█████████ | 9016/10000 [35:24:51<3:53:23, 14.23s/it] 90%|█████████ | 9017/10000 [35:25:05<3:51:58, 14.16s/it] {'loss': 0.0016, 'learning_rate': 4.970000000000001e-06, 'epoch': 11.8} 90%|█████████ | 9017/10000 [35:25:05<3:51:58, 14.16s/it] 90%|█████████ | 9018/10000 [35:25:19<3:50:37, 14.09s/it] {'loss': 0.0025, 'learning_rate': 4.965e-06, 'epoch': 11.8} 90%|█████████ | 9018/10000 [35:25:19<3:50:37, 14.09s/it] 90%|█████████ | 9019/10000 [35:25:33<3:49:29, 14.04s/it] {'loss': 0.0039, 'learning_rate': 4.96e-06, 'epoch': 11.8} 90%|█████████ | 9019/10000 [35:25:33<3:49:29, 14.04s/it] 90%|█████████ | 9020/10000 [35:25:47<3:48:31, 13.99s/it] {'loss': 0.0021, 'learning_rate': 4.955e-06, 'epoch': 11.81} 90%|█████████ | 9020/10000 [35:25:47<3:48:31, 13.99s/it] 90%|█████████ | 9021/10000 [35:26:01<3:47:28, 13.94s/it] {'loss': 0.0027, 'learning_rate': 4.950000000000001e-06, 'epoch': 11.81} 90%|█████████ | 9021/10000 [35:26:01<3:47:28, 13.94s/it] 90%|█████████ | 9022/10000 [35:26:15<3:47:11, 13.94s/it] {'loss': 0.0033, 'learning_rate': 4.945e-06, 'epoch': 11.81} 90%|█████████ | 9022/10000 [35:26:15<3:47:11, 13.94s/it] 90%|█████████ | 9023/10000 [35:26:28<3:46:48, 13.93s/it] {'loss': 0.0033, 'learning_rate': 4.94e-06, 'epoch': 11.81} 90%|█████████ | 9023/10000 [35:26:28<3:46:48, 13.93s/it] 90%|█████████ | 9024/10000 [35:26:42<3:45:58, 13.89s/it] {'loss': 0.0029, 'learning_rate': 4.935e-06, 'epoch': 11.81} 90%|█████████ | 9024/10000 [35:26:42<3:45:58, 13.89s/it] 90%|█████████ | 9025/10000 [35:26:56<3:46:39, 13.95s/it] {'loss': 0.003, 'learning_rate': 4.93e-06, 'epoch': 11.81} 90%|█████████ | 9025/10000 [35:26:56<3:46:39, 13.95s/it] 90%|█████████ | 9026/10000 [35:27:10<3:46:14, 13.94s/it] {'loss': 0.0028, 'learning_rate': 4.925e-06, 'epoch': 11.81} 90%|█████████ | 9026/10000 [35:27:10<3:46:14, 13.94s/it] 90%|█████████ | 9027/10000 [35:27:24<3:45:45, 13.92s/it] {'loss': 0.0036, 'learning_rate': 4.92e-06, 'epoch': 11.82} 90%|█████████ | 9027/10000 [35:27:24<3:45:45, 13.92s/it] 90%|█████████ | 9028/10000 [35:27:38<3:46:08, 13.96s/it] {'loss': 0.002, 'learning_rate': 4.915e-06, 'epoch': 11.82} 90%|█████████ | 9028/10000 [35:27:38<3:46:08, 13.96s/it] 90%|█████████ | 9029/10000 [35:27:52<3:46:18, 13.98s/it] {'loss': 0.0028, 'learning_rate': 4.9100000000000004e-06, 'epoch': 11.82} 90%|█████████ | 9029/10000 [35:27:52<3:46:18, 13.98s/it] 90%|█████████ | 9030/10000 [35:28:06<3:45:37, 13.96s/it] {'loss': 0.0023, 'learning_rate': 4.9050000000000005e-06, 'epoch': 11.82} 90%|█████████ | 9030/10000 [35:28:06<3:45:37, 13.96s/it] 90%|█████████ | 9031/10000 [35:28:20<3:44:55, 13.93s/it] {'loss': 0.0031, 'learning_rate': 4.9000000000000005e-06, 'epoch': 11.82} 90%|█████████ | 9031/10000 [35:28:20<3:44:55, 13.93s/it] 90%|█████████ | 9032/10000 [35:28:34<3:44:28, 13.91s/it] {'loss': 0.0019, 'learning_rate': 4.8950000000000006e-06, 'epoch': 11.82} 90%|█████████ | 9032/10000 [35:28:34<3:44:28, 13.91s/it] 90%|█████████ | 9033/10000 [35:28:48<3:44:14, 13.91s/it] {'loss': 0.0029, 'learning_rate': 4.89e-06, 'epoch': 11.82} 90%|█████████ | 9033/10000 [35:28:48<3:44:14, 13.91s/it] 90%|█████████ | 9034/10000 [35:29:02<3:43:58, 13.91s/it] {'loss': 0.0017, 'learning_rate': 4.885e-06, 'epoch': 11.82} 90%|█████████ | 9034/10000 [35:29:02<3:43:58, 13.91s/it] 90%|█████████ | 9035/10000 [35:29:16<3:43:36, 13.90s/it] {'loss': 0.0028, 'learning_rate': 4.880000000000001e-06, 'epoch': 11.83} 90%|█████████ | 9035/10000 [35:29:16<3:43:36, 13.90s/it] 90%|█████████ | 9036/10000 [35:29:29<3:43:09, 13.89s/it] {'loss': 0.0035, 'learning_rate': 4.875000000000001e-06, 'epoch': 11.83} 90%|█████████ | 9036/10000 [35:29:29<3:43:09, 13.89s/it] 90%|█████████ | 9037/10000 [35:29:43<3:43:04, 13.90s/it] {'loss': 0.0025, 'learning_rate': 4.87e-06, 'epoch': 11.83} 90%|█████████ | 9037/10000 [35:29:43<3:43:04, 13.90s/it] 90%|█████████ | 9038/10000 [35:29:57<3:42:53, 13.90s/it] {'loss': 0.0031, 'learning_rate': 4.865e-06, 'epoch': 11.83} 90%|█████████ | 9038/10000 [35:29:57<3:42:53, 13.90s/it] 90%|█████████ | 9039/10000 [35:30:11<3:42:10, 13.87s/it] {'loss': 0.0024, 'learning_rate': 4.86e-06, 'epoch': 11.83} 90%|█████████ | 9039/10000 [35:30:11<3:42:10, 13.87s/it] 90%|█████████ | 9040/10000 [35:30:25<3:42:38, 13.91s/it] {'loss': 0.0024, 'learning_rate': 4.855e-06, 'epoch': 11.83} 90%|█████████ | 9040/10000 [35:30:25<3:42:38, 13.91s/it] 90%|█████████ | 9041/10000 [35:30:39<3:41:54, 13.88s/it] {'loss': 0.0031, 'learning_rate': 4.85e-06, 'epoch': 11.83} 90%|█████████ | 9041/10000 [35:30:39<3:41:54, 13.88s/it] 90%|█████████ | 9042/10000 [35:30:53<3:41:44, 13.89s/it] {'loss': 0.0031, 'learning_rate': 4.845e-06, 'epoch': 11.84} 90%|█████████ | 9042/10000 [35:30:53<3:41:44, 13.89s/it] 90%|█████████ | 9043/10000 [35:31:07<3:41:22, 13.88s/it] {'loss': 0.0021, 'learning_rate': 4.84e-06, 'epoch': 11.84} 90%|█████████ | 9043/10000 [35:31:07<3:41:22, 13.88s/it] 90%|█████████ | 9044/10000 [35:31:20<3:40:50, 13.86s/it] {'loss': 0.003, 'learning_rate': 4.835e-06, 'epoch': 11.84} 90%|█████████ | 9044/10000 [35:31:20<3:40:50, 13.86s/it] 90%|█████████ | 9045/10000 [35:31:34<3:41:09, 13.89s/it] {'loss': 0.0029, 'learning_rate': 4.83e-06, 'epoch': 11.84} 90%|█████████ | 9045/10000 [35:31:34<3:41:09, 13.89s/it] 90%|█████████ | 9046/10000 [35:31:48<3:41:08, 13.91s/it] {'loss': 0.0033, 'learning_rate': 4.825e-06, 'epoch': 11.84} 90%|█████████ | 9046/10000 [35:31:48<3:41:08, 13.91s/it] 90%|█████████ | 9047/10000 [35:32:02<3:40:50, 13.90s/it] {'loss': 0.0024, 'learning_rate': 4.8200000000000004e-06, 'epoch': 11.84} 90%|█████████ | 9047/10000 [35:32:02<3:40:50, 13.90s/it] 90%|█████████ | 9048/10000 [35:32:16<3:40:31, 13.90s/it] {'loss': 0.0034, 'learning_rate': 4.8150000000000005e-06, 'epoch': 11.84} 90%|█████████ | 9048/10000 [35:32:16<3:40:31, 13.90s/it] 90%|█████████ | 9049/10000 [35:32:30<3:40:23, 13.91s/it] {'loss': 0.0029, 'learning_rate': 4.81e-06, 'epoch': 11.84} 90%|█████████ | 9049/10000 [35:32:30<3:40:23, 13.91s/it] 90%|█████████ | 9050/10000 [35:32:44<3:40:11, 13.91s/it] {'loss': 0.0018, 'learning_rate': 4.805000000000001e-06, 'epoch': 11.85} 90%|█████████ | 9050/10000 [35:32:44<3:40:11, 13.91s/it] 91%|█████████ | 9051/10000 [35:32:58<3:40:25, 13.94s/it] {'loss': 0.0029, 'learning_rate': 4.800000000000001e-06, 'epoch': 11.85} 91%|█████████ | 9051/10000 [35:32:58<3:40:25, 13.94s/it] 91%|█████████ | 9052/10000 [35:33:12<3:39:42, 13.91s/it] {'loss': 0.0022, 'learning_rate': 4.795e-06, 'epoch': 11.85} 91%|█████████ | 9052/10000 [35:33:12<3:39:42, 13.91s/it] 91%|█████████ | 9053/10000 [35:33:26<3:39:39, 13.92s/it] {'loss': 0.003, 'learning_rate': 4.79e-06, 'epoch': 11.85} 91%|█████████ | 9053/10000 [35:33:26<3:39:39, 13.92s/it] 91%|█████████ | 9054/10000 [35:33:40<3:39:42, 13.94s/it] {'loss': 0.004, 'learning_rate': 4.785e-06, 'epoch': 11.85} 91%|█████████ | 9054/10000 [35:33:40<3:39:42, 13.94s/it] 91%|█████████ | 9055/10000 [35:33:54<3:39:19, 13.93s/it] {'loss': 0.0022, 'learning_rate': 4.780000000000001e-06, 'epoch': 11.85} 91%|█████████ | 9055/10000 [35:33:54<3:39:19, 13.93s/it] 91%|█████████ | 9056/10000 [35:34:08<3:39:01, 13.92s/it] {'loss': 0.0023, 'learning_rate': 4.775e-06, 'epoch': 11.85} 91%|█████████ | 9056/10000 [35:34:08<3:39:01, 13.92s/it] 91%|█████████ | 9057/10000 [35:34:21<3:38:51, 13.93s/it] {'loss': 0.0019, 'learning_rate': 4.77e-06, 'epoch': 11.85} 91%|█████████ | 9057/10000 [35:34:21<3:38:51, 13.93s/it] 91%|█████████ | 9058/10000 [35:34:35<3:38:48, 13.94s/it] {'loss': 0.0026, 'learning_rate': 4.765e-06, 'epoch': 11.86} 91%|█████████ | 9058/10000 [35:34:35<3:38:48, 13.94s/it] 91%|█████████ | 9059/10000 [35:34:49<3:38:29, 13.93s/it] {'loss': 0.0034, 'learning_rate': 4.76e-06, 'epoch': 11.86} 91%|█████████ | 9059/10000 [35:34:49<3:38:29, 13.93s/it] 91%|█████████ | 9060/10000 [35:35:03<3:38:31, 13.95s/it] {'loss': 0.0028, 'learning_rate': 4.755e-06, 'epoch': 11.86} 91%|█████████ | 9060/10000 [35:35:03<3:38:31, 13.95s/it] 91%|█████████ | 9061/10000 [35:35:17<3:38:00, 13.93s/it] {'loss': 0.0031, 'learning_rate': 4.75e-06, 'epoch': 11.86} 91%|█████████ | 9061/10000 [35:35:17<3:38:00, 13.93s/it] 91%|█████████ | 9062/10000 [35:35:31<3:38:05, 13.95s/it] {'loss': 0.0031, 'learning_rate': 4.745e-06, 'epoch': 11.86} 91%|█████████ | 9062/10000 [35:35:31<3:38:05, 13.95s/it] 91%|█████████ | 9063/10000 [35:35:45<3:37:46, 13.95s/it] {'loss': 0.0027, 'learning_rate': 4.74e-06, 'epoch': 11.86} 91%|█████████ | 9063/10000 [35:35:45<3:37:46, 13.95s/it] 91%|█████████ | 9064/10000 [35:35:59<3:37:12, 13.92s/it] {'loss': 0.0028, 'learning_rate': 4.735e-06, 'epoch': 11.86} 91%|█████████ | 9064/10000 [35:35:59<3:37:12, 13.92s/it] 91%|█████████ | 9065/10000 [35:36:13<3:36:31, 13.89s/it] {'loss': 0.0021, 'learning_rate': 4.7300000000000005e-06, 'epoch': 11.87} 91%|█████████ | 9065/10000 [35:36:13<3:36:31, 13.89s/it] 91%|█████████ | 9066/10000 [35:36:27<3:36:19, 13.90s/it] {'loss': 0.0034, 'learning_rate': 4.7250000000000005e-06, 'epoch': 11.87} 91%|█████████ | 9066/10000 [35:36:27<3:36:19, 13.90s/it] 91%|█████████ | 9067/10000 [35:36:41<3:36:25, 13.92s/it] {'loss': 0.0016, 'learning_rate': 4.72e-06, 'epoch': 11.87} 91%|█████████ | 9067/10000 [35:36:41<3:36:25, 13.92s/it] 91%|█████████ | 9068/10000 [35:36:55<3:36:10, 13.92s/it] {'loss': 0.0029, 'learning_rate': 4.715e-06, 'epoch': 11.87} 91%|█████████ | 9068/10000 [35:36:55<3:36:10, 13.92s/it] 91%|█████████ | 9069/10000 [35:37:09<3:35:55, 13.92s/it] {'loss': 0.0021, 'learning_rate': 4.710000000000001e-06, 'epoch': 11.87} 91%|█████████ | 9069/10000 [35:37:09<3:35:55, 13.92s/it] 91%|█████████ | 9070/10000 [35:37:23<3:36:25, 13.96s/it] {'loss': 0.0031, 'learning_rate': 4.705000000000001e-06, 'epoch': 11.87} 91%|█████████ | 9070/10000 [35:37:23<3:36:25, 13.96s/it] 91%|█████████ | 9071/10000 [35:37:37<3:36:00, 13.95s/it] {'loss': 0.0019, 'learning_rate': 4.7e-06, 'epoch': 11.87} 91%|█████████ | 9071/10000 [35:37:37<3:36:00, 13.95s/it] 91%|█████████ | 9072/10000 [35:37:51<3:36:17, 13.98s/it] {'loss': 0.0023, 'learning_rate': 4.695e-06, 'epoch': 11.87} 91%|█████████ | 9072/10000 [35:37:51<3:36:17, 13.98s/it] 91%|█████████ | 9073/10000 [35:38:04<3:35:36, 13.96s/it] {'loss': 0.0031, 'learning_rate': 4.69e-06, 'epoch': 11.88} 91%|█████████ | 9073/10000 [35:38:05<3:35:36, 13.96s/it] 91%|█████████ | 9074/10000 [35:38:18<3:35:00, 13.93s/it] {'loss': 0.0031, 'learning_rate': 4.685000000000001e-06, 'epoch': 11.88} 91%|█████████ | 9074/10000 [35:38:18<3:35:00, 13.93s/it] 91%|█████████ | 9075/10000 [35:38:32<3:34:32, 13.92s/it] {'loss': 0.0025, 'learning_rate': 4.68e-06, 'epoch': 11.88} 91%|█████████ | 9075/10000 [35:38:32<3:34:32, 13.92s/it] 91%|█████████ | 9076/10000 [35:38:46<3:34:17, 13.92s/it] {'loss': 0.0029, 'learning_rate': 4.675e-06, 'epoch': 11.88} 91%|█████████ | 9076/10000 [35:38:46<3:34:17, 13.92s/it] 91%|█████████ | 9077/10000 [35:39:00<3:33:53, 13.90s/it] {'loss': 0.0036, 'learning_rate': 4.67e-06, 'epoch': 11.88} 91%|█████████ | 9077/10000 [35:39:00<3:33:53, 13.90s/it] 91%|█████████ | 9078/10000 [35:39:14<3:33:46, 13.91s/it] {'loss': 0.0013, 'learning_rate': 4.665e-06, 'epoch': 11.88} 91%|█████████ | 9078/10000 [35:39:14<3:33:46, 13.91s/it] 91%|█████████ | 9079/10000 [35:39:28<3:33:05, 13.88s/it] {'loss': 0.0016, 'learning_rate': 4.66e-06, 'epoch': 11.88} 91%|█████████ | 9079/10000 [35:39:28<3:33:05, 13.88s/it] 91%|█████████ | 9080/10000 [35:39:42<3:32:38, 13.87s/it] {'loss': 0.0023, 'learning_rate': 4.655e-06, 'epoch': 11.88} 91%|█████████ | 9080/10000 [35:39:42<3:32:38, 13.87s/it] 91%|█████████ | 9081/10000 [35:39:56<3:33:11, 13.92s/it] {'loss': 0.0015, 'learning_rate': 4.65e-06, 'epoch': 11.89} 91%|█████████ | 9081/10000 [35:39:56<3:33:11, 13.92s/it] 91%|█████████ | 9082/10000 [35:40:09<3:32:41, 13.90s/it] {'loss': 0.0019, 'learning_rate': 4.645e-06, 'epoch': 11.89} 91%|█████████ | 9082/10000 [35:40:10<3:32:41, 13.90s/it] 91%|█████████ | 9083/10000 [35:40:23<3:32:42, 13.92s/it] {'loss': 0.001, 'learning_rate': 4.64e-06, 'epoch': 11.89} 91%|█████████ | 9083/10000 [35:40:24<3:32:42, 13.92s/it] 91%|█████████ | 9084/10000 [35:40:37<3:32:45, 13.94s/it] {'loss': 0.0026, 'learning_rate': 4.6350000000000005e-06, 'epoch': 11.89} 91%|█████████ | 9084/10000 [35:40:37<3:32:45, 13.94s/it] 91%|█████████ | 9085/10000 [35:40:51<3:32:07, 13.91s/it] {'loss': 0.0036, 'learning_rate': 4.6300000000000006e-06, 'epoch': 11.89} 91%|█████████ | 9085/10000 [35:40:51<3:32:07, 13.91s/it] 91%|█████████ | 9086/10000 [35:41:05<3:32:22, 13.94s/it] {'loss': 0.0027, 'learning_rate': 4.625e-06, 'epoch': 11.89} 91%|█████████ | 9086/10000 [35:41:05<3:32:22, 13.94s/it] 91%|█████████ | 9087/10000 [35:41:19<3:32:06, 13.94s/it] {'loss': 0.0041, 'learning_rate': 4.62e-06, 'epoch': 11.89} 91%|█████████ | 9087/10000 [35:41:19<3:32:06, 13.94s/it] 91%|█████████ | 9088/10000 [35:41:33<3:31:21, 13.90s/it] {'loss': 0.0038, 'learning_rate': 4.615e-06, 'epoch': 11.9} 91%|█████████ | 9088/10000 [35:41:33<3:31:21, 13.90s/it] 91%|█████████ | 9089/10000 [35:41:47<3:30:57, 13.89s/it] {'loss': 0.0028, 'learning_rate': 4.610000000000001e-06, 'epoch': 11.9} 91%|█████████ | 9089/10000 [35:41:47<3:30:57, 13.89s/it] 91%|█████████ | 9090/10000 [35:42:01<3:30:18, 13.87s/it] {'loss': 0.0028, 'learning_rate': 4.605e-06, 'epoch': 11.9} 91%|█████████ | 9090/10000 [35:42:01<3:30:18, 13.87s/it] 91%|█████████ | 9091/10000 [35:42:15<3:30:45, 13.91s/it] {'loss': 0.0028, 'learning_rate': 4.6e-06, 'epoch': 11.9} 91%|█████████ | 9091/10000 [35:42:15<3:30:45, 13.91s/it] 91%|█████████ | 9092/10000 [35:42:29<3:30:13, 13.89s/it] {'loss': 0.0035, 'learning_rate': 4.595e-06, 'epoch': 11.9} 91%|█████████ | 9092/10000 [35:42:29<3:30:13, 13.89s/it] 91%|█████████ | 9093/10000 [35:42:43<3:30:17, 13.91s/it] {'loss': 0.0022, 'learning_rate': 4.590000000000001e-06, 'epoch': 11.9} 91%|█████████ | 9093/10000 [35:42:43<3:30:17, 13.91s/it] 91%|█████████ | 9094/10000 [35:42:56<3:30:14, 13.92s/it] {'loss': 0.002, 'learning_rate': 4.585e-06, 'epoch': 11.9} 91%|█████████ | 9094/10000 [35:42:57<3:30:14, 13.92s/it] 91%|█████████ | 9095/10000 [35:43:10<3:30:09, 13.93s/it] {'loss': 0.0025, 'learning_rate': 4.58e-06, 'epoch': 11.9} 91%|█████████ | 9095/10000 [35:43:11<3:30:09, 13.93s/it] 91%|█████████ | 9096/10000 [35:43:24<3:29:44, 13.92s/it] {'loss': 0.0021, 'learning_rate': 4.575e-06, 'epoch': 11.91} 91%|█████████ | 9096/10000 [35:43:24<3:29:44, 13.92s/it] 91%|█████████ | 9097/10000 [35:43:38<3:29:14, 13.90s/it] {'loss': 0.0033, 'learning_rate': 4.57e-06, 'epoch': 11.91} 91%|█████████ | 9097/10000 [35:43:38<3:29:14, 13.90s/it] 91%|█████████ | 9098/10000 [35:43:52<3:29:18, 13.92s/it] {'loss': 0.0045, 'learning_rate': 4.565e-06, 'epoch': 11.91} 91%|█████████ | 9098/10000 [35:43:52<3:29:18, 13.92s/it] 91%|█████████ | 9099/10000 [35:44:06<3:28:55, 13.91s/it] {'loss': 0.0027, 'learning_rate': 4.56e-06, 'epoch': 11.91} 91%|█████████ | 9099/10000 [35:44:06<3:28:55, 13.91s/it] 91%|█████████ | 9100/10000 [35:44:20<3:28:40, 13.91s/it] {'loss': 0.0034, 'learning_rate': 4.5550000000000004e-06, 'epoch': 11.91} 91%|█████████ | 9100/10000 [35:44:20<3:28:40, 13.91s/it] 91%|█████████ | 9101/10000 [35:44:34<3:27:42, 13.86s/it] {'loss': 0.0013, 'learning_rate': 4.5500000000000005e-06, 'epoch': 11.91} 91%|█████████ | 9101/10000 [35:44:34<3:27:42, 13.86s/it] 91%|█████████ | 9102/10000 [35:44:48<3:27:30, 13.86s/it] {'loss': 0.0024, 'learning_rate': 4.545e-06, 'epoch': 11.91} 91%|█████████ | 9102/10000 [35:44:48<3:27:30, 13.86s/it] 91%|█████████ | 9103/10000 [35:45:01<3:26:37, 13.82s/it] {'loss': 0.0015, 'learning_rate': 4.540000000000001e-06, 'epoch': 11.91} 91%|█████████ | 9103/10000 [35:45:01<3:26:37, 13.82s/it] 91%|█████████ | 9104/10000 [35:45:15<3:27:12, 13.88s/it] {'loss': 0.0022, 'learning_rate': 4.535000000000001e-06, 'epoch': 11.92} 91%|█████████ | 9104/10000 [35:45:15<3:27:12, 13.88s/it] 91%|█████████ | 9105/10000 [35:45:29<3:26:45, 13.86s/it] {'loss': 0.0023, 'learning_rate': 4.53e-06, 'epoch': 11.92} 91%|█████████ | 9105/10000 [35:45:29<3:26:45, 13.86s/it] 91%|█████████ | 9106/10000 [35:45:43<3:26:36, 13.87s/it] {'loss': 0.0022, 'learning_rate': 4.525e-06, 'epoch': 11.92} 91%|█████████ | 9106/10000 [35:45:43<3:26:36, 13.87s/it] 91%|█████████ | 9107/10000 [35:45:57<3:27:09, 13.92s/it] {'loss': 0.0032, 'learning_rate': 4.52e-06, 'epoch': 11.92} 91%|█████████ | 9107/10000 [35:45:57<3:27:09, 13.92s/it] 91%|█████████ | 9108/10000 [35:46:11<3:26:40, 13.90s/it] {'loss': 0.0022, 'learning_rate': 4.515000000000001e-06, 'epoch': 11.92} 91%|█████████ | 9108/10000 [35:46:11<3:26:40, 13.90s/it] 91%|█████████ | 9109/10000 [35:46:25<3:27:06, 13.95s/it] {'loss': 0.0023, 'learning_rate': 4.51e-06, 'epoch': 11.92} 91%|█████████ | 9109/10000 [35:46:25<3:27:06, 13.95s/it] 91%|█████████ | 9110/10000 [35:46:39<3:27:12, 13.97s/it] {'loss': 0.0024, 'learning_rate': 4.505e-06, 'epoch': 11.92} 91%|█████████ | 9110/10000 [35:46:39<3:27:12, 13.97s/it] 91%|█████████ | 9111/10000 [35:46:53<3:26:48, 13.96s/it] {'loss': 0.0045, 'learning_rate': 4.5e-06, 'epoch': 11.93} 91%|█████████ | 9111/10000 [35:46:53<3:26:48, 13.96s/it] 91%|█████████ | 9112/10000 [35:47:07<3:26:47, 13.97s/it] {'loss': 0.0024, 'learning_rate': 4.495e-06, 'epoch': 11.93} 91%|█████████ | 9112/10000 [35:47:07<3:26:47, 13.97s/it] 91%|█████████ | 9113/10000 [35:47:21<3:26:02, 13.94s/it] {'loss': 0.0024, 'learning_rate': 4.49e-06, 'epoch': 11.93} 91%|█████████ | 9113/10000 [35:47:21<3:26:02, 13.94s/it] 91%|█████████ | 9114/10000 [35:47:35<3:25:37, 13.93s/it] {'loss': 0.0026, 'learning_rate': 4.485e-06, 'epoch': 11.93} 91%|█████████ | 9114/10000 [35:47:35<3:25:37, 13.93s/it] 91%|█████████ | 9115/10000 [35:47:49<3:24:56, 13.89s/it] {'loss': 0.0026, 'learning_rate': 4.48e-06, 'epoch': 11.93} 91%|█████████ | 9115/10000 [35:47:49<3:24:56, 13.89s/it] 91%|█████████ | 9116/10000 [35:48:02<3:24:25, 13.87s/it] {'loss': 0.0036, 'learning_rate': 4.475e-06, 'epoch': 11.93} 91%|█████████ | 9116/10000 [35:48:02<3:24:25, 13.87s/it] 91%|█████████ | 9117/10000 [35:48:16<3:23:59, 13.86s/it] {'loss': 0.0028, 'learning_rate': 4.4699999999999996e-06, 'epoch': 11.93} 91%|█████████ | 9117/10000 [35:48:16<3:23:59, 13.86s/it] 91%|█████████ | 9118/10000 [35:48:30<3:23:54, 13.87s/it] {'loss': 0.0036, 'learning_rate': 4.4650000000000004e-06, 'epoch': 11.93} 91%|█████████ | 9118/10000 [35:48:30<3:23:54, 13.87s/it] 91%|█████████ | 9119/10000 [35:48:44<3:23:49, 13.88s/it] {'loss': 0.0015, 'learning_rate': 4.4600000000000005e-06, 'epoch': 11.94} 91%|█████████ | 9119/10000 [35:48:44<3:23:49, 13.88s/it] 91%|█████████ | 9120/10000 [35:48:58<3:24:04, 13.91s/it] {'loss': 0.003, 'learning_rate': 4.4550000000000005e-06, 'epoch': 11.94} 91%|█████████ | 9120/10000 [35:48:58<3:24:04, 13.91s/it] 91%|█████████ | 9121/10000 [35:49:12<3:23:58, 13.92s/it] {'loss': 0.0028, 'learning_rate': 4.45e-06, 'epoch': 11.94} 91%|█████████ | 9121/10000 [35:49:12<3:23:58, 13.92s/it] 91%|█████████ | 9122/10000 [35:49:26<3:23:27, 13.90s/it] {'loss': 0.0021, 'learning_rate': 4.445000000000001e-06, 'epoch': 11.94} 91%|█████████ | 9122/10000 [35:49:26<3:23:27, 13.90s/it] 91%|█████████ | 9123/10000 [35:49:40<3:23:12, 13.90s/it] {'loss': 0.0034, 'learning_rate': 4.440000000000001e-06, 'epoch': 11.94} 91%|█████████ | 9123/10000 [35:49:40<3:23:12, 13.90s/it] 91%|█████████ | 9124/10000 [35:49:54<3:22:56, 13.90s/it] {'loss': 0.0041, 'learning_rate': 4.435e-06, 'epoch': 11.94} 91%|█████████ | 9124/10000 [35:49:54<3:22:56, 13.90s/it] 91%|█████████▏| 9125/10000 [35:50:07<3:22:45, 13.90s/it] {'loss': 0.0021, 'learning_rate': 4.43e-06, 'epoch': 11.94} 91%|█████████▏| 9125/10000 [35:50:08<3:22:45, 13.90s/it] 91%|█████████▏| 9126/10000 [35:50:22<3:23:13, 13.95s/it] {'loss': 0.0021, 'learning_rate': 4.425e-06, 'epoch': 11.95} 91%|█████████▏| 9126/10000 [35:50:22<3:23:13, 13.95s/it] 91%|█████████▏| 9127/10000 [35:50:35<3:22:48, 13.94s/it] {'loss': 0.003, 'learning_rate': 4.420000000000001e-06, 'epoch': 11.95} 91%|█████████▏| 9127/10000 [35:50:35<3:22:48, 13.94s/it] 91%|█████████▏| 9128/10000 [35:50:49<3:22:46, 13.95s/it] {'loss': 0.0025, 'learning_rate': 4.415e-06, 'epoch': 11.95} 91%|█████████▏| 9128/10000 [35:50:49<3:22:46, 13.95s/it] 91%|█████████▏| 9129/10000 [35:51:03<3:22:33, 13.95s/it] {'loss': 0.0043, 'learning_rate': 4.41e-06, 'epoch': 11.95} 91%|█████████▏| 9129/10000 [35:51:03<3:22:33, 13.95s/it] 91%|█████████▏| 9130/10000 [35:51:17<3:22:01, 13.93s/it] {'loss': 0.0022, 'learning_rate': 4.405e-06, 'epoch': 11.95} 91%|█████████▏| 9130/10000 [35:51:17<3:22:01, 13.93s/it] 91%|█████████▏| 9131/10000 [35:51:31<3:21:42, 13.93s/it] {'loss': 0.0035, 'learning_rate': 4.4e-06, 'epoch': 11.95} 91%|█████████▏| 9131/10000 [35:51:31<3:21:42, 13.93s/it] 91%|█████████▏| 9132/10000 [35:51:45<3:21:19, 13.92s/it] {'loss': 0.0021, 'learning_rate': 4.395e-06, 'epoch': 11.95} 91%|█████████▏| 9132/10000 [35:51:45<3:21:19, 13.92s/it] 91%|█████████▏| 9133/10000 [35:51:59<3:20:36, 13.88s/it] {'loss': 0.0022, 'learning_rate': 4.39e-06, 'epoch': 11.95} 91%|█████████▏| 9133/10000 [35:51:59<3:20:36, 13.88s/it] 91%|█████████▏| 9134/10000 [35:52:13<3:20:16, 13.88s/it] {'loss': 0.0014, 'learning_rate': 4.385e-06, 'epoch': 11.96} 91%|█████████▏| 9134/10000 [35:52:13<3:20:16, 13.88s/it] 91%|█████████▏| 9135/10000 [35:52:27<3:20:02, 13.88s/it] {'loss': 0.0031, 'learning_rate': 4.38e-06, 'epoch': 11.96} 91%|█████████▏| 9135/10000 [35:52:27<3:20:02, 13.88s/it] 91%|█████████▏| 9136/10000 [35:52:41<3:20:29, 13.92s/it] {'loss': 0.0014, 'learning_rate': 4.375e-06, 'epoch': 11.96} 91%|█████████▏| 9136/10000 [35:52:41<3:20:29, 13.92s/it] 91%|█████████▏| 9137/10000 [35:52:55<3:20:18, 13.93s/it] {'loss': 0.0026, 'learning_rate': 4.3700000000000005e-06, 'epoch': 11.96} 91%|█████████▏| 9137/10000 [35:52:55<3:20:18, 13.93s/it] 91%|█████████▏| 9138/10000 [35:53:08<3:19:52, 13.91s/it] {'loss': 0.0025, 'learning_rate': 4.3650000000000006e-06, 'epoch': 11.96} 91%|█████████▏| 9138/10000 [35:53:09<3:19:52, 13.91s/it] 91%|█████████▏| 9139/10000 [35:53:22<3:19:22, 13.89s/it] {'loss': 0.0042, 'learning_rate': 4.360000000000001e-06, 'epoch': 11.96} 91%|█████████▏| 9139/10000 [35:53:22<3:19:22, 13.89s/it] 91%|█████████▏| 9140/10000 [35:53:36<3:19:14, 13.90s/it] {'loss': 0.0021, 'learning_rate': 4.355e-06, 'epoch': 11.96} 91%|█████████▏| 9140/10000 [35:53:36<3:19:14, 13.90s/it] 91%|█████████▏| 9141/10000 [35:53:50<3:19:58, 13.97s/it] {'loss': 0.0022, 'learning_rate': 4.35e-06, 'epoch': 11.96} 91%|█████████▏| 9141/10000 [35:53:50<3:19:58, 13.97s/it] 91%|█████████▏| 9142/10000 [35:54:04<3:19:42, 13.97s/it] {'loss': 0.004, 'learning_rate': 4.345000000000001e-06, 'epoch': 11.97} 91%|█████████▏| 9142/10000 [35:54:04<3:19:42, 13.97s/it] 91%|█████████▏| 9143/10000 [35:54:18<3:19:08, 13.94s/it] {'loss': 0.0027, 'learning_rate': 4.34e-06, 'epoch': 11.97} 91%|█████████▏| 9143/10000 [35:54:18<3:19:08, 13.94s/it] 91%|█████████▏| 9144/10000 [35:54:32<3:18:15, 13.90s/it] {'loss': 0.0038, 'learning_rate': 4.335e-06, 'epoch': 11.97} 91%|█████████▏| 9144/10000 [35:54:32<3:18:15, 13.90s/it] 91%|█████████▏| 9145/10000 [35:54:46<3:18:29, 13.93s/it] {'loss': 0.0024, 'learning_rate': 4.33e-06, 'epoch': 11.97} 91%|█████████▏| 9145/10000 [35:54:46<3:18:29, 13.93s/it] 91%|█████████▏| 9146/10000 [35:55:00<3:18:44, 13.96s/it] {'loss': 0.0021, 'learning_rate': 4.325e-06, 'epoch': 11.97} 91%|█████████▏| 9146/10000 [35:55:00<3:18:44, 13.96s/it] 91%|█████████▏| 9147/10000 [35:55:14<3:18:14, 13.94s/it] {'loss': 0.0042, 'learning_rate': 4.32e-06, 'epoch': 11.97} 91%|█████████▏| 9147/10000 [35:55:14<3:18:14, 13.94s/it] 91%|█████████▏| 9148/10000 [35:55:28<3:17:46, 13.93s/it] {'loss': 0.0016, 'learning_rate': 4.315e-06, 'epoch': 11.97} 91%|█████████▏| 9148/10000 [35:55:28<3:17:46, 13.93s/it] 91%|█████████▏| 9149/10000 [35:55:42<3:17:52, 13.95s/it] {'loss': 0.0037, 'learning_rate': 4.31e-06, 'epoch': 11.98} 91%|█████████▏| 9149/10000 [35:55:42<3:17:52, 13.95s/it] 92%|█████████▏| 9150/10000 [35:55:56<3:17:06, 13.91s/it] {'loss': 0.0019, 'learning_rate': 4.305e-06, 'epoch': 11.98} 92%|█████████▏| 9150/10000 [35:55:56<3:17:06, 13.91s/it] 92%|█████████▏| 9151/10000 [35:56:10<3:17:09, 13.93s/it] {'loss': 0.0026, 'learning_rate': 4.2999999999999995e-06, 'epoch': 11.98} 92%|█████████▏| 9151/10000 [35:56:10<3:17:09, 13.93s/it] 92%|█████████▏| 9152/10000 [35:56:23<3:16:24, 13.90s/it] {'loss': 0.0042, 'learning_rate': 4.295e-06, 'epoch': 11.98} 92%|█████████▏| 9152/10000 [35:56:23<3:16:24, 13.90s/it] 92%|█████████▏| 9153/10000 [35:56:37<3:16:19, 13.91s/it] {'loss': 0.0029, 'learning_rate': 4.2900000000000004e-06, 'epoch': 11.98} 92%|█████████▏| 9153/10000 [35:56:37<3:16:19, 13.91s/it] 92%|█████████▏| 9154/10000 [35:56:51<3:16:14, 13.92s/it] {'loss': 0.0031, 'learning_rate': 4.2850000000000005e-06, 'epoch': 11.98} 92%|█████████▏| 9154/10000 [35:56:51<3:16:14, 13.92s/it] 92%|█████████▏| 9155/10000 [35:57:05<3:15:35, 13.89s/it] {'loss': 0.0038, 'learning_rate': 4.28e-06, 'epoch': 11.98} 92%|█████████▏| 9155/10000 [35:57:05<3:15:35, 13.89s/it] 92%|█████████▏| 9156/10000 [35:57:19<3:15:36, 13.91s/it] {'loss': 0.002, 'learning_rate': 4.2750000000000006e-06, 'epoch': 11.98} 92%|█████████▏| 9156/10000 [35:57:19<3:15:36, 13.91s/it] 92%|█████████▏| 9157/10000 [35:57:33<3:15:20, 13.90s/it] {'loss': 0.0022, 'learning_rate': 4.270000000000001e-06, 'epoch': 11.99} 92%|█████████▏| 9157/10000 [35:57:33<3:15:20, 13.90s/it] 92%|█████████▏| 9158/10000 [35:57:47<3:14:37, 13.87s/it] {'loss': 0.0043, 'learning_rate': 4.265e-06, 'epoch': 11.99} 92%|█████████▏| 9158/10000 [35:57:47<3:14:37, 13.87s/it] 92%|█████████▏| 9159/10000 [35:58:01<3:14:45, 13.89s/it] {'loss': 0.0025, 'learning_rate': 4.26e-06, 'epoch': 11.99} 92%|█████████▏| 9159/10000 [35:58:01<3:14:45, 13.89s/it] 92%|█████████▏| 9160/10000 [35:58:15<3:15:03, 13.93s/it] {'loss': 0.0025, 'learning_rate': 4.255e-06, 'epoch': 11.99} 92%|█████████▏| 9160/10000 [35:58:15<3:15:03, 13.93s/it] 92%|█████████▏| 9161/10000 [35:58:29<3:14:36, 13.92s/it] {'loss': 0.0024, 'learning_rate': 4.250000000000001e-06, 'epoch': 11.99} 92%|█████████▏| 9161/10000 [35:58:29<3:14:36, 13.92s/it] 92%|█████████▏| 9162/10000 [35:58:43<3:14:24, 13.92s/it] {'loss': 0.0027, 'learning_rate': 4.245e-06, 'epoch': 11.99} 92%|█████████▏| 9162/10000 [35:58:43<3:14:24, 13.92s/it] 92%|█████████▏| 9163/10000 [35:58:56<3:14:01, 13.91s/it] {'loss': 0.0026, 'learning_rate': 4.24e-06, 'epoch': 11.99} 92%|█████████▏| 9163/10000 [35:58:56<3:14:01, 13.91s/it] 92%|█████████▏| 9164/10000 [35:59:10<3:13:36, 13.90s/it] {'loss': 0.003, 'learning_rate': 4.235e-06, 'epoch': 11.99} 92%|█████████▏| 9164/10000 [35:59:10<3:13:36, 13.90s/it] 92%|█████████▏| 9165/10000 [35:59:24<3:13:35, 13.91s/it] {'loss': 0.0035, 'learning_rate': 4.23e-06, 'epoch': 12.0} 92%|█████████▏| 9165/10000 [35:59:24<3:13:35, 13.91s/it] 92%|█████████▏| 9166/10000 [35:59:38<3:13:46, 13.94s/it] {'loss': 0.0032, 'learning_rate': 4.225e-06, 'epoch': 12.0} 92%|█████████▏| 9166/10000 [35:59:38<3:13:46, 13.94s/it] 92%|█████████▏| 9167/10000 [35:59:52<3:13:21, 13.93s/it] {'loss': 0.0026, 'learning_rate': 4.22e-06, 'epoch': 12.0} 92%|█████████▏| 9167/10000 [35:59:52<3:13:21, 13.93s/it] 92%|█████████▏| 9168/10000 [36:00:05<3:07:16, 13.51s/it] {'loss': 0.0026, 'learning_rate': 4.215e-06, 'epoch': 12.0} 92%|█████████▏| 9168/10000 [36:00:05<3:07:16, 13.51s/it] 92%|█████████▏| 9169/10000 [36:00:19<3:09:05, 13.65s/it] {'loss': 0.0013, 'learning_rate': 4.21e-06, 'epoch': 12.0} 92%|█████████▏| 9169/10000 [36:00:19<3:09:05, 13.65s/it] 92%|█████████▏| 9170/10000 [36:00:33<3:10:30, 13.77s/it] {'loss': 0.003, 'learning_rate': 4.2049999999999996e-06, 'epoch': 12.0} 92%|█████████▏| 9170/10000 [36:00:33<3:10:30, 13.77s/it] 92%|█████████▏| 9171/10000 [36:00:47<3:10:32, 13.79s/it] {'loss': 0.0039, 'learning_rate': 4.2000000000000004e-06, 'epoch': 12.0} 92%|█████████▏| 9171/10000 [36:00:47<3:10:32, 13.79s/it] 92%|█████████▏| 9172/10000 [36:01:00<3:10:42, 13.82s/it] {'loss': 0.0034, 'learning_rate': 4.1950000000000005e-06, 'epoch': 12.01} 92%|█████████▏| 9172/10000 [36:01:00<3:10:42, 13.82s/it] 92%|█████████▏| 9173/10000 [36:01:14<3:10:48, 13.84s/it] {'loss': 0.0018, 'learning_rate': 4.1900000000000005e-06, 'epoch': 12.01} 92%|█████████▏| 9173/10000 [36:01:14<3:10:48, 13.84s/it] 92%|█████████▏| 9174/10000 [36:01:28<3:11:37, 13.92s/it] {'loss': 0.0011, 'learning_rate': 4.185e-06, 'epoch': 12.01} 92%|█████████▏| 9174/10000 [36:01:28<3:11:37, 13.92s/it] 92%|█████████▏| 9175/10000 [36:01:42<3:11:23, 13.92s/it] {'loss': 0.0024, 'learning_rate': 4.18e-06, 'epoch': 12.01} 92%|█████████▏| 9175/10000 [36:01:42<3:11:23, 13.92s/it] 92%|█████████▏| 9176/10000 [36:01:56<3:10:34, 13.88s/it] {'loss': 0.0031, 'learning_rate': 4.175000000000001e-06, 'epoch': 12.01} 92%|█████████▏| 9176/10000 [36:01:56<3:10:34, 13.88s/it] 92%|█████████▏| 9177/10000 [36:02:10<3:10:00, 13.85s/it] {'loss': 0.002, 'learning_rate': 4.17e-06, 'epoch': 12.01} 92%|█████████▏| 9177/10000 [36:02:10<3:10:00, 13.85s/it] 92%|█████████▏| 9178/10000 [36:02:24<3:09:39, 13.84s/it] {'loss': 0.0012, 'learning_rate': 4.165e-06, 'epoch': 12.01} 92%|█████████▏| 9178/10000 [36:02:24<3:09:39, 13.84s/it] 92%|█████████▏| 9179/10000 [36:02:38<3:09:12, 13.83s/it] {'loss': 0.003, 'learning_rate': 4.16e-06, 'epoch': 12.01} 92%|█████████▏| 9179/10000 [36:02:38<3:09:12, 13.83s/it] 92%|█████████▏| 9180/10000 [36:02:51<3:09:17, 13.85s/it] {'loss': 0.0014, 'learning_rate': 4.155e-06, 'epoch': 12.02} 92%|█████████▏| 9180/10000 [36:02:51<3:09:17, 13.85s/it] 92%|█████████▏| 9181/10000 [36:03:05<3:08:26, 13.81s/it] {'loss': 0.0034, 'learning_rate': 4.15e-06, 'epoch': 12.02} 92%|█████████▏| 9181/10000 [36:03:05<3:08:26, 13.81s/it] 92%|█████████▏| 9182/10000 [36:03:19<3:08:17, 13.81s/it] {'loss': 0.0015, 'learning_rate': 4.145e-06, 'epoch': 12.02} 92%|█████████▏| 9182/10000 [36:03:19<3:08:17, 13.81s/it] 92%|█████████▏| 9183/10000 [36:03:33<3:08:21, 13.83s/it] {'loss': 0.0022, 'learning_rate': 4.14e-06, 'epoch': 12.02} 92%|█████████▏| 9183/10000 [36:03:33<3:08:21, 13.83s/it] 92%|█████████▏| 9184/10000 [36:03:47<3:08:31, 13.86s/it] {'loss': 0.0032, 'learning_rate': 4.135e-06, 'epoch': 12.02} 92%|█████████▏| 9184/10000 [36:03:47<3:08:31, 13.86s/it] 92%|█████████▏| 9185/10000 [36:04:01<3:08:04, 13.85s/it] {'loss': 0.0026, 'learning_rate': 4.13e-06, 'epoch': 12.02} 92%|█████████▏| 9185/10000 [36:04:01<3:08:04, 13.85s/it] 92%|█████████▏| 9186/10000 [36:04:14<3:08:00, 13.86s/it] {'loss': 0.0018, 'learning_rate': 4.125e-06, 'epoch': 12.02} 92%|█████████▏| 9186/10000 [36:04:15<3:08:00, 13.86s/it] 92%|█████████▏| 9187/10000 [36:04:28<3:07:58, 13.87s/it] {'loss': 0.002, 'learning_rate': 4.12e-06, 'epoch': 12.02} 92%|█████████▏| 9187/10000 [36:04:28<3:07:58, 13.87s/it] 92%|█████████▏| 9188/10000 [36:04:42<3:07:44, 13.87s/it] {'loss': 0.0025, 'learning_rate': 4.115e-06, 'epoch': 12.03} 92%|█████████▏| 9188/10000 [36:04:42<3:07:44, 13.87s/it] 92%|█████████▏| 9189/10000 [36:04:56<3:07:33, 13.88s/it] {'loss': 0.0023, 'learning_rate': 4.11e-06, 'epoch': 12.03} 92%|█████████▏| 9189/10000 [36:04:56<3:07:33, 13.88s/it] 92%|█████████▏| 9190/10000 [36:05:10<3:06:52, 13.84s/it] {'loss': 0.0014, 'learning_rate': 4.1050000000000005e-06, 'epoch': 12.03} 92%|█████████▏| 9190/10000 [36:05:10<3:06:52, 13.84s/it] 92%|█████████▏| 9191/10000 [36:05:24<3:06:37, 13.84s/it] {'loss': 0.0028, 'learning_rate': 4.1000000000000006e-06, 'epoch': 12.03} 92%|█████████▏| 9191/10000 [36:05:24<3:06:37, 13.84s/it] 92%|█████████▏| 9192/10000 [36:05:38<3:06:44, 13.87s/it] {'loss': 0.0018, 'learning_rate': 4.095000000000001e-06, 'epoch': 12.03} 92%|█████████▏| 9192/10000 [36:05:38<3:06:44, 13.87s/it] 92%|█████████▏| 9193/10000 [36:05:52<3:06:49, 13.89s/it] {'loss': 0.0025, 'learning_rate': 4.09e-06, 'epoch': 12.03} 92%|█████████▏| 9193/10000 [36:05:52<3:06:49, 13.89s/it] 92%|█████████▏| 9194/10000 [36:06:06<3:06:57, 13.92s/it] {'loss': 0.001, 'learning_rate': 4.085e-06, 'epoch': 12.03} 92%|█████████▏| 9194/10000 [36:06:06<3:06:57, 13.92s/it] 92%|█████████▏| 9195/10000 [36:06:19<3:06:31, 13.90s/it] {'loss': 0.0029, 'learning_rate': 4.080000000000001e-06, 'epoch': 12.04} 92%|█████████▏| 9195/10000 [36:06:20<3:06:31, 13.90s/it] 92%|█████████▏| 9196/10000 [36:06:33<3:06:07, 13.89s/it] {'loss': 0.0016, 'learning_rate': 4.075e-06, 'epoch': 12.04} 92%|█████████▏| 9196/10000 [36:06:33<3:06:07, 13.89s/it] 92%|█████████▏| 9197/10000 [36:06:47<3:05:52, 13.89s/it] {'loss': 0.0022, 'learning_rate': 4.07e-06, 'epoch': 12.04} 92%|█████████▏| 9197/10000 [36:06:47<3:05:52, 13.89s/it] 92%|█████████▏| 9198/10000 [36:07:01<3:05:14, 13.86s/it] {'loss': 0.0022, 'learning_rate': 4.065e-06, 'epoch': 12.04} 92%|█████████▏| 9198/10000 [36:07:01<3:05:14, 13.86s/it] 92%|█████████▏| 9199/10000 [36:07:15<3:05:00, 13.86s/it] {'loss': 0.0027, 'learning_rate': 4.06e-06, 'epoch': 12.04} 92%|█████████▏| 9199/10000 [36:07:15<3:05:00, 13.86s/it] 92%|█████████▏| 9200/10000 [36:07:29<3:04:25, 13.83s/it] {'loss': 0.0028, 'learning_rate': 4.055e-06, 'epoch': 12.04} 92%|█████████▏| 9200/10000 [36:07:29<3:04:25, 13.83s/it] 92%|█████████▏| 9201/10000 [36:07:42<3:04:08, 13.83s/it] {'loss': 0.0043, 'learning_rate': 4.05e-06, 'epoch': 12.04} 92%|█████████▏| 9201/10000 [36:07:42<3:04:08, 13.83s/it] 92%|█████████▏| 9202/10000 [36:07:56<3:03:48, 13.82s/it] {'loss': 0.0022, 'learning_rate': 4.045e-06, 'epoch': 12.04} 92%|█████████▏| 9202/10000 [36:07:56<3:03:48, 13.82s/it] 92%|█████████▏| 9203/10000 [36:08:10<3:03:52, 13.84s/it] {'loss': 0.0015, 'learning_rate': 4.04e-06, 'epoch': 12.05} 92%|█████████▏| 9203/10000 [36:08:10<3:03:52, 13.84s/it] 92%|█████████▏| 9204/10000 [36:08:24<3:03:35, 13.84s/it] {'loss': 0.0024, 'learning_rate': 4.0349999999999995e-06, 'epoch': 12.05} 92%|█████████▏| 9204/10000 [36:08:24<3:03:35, 13.84s/it] 92%|█████████▏| 9205/10000 [36:08:38<3:03:19, 13.84s/it] {'loss': 0.0013, 'learning_rate': 4.03e-06, 'epoch': 12.05} 92%|█████████▏| 9205/10000 [36:08:38<3:03:19, 13.84s/it] 92%|█████████▏| 9206/10000 [36:08:52<3:03:53, 13.90s/it] {'loss': 0.002, 'learning_rate': 4.0250000000000004e-06, 'epoch': 12.05} 92%|█████████▏| 9206/10000 [36:08:52<3:03:53, 13.90s/it] 92%|█████████▏| 9207/10000 [36:09:06<3:03:09, 13.86s/it] {'loss': 0.0028, 'learning_rate': 4.0200000000000005e-06, 'epoch': 12.05} 92%|█████████▏| 9207/10000 [36:09:06<3:03:09, 13.86s/it] 92%|█████████▏| 9208/10000 [36:09:19<3:03:03, 13.87s/it] {'loss': 0.002, 'learning_rate': 4.015e-06, 'epoch': 12.05} 92%|█████████▏| 9208/10000 [36:09:20<3:03:03, 13.87s/it] 92%|█████████▏| 9209/10000 [36:09:33<3:02:27, 13.84s/it] {'loss': 0.0023, 'learning_rate': 4.01e-06, 'epoch': 12.05} 92%|█████████▏| 9209/10000 [36:09:33<3:02:27, 13.84s/it] 92%|█████████▏| 9210/10000 [36:09:47<3:02:17, 13.84s/it] {'loss': 0.0008, 'learning_rate': 4.005000000000001e-06, 'epoch': 12.05} 92%|█████████▏| 9210/10000 [36:09:47<3:02:17, 13.84s/it] 92%|█████████▏| 9211/10000 [36:10:01<3:01:35, 13.81s/it] {'loss': 0.0027, 'learning_rate': 4.000000000000001e-06, 'epoch': 12.06} 92%|█████████▏| 9211/10000 [36:10:01<3:01:35, 13.81s/it] 92%|█████████▏| 9212/10000 [36:10:15<3:01:56, 13.85s/it] {'loss': 0.0039, 'learning_rate': 3.995e-06, 'epoch': 12.06} 92%|█████████▏| 9212/10000 [36:10:15<3:01:56, 13.85s/it] 92%|█████████▏| 9213/10000 [36:10:29<3:01:58, 13.87s/it] {'loss': 0.0036, 'learning_rate': 3.99e-06, 'epoch': 12.06} 92%|█████████▏| 9213/10000 [36:10:29<3:01:58, 13.87s/it] 92%|█████████▏| 9214/10000 [36:10:43<3:01:30, 13.86s/it] {'loss': 0.0017, 'learning_rate': 3.985e-06, 'epoch': 12.06} 92%|█████████▏| 9214/10000 [36:10:43<3:01:30, 13.86s/it] 92%|█████████▏| 9215/10000 [36:10:56<3:01:24, 13.87s/it] {'loss': 0.0023, 'learning_rate': 3.98e-06, 'epoch': 12.06} 92%|█████████▏| 9215/10000 [36:10:56<3:01:24, 13.87s/it] 92%|█████████▏| 9216/10000 [36:11:10<3:01:05, 13.86s/it] {'loss': 0.0027, 'learning_rate': 3.975e-06, 'epoch': 12.06} 92%|█████████▏| 9216/10000 [36:11:10<3:01:05, 13.86s/it] 92%|█████████▏| 9217/10000 [36:11:24<3:00:57, 13.87s/it] {'loss': 0.0015, 'learning_rate': 3.97e-06, 'epoch': 12.06} 92%|█████████▏| 9217/10000 [36:11:24<3:00:57, 13.87s/it] 92%|█████████▏| 9218/10000 [36:11:38<3:00:41, 13.86s/it] {'loss': 0.0018, 'learning_rate': 3.965e-06, 'epoch': 12.07} 92%|█████████▏| 9218/10000 [36:11:38<3:00:41, 13.86s/it] 92%|█████████▏| 9219/10000 [36:11:52<3:00:29, 13.87s/it] {'loss': 0.0013, 'learning_rate': 3.96e-06, 'epoch': 12.07} 92%|█████████▏| 9219/10000 [36:11:52<3:00:29, 13.87s/it] 92%|█████████▏| 9220/10000 [36:12:06<3:00:29, 13.88s/it] {'loss': 0.0041, 'learning_rate': 3.955e-06, 'epoch': 12.07} 92%|█████████▏| 9220/10000 [36:12:06<3:00:29, 13.88s/it] 92%|█████████▏| 9221/10000 [36:12:20<2:59:52, 13.85s/it] {'loss': 0.0021, 'learning_rate': 3.95e-06, 'epoch': 12.07} 92%|█████████▏| 9221/10000 [36:12:20<2:59:52, 13.85s/it] 92%|█████████▏| 9222/10000 [36:12:34<2:59:51, 13.87s/it] {'loss': 0.0029, 'learning_rate': 3.945e-06, 'epoch': 12.07} 92%|█████████▏| 9222/10000 [36:12:34<2:59:51, 13.87s/it] 92%|█████████▏| 9223/10000 [36:12:47<2:59:45, 13.88s/it] {'loss': 0.0027, 'learning_rate': 3.9399999999999995e-06, 'epoch': 12.07} 92%|█████████▏| 9223/10000 [36:12:47<2:59:45, 13.88s/it] 92%|█████████▏| 9224/10000 [36:13:01<2:59:39, 13.89s/it] {'loss': 0.0023, 'learning_rate': 3.9350000000000004e-06, 'epoch': 12.07} 92%|█████████▏| 9224/10000 [36:13:01<2:59:39, 13.89s/it] 92%|█████████▏| 9225/10000 [36:13:15<2:59:06, 13.87s/it] {'loss': 0.0019, 'learning_rate': 3.9300000000000005e-06, 'epoch': 12.07} 92%|█████████▏| 9225/10000 [36:13:15<2:59:06, 13.87s/it] 92%|█████████▏| 9226/10000 [36:13:29<2:59:00, 13.88s/it] {'loss': 0.0024, 'learning_rate': 3.9250000000000005e-06, 'epoch': 12.08} 92%|█████████▏| 9226/10000 [36:13:29<2:59:00, 13.88s/it] 92%|█████████▏| 9227/10000 [36:13:43<2:58:39, 13.87s/it] {'loss': 0.002, 'learning_rate': 3.92e-06, 'epoch': 12.08} 92%|█████████▏| 9227/10000 [36:13:43<2:58:39, 13.87s/it] 92%|█████████▏| 9228/10000 [36:13:57<2:58:13, 13.85s/it] {'loss': 0.0029, 'learning_rate': 3.915e-06, 'epoch': 12.08} 92%|█████████▏| 9228/10000 [36:13:57<2:58:13, 13.85s/it] 92%|█████████▏| 9229/10000 [36:14:11<2:58:03, 13.86s/it] {'loss': 0.0021, 'learning_rate': 3.910000000000001e-06, 'epoch': 12.08} 92%|█████████▏| 9229/10000 [36:14:11<2:58:03, 13.86s/it] 92%|█████████▏| 9230/10000 [36:14:24<2:57:59, 13.87s/it] {'loss': 0.0018, 'learning_rate': 3.905000000000001e-06, 'epoch': 12.08} 92%|█████████▏| 9230/10000 [36:14:25<2:57:59, 13.87s/it] 92%|█████████▏| 9231/10000 [36:14:39<2:58:25, 13.92s/it] {'loss': 0.0021, 'learning_rate': 3.9e-06, 'epoch': 12.08} 92%|█████████▏| 9231/10000 [36:14:39<2:58:25, 13.92s/it] 92%|█████████▏| 9232/10000 [36:14:52<2:58:03, 13.91s/it] {'loss': 0.0015, 'learning_rate': 3.895e-06, 'epoch': 12.08} 92%|█████████▏| 9232/10000 [36:14:52<2:58:03, 13.91s/it] 92%|█████████▏| 9233/10000 [36:15:06<2:57:53, 13.92s/it] {'loss': 0.0016, 'learning_rate': 3.89e-06, 'epoch': 12.09} 92%|█████████▏| 9233/10000 [36:15:06<2:57:53, 13.92s/it] 92%|█████████▏| 9234/10000 [36:15:20<2:57:40, 13.92s/it] {'loss': 0.0019, 'learning_rate': 3.885e-06, 'epoch': 12.09} 92%|█████████▏| 9234/10000 [36:15:20<2:57:40, 13.92s/it] 92%|█████████▏| 9235/10000 [36:15:34<2:57:24, 13.92s/it] {'loss': 0.0036, 'learning_rate': 3.88e-06, 'epoch': 12.09} 92%|█████████▏| 9235/10000 [36:15:34<2:57:24, 13.92s/it] 92%|█████████▏| 9236/10000 [36:15:48<2:57:14, 13.92s/it] {'loss': 0.0041, 'learning_rate': 3.875e-06, 'epoch': 12.09} 92%|█████████▏| 9236/10000 [36:15:48<2:57:14, 13.92s/it] 92%|█████████▏| 9237/10000 [36:16:02<2:56:36, 13.89s/it] {'loss': 0.0025, 'learning_rate': 3.87e-06, 'epoch': 12.09} 92%|█████████▏| 9237/10000 [36:16:02<2:56:36, 13.89s/it] 92%|█████████▏| 9238/10000 [36:16:16<2:56:57, 13.93s/it] {'loss': 0.0025, 'learning_rate': 3.865e-06, 'epoch': 12.09} 92%|█████████▏| 9238/10000 [36:16:16<2:56:57, 13.93s/it] 92%|█████████▏| 9239/10000 [36:16:30<2:56:32, 13.92s/it] {'loss': 0.0023, 'learning_rate': 3.86e-06, 'epoch': 12.09} 92%|█████████▏| 9239/10000 [36:16:30<2:56:32, 13.92s/it] 92%|█████████▏| 9240/10000 [36:16:44<2:55:54, 13.89s/it] {'loss': 0.0028, 'learning_rate': 3.855e-06, 'epoch': 12.09} 92%|█████████▏| 9240/10000 [36:16:44<2:55:54, 13.89s/it] 92%|█████████▏| 9241/10000 [36:16:57<2:55:00, 13.83s/it] {'loss': 0.0017, 'learning_rate': 3.85e-06, 'epoch': 12.1} 92%|█████████▏| 9241/10000 [36:16:57<2:55:00, 13.83s/it] 92%|█████████▏| 9242/10000 [36:17:11<2:55:31, 13.89s/it] {'loss': 0.0036, 'learning_rate': 3.845e-06, 'epoch': 12.1} 92%|█████████▏| 9242/10000 [36:17:11<2:55:31, 13.89s/it] 92%|█████████▏| 9243/10000 [36:17:25<2:55:25, 13.90s/it] {'loss': 0.0018, 'learning_rate': 3.84e-06, 'epoch': 12.1} 92%|█████████▏| 9243/10000 [36:17:25<2:55:25, 13.90s/it] 92%|█████████▏| 9244/10000 [36:17:39<2:55:18, 13.91s/it] {'loss': 0.0021, 'learning_rate': 3.8350000000000006e-06, 'epoch': 12.1} 92%|█████████▏| 9244/10000 [36:17:39<2:55:18, 13.91s/it] 92%|█████████▏| 9245/10000 [36:17:53<2:54:58, 13.90s/it] {'loss': 0.0016, 'learning_rate': 3.830000000000001e-06, 'epoch': 12.1} 92%|█████████▏| 9245/10000 [36:17:53<2:54:58, 13.90s/it] 92%|█████████▏| 9246/10000 [36:18:07<2:54:44, 13.91s/it] {'loss': 0.0021, 'learning_rate': 3.825e-06, 'epoch': 12.1} 92%|█████████▏| 9246/10000 [36:18:07<2:54:44, 13.91s/it] 92%|█████████▏| 9247/10000 [36:18:21<2:54:17, 13.89s/it] {'loss': 0.0029, 'learning_rate': 3.82e-06, 'epoch': 12.1} 92%|█████████▏| 9247/10000 [36:18:21<2:54:17, 13.89s/it] 92%|█████████▏| 9248/10000 [36:18:35<2:53:51, 13.87s/it] {'loss': 0.0026, 'learning_rate': 3.815000000000001e-06, 'epoch': 12.1} 92%|█████████▏| 9248/10000 [36:18:35<2:53:51, 13.87s/it] 92%|█████████▏| 9249/10000 [36:18:49<2:53:33, 13.87s/it] {'loss': 0.0017, 'learning_rate': 3.8100000000000004e-06, 'epoch': 12.11} 92%|█████████▏| 9249/10000 [36:18:49<2:53:33, 13.87s/it] 92%|█████████▎| 9250/10000 [36:19:02<2:53:29, 13.88s/it] {'loss': 0.0018, 'learning_rate': 3.8050000000000004e-06, 'epoch': 12.11} 92%|█████████▎| 9250/10000 [36:19:03<2:53:29, 13.88s/it] 93%|█████████▎| 9251/10000 [36:19:16<2:53:18, 13.88s/it] {'loss': 0.0022, 'learning_rate': 3.8e-06, 'epoch': 12.11} 93%|█████████▎| 9251/10000 [36:19:16<2:53:18, 13.88s/it] 93%|█████████▎| 9252/10000 [36:19:30<2:53:11, 13.89s/it] {'loss': 0.003, 'learning_rate': 3.795e-06, 'epoch': 12.11} 93%|█████████▎| 9252/10000 [36:19:30<2:53:11, 13.89s/it] 93%|█████████▎| 9253/10000 [36:19:44<2:53:06, 13.90s/it] {'loss': 0.0023, 'learning_rate': 3.7900000000000006e-06, 'epoch': 12.11} 93%|█████████▎| 9253/10000 [36:19:44<2:53:06, 13.90s/it] 93%|█████████▎| 9254/10000 [36:19:58<2:52:45, 13.89s/it] {'loss': 0.0026, 'learning_rate': 3.785e-06, 'epoch': 12.11} 93%|█████████▎| 9254/10000 [36:19:58<2:52:45, 13.89s/it] 93%|█████████▎| 9255/10000 [36:20:12<2:52:34, 13.90s/it] {'loss': 0.0021, 'learning_rate': 3.7800000000000002e-06, 'epoch': 12.11} 93%|█████████▎| 9255/10000 [36:20:12<2:52:34, 13.90s/it] 93%|█████████▎| 9256/10000 [36:20:26<2:52:25, 13.91s/it] {'loss': 0.0028, 'learning_rate': 3.775e-06, 'epoch': 12.12} 93%|█████████▎| 9256/10000 [36:20:26<2:52:25, 13.91s/it] 93%|█████████▎| 9257/10000 [36:20:40<2:52:06, 13.90s/it] {'loss': 0.002, 'learning_rate': 3.77e-06, 'epoch': 12.12} 93%|█████████▎| 9257/10000 [36:20:40<2:52:06, 13.90s/it] 93%|█████████▎| 9258/10000 [36:20:54<2:51:14, 13.85s/it] {'loss': 0.0016, 'learning_rate': 3.7650000000000004e-06, 'epoch': 12.12} 93%|█████████▎| 9258/10000 [36:20:54<2:51:14, 13.85s/it] 93%|█████████▎| 9259/10000 [36:21:07<2:50:42, 13.82s/it] {'loss': 0.002, 'learning_rate': 3.7600000000000004e-06, 'epoch': 12.12} 93%|█████████▎| 9259/10000 [36:21:07<2:50:42, 13.82s/it] 93%|█████████▎| 9260/10000 [36:21:21<2:50:06, 13.79s/it] {'loss': 0.003, 'learning_rate': 3.755e-06, 'epoch': 12.12} 93%|█████████▎| 9260/10000 [36:21:21<2:50:06, 13.79s/it] 93%|█████████▎| 9261/10000 [36:21:35<2:49:54, 13.79s/it] {'loss': 0.0022, 'learning_rate': 3.75e-06, 'epoch': 12.12} 93%|█████████▎| 9261/10000 [36:21:35<2:49:54, 13.79s/it] 93%|█████████▎| 9262/10000 [36:21:49<2:49:46, 13.80s/it] {'loss': 0.0017, 'learning_rate': 3.7449999999999997e-06, 'epoch': 12.12} 93%|█████████▎| 9262/10000 [36:21:49<2:49:46, 13.80s/it] 93%|█████████▎| 9263/10000 [36:22:03<2:49:51, 13.83s/it] {'loss': 0.0023, 'learning_rate': 3.7400000000000006e-06, 'epoch': 12.12} 93%|█████████▎| 9263/10000 [36:22:03<2:49:51, 13.83s/it] 93%|█████████▎| 9264/10000 [36:22:17<2:50:23, 13.89s/it] {'loss': 0.0006, 'learning_rate': 3.7350000000000002e-06, 'epoch': 12.13} 93%|█████████▎| 9264/10000 [36:22:17<2:50:23, 13.89s/it] 93%|█████████▎| 9265/10000 [36:22:30<2:50:05, 13.89s/it] {'loss': 0.0023, 'learning_rate': 3.7300000000000003e-06, 'epoch': 12.13} 93%|█████████▎| 9265/10000 [36:22:30<2:50:05, 13.89s/it] 93%|█████████▎| 9266/10000 [36:22:44<2:50:08, 13.91s/it] {'loss': 0.0019, 'learning_rate': 3.725e-06, 'epoch': 12.13} 93%|█████████▎| 9266/10000 [36:22:44<2:50:08, 13.91s/it] 93%|█████████▎| 9267/10000 [36:22:58<2:49:33, 13.88s/it] {'loss': 0.0015, 'learning_rate': 3.72e-06, 'epoch': 12.13} 93%|█████████▎| 9267/10000 [36:22:58<2:49:33, 13.88s/it] 93%|█████████▎| 9268/10000 [36:23:12<2:49:33, 13.90s/it] {'loss': 0.0015, 'learning_rate': 3.7150000000000004e-06, 'epoch': 12.13} 93%|█████████▎| 9268/10000 [36:23:12<2:49:33, 13.90s/it] 93%|█████████▎| 9269/10000 [36:23:26<2:49:13, 13.89s/it] {'loss': 0.0017, 'learning_rate': 3.7100000000000005e-06, 'epoch': 12.13} 93%|█████████▎| 9269/10000 [36:23:26<2:49:13, 13.89s/it] 93%|█████████▎| 9270/10000 [36:23:40<2:48:37, 13.86s/it] {'loss': 0.0021, 'learning_rate': 3.705e-06, 'epoch': 12.13} 93%|█████████▎| 9270/10000 [36:23:40<2:48:37, 13.86s/it] 93%|█████████▎| 9271/10000 [36:23:54<2:48:08, 13.84s/it] {'loss': 0.0018, 'learning_rate': 3.7e-06, 'epoch': 12.13} 93%|█████████▎| 9271/10000 [36:23:54<2:48:08, 13.84s/it] 93%|█████████▎| 9272/10000 [36:24:08<2:48:23, 13.88s/it] {'loss': 0.0025, 'learning_rate': 3.6949999999999998e-06, 'epoch': 12.14} 93%|█████████▎| 9272/10000 [36:24:08<2:48:23, 13.88s/it] 93%|█████████▎| 9273/10000 [36:24:22<2:48:37, 13.92s/it] {'loss': 0.002, 'learning_rate': 3.6900000000000002e-06, 'epoch': 12.14} 93%|█████████▎| 9273/10000 [36:24:22<2:48:37, 13.92s/it] 93%|█████████▎| 9274/10000 [36:24:35<2:48:16, 13.91s/it] {'loss': 0.0029, 'learning_rate': 3.6850000000000003e-06, 'epoch': 12.14} 93%|█████████▎| 9274/10000 [36:24:36<2:48:16, 13.91s/it] 93%|█████████▎| 9275/10000 [36:24:49<2:48:05, 13.91s/it] {'loss': 0.0029, 'learning_rate': 3.68e-06, 'epoch': 12.14} 93%|█████████▎| 9275/10000 [36:24:49<2:48:05, 13.91s/it] 93%|█████████▎| 9276/10000 [36:25:03<2:47:49, 13.91s/it] {'loss': 0.0024, 'learning_rate': 3.675e-06, 'epoch': 12.14} 93%|█████████▎| 9276/10000 [36:25:03<2:47:49, 13.91s/it] 93%|█████████▎| 9277/10000 [36:25:17<2:47:31, 13.90s/it] {'loss': 0.001, 'learning_rate': 3.6700000000000004e-06, 'epoch': 12.14} 93%|█████████▎| 9277/10000 [36:25:17<2:47:31, 13.90s/it] 93%|█████████▎| 9278/10000 [36:25:31<2:46:45, 13.86s/it] {'loss': 0.0027, 'learning_rate': 3.6650000000000005e-06, 'epoch': 12.14} 93%|█████████▎| 9278/10000 [36:25:31<2:46:45, 13.86s/it] 93%|█████████▎| 9279/10000 [36:25:45<2:47:02, 13.90s/it] {'loss': 0.0018, 'learning_rate': 3.66e-06, 'epoch': 12.15} 93%|█████████▎| 9279/10000 [36:25:45<2:47:02, 13.90s/it] 93%|█████████▎| 9280/10000 [36:25:59<2:46:21, 13.86s/it] {'loss': 0.0023, 'learning_rate': 3.655e-06, 'epoch': 12.15} 93%|█████████▎| 9280/10000 [36:25:59<2:46:21, 13.86s/it] 93%|█████████▎| 9281/10000 [36:26:13<2:46:44, 13.91s/it] {'loss': 0.0017, 'learning_rate': 3.6499999999999998e-06, 'epoch': 12.15} 93%|█████████▎| 9281/10000 [36:26:13<2:46:44, 13.91s/it] 93%|█████████▎| 9282/10000 [36:26:27<2:46:35, 13.92s/it] {'loss': 0.0024, 'learning_rate': 3.6450000000000007e-06, 'epoch': 12.15} 93%|█████████▎| 9282/10000 [36:26:27<2:46:35, 13.92s/it] 93%|█████████▎| 9283/10000 [36:26:41<2:46:23, 13.92s/it] {'loss': 0.001, 'learning_rate': 3.6400000000000003e-06, 'epoch': 12.15} 93%|█████████▎| 9283/10000 [36:26:41<2:46:23, 13.92s/it] 93%|█████████▎| 9284/10000 [36:26:54<2:45:56, 13.91s/it] {'loss': 0.0016, 'learning_rate': 3.6350000000000003e-06, 'epoch': 12.15} 93%|█████████▎| 9284/10000 [36:26:55<2:45:56, 13.91s/it] 93%|█████████▎| 9285/10000 [36:27:08<2:45:41, 13.90s/it] {'loss': 0.0018, 'learning_rate': 3.63e-06, 'epoch': 12.15} 93%|█████████▎| 9285/10000 [36:27:08<2:45:41, 13.90s/it] 93%|█████████▎| 9286/10000 [36:27:22<2:45:39, 13.92s/it] {'loss': 0.0037, 'learning_rate': 3.625e-06, 'epoch': 12.15} 93%|█████████▎| 9286/10000 [36:27:22<2:45:39, 13.92s/it] 93%|█████████▎| 9287/10000 [36:27:36<2:44:56, 13.88s/it] {'loss': 0.002, 'learning_rate': 3.6200000000000005e-06, 'epoch': 12.16} 93%|█████████▎| 9287/10000 [36:27:36<2:44:56, 13.88s/it] 93%|█████████▎| 9288/10000 [36:27:50<2:44:27, 13.86s/it] {'loss': 0.0018, 'learning_rate': 3.6150000000000005e-06, 'epoch': 12.16} 93%|█████████▎| 9288/10000 [36:27:50<2:44:27, 13.86s/it] 93%|█████████▎| 9289/10000 [36:28:04<2:44:09, 13.85s/it] {'loss': 0.0017, 'learning_rate': 3.61e-06, 'epoch': 12.16} 93%|█████████▎| 9289/10000 [36:28:04<2:44:09, 13.85s/it] 93%|█████████▎| 9290/10000 [36:28:18<2:43:39, 13.83s/it] {'loss': 0.0013, 'learning_rate': 3.6050000000000002e-06, 'epoch': 12.16} 93%|█████████▎| 9290/10000 [36:28:18<2:43:39, 13.83s/it] 93%|█████████▎| 9291/10000 [36:28:31<2:43:15, 13.82s/it] {'loss': 0.0028, 'learning_rate': 3.6e-06, 'epoch': 12.16} 93%|█████████▎| 9291/10000 [36:28:31<2:43:15, 13.82s/it] 93%|█████████▎| 9292/10000 [36:28:45<2:42:51, 13.80s/it] {'loss': 0.0023, 'learning_rate': 3.5950000000000003e-06, 'epoch': 12.16} 93%|█████████▎| 9292/10000 [36:28:45<2:42:51, 13.80s/it] 93%|█████████▎| 9293/10000 [36:28:59<2:43:19, 13.86s/it] {'loss': 0.0012, 'learning_rate': 3.5900000000000004e-06, 'epoch': 12.16} 93%|█████████▎| 9293/10000 [36:28:59<2:43:19, 13.86s/it] 93%|█████████▎| 9294/10000 [36:29:13<2:42:35, 13.82s/it] {'loss': 0.0025, 'learning_rate': 3.585e-06, 'epoch': 12.16} 93%|█████████▎| 9294/10000 [36:29:13<2:42:35, 13.82s/it] 93%|█████████▎| 9295/10000 [36:29:26<2:41:54, 13.78s/it] {'loss': 0.0019, 'learning_rate': 3.58e-06, 'epoch': 12.17} 93%|█████████▎| 9295/10000 [36:29:27<2:41:54, 13.78s/it] 93%|█████████▎| 9296/10000 [36:29:40<2:41:43, 13.78s/it] {'loss': 0.0031, 'learning_rate': 3.575e-06, 'epoch': 12.17} 93%|█████████▎| 9296/10000 [36:29:40<2:41:43, 13.78s/it] 93%|█████████▎| 9297/10000 [36:29:54<2:41:45, 13.81s/it] {'loss': 0.0018, 'learning_rate': 3.5700000000000005e-06, 'epoch': 12.17} 93%|█████████▎| 9297/10000 [36:29:54<2:41:45, 13.81s/it] 93%|█████████▎| 9298/10000 [36:30:08<2:41:06, 13.77s/it] {'loss': 0.0019, 'learning_rate': 3.565e-06, 'epoch': 12.17} 93%|█████████▎| 9298/10000 [36:30:08<2:41:06, 13.77s/it] 93%|█████████▎| 9299/10000 [36:30:22<2:40:48, 13.76s/it] {'loss': 0.002, 'learning_rate': 3.5600000000000002e-06, 'epoch': 12.17} 93%|█████████▎| 9299/10000 [36:30:22<2:40:48, 13.76s/it] 93%|█████████▎| 9300/10000 [36:30:36<2:41:10, 13.82s/it] {'loss': 0.002, 'learning_rate': 3.555e-06, 'epoch': 12.17} 93%|█████████▎| 9300/10000 [36:30:36<2:41:10, 13.82s/it] 93%|█████████▎| 9301/10000 [36:30:49<2:40:46, 13.80s/it] {'loss': 0.0033, 'learning_rate': 3.55e-06, 'epoch': 12.17} 93%|█████████▎| 9301/10000 [36:30:49<2:40:46, 13.80s/it] 93%|█████████▎| 9302/10000 [36:31:03<2:40:49, 13.82s/it] {'loss': 0.0025, 'learning_rate': 3.5450000000000004e-06, 'epoch': 12.18} 93%|█████████▎| 9302/10000 [36:31:03<2:40:49, 13.82s/it] 93%|█████████▎| 9303/10000 [36:31:17<2:40:15, 13.80s/it] {'loss': 0.002, 'learning_rate': 3.5400000000000004e-06, 'epoch': 12.18} 93%|█████████▎| 9303/10000 [36:31:17<2:40:15, 13.80s/it] 93%|█████████▎| 9304/10000 [36:31:31<2:39:45, 13.77s/it] {'loss': 0.0013, 'learning_rate': 3.535e-06, 'epoch': 12.18} 93%|█████████▎| 9304/10000 [36:31:31<2:39:45, 13.77s/it] 93%|█████████▎| 9305/10000 [36:31:44<2:39:32, 13.77s/it] {'loss': 0.0026, 'learning_rate': 3.53e-06, 'epoch': 12.18} 93%|█████████▎| 9305/10000 [36:31:44<2:39:32, 13.77s/it] 93%|█████████▎| 9306/10000 [36:31:58<2:39:15, 13.77s/it] {'loss': 0.0019, 'learning_rate': 3.5249999999999997e-06, 'epoch': 12.18} 93%|█████████▎| 9306/10000 [36:31:58<2:39:15, 13.77s/it] 93%|█████████▎| 9307/10000 [36:32:12<2:39:08, 13.78s/it] {'loss': 0.0042, 'learning_rate': 3.52e-06, 'epoch': 12.18} 93%|█████████▎| 9307/10000 [36:32:12<2:39:08, 13.78s/it] 93%|█████████▎| 9308/10000 [36:32:26<2:38:45, 13.77s/it] {'loss': 0.0019, 'learning_rate': 3.5150000000000002e-06, 'epoch': 12.18} 93%|█████████▎| 9308/10000 [36:32:26<2:38:45, 13.77s/it] 93%|█████████▎| 9309/10000 [36:32:39<2:38:27, 13.76s/it] {'loss': 0.0018, 'learning_rate': 3.5100000000000003e-06, 'epoch': 12.18} 93%|█████████▎| 9309/10000 [36:32:39<2:38:27, 13.76s/it] 93%|█████████▎| 9310/10000 [36:32:53<2:38:22, 13.77s/it] {'loss': 0.003, 'learning_rate': 3.505e-06, 'epoch': 12.19} 93%|█████████▎| 9310/10000 [36:32:53<2:38:22, 13.77s/it] 93%|█████████▎| 9311/10000 [36:33:07<2:38:42, 13.82s/it] {'loss': 0.0021, 'learning_rate': 3.5000000000000004e-06, 'epoch': 12.19} 93%|█████████▎| 9311/10000 [36:33:07<2:38:42, 13.82s/it] 93%|█████████▎| 9312/10000 [36:33:21<2:38:22, 13.81s/it] {'loss': 0.0031, 'learning_rate': 3.4950000000000004e-06, 'epoch': 12.19} 93%|█████████▎| 9312/10000 [36:33:21<2:38:22, 13.81s/it] 93%|█████████▎| 9313/10000 [36:33:35<2:38:08, 13.81s/it] {'loss': 0.0026, 'learning_rate': 3.49e-06, 'epoch': 12.19} 93%|█████████▎| 9313/10000 [36:33:35<2:38:08, 13.81s/it] 93%|█████████▎| 9314/10000 [36:33:49<2:38:09, 13.83s/it] {'loss': 0.0017, 'learning_rate': 3.485e-06, 'epoch': 12.19} 93%|█████████▎| 9314/10000 [36:33:49<2:38:09, 13.83s/it] 93%|█████████▎| 9315/10000 [36:34:02<2:37:40, 13.81s/it] {'loss': 0.0023, 'learning_rate': 3.4799999999999997e-06, 'epoch': 12.19} 93%|█████████▎| 9315/10000 [36:34:02<2:37:40, 13.81s/it] 93%|█████████▎| 9316/10000 [36:34:16<2:37:41, 13.83s/it] {'loss': 0.0027, 'learning_rate': 3.4750000000000006e-06, 'epoch': 12.19} 93%|█████████▎| 9316/10000 [36:34:16<2:37:41, 13.83s/it] 93%|█████████▎| 9317/10000 [36:34:30<2:37:25, 13.83s/it] {'loss': 0.0025, 'learning_rate': 3.4700000000000002e-06, 'epoch': 12.2} 93%|█████████▎| 9317/10000 [36:34:30<2:37:25, 13.83s/it] 93%|█████████▎| 9318/10000 [36:34:44<2:37:19, 13.84s/it] {'loss': 0.0018, 'learning_rate': 3.4650000000000003e-06, 'epoch': 12.2} 93%|█████████▎| 9318/10000 [36:34:44<2:37:19, 13.84s/it] 93%|█████████▎| 9319/10000 [36:34:58<2:36:22, 13.78s/it] {'loss': 0.0045, 'learning_rate': 3.46e-06, 'epoch': 12.2} 93%|█████████▎| 9319/10000 [36:34:58<2:36:22, 13.78s/it] 93%|█████████▎| 9320/10000 [36:35:11<2:36:29, 13.81s/it] {'loss': 0.0019, 'learning_rate': 3.455e-06, 'epoch': 12.2} 93%|█████████▎| 9320/10000 [36:35:12<2:36:29, 13.81s/it] 93%|█████████▎| 9321/10000 [36:35:25<2:36:31, 13.83s/it] {'loss': 0.0021, 'learning_rate': 3.4500000000000004e-06, 'epoch': 12.2} 93%|█████████▎| 9321/10000 [36:35:25<2:36:31, 13.83s/it] 93%|█████████▎| 9322/10000 [36:35:39<2:36:01, 13.81s/it] {'loss': 0.0017, 'learning_rate': 3.4450000000000005e-06, 'epoch': 12.2} 93%|█████████▎| 9322/10000 [36:35:39<2:36:01, 13.81s/it] 93%|█████████▎| 9323/10000 [36:35:53<2:35:46, 13.81s/it] {'loss': 0.0018, 'learning_rate': 3.44e-06, 'epoch': 12.2} 93%|█████████▎| 9323/10000 [36:35:53<2:35:46, 13.81s/it] 93%|█████████▎| 9324/10000 [36:36:07<2:35:52, 13.83s/it] {'loss': 0.0024, 'learning_rate': 3.435e-06, 'epoch': 12.2} 93%|█████████▎| 9324/10000 [36:36:07<2:35:52, 13.83s/it] 93%|█████████▎| 9325/10000 [36:36:21<2:35:24, 13.81s/it] {'loss': 0.0026, 'learning_rate': 3.4299999999999998e-06, 'epoch': 12.21} 93%|█████████▎| 9325/10000 [36:36:21<2:35:24, 13.81s/it] 93%|█████████▎| 9326/10000 [36:36:34<2:35:05, 13.81s/it] {'loss': 0.0016, 'learning_rate': 3.4250000000000002e-06, 'epoch': 12.21} 93%|█████████▎| 9326/10000 [36:36:34<2:35:05, 13.81s/it] 93%|█████████▎| 9327/10000 [36:36:48<2:34:38, 13.79s/it] {'loss': 0.0032, 'learning_rate': 3.4200000000000003e-06, 'epoch': 12.21} 93%|█████████▎| 9327/10000 [36:36:48<2:34:38, 13.79s/it] 93%|█████████▎| 9328/10000 [36:37:02<2:34:22, 13.78s/it] {'loss': 0.0018, 'learning_rate': 3.4150000000000003e-06, 'epoch': 12.21} 93%|█████████▎| 9328/10000 [36:37:02<2:34:22, 13.78s/it] 93%|█████████▎| 9329/10000 [36:37:16<2:33:58, 13.77s/it] {'loss': 0.0018, 'learning_rate': 3.41e-06, 'epoch': 12.21} 93%|█████████▎| 9329/10000 [36:37:16<2:33:58, 13.77s/it] 93%|█████████▎| 9330/10000 [36:37:29<2:33:59, 13.79s/it] {'loss': 0.0014, 'learning_rate': 3.405e-06, 'epoch': 12.21} 93%|█████████▎| 9330/10000 [36:37:30<2:33:59, 13.79s/it] 93%|█████████▎| 9331/10000 [36:37:43<2:33:59, 13.81s/it] {'loss': 0.0024, 'learning_rate': 3.4000000000000005e-06, 'epoch': 12.21} 93%|█████████▎| 9331/10000 [36:37:43<2:33:59, 13.81s/it] 93%|█████████▎| 9332/10000 [36:37:57<2:33:37, 13.80s/it] {'loss': 0.0011, 'learning_rate': 3.395e-06, 'epoch': 12.21} 93%|█████████▎| 9332/10000 [36:37:57<2:33:37, 13.80s/it] 93%|█████████▎| 9333/10000 [36:38:11<2:33:24, 13.80s/it] {'loss': 0.0021, 'learning_rate': 3.39e-06, 'epoch': 12.22} 93%|█████████▎| 9333/10000 [36:38:11<2:33:24, 13.80s/it] 93%|█████████▎| 9334/10000 [36:38:25<2:32:53, 13.77s/it] {'loss': 0.0021, 'learning_rate': 3.3849999999999998e-06, 'epoch': 12.22} 93%|█████████▎| 9334/10000 [36:38:25<2:32:53, 13.77s/it] 93%|█████████▎| 9335/10000 [36:38:38<2:32:35, 13.77s/it] {'loss': 0.0022, 'learning_rate': 3.38e-06, 'epoch': 12.22} 93%|█████████▎| 9335/10000 [36:38:38<2:32:35, 13.77s/it] 93%|█████████▎| 9336/10000 [36:38:52<2:31:58, 13.73s/it] {'loss': 0.0023, 'learning_rate': 3.3750000000000003e-06, 'epoch': 12.22} 93%|█████████▎| 9336/10000 [36:38:52<2:31:58, 13.73s/it] 93%|█████████▎| 9337/10000 [36:39:06<2:31:32, 13.71s/it] {'loss': 0.0022, 'learning_rate': 3.3700000000000003e-06, 'epoch': 12.22} 93%|█████████▎| 9337/10000 [36:39:06<2:31:32, 13.71s/it] 93%|█████████▎| 9338/10000 [36:39:20<2:31:37, 13.74s/it] {'loss': 0.0026, 'learning_rate': 3.365e-06, 'epoch': 12.22} 93%|█████████▎| 9338/10000 [36:39:20<2:31:37, 13.74s/it] 93%|█████████▎| 9339/10000 [36:39:33<2:31:51, 13.78s/it] {'loss': 0.0017, 'learning_rate': 3.36e-06, 'epoch': 12.22} 93%|█████████▎| 9339/10000 [36:39:33<2:31:51, 13.78s/it] 93%|█████████▎| 9340/10000 [36:39:47<2:31:26, 13.77s/it] {'loss': 0.0022, 'learning_rate': 3.3550000000000005e-06, 'epoch': 12.23} 93%|█████████▎| 9340/10000 [36:39:47<2:31:26, 13.77s/it] 93%|█████████▎| 9341/10000 [36:40:01<2:31:30, 13.79s/it] {'loss': 0.0025, 'learning_rate': 3.3500000000000005e-06, 'epoch': 12.23} 93%|█████████▎| 9341/10000 [36:40:01<2:31:30, 13.79s/it] 93%|█████████▎| 9342/10000 [36:40:15<2:31:30, 13.81s/it] {'loss': 0.0025, 'learning_rate': 3.345e-06, 'epoch': 12.23} 93%|█████████▎| 9342/10000 [36:40:15<2:31:30, 13.81s/it] 93%|█████████▎| 9343/10000 [36:40:29<2:31:05, 13.80s/it] {'loss': 0.0018, 'learning_rate': 3.34e-06, 'epoch': 12.23} 93%|█████████▎| 9343/10000 [36:40:29<2:31:05, 13.80s/it] 93%|█████████▎| 9344/10000 [36:40:42<2:30:51, 13.80s/it] {'loss': 0.002, 'learning_rate': 3.335e-06, 'epoch': 12.23} 93%|█████████▎| 9344/10000 [36:40:42<2:30:51, 13.80s/it] 93%|█████████▎| 9345/10000 [36:40:56<2:30:31, 13.79s/it] {'loss': 0.0019, 'learning_rate': 3.3300000000000003e-06, 'epoch': 12.23} 93%|█████████▎| 9345/10000 [36:40:56<2:30:31, 13.79s/it] 93%|█████████▎| 9346/10000 [36:41:10<2:30:24, 13.80s/it] {'loss': 0.0023, 'learning_rate': 3.3250000000000004e-06, 'epoch': 12.23} 93%|█████████▎| 9346/10000 [36:41:10<2:30:24, 13.80s/it] 93%|█████████▎| 9347/10000 [36:41:24<2:30:04, 13.79s/it] {'loss': 0.0011, 'learning_rate': 3.3200000000000004e-06, 'epoch': 12.23} 93%|█████████▎| 9347/10000 [36:41:24<2:30:04, 13.79s/it] 93%|█████████▎| 9348/10000 [36:41:38<2:29:45, 13.78s/it] {'loss': 0.0014, 'learning_rate': 3.315e-06, 'epoch': 12.24} 93%|█████████▎| 9348/10000 [36:41:38<2:29:45, 13.78s/it] 93%|█████████▎| 9349/10000 [36:41:51<2:29:37, 13.79s/it] {'loss': 0.0092, 'learning_rate': 3.31e-06, 'epoch': 12.24} 93%|█████████▎| 9349/10000 [36:41:51<2:29:37, 13.79s/it] 94%|█████████▎| 9350/10000 [36:42:05<2:29:49, 13.83s/it] {'loss': 0.0016, 'learning_rate': 3.3050000000000005e-06, 'epoch': 12.24} 94%|█████████▎| 9350/10000 [36:42:05<2:29:49, 13.83s/it] 94%|█████████▎| 9351/10000 [36:42:19<2:29:55, 13.86s/it] {'loss': 0.0027, 'learning_rate': 3.3e-06, 'epoch': 12.24} 94%|█████████▎| 9351/10000 [36:42:19<2:29:55, 13.86s/it] 94%|█████████▎| 9352/10000 [36:42:33<2:29:25, 13.84s/it] {'loss': 0.0016, 'learning_rate': 3.2950000000000002e-06, 'epoch': 12.24} 94%|█████████▎| 9352/10000 [36:42:33<2:29:25, 13.84s/it] 94%|█████████▎| 9353/10000 [36:42:47<2:29:12, 13.84s/it] {'loss': 0.0011, 'learning_rate': 3.29e-06, 'epoch': 12.24} 94%|█████████▎| 9353/10000 [36:42:47<2:29:12, 13.84s/it] 94%|█████████▎| 9354/10000 [36:43:01<2:28:36, 13.80s/it] {'loss': 0.0012, 'learning_rate': 3.285e-06, 'epoch': 12.24} 94%|█████████▎| 9354/10000 [36:43:01<2:28:36, 13.80s/it] 94%|█████████▎| 9355/10000 [36:43:14<2:28:11, 13.79s/it] {'loss': 0.0027, 'learning_rate': 3.2800000000000004e-06, 'epoch': 12.24} 94%|█████████▎| 9355/10000 [36:43:14<2:28:11, 13.79s/it] 94%|█████████▎| 9356/10000 [36:43:28<2:28:02, 13.79s/it] {'loss': 0.0018, 'learning_rate': 3.2750000000000004e-06, 'epoch': 12.25} 94%|█████████▎| 9356/10000 [36:43:28<2:28:02, 13.79s/it] 94%|█████████▎| 9357/10000 [36:43:42<2:27:50, 13.80s/it] {'loss': 0.0011, 'learning_rate': 3.27e-06, 'epoch': 12.25} 94%|█████████▎| 9357/10000 [36:43:42<2:27:50, 13.80s/it] 94%|█████████▎| 9358/10000 [36:43:56<2:27:28, 13.78s/it] {'loss': 0.0029, 'learning_rate': 3.265e-06, 'epoch': 12.25} 94%|█████████▎| 9358/10000 [36:43:56<2:27:28, 13.78s/it] 94%|█████████▎| 9359/10000 [36:44:09<2:26:52, 13.75s/it] {'loss': 0.0016, 'learning_rate': 3.2599999999999997e-06, 'epoch': 12.25} 94%|█████████▎| 9359/10000 [36:44:09<2:26:52, 13.75s/it] 94%|█████████▎| 9360/10000 [36:44:23<2:27:11, 13.80s/it] {'loss': 0.0037, 'learning_rate': 3.2550000000000006e-06, 'epoch': 12.25} 94%|█████████▎| 9360/10000 [36:44:23<2:27:11, 13.80s/it] 94%|█████████▎| 9361/10000 [36:44:37<2:27:06, 13.81s/it] {'loss': 0.0025, 'learning_rate': 3.2500000000000002e-06, 'epoch': 12.25} 94%|█████████▎| 9361/10000 [36:44:37<2:27:06, 13.81s/it] 94%|█████████▎| 9362/10000 [36:44:51<2:27:12, 13.84s/it] {'loss': 0.0028, 'learning_rate': 3.2450000000000003e-06, 'epoch': 12.25} 94%|█████████▎| 9362/10000 [36:44:51<2:27:12, 13.84s/it] 94%|█████████▎| 9363/10000 [36:45:05<2:26:31, 13.80s/it] {'loss': 0.0014, 'learning_rate': 3.24e-06, 'epoch': 12.26} 94%|█████████▎| 9363/10000 [36:45:05<2:26:31, 13.80s/it] 94%|█████████▎| 9364/10000 [36:45:18<2:26:03, 13.78s/it] {'loss': 0.0021, 'learning_rate': 3.235e-06, 'epoch': 12.26} 94%|█████████▎| 9364/10000 [36:45:18<2:26:03, 13.78s/it] 94%|█████████▎| 9365/10000 [36:45:32<2:25:48, 13.78s/it] {'loss': 0.0024, 'learning_rate': 3.2300000000000004e-06, 'epoch': 12.26} 94%|█████████▎| 9365/10000 [36:45:32<2:25:48, 13.78s/it] 94%|█████████▎| 9366/10000 [36:45:46<2:25:45, 13.79s/it] {'loss': 0.0013, 'learning_rate': 3.225e-06, 'epoch': 12.26} 94%|█████████▎| 9366/10000 [36:45:46<2:25:45, 13.79s/it] 94%|█████████▎| 9367/10000 [36:46:00<2:25:20, 13.78s/it] {'loss': 0.0015, 'learning_rate': 3.22e-06, 'epoch': 12.26} 94%|█████████▎| 9367/10000 [36:46:00<2:25:20, 13.78s/it] 94%|█████████▎| 9368/10000 [36:46:13<2:24:49, 13.75s/it] {'loss': 0.0019, 'learning_rate': 3.215e-06, 'epoch': 12.26} 94%|█████████▎| 9368/10000 [36:46:13<2:24:49, 13.75s/it] 94%|█████████▎| 9369/10000 [36:46:27<2:24:22, 13.73s/it] {'loss': 0.001, 'learning_rate': 3.2099999999999998e-06, 'epoch': 12.26} 94%|█████████▎| 9369/10000 [36:46:27<2:24:22, 13.73s/it] 94%|█████████▎| 9370/10000 [36:46:41<2:24:14, 13.74s/it] {'loss': 0.002, 'learning_rate': 3.2050000000000002e-06, 'epoch': 12.26} 94%|█████████▎| 9370/10000 [36:46:41<2:24:14, 13.74s/it] 94%|█████████▎| 9371/10000 [36:46:55<2:24:22, 13.77s/it] {'loss': 0.0026, 'learning_rate': 3.2000000000000003e-06, 'epoch': 12.27} 94%|█████████▎| 9371/10000 [36:46:55<2:24:22, 13.77s/it] 94%|█████████▎| 9372/10000 [36:47:09<2:24:29, 13.80s/it] {'loss': 0.0027, 'learning_rate': 3.195e-06, 'epoch': 12.27} 94%|█████████▎| 9372/10000 [36:47:09<2:24:29, 13.80s/it] 94%|█████████▎| 9373/10000 [36:47:22<2:24:15, 13.80s/it] {'loss': 0.0023, 'learning_rate': 3.19e-06, 'epoch': 12.27} 94%|█████████▎| 9373/10000 [36:47:22<2:24:15, 13.80s/it] 94%|█████████▎| 9374/10000 [36:47:36<2:24:12, 13.82s/it] {'loss': 0.0024, 'learning_rate': 3.1850000000000004e-06, 'epoch': 12.27} 94%|█████████▎| 9374/10000 [36:47:36<2:24:12, 13.82s/it] 94%|█████████▍| 9375/10000 [36:47:50<2:24:06, 13.83s/it] {'loss': 0.0026, 'learning_rate': 3.1800000000000005e-06, 'epoch': 12.27} 94%|█████████▍| 9375/10000 [36:47:50<2:24:06, 13.83s/it] 94%|█████████▍| 9376/10000 [36:48:04<2:23:35, 13.81s/it] {'loss': 0.0028, 'learning_rate': 3.175e-06, 'epoch': 12.27} 94%|█████████▍| 9376/10000 [36:48:04<2:23:35, 13.81s/it] 94%|█████████▍| 9377/10000 [36:48:18<2:23:17, 13.80s/it] {'loss': 0.0024, 'learning_rate': 3.17e-06, 'epoch': 12.27} 94%|█████████▍| 9377/10000 [36:48:18<2:23:17, 13.80s/it] 94%|█████████▍| 9378/10000 [36:48:31<2:22:50, 13.78s/it] {'loss': 0.0017, 'learning_rate': 3.1649999999999998e-06, 'epoch': 12.27} 94%|█████████▍| 9378/10000 [36:48:31<2:22:50, 13.78s/it] 94%|█████████▍| 9379/10000 [36:48:45<2:22:34, 13.77s/it] {'loss': 0.0014, 'learning_rate': 3.1600000000000007e-06, 'epoch': 12.28} 94%|█████████▍| 9379/10000 [36:48:45<2:22:34, 13.77s/it] 94%|█████████▍| 9380/10000 [36:48:59<2:22:23, 13.78s/it] {'loss': 0.0013, 'learning_rate': 3.1550000000000003e-06, 'epoch': 12.28} 94%|█████████▍| 9380/10000 [36:48:59<2:22:23, 13.78s/it] 94%|█████████▍| 9381/10000 [36:49:13<2:22:22, 13.80s/it] {'loss': 0.0023, 'learning_rate': 3.1500000000000003e-06, 'epoch': 12.28} 94%|█████████▍| 9381/10000 [36:49:13<2:22:22, 13.80s/it] 94%|█████████▍| 9382/10000 [36:49:27<2:22:08, 13.80s/it] {'loss': 0.0023, 'learning_rate': 3.145e-06, 'epoch': 12.28} 94%|█████████▍| 9382/10000 [36:49:27<2:22:08, 13.80s/it] 94%|█████████▍| 9383/10000 [36:49:40<2:21:59, 13.81s/it] {'loss': 0.0012, 'learning_rate': 3.14e-06, 'epoch': 12.28} 94%|█████████▍| 9383/10000 [36:49:40<2:21:59, 13.81s/it] 94%|█████████▍| 9384/10000 [36:49:54<2:21:32, 13.79s/it] {'loss': 0.0015, 'learning_rate': 3.1350000000000005e-06, 'epoch': 12.28} 94%|█████████▍| 9384/10000 [36:49:54<2:21:32, 13.79s/it] 94%|█████████▍| 9385/10000 [36:50:08<2:21:19, 13.79s/it] {'loss': 0.0021, 'learning_rate': 3.13e-06, 'epoch': 12.28} 94%|█████████▍| 9385/10000 [36:50:08<2:21:19, 13.79s/it] 94%|█████████▍| 9386/10000 [36:50:22<2:21:50, 13.86s/it] {'loss': 0.0019, 'learning_rate': 3.125e-06, 'epoch': 12.29} 94%|█████████▍| 9386/10000 [36:50:22<2:21:50, 13.86s/it] 94%|█████████▍| 9387/10000 [36:50:36<2:21:06, 13.81s/it] {'loss': 0.0026, 'learning_rate': 3.12e-06, 'epoch': 12.29} 94%|█████████▍| 9387/10000 [36:50:36<2:21:06, 13.81s/it] 94%|█████████▍| 9388/10000 [36:50:49<2:20:47, 13.80s/it] {'loss': 0.0022, 'learning_rate': 3.1150000000000002e-06, 'epoch': 12.29} 94%|█████████▍| 9388/10000 [36:50:49<2:20:47, 13.80s/it] 94%|█████████▍| 9389/10000 [36:51:03<2:20:37, 13.81s/it] {'loss': 0.0033, 'learning_rate': 3.11e-06, 'epoch': 12.29} 94%|█████████▍| 9389/10000 [36:51:03<2:20:37, 13.81s/it] 94%|█████████▍| 9390/10000 [36:51:17<2:20:22, 13.81s/it] {'loss': 0.0027, 'learning_rate': 3.1050000000000003e-06, 'epoch': 12.29} 94%|█████████▍| 9390/10000 [36:51:17<2:20:22, 13.81s/it] 94%|█████████▍| 9391/10000 [36:51:31<2:19:52, 13.78s/it] {'loss': 0.0015, 'learning_rate': 3.1e-06, 'epoch': 12.29} 94%|█████████▍| 9391/10000 [36:51:31<2:19:52, 13.78s/it] 94%|█████████▍| 9392/10000 [36:51:45<2:19:36, 13.78s/it] {'loss': 0.0012, 'learning_rate': 3.095e-06, 'epoch': 12.29} 94%|█████████▍| 9392/10000 [36:51:45<2:19:36, 13.78s/it] 94%|█████████▍| 9393/10000 [36:51:58<2:19:31, 13.79s/it] {'loss': 0.0035, 'learning_rate': 3.09e-06, 'epoch': 12.29} 94%|█████████▍| 9393/10000 [36:51:58<2:19:31, 13.79s/it] 94%|█████████▍| 9394/10000 [36:52:12<2:19:18, 13.79s/it] {'loss': 0.0015, 'learning_rate': 3.085e-06, 'epoch': 12.3} 94%|█████████▍| 9394/10000 [36:52:12<2:19:18, 13.79s/it] 94%|█████████▍| 9395/10000 [36:52:26<2:19:07, 13.80s/it] {'loss': 0.0018, 'learning_rate': 3.08e-06, 'epoch': 12.3} 94%|█████████▍| 9395/10000 [36:52:26<2:19:07, 13.80s/it] 94%|█████████▍| 9396/10000 [36:52:40<2:18:54, 13.80s/it] {'loss': 0.0017, 'learning_rate': 3.075e-06, 'epoch': 12.3} 94%|█████████▍| 9396/10000 [36:52:40<2:18:54, 13.80s/it] 94%|█████████▍| 9397/10000 [36:52:54<2:18:35, 13.79s/it] {'loss': 0.0015, 'learning_rate': 3.0700000000000003e-06, 'epoch': 12.3} 94%|█████████▍| 9397/10000 [36:52:54<2:18:35, 13.79s/it] 94%|█████████▍| 9398/10000 [36:53:07<2:18:18, 13.79s/it] {'loss': 0.0016, 'learning_rate': 3.0650000000000003e-06, 'epoch': 12.3} 94%|█████████▍| 9398/10000 [36:53:07<2:18:18, 13.79s/it] 94%|█████████▍| 9399/10000 [36:53:21<2:17:57, 13.77s/it] {'loss': 0.0023, 'learning_rate': 3.06e-06, 'epoch': 12.3} 94%|█████████▍| 9399/10000 [36:53:21<2:17:57, 13.77s/it] 94%|█████████▍| 9400/10000 [36:53:35<2:18:04, 13.81s/it] {'loss': 0.0024, 'learning_rate': 3.0550000000000004e-06, 'epoch': 12.3} 94%|█████████▍| 9400/10000 [36:53:35<2:18:04, 13.81s/it] 94%|█████████▍| 9401/10000 [36:53:49<2:17:44, 13.80s/it] {'loss': 0.002, 'learning_rate': 3.05e-06, 'epoch': 12.3} 94%|█████████▍| 9401/10000 [36:53:49<2:17:44, 13.80s/it] 94%|█████████▍| 9402/10000 [36:54:03<2:17:31, 13.80s/it] {'loss': 0.002, 'learning_rate': 3.0450000000000005e-06, 'epoch': 12.31} 94%|█████████▍| 9402/10000 [36:54:03<2:17:31, 13.80s/it] 94%|█████████▍| 9403/10000 [36:54:16<2:17:15, 13.80s/it] {'loss': 0.0026, 'learning_rate': 3.04e-06, 'epoch': 12.31} 94%|█████████▍| 9403/10000 [36:54:16<2:17:15, 13.80s/it] 94%|█████████▍| 9404/10000 [36:54:30<2:17:13, 13.82s/it] {'loss': 0.002, 'learning_rate': 3.035e-06, 'epoch': 12.31} 94%|█████████▍| 9404/10000 [36:54:30<2:17:13, 13.82s/it] 94%|█████████▍| 9405/10000 [36:54:44<2:17:03, 13.82s/it] {'loss': 0.006, 'learning_rate': 3.0300000000000002e-06, 'epoch': 12.31} 94%|█████████▍| 9405/10000 [36:54:44<2:17:03, 13.82s/it] 94%|█████████▍| 9406/10000 [36:54:58<2:17:09, 13.85s/it] {'loss': 0.0021, 'learning_rate': 3.0250000000000003e-06, 'epoch': 12.31} 94%|█████████▍| 9406/10000 [36:54:58<2:17:09, 13.85s/it] 94%|█████████▍| 9407/10000 [36:55:12<2:17:01, 13.87s/it] {'loss': 0.0029, 'learning_rate': 3.0200000000000003e-06, 'epoch': 12.31} 94%|█████████▍| 9407/10000 [36:55:12<2:17:01, 13.87s/it] 94%|█████████▍| 9408/10000 [36:55:26<2:16:39, 13.85s/it] {'loss': 0.0018, 'learning_rate': 3.015e-06, 'epoch': 12.31} 94%|█████████▍| 9408/10000 [36:55:26<2:16:39, 13.85s/it] 94%|█████████▍| 9409/10000 [36:55:39<2:16:14, 13.83s/it] {'loss': 0.0024, 'learning_rate': 3.01e-06, 'epoch': 12.32} 94%|█████████▍| 9409/10000 [36:55:39<2:16:14, 13.83s/it] 94%|█████████▍| 9410/10000 [36:55:53<2:15:48, 13.81s/it] {'loss': 0.0023, 'learning_rate': 3.005e-06, 'epoch': 12.32} 94%|█████████▍| 9410/10000 [36:55:53<2:15:48, 13.81s/it] 94%|█████████▍| 9411/10000 [36:56:07<2:15:41, 13.82s/it] {'loss': 0.0022, 'learning_rate': 3e-06, 'epoch': 12.32} 94%|█████████▍| 9411/10000 [36:56:07<2:15:41, 13.82s/it] 94%|█████████▍| 9412/10000 [36:56:21<2:15:32, 13.83s/it] {'loss': 0.0025, 'learning_rate': 2.995e-06, 'epoch': 12.32} 94%|█████████▍| 9412/10000 [36:56:21<2:15:32, 13.83s/it] 94%|█████████▍| 9413/10000 [36:56:35<2:15:02, 13.80s/it] {'loss': 0.0042, 'learning_rate': 2.99e-06, 'epoch': 12.32} 94%|█████████▍| 9413/10000 [36:56:35<2:15:02, 13.80s/it] 94%|█████████▍| 9414/10000 [36:56:48<2:14:37, 13.78s/it] {'loss': 0.0028, 'learning_rate': 2.9850000000000002e-06, 'epoch': 12.32} 94%|█████████▍| 9414/10000 [36:56:48<2:14:37, 13.78s/it] 94%|█████████▍| 9415/10000 [36:57:02<2:14:23, 13.78s/it] {'loss': 0.0015, 'learning_rate': 2.9800000000000003e-06, 'epoch': 12.32} 94%|█████████▍| 9415/10000 [36:57:02<2:14:23, 13.78s/it] 94%|█████████▍| 9416/10000 [36:57:16<2:14:24, 13.81s/it] {'loss': 0.0026, 'learning_rate': 2.975e-06, 'epoch': 12.32} 94%|█████████▍| 9416/10000 [36:57:16<2:14:24, 13.81s/it] 94%|█████████▍| 9417/10000 [36:57:30<2:14:16, 13.82s/it] {'loss': 0.0016, 'learning_rate': 2.9700000000000004e-06, 'epoch': 12.33} 94%|█████████▍| 9417/10000 [36:57:30<2:14:16, 13.82s/it] 94%|█████████▍| 9418/10000 [36:57:44<2:14:05, 13.82s/it] {'loss': 0.0025, 'learning_rate': 2.965e-06, 'epoch': 12.33} 94%|█████████▍| 9418/10000 [36:57:44<2:14:05, 13.82s/it] 94%|█████████▍| 9419/10000 [36:57:58<2:14:04, 13.85s/it] {'loss': 0.0023, 'learning_rate': 2.9600000000000005e-06, 'epoch': 12.33} 94%|█████████▍| 9419/10000 [36:57:58<2:14:04, 13.85s/it] 94%|█████████▍| 9420/10000 [36:58:12<2:13:56, 13.86s/it] {'loss': 0.0014, 'learning_rate': 2.955e-06, 'epoch': 12.33} 94%|█████████▍| 9420/10000 [36:58:12<2:13:56, 13.86s/it] 94%|█████████▍| 9421/10000 [36:58:25<2:13:34, 13.84s/it] {'loss': 0.0015, 'learning_rate': 2.95e-06, 'epoch': 12.33} 94%|█████████▍| 9421/10000 [36:58:25<2:13:34, 13.84s/it] 94%|█████████▍| 9422/10000 [36:58:39<2:13:32, 13.86s/it] {'loss': 0.0019, 'learning_rate': 2.945e-06, 'epoch': 12.33} 94%|█████████▍| 9422/10000 [36:58:39<2:13:32, 13.86s/it] 94%|█████████▍| 9423/10000 [36:58:53<2:12:55, 13.82s/it] {'loss': 0.0019, 'learning_rate': 2.9400000000000002e-06, 'epoch': 12.33} 94%|█████████▍| 9423/10000 [36:58:53<2:12:55, 13.82s/it] 94%|█████████▍| 9424/10000 [36:59:07<2:12:53, 13.84s/it] {'loss': 0.0022, 'learning_rate': 2.9350000000000003e-06, 'epoch': 12.34} 94%|█████████▍| 9424/10000 [36:59:07<2:12:53, 13.84s/it] 94%|█████████▍| 9425/10000 [36:59:21<2:12:41, 13.85s/it] {'loss': 0.0021, 'learning_rate': 2.93e-06, 'epoch': 12.34} 94%|█████████▍| 9425/10000 [36:59:21<2:12:41, 13.85s/it] 94%|█████████▍| 9426/10000 [36:59:35<2:12:27, 13.85s/it] {'loss': 0.0012, 'learning_rate': 2.9250000000000004e-06, 'epoch': 12.34} 94%|█████████▍| 9426/10000 [36:59:35<2:12:27, 13.85s/it] 94%|█████████▍| 9427/10000 [36:59:48<2:12:15, 13.85s/it] {'loss': 0.0026, 'learning_rate': 2.92e-06, 'epoch': 12.34} 94%|█████████▍| 9427/10000 [36:59:48<2:12:15, 13.85s/it] 94%|█████████▍| 9428/10000 [37:00:02<2:11:45, 13.82s/it] {'loss': 0.0019, 'learning_rate': 2.915e-06, 'epoch': 12.34} 94%|█████████▍| 9428/10000 [37:00:02<2:11:45, 13.82s/it] 94%|█████████▍| 9429/10000 [37:00:16<2:11:32, 13.82s/it] {'loss': 0.0023, 'learning_rate': 2.91e-06, 'epoch': 12.34} 94%|█████████▍| 9429/10000 [37:00:16<2:11:32, 13.82s/it] 94%|█████████▍| 9430/10000 [37:00:30<2:11:06, 13.80s/it] {'loss': 0.0027, 'learning_rate': 2.905e-06, 'epoch': 12.34} 94%|█████████▍| 9430/10000 [37:00:30<2:11:06, 13.80s/it] 94%|█████████▍| 9431/10000 [37:00:44<2:11:04, 13.82s/it] {'loss': 0.0017, 'learning_rate': 2.9e-06, 'epoch': 12.34} 94%|█████████▍| 9431/10000 [37:00:44<2:11:04, 13.82s/it] 94%|█████████▍| 9432/10000 [37:00:58<2:11:06, 13.85s/it] {'loss': 0.002, 'learning_rate': 2.8950000000000002e-06, 'epoch': 12.35} 94%|█████████▍| 9432/10000 [37:00:58<2:11:06, 13.85s/it] 94%|█████████▍| 9433/10000 [37:01:11<2:10:32, 13.81s/it] {'loss': 0.0023, 'learning_rate': 2.89e-06, 'epoch': 12.35} 94%|█████████▍| 9433/10000 [37:01:11<2:10:32, 13.81s/it] 94%|█████████▍| 9434/10000 [37:01:25<2:10:29, 13.83s/it] {'loss': 0.002, 'learning_rate': 2.8850000000000003e-06, 'epoch': 12.35} 94%|█████████▍| 9434/10000 [37:01:25<2:10:29, 13.83s/it] 94%|█████████▍| 9435/10000 [37:01:39<2:10:15, 13.83s/it] {'loss': 0.0029, 'learning_rate': 2.88e-06, 'epoch': 12.35} 94%|█████████▍| 9435/10000 [37:01:39<2:10:15, 13.83s/it] 94%|█████████▍| 9436/10000 [37:01:53<2:09:48, 13.81s/it] {'loss': 0.0017, 'learning_rate': 2.8750000000000004e-06, 'epoch': 12.35} 94%|█████████▍| 9436/10000 [37:01:53<2:09:48, 13.81s/it] 94%|█████████▍| 9437/10000 [37:02:06<2:09:22, 13.79s/it] {'loss': 0.0021, 'learning_rate': 2.87e-06, 'epoch': 12.35} 94%|█████████▍| 9437/10000 [37:02:06<2:09:22, 13.79s/it] 94%|█████████▍| 9438/10000 [37:02:20<2:09:08, 13.79s/it] {'loss': 0.0026, 'learning_rate': 2.865e-06, 'epoch': 12.35} 94%|█████████▍| 9438/10000 [37:02:20<2:09:08, 13.79s/it] 94%|█████████▍| 9439/10000 [37:02:34<2:09:02, 13.80s/it] {'loss': 0.0023, 'learning_rate': 2.86e-06, 'epoch': 12.35} 94%|█████████▍| 9439/10000 [37:02:34<2:09:02, 13.80s/it] 94%|█████████▍| 9440/10000 [37:02:48<2:08:51, 13.81s/it] {'loss': 0.0018, 'learning_rate': 2.855e-06, 'epoch': 12.36} 94%|█████████▍| 9440/10000 [37:02:48<2:08:51, 13.81s/it] 94%|█████████▍| 9441/10000 [37:03:02<2:08:48, 13.83s/it] {'loss': 0.0019, 'learning_rate': 2.8500000000000002e-06, 'epoch': 12.36} 94%|█████████▍| 9441/10000 [37:03:02<2:08:48, 13.83s/it] 94%|█████████▍| 9442/10000 [37:03:16<2:08:25, 13.81s/it] {'loss': 0.0016, 'learning_rate': 2.8450000000000003e-06, 'epoch': 12.36} 94%|█████████▍| 9442/10000 [37:03:16<2:08:25, 13.81s/it] 94%|█████████▍| 9443/10000 [37:03:29<2:08:02, 13.79s/it] {'loss': 0.0017, 'learning_rate': 2.8400000000000003e-06, 'epoch': 12.36} 94%|█████████▍| 9443/10000 [37:03:29<2:08:02, 13.79s/it] 94%|█████████▍| 9444/10000 [37:03:43<2:07:47, 13.79s/it] {'loss': 0.0028, 'learning_rate': 2.835e-06, 'epoch': 12.36} 94%|█████████▍| 9444/10000 [37:03:43<2:07:47, 13.79s/it] 94%|█████████▍| 9445/10000 [37:03:57<2:07:42, 13.81s/it] {'loss': 0.0023, 'learning_rate': 2.83e-06, 'epoch': 12.36} 94%|█████████▍| 9445/10000 [37:03:57<2:07:42, 13.81s/it] 94%|█████████▍| 9446/10000 [37:04:11<2:07:40, 13.83s/it] {'loss': 0.0028, 'learning_rate': 2.825e-06, 'epoch': 12.36} 94%|█████████▍| 9446/10000 [37:04:11<2:07:40, 13.83s/it] 94%|█████████▍| 9447/10000 [37:04:25<2:07:25, 13.83s/it] {'loss': 0.0024, 'learning_rate': 2.82e-06, 'epoch': 12.37} 94%|█████████▍| 9447/10000 [37:04:25<2:07:25, 13.83s/it] 94%|█████████▍| 9448/10000 [37:04:38<2:07:07, 13.82s/it] {'loss': 0.0024, 'learning_rate': 2.815e-06, 'epoch': 12.37} 94%|█████████▍| 9448/10000 [37:04:38<2:07:07, 13.82s/it] 94%|█████████▍| 9449/10000 [37:04:52<2:07:02, 13.83s/it] {'loss': 0.0022, 'learning_rate': 2.81e-06, 'epoch': 12.37} 94%|█████████▍| 9449/10000 [37:04:52<2:07:02, 13.83s/it] 94%|█████████▍| 9450/10000 [37:05:06<2:06:42, 13.82s/it] {'loss': 0.0024, 'learning_rate': 2.805e-06, 'epoch': 12.37} 94%|█████████▍| 9450/10000 [37:05:06<2:06:42, 13.82s/it] 95%|█████████▍| 9451/10000 [37:05:20<2:06:26, 13.82s/it] {'loss': 0.0029, 'learning_rate': 2.8000000000000003e-06, 'epoch': 12.37} 95%|█████████▍| 9451/10000 [37:05:20<2:06:26, 13.82s/it] 95%|█████████▍| 9452/10000 [37:05:34<2:06:11, 13.82s/it] {'loss': 0.0022, 'learning_rate': 2.795e-06, 'epoch': 12.37} 95%|█████████▍| 9452/10000 [37:05:34<2:06:11, 13.82s/it] 95%|█████████▍| 9453/10000 [37:05:48<2:06:14, 13.85s/it] {'loss': 0.0027, 'learning_rate': 2.7900000000000004e-06, 'epoch': 12.37} 95%|█████████▍| 9453/10000 [37:05:48<2:06:14, 13.85s/it] 95%|█████████▍| 9454/10000 [37:06:01<2:05:50, 13.83s/it] {'loss': 0.0028, 'learning_rate': 2.785e-06, 'epoch': 12.37} 95%|█████████▍| 9454/10000 [37:06:01<2:05:50, 13.83s/it] 95%|█████████▍| 9455/10000 [37:06:15<2:05:39, 13.83s/it] {'loss': 0.0019, 'learning_rate': 2.78e-06, 'epoch': 12.38} 95%|█████████▍| 9455/10000 [37:06:15<2:05:39, 13.83s/it] 95%|█████████▍| 9456/10000 [37:06:29<2:05:03, 13.79s/it] {'loss': 0.0024, 'learning_rate': 2.775e-06, 'epoch': 12.38} 95%|█████████▍| 9456/10000 [37:06:29<2:05:03, 13.79s/it] 95%|█████████▍| 9457/10000 [37:06:43<2:04:43, 13.78s/it] {'loss': 0.0022, 'learning_rate': 2.77e-06, 'epoch': 12.38} 95%|█████████▍| 9457/10000 [37:06:43<2:04:43, 13.78s/it] 95%|█████████▍| 9458/10000 [37:06:57<2:04:42, 13.80s/it] {'loss': 0.0014, 'learning_rate': 2.765e-06, 'epoch': 12.38} 95%|█████████▍| 9458/10000 [37:06:57<2:04:42, 13.80s/it] 95%|█████████▍| 9459/10000 [37:07:10<2:04:32, 13.81s/it] {'loss': 0.0014, 'learning_rate': 2.7600000000000003e-06, 'epoch': 12.38} 95%|█████████▍| 9459/10000 [37:07:10<2:04:32, 13.81s/it] 95%|█████████▍| 9460/10000 [37:07:24<2:04:04, 13.79s/it] {'loss': 0.0019, 'learning_rate': 2.7550000000000003e-06, 'epoch': 12.38} 95%|█████████▍| 9460/10000 [37:07:24<2:04:04, 13.79s/it] 95%|█████████▍| 9461/10000 [37:07:38<2:03:59, 13.80s/it] {'loss': 0.0022, 'learning_rate': 2.7500000000000004e-06, 'epoch': 12.38} 95%|█████████▍| 9461/10000 [37:07:38<2:03:59, 13.80s/it] 95%|█████████▍| 9462/10000 [37:07:52<2:03:46, 13.80s/it] {'loss': 0.0019, 'learning_rate': 2.745e-06, 'epoch': 12.38} 95%|█████████▍| 9462/10000 [37:07:52<2:03:46, 13.80s/it] 95%|█████████▍| 9463/10000 [37:08:06<2:03:31, 13.80s/it] {'loss': 0.0024, 'learning_rate': 2.74e-06, 'epoch': 12.39} 95%|█████████▍| 9463/10000 [37:08:06<2:03:31, 13.80s/it] 95%|█████████▍| 9464/10000 [37:08:19<2:02:58, 13.77s/it] {'loss': 0.002, 'learning_rate': 2.735e-06, 'epoch': 12.39} 95%|█████████▍| 9464/10000 [37:08:19<2:02:58, 13.77s/it] 95%|█████████▍| 9465/10000 [37:08:33<2:02:40, 13.76s/it] {'loss': 0.0026, 'learning_rate': 2.73e-06, 'epoch': 12.39} 95%|█████████▍| 9465/10000 [37:08:33<2:02:40, 13.76s/it] 95%|█████████▍| 9466/10000 [37:08:47<2:02:12, 13.73s/it] {'loss': 0.0028, 'learning_rate': 2.725e-06, 'epoch': 12.39} 95%|█████████▍| 9466/10000 [37:08:47<2:02:12, 13.73s/it] 95%|█████████▍| 9467/10000 [37:09:00<2:01:51, 13.72s/it] {'loss': 0.002, 'learning_rate': 2.72e-06, 'epoch': 12.39} 95%|█████████▍| 9467/10000 [37:09:00<2:01:51, 13.72s/it] 95%|█████████▍| 9468/10000 [37:09:14<2:02:04, 13.77s/it] {'loss': 0.0018, 'learning_rate': 2.7150000000000003e-06, 'epoch': 12.39} 95%|█████████▍| 9468/10000 [37:09:14<2:02:04, 13.77s/it] 95%|█████████▍| 9469/10000 [37:09:28<2:02:11, 13.81s/it] {'loss': 0.0041, 'learning_rate': 2.71e-06, 'epoch': 12.39} 95%|█████████▍| 9469/10000 [37:09:28<2:02:11, 13.81s/it] 95%|█████████▍| 9470/10000 [37:09:42<2:02:02, 13.82s/it] {'loss': 0.0022, 'learning_rate': 2.7050000000000004e-06, 'epoch': 12.4} 95%|█████████▍| 9470/10000 [37:09:42<2:02:02, 13.82s/it] 95%|█████████▍| 9471/10000 [37:09:56<2:01:56, 13.83s/it] {'loss': 0.0009, 'learning_rate': 2.7e-06, 'epoch': 12.4} 95%|█████████▍| 9471/10000 [37:09:56<2:01:56, 13.83s/it] 95%|█████████▍| 9472/10000 [37:10:10<2:01:28, 13.80s/it] {'loss': 0.002, 'learning_rate': 2.6950000000000005e-06, 'epoch': 12.4} 95%|█████████▍| 9472/10000 [37:10:10<2:01:28, 13.80s/it] 95%|█████████▍| 9473/10000 [37:10:23<2:01:18, 13.81s/it] {'loss': 0.0022, 'learning_rate': 2.69e-06, 'epoch': 12.4} 95%|█████████▍| 9473/10000 [37:10:23<2:01:18, 13.81s/it] 95%|█████████▍| 9474/10000 [37:10:37<2:01:06, 13.81s/it] {'loss': 0.0023, 'learning_rate': 2.685e-06, 'epoch': 12.4} 95%|█████████▍| 9474/10000 [37:10:37<2:01:06, 13.81s/it] 95%|█████████▍| 9475/10000 [37:10:51<2:01:20, 13.87s/it] {'loss': 0.0025, 'learning_rate': 2.68e-06, 'epoch': 12.4} 95%|█████████▍| 9475/10000 [37:10:51<2:01:20, 13.87s/it] 95%|█████████▍| 9476/10000 [37:11:05<2:00:46, 13.83s/it] {'loss': 0.0018, 'learning_rate': 2.6750000000000002e-06, 'epoch': 12.4} 95%|█████████▍| 9476/10000 [37:11:05<2:00:46, 13.83s/it] 95%|█████████▍| 9477/10000 [37:11:19<2:00:36, 13.84s/it] {'loss': 0.0016, 'learning_rate': 2.6700000000000003e-06, 'epoch': 12.4} 95%|█████████▍| 9477/10000 [37:11:19<2:00:36, 13.84s/it] 95%|█████████▍| 9478/10000 [37:11:33<2:00:21, 13.83s/it] {'loss': 0.0025, 'learning_rate': 2.6650000000000003e-06, 'epoch': 12.41} 95%|█████████▍| 9478/10000 [37:11:33<2:00:21, 13.83s/it] 95%|█████████▍| 9479/10000 [37:11:46<2:00:11, 13.84s/it] {'loss': 0.0016, 'learning_rate': 2.66e-06, 'epoch': 12.41} 95%|█████████▍| 9479/10000 [37:11:47<2:00:11, 13.84s/it] 95%|█████████▍| 9480/10000 [37:12:00<1:59:45, 13.82s/it] {'loss': 0.0018, 'learning_rate': 2.655e-06, 'epoch': 12.41} 95%|█████████▍| 9480/10000 [37:12:00<1:59:45, 13.82s/it] 95%|█████████▍| 9481/10000 [37:12:14<1:59:41, 13.84s/it] {'loss': 0.0031, 'learning_rate': 2.65e-06, 'epoch': 12.41} 95%|█████████▍| 9481/10000 [37:12:14<1:59:41, 13.84s/it] 95%|█████████▍| 9482/10000 [37:12:28<1:59:12, 13.81s/it] {'loss': 0.0037, 'learning_rate': 2.645e-06, 'epoch': 12.41} 95%|█████████▍| 9482/10000 [37:12:28<1:59:12, 13.81s/it] 95%|█████████▍| 9483/10000 [37:12:42<1:58:47, 13.79s/it] {'loss': 0.0046, 'learning_rate': 2.64e-06, 'epoch': 12.41} 95%|█████████▍| 9483/10000 [37:12:42<1:58:47, 13.79s/it] 95%|█████████▍| 9484/10000 [37:12:55<1:58:37, 13.79s/it] {'loss': 0.0018, 'learning_rate': 2.6349999999999998e-06, 'epoch': 12.41} 95%|█████████▍| 9484/10000 [37:12:55<1:58:37, 13.79s/it] 95%|█████████▍| 9485/10000 [37:13:09<1:58:07, 13.76s/it] {'loss': 0.0018, 'learning_rate': 2.6300000000000002e-06, 'epoch': 12.41} 95%|█████████▍| 9485/10000 [37:13:09<1:58:07, 13.76s/it] 95%|█████████▍| 9486/10000 [37:13:23<1:58:11, 13.80s/it] {'loss': 0.002, 'learning_rate': 2.625e-06, 'epoch': 12.42} 95%|█████████▍| 9486/10000 [37:13:23<1:58:11, 13.80s/it] 95%|█████████▍| 9487/10000 [37:13:37<1:58:09, 13.82s/it] {'loss': 0.0024, 'learning_rate': 2.6200000000000003e-06, 'epoch': 12.42} 95%|█████████▍| 9487/10000 [37:13:37<1:58:09, 13.82s/it] 95%|█████████▍| 9488/10000 [37:13:51<1:58:22, 13.87s/it] {'loss': 0.0012, 'learning_rate': 2.615e-06, 'epoch': 12.42} 95%|█████████▍| 9488/10000 [37:13:51<1:58:22, 13.87s/it] 95%|█████████▍| 9489/10000 [37:14:05<1:57:57, 13.85s/it] {'loss': 0.0025, 'learning_rate': 2.6100000000000004e-06, 'epoch': 12.42} 95%|█████████▍| 9489/10000 [37:14:05<1:57:57, 13.85s/it] 95%|█████████▍| 9490/10000 [37:14:18<1:57:23, 13.81s/it] {'loss': 0.0024, 'learning_rate': 2.605e-06, 'epoch': 12.42} 95%|█████████▍| 9490/10000 [37:14:18<1:57:23, 13.81s/it] 95%|█████████▍| 9491/10000 [37:14:32<1:57:09, 13.81s/it] {'loss': 0.0018, 'learning_rate': 2.6e-06, 'epoch': 12.42} 95%|█████████▍| 9491/10000 [37:14:32<1:57:09, 13.81s/it] 95%|█████████▍| 9492/10000 [37:14:46<1:56:48, 13.80s/it] {'loss': 0.002, 'learning_rate': 2.595e-06, 'epoch': 12.42} 95%|█████████▍| 9492/10000 [37:14:46<1:56:48, 13.80s/it] 95%|█████████▍| 9493/10000 [37:15:00<1:56:35, 13.80s/it] {'loss': 0.0031, 'learning_rate': 2.59e-06, 'epoch': 12.43} 95%|█████████▍| 9493/10000 [37:15:00<1:56:35, 13.80s/it] 95%|█████████▍| 9494/10000 [37:15:14<1:56:16, 13.79s/it] {'loss': 0.0012, 'learning_rate': 2.5850000000000002e-06, 'epoch': 12.43} 95%|█████████▍| 9494/10000 [37:15:14<1:56:16, 13.79s/it] 95%|█████████▍| 9495/10000 [37:15:27<1:55:57, 13.78s/it] {'loss': 0.003, 'learning_rate': 2.5800000000000003e-06, 'epoch': 12.43} 95%|█████████▍| 9495/10000 [37:15:27<1:55:57, 13.78s/it] 95%|█████████▍| 9496/10000 [37:15:41<1:55:25, 13.74s/it] {'loss': 0.0014, 'learning_rate': 2.575e-06, 'epoch': 12.43} 95%|█████████▍| 9496/10000 [37:15:41<1:55:25, 13.74s/it] 95%|█████████▍| 9497/10000 [37:15:55<1:55:22, 13.76s/it] {'loss': 0.0018, 'learning_rate': 2.5700000000000004e-06, 'epoch': 12.43} 95%|█████████▍| 9497/10000 [37:15:55<1:55:22, 13.76s/it] 95%|█████████▍| 9498/10000 [37:16:09<1:55:30, 13.81s/it] {'loss': 0.0019, 'learning_rate': 2.565e-06, 'epoch': 12.43} 95%|█████████▍| 9498/10000 [37:16:09<1:55:30, 13.81s/it] 95%|█████████▍| 9499/10000 [37:16:22<1:55:22, 13.82s/it] {'loss': 0.0018, 'learning_rate': 2.56e-06, 'epoch': 12.43} 95%|█████████▍| 9499/10000 [37:16:23<1:55:22, 13.82s/it] 95%|█████████▌| 9500/10000 [37:16:36<1:54:59, 13.80s/it] {'loss': 0.0012, 'learning_rate': 2.555e-06, 'epoch': 12.43} 95%|█████████▌| 9500/10000 [37:16:36<1:54:59, 13.80s/it] 95%|█████████▌| 9501/10000 [37:16:50<1:55:09, 13.85s/it] {'loss': 0.0015, 'learning_rate': 2.55e-06, 'epoch': 12.44} 95%|█████████▌| 9501/10000 [37:16:50<1:55:09, 13.85s/it] 95%|█████████▌| 9502/10000 [37:17:04<1:54:43, 13.82s/it] {'loss': 0.003, 'learning_rate': 2.545e-06, 'epoch': 12.44} 95%|█████████▌| 9502/10000 [37:17:04<1:54:43, 13.82s/it] 95%|█████████▌| 9503/10000 [37:17:18<1:54:12, 13.79s/it] {'loss': 0.0015, 'learning_rate': 2.54e-06, 'epoch': 12.44} 95%|█████████▌| 9503/10000 [37:17:18<1:54:12, 13.79s/it] 95%|█████████▌| 9504/10000 [37:17:31<1:53:52, 13.78s/it] {'loss': 0.0029, 'learning_rate': 2.5350000000000003e-06, 'epoch': 12.44} 95%|█████████▌| 9504/10000 [37:17:31<1:53:52, 13.78s/it] 95%|█████████▌| 9505/10000 [37:17:45<1:53:42, 13.78s/it] {'loss': 0.0029, 'learning_rate': 2.53e-06, 'epoch': 12.44} 95%|█████████▌| 9505/10000 [37:17:45<1:53:42, 13.78s/it] 95%|█████████▌| 9506/10000 [37:17:59<1:53:41, 13.81s/it] {'loss': 0.0023, 'learning_rate': 2.5250000000000004e-06, 'epoch': 12.44} 95%|█████████▌| 9506/10000 [37:17:59<1:53:41, 13.81s/it] 95%|█████████▌| 9507/10000 [37:18:13<1:53:23, 13.80s/it] {'loss': 0.0024, 'learning_rate': 2.52e-06, 'epoch': 12.44} 95%|█████████▌| 9507/10000 [37:18:13<1:53:23, 13.80s/it] 95%|█████████▌| 9508/10000 [37:18:27<1:53:15, 13.81s/it] {'loss': 0.0009, 'learning_rate': 2.515e-06, 'epoch': 12.45} 95%|█████████▌| 9508/10000 [37:18:27<1:53:15, 13.81s/it] 95%|█████████▌| 9509/10000 [37:18:40<1:52:58, 13.81s/it] {'loss': 0.0021, 'learning_rate': 2.51e-06, 'epoch': 12.45} 95%|█████████▌| 9509/10000 [37:18:41<1:52:58, 13.81s/it] 95%|█████████▌| 9510/10000 [37:18:54<1:53:03, 13.84s/it] {'loss': 0.0017, 'learning_rate': 2.505e-06, 'epoch': 12.45} 95%|█████████▌| 9510/10000 [37:18:54<1:53:03, 13.84s/it] 95%|█████████▌| 9511/10000 [37:19:08<1:52:45, 13.84s/it] {'loss': 0.0024, 'learning_rate': 2.5e-06, 'epoch': 12.45} 95%|█████████▌| 9511/10000 [37:19:08<1:52:45, 13.84s/it] 95%|█████████▌| 9512/10000 [37:19:22<1:52:45, 13.86s/it] {'loss': 0.0031, 'learning_rate': 2.4950000000000003e-06, 'epoch': 12.45} 95%|█████████▌| 9512/10000 [37:19:22<1:52:45, 13.86s/it] 95%|█████████▌| 9513/10000 [37:19:36<1:52:10, 13.82s/it] {'loss': 0.0037, 'learning_rate': 2.49e-06, 'epoch': 12.45} 95%|█████████▌| 9513/10000 [37:19:36<1:52:10, 13.82s/it] 95%|█████████▌| 9514/10000 [37:19:50<1:51:41, 13.79s/it] {'loss': 0.0014, 'learning_rate': 2.4850000000000003e-06, 'epoch': 12.45} 95%|█████████▌| 9514/10000 [37:19:50<1:51:41, 13.79s/it] 95%|█████████▌| 9515/10000 [37:20:03<1:51:41, 13.82s/it] {'loss': 0.0015, 'learning_rate': 2.48e-06, 'epoch': 12.45} 95%|█████████▌| 9515/10000 [37:20:04<1:51:41, 13.82s/it] 95%|█████████▌| 9516/10000 [37:20:17<1:51:36, 13.84s/it] {'loss': 0.0027, 'learning_rate': 2.4750000000000004e-06, 'epoch': 12.46} 95%|█████████▌| 9516/10000 [37:20:17<1:51:36, 13.84s/it] 95%|█████████▌| 9517/10000 [37:20:31<1:51:19, 13.83s/it] {'loss': 0.0029, 'learning_rate': 2.47e-06, 'epoch': 12.46} 95%|█████████▌| 9517/10000 [37:20:31<1:51:19, 13.83s/it] 95%|█████████▌| 9518/10000 [37:20:45<1:51:20, 13.86s/it] {'loss': 0.001, 'learning_rate': 2.465e-06, 'epoch': 12.46} 95%|█████████▌| 9518/10000 [37:20:45<1:51:20, 13.86s/it] 95%|█████████▌| 9519/10000 [37:20:59<1:51:18, 13.88s/it] {'loss': 0.0028, 'learning_rate': 2.46e-06, 'epoch': 12.46} 95%|█████████▌| 9519/10000 [37:20:59<1:51:18, 13.88s/it] 95%|█████████▌| 9520/10000 [37:21:13<1:50:59, 13.87s/it] {'loss': 0.0027, 'learning_rate': 2.4550000000000002e-06, 'epoch': 12.46} 95%|█████████▌| 9520/10000 [37:21:13<1:50:59, 13.87s/it] 95%|█████████▌| 9521/10000 [37:21:27<1:50:36, 13.86s/it] {'loss': 0.0018, 'learning_rate': 2.4500000000000003e-06, 'epoch': 12.46} 95%|█████████▌| 9521/10000 [37:21:27<1:50:36, 13.86s/it] 95%|█████████▌| 9522/10000 [37:21:41<1:50:28, 13.87s/it] {'loss': 0.0015, 'learning_rate': 2.445e-06, 'epoch': 12.46} 95%|█████████▌| 9522/10000 [37:21:41<1:50:28, 13.87s/it] 95%|█████████▌| 9523/10000 [37:21:54<1:50:13, 13.86s/it] {'loss': 0.0034, 'learning_rate': 2.4400000000000004e-06, 'epoch': 12.46} 95%|█████████▌| 9523/10000 [37:21:54<1:50:13, 13.86s/it] 95%|█████████▌| 9524/10000 [37:22:08<1:49:48, 13.84s/it] {'loss': 0.0019, 'learning_rate': 2.435e-06, 'epoch': 12.47} 95%|█████████▌| 9524/10000 [37:22:08<1:49:48, 13.84s/it] 95%|█████████▌| 9525/10000 [37:22:22<1:49:32, 13.84s/it] {'loss': 0.003, 'learning_rate': 2.43e-06, 'epoch': 12.47} 95%|█████████▌| 9525/10000 [37:22:22<1:49:32, 13.84s/it] 95%|█████████▌| 9526/10000 [37:22:36<1:49:01, 13.80s/it] {'loss': 0.0023, 'learning_rate': 2.425e-06, 'epoch': 12.47} 95%|█████████▌| 9526/10000 [37:22:36<1:49:01, 13.80s/it] 95%|█████████▌| 9527/10000 [37:22:50<1:48:38, 13.78s/it] {'loss': 0.0017, 'learning_rate': 2.42e-06, 'epoch': 12.47} 95%|█████████▌| 9527/10000 [37:22:50<1:48:38, 13.78s/it] 95%|█████████▌| 9528/10000 [37:23:03<1:48:30, 13.79s/it] {'loss': 0.0036, 'learning_rate': 2.415e-06, 'epoch': 12.47} 95%|█████████▌| 9528/10000 [37:23:03<1:48:30, 13.79s/it] 95%|█████████▌| 9529/10000 [37:23:17<1:48:24, 13.81s/it] {'loss': 0.0029, 'learning_rate': 2.4100000000000002e-06, 'epoch': 12.47} 95%|█████████▌| 9529/10000 [37:23:17<1:48:24, 13.81s/it] 95%|█████████▌| 9530/10000 [37:23:31<1:48:03, 13.79s/it] {'loss': 0.0022, 'learning_rate': 2.405e-06, 'epoch': 12.47} 95%|█████████▌| 9530/10000 [37:23:31<1:48:03, 13.79s/it] 95%|█████████▌| 9531/10000 [37:23:45<1:47:42, 13.78s/it] {'loss': 0.0016, 'learning_rate': 2.4000000000000003e-06, 'epoch': 12.48} 95%|█████████▌| 9531/10000 [37:23:45<1:47:42, 13.78s/it] 95%|█████████▌| 9532/10000 [37:23:59<1:47:34, 13.79s/it] {'loss': 0.0023, 'learning_rate': 2.395e-06, 'epoch': 12.48} 95%|█████████▌| 9532/10000 [37:23:59<1:47:34, 13.79s/it] 95%|█████████▌| 9533/10000 [37:24:12<1:47:18, 13.79s/it] {'loss': 0.0019, 'learning_rate': 2.3900000000000004e-06, 'epoch': 12.48} 95%|█████████▌| 9533/10000 [37:24:12<1:47:18, 13.79s/it] 95%|█████████▌| 9534/10000 [37:24:26<1:46:57, 13.77s/it] {'loss': 0.0018, 'learning_rate': 2.385e-06, 'epoch': 12.48} 95%|█████████▌| 9534/10000 [37:24:26<1:46:57, 13.77s/it] 95%|█████████▌| 9535/10000 [37:24:40<1:46:43, 13.77s/it] {'loss': 0.001, 'learning_rate': 2.38e-06, 'epoch': 12.48} 95%|█████████▌| 9535/10000 [37:24:40<1:46:43, 13.77s/it] 95%|█████████▌| 9536/10000 [37:24:54<1:46:28, 13.77s/it] {'loss': 0.0024, 'learning_rate': 2.375e-06, 'epoch': 12.48} 95%|█████████▌| 9536/10000 [37:24:54<1:46:28, 13.77s/it] 95%|█████████▌| 9537/10000 [37:25:07<1:46:29, 13.80s/it] {'loss': 0.0019, 'learning_rate': 2.37e-06, 'epoch': 12.48} 95%|█████████▌| 9537/10000 [37:25:07<1:46:29, 13.80s/it] 95%|█████████▌| 9538/10000 [37:25:21<1:46:25, 13.82s/it] {'loss': 0.0017, 'learning_rate': 2.3650000000000002e-06, 'epoch': 12.48} 95%|█████████▌| 9538/10000 [37:25:21<1:46:25, 13.82s/it] 95%|█████████▌| 9539/10000 [37:25:35<1:46:11, 13.82s/it] {'loss': 0.0023, 'learning_rate': 2.36e-06, 'epoch': 12.49} 95%|█████████▌| 9539/10000 [37:25:35<1:46:11, 13.82s/it] 95%|█████████▌| 9540/10000 [37:25:49<1:45:54, 13.81s/it] {'loss': 0.0026, 'learning_rate': 2.3550000000000003e-06, 'epoch': 12.49} 95%|█████████▌| 9540/10000 [37:25:49<1:45:54, 13.81s/it] 95%|█████████▌| 9541/10000 [37:26:03<1:45:30, 13.79s/it] {'loss': 0.0025, 'learning_rate': 2.35e-06, 'epoch': 12.49} 95%|█████████▌| 9541/10000 [37:26:03<1:45:30, 13.79s/it] 95%|█████████▌| 9542/10000 [37:26:16<1:45:16, 13.79s/it] {'loss': 0.0029, 'learning_rate': 2.345e-06, 'epoch': 12.49} 95%|█████████▌| 9542/10000 [37:26:17<1:45:16, 13.79s/it] 95%|█████████▌| 9543/10000 [37:26:30<1:45:00, 13.79s/it] {'loss': 0.0036, 'learning_rate': 2.34e-06, 'epoch': 12.49} 95%|█████████▌| 9543/10000 [37:26:30<1:45:00, 13.79s/it] 95%|█████████▌| 9544/10000 [37:26:44<1:45:02, 13.82s/it] {'loss': 0.0013, 'learning_rate': 2.335e-06, 'epoch': 12.49} 95%|█████████▌| 9544/10000 [37:26:44<1:45:02, 13.82s/it] 95%|█████████▌| 9545/10000 [37:26:58<1:44:49, 13.82s/it] {'loss': 0.0021, 'learning_rate': 2.33e-06, 'epoch': 12.49} 95%|█████████▌| 9545/10000 [37:26:58<1:44:49, 13.82s/it] 95%|█████████▌| 9546/10000 [37:27:12<1:44:31, 13.81s/it] {'loss': 0.0028, 'learning_rate': 2.325e-06, 'epoch': 12.49} 95%|█████████▌| 9546/10000 [37:27:12<1:44:31, 13.81s/it] 95%|█████████▌| 9547/10000 [37:27:26<1:44:33, 13.85s/it] {'loss': 0.0019, 'learning_rate': 2.32e-06, 'epoch': 12.5} 95%|█████████▌| 9547/10000 [37:27:26<1:44:33, 13.85s/it] 95%|█████████▌| 9548/10000 [37:27:39<1:44:12, 13.83s/it] {'loss': 0.0013, 'learning_rate': 2.3150000000000003e-06, 'epoch': 12.5} 95%|█████████▌| 9548/10000 [37:27:40<1:44:12, 13.83s/it] 95%|█████████▌| 9549/10000 [37:27:53<1:43:58, 13.83s/it] {'loss': 0.0024, 'learning_rate': 2.31e-06, 'epoch': 12.5} 95%|█████████▌| 9549/10000 [37:27:53<1:43:58, 13.83s/it] 96%|█████████▌| 9550/10000 [37:28:07<1:43:21, 13.78s/it] {'loss': 0.0013, 'learning_rate': 2.3050000000000004e-06, 'epoch': 12.5} 96%|█████████▌| 9550/10000 [37:28:07<1:43:21, 13.78s/it] 96%|█████████▌| 9551/10000 [37:28:21<1:43:00, 13.77s/it] {'loss': 0.0017, 'learning_rate': 2.3e-06, 'epoch': 12.5} 96%|█████████▌| 9551/10000 [37:28:21<1:43:00, 13.77s/it] 96%|█████████▌| 9552/10000 [37:28:35<1:43:03, 13.80s/it] {'loss': 0.0025, 'learning_rate': 2.2950000000000005e-06, 'epoch': 12.5} 96%|█████████▌| 9552/10000 [37:28:35<1:43:03, 13.80s/it] 96%|█████████▌| 9553/10000 [37:28:48<1:42:34, 13.77s/it] {'loss': 0.0016, 'learning_rate': 2.29e-06, 'epoch': 12.5} 96%|█████████▌| 9553/10000 [37:28:48<1:42:34, 13.77s/it] 96%|█████████▌| 9554/10000 [37:29:02<1:42:12, 13.75s/it] {'loss': 0.0028, 'learning_rate': 2.285e-06, 'epoch': 12.51} 96%|█████████▌| 9554/10000 [37:29:02<1:42:12, 13.75s/it] 96%|█████████▌| 9555/10000 [37:29:16<1:42:00, 13.75s/it] {'loss': 0.0025, 'learning_rate': 2.28e-06, 'epoch': 12.51} 96%|█████████▌| 9555/10000 [37:29:16<1:42:00, 13.75s/it] 96%|█████████▌| 9556/10000 [37:29:30<1:41:46, 13.75s/it] {'loss': 0.0024, 'learning_rate': 2.2750000000000002e-06, 'epoch': 12.51} 96%|█████████▌| 9556/10000 [37:29:30<1:41:46, 13.75s/it] 96%|█████████▌| 9557/10000 [37:29:43<1:41:56, 13.81s/it] {'loss': 0.0013, 'learning_rate': 2.2700000000000003e-06, 'epoch': 12.51} 96%|█████████▌| 9557/10000 [37:29:43<1:41:56, 13.81s/it] 96%|█████████▌| 9558/10000 [37:29:57<1:41:46, 13.82s/it] {'loss': 0.0023, 'learning_rate': 2.265e-06, 'epoch': 12.51} 96%|█████████▌| 9558/10000 [37:29:57<1:41:46, 13.82s/it] 96%|█████████▌| 9559/10000 [37:30:11<1:41:22, 13.79s/it] {'loss': 0.0021, 'learning_rate': 2.26e-06, 'epoch': 12.51} 96%|█████████▌| 9559/10000 [37:30:11<1:41:22, 13.79s/it] 96%|█████████▌| 9560/10000 [37:30:25<1:41:06, 13.79s/it] {'loss': 0.0023, 'learning_rate': 2.255e-06, 'epoch': 12.51} 96%|█████████▌| 9560/10000 [37:30:25<1:41:06, 13.79s/it] 96%|█████████▌| 9561/10000 [37:30:38<1:40:40, 13.76s/it] {'loss': 0.0024, 'learning_rate': 2.25e-06, 'epoch': 12.51} 96%|█████████▌| 9561/10000 [37:30:39<1:40:40, 13.76s/it] 96%|█████████▌| 9562/10000 [37:30:52<1:40:50, 13.81s/it] {'loss': 0.0023, 'learning_rate': 2.245e-06, 'epoch': 12.52} 96%|█████████▌| 9562/10000 [37:30:52<1:40:50, 13.81s/it] 96%|█████████▌| 9563/10000 [37:31:06<1:40:39, 13.82s/it] {'loss': 0.0021, 'learning_rate': 2.24e-06, 'epoch': 12.52} 96%|█████████▌| 9563/10000 [37:31:06<1:40:39, 13.82s/it] 96%|█████████▌| 9564/10000 [37:31:20<1:40:06, 13.78s/it] {'loss': 0.004, 'learning_rate': 2.2349999999999998e-06, 'epoch': 12.52} 96%|█████████▌| 9564/10000 [37:31:20<1:40:06, 13.78s/it] 96%|█████████▌| 9565/10000 [37:31:34<1:39:43, 13.75s/it] {'loss': 0.0021, 'learning_rate': 2.2300000000000002e-06, 'epoch': 12.52} 96%|█████████▌| 9565/10000 [37:31:34<1:39:43, 13.75s/it] 96%|█████████▌| 9566/10000 [37:31:47<1:39:34, 13.77s/it] {'loss': 0.0028, 'learning_rate': 2.225e-06, 'epoch': 12.52} 96%|█████████▌| 9566/10000 [37:31:47<1:39:34, 13.77s/it] 96%|█████████▌| 9567/10000 [37:32:01<1:39:28, 13.78s/it] {'loss': 0.002, 'learning_rate': 2.2200000000000003e-06, 'epoch': 12.52} 96%|█████████▌| 9567/10000 [37:32:01<1:39:28, 13.78s/it] 96%|█████████▌| 9568/10000 [37:32:15<1:39:21, 13.80s/it] {'loss': 0.0018, 'learning_rate': 2.215e-06, 'epoch': 12.52} 96%|█████████▌| 9568/10000 [37:32:15<1:39:21, 13.80s/it] 96%|█████████▌| 9569/10000 [37:32:29<1:39:19, 13.83s/it] {'loss': 0.0027, 'learning_rate': 2.2100000000000004e-06, 'epoch': 12.52} 96%|█████████▌| 9569/10000 [37:32:29<1:39:19, 13.83s/it] 96%|█████████▌| 9570/10000 [37:32:43<1:39:04, 13.82s/it] {'loss': 0.001, 'learning_rate': 2.205e-06, 'epoch': 12.53} 96%|█████████▌| 9570/10000 [37:32:43<1:39:04, 13.82s/it] 96%|█████████▌| 9571/10000 [37:32:57<1:38:47, 13.82s/it] {'loss': 0.0014, 'learning_rate': 2.2e-06, 'epoch': 12.53} 96%|█████████▌| 9571/10000 [37:32:57<1:38:47, 13.82s/it] 96%|█████████▌| 9572/10000 [37:33:10<1:38:37, 13.82s/it] {'loss': 0.0018, 'learning_rate': 2.195e-06, 'epoch': 12.53} 96%|█████████▌| 9572/10000 [37:33:10<1:38:37, 13.82s/it] 96%|█████████▌| 9573/10000 [37:33:24<1:38:13, 13.80s/it] {'loss': 0.0028, 'learning_rate': 2.19e-06, 'epoch': 12.53} 96%|█████████▌| 9573/10000 [37:33:24<1:38:13, 13.80s/it] 96%|█████████▌| 9574/10000 [37:33:38<1:37:59, 13.80s/it] {'loss': 0.0012, 'learning_rate': 2.1850000000000003e-06, 'epoch': 12.53} 96%|█████████▌| 9574/10000 [37:33:38<1:37:59, 13.80s/it] 96%|█████████▌| 9575/10000 [37:33:52<1:37:44, 13.80s/it] {'loss': 0.0027, 'learning_rate': 2.1800000000000003e-06, 'epoch': 12.53} 96%|█████████▌| 9575/10000 [37:33:52<1:37:44, 13.80s/it] 96%|█████████▌| 9576/10000 [37:34:06<1:37:40, 13.82s/it] {'loss': 0.0026, 'learning_rate': 2.175e-06, 'epoch': 12.53} 96%|█████████▌| 9576/10000 [37:34:06<1:37:40, 13.82s/it] 96%|█████████▌| 9577/10000 [37:34:19<1:37:28, 13.83s/it] {'loss': 0.0025, 'learning_rate': 2.17e-06, 'epoch': 12.54} 96%|█████████▌| 9577/10000 [37:34:20<1:37:28, 13.83s/it] 96%|█████████▌| 9578/10000 [37:34:33<1:37:13, 13.82s/it] {'loss': 0.0018, 'learning_rate': 2.165e-06, 'epoch': 12.54} 96%|█████████▌| 9578/10000 [37:34:33<1:37:13, 13.82s/it] 96%|█████████▌| 9579/10000 [37:34:47<1:37:07, 13.84s/it] {'loss': 0.0039, 'learning_rate': 2.16e-06, 'epoch': 12.54} 96%|█████████▌| 9579/10000 [37:34:47<1:37:07, 13.84s/it] 96%|█████████▌| 9580/10000 [37:35:01<1:36:46, 13.83s/it] {'loss': 0.0016, 'learning_rate': 2.155e-06, 'epoch': 12.54} 96%|█████████▌| 9580/10000 [37:35:01<1:36:46, 13.83s/it] 96%|█████████▌| 9581/10000 [37:35:15<1:36:35, 13.83s/it] {'loss': 0.0022, 'learning_rate': 2.1499999999999997e-06, 'epoch': 12.54} 96%|█████████▌| 9581/10000 [37:35:15<1:36:35, 13.83s/it] 96%|█████████▌| 9582/10000 [37:35:29<1:36:14, 13.82s/it] {'loss': 0.0024, 'learning_rate': 2.1450000000000002e-06, 'epoch': 12.54} 96%|█████████▌| 9582/10000 [37:35:29<1:36:14, 13.82s/it] 96%|█████████▌| 9583/10000 [37:35:42<1:35:55, 13.80s/it] {'loss': 0.0019, 'learning_rate': 2.14e-06, 'epoch': 12.54} 96%|█████████▌| 9583/10000 [37:35:42<1:35:55, 13.80s/it] 96%|█████████▌| 9584/10000 [37:35:56<1:35:49, 13.82s/it] {'loss': 0.002, 'learning_rate': 2.1350000000000003e-06, 'epoch': 12.54} 96%|█████████▌| 9584/10000 [37:35:56<1:35:49, 13.82s/it] 96%|█████████▌| 9585/10000 [37:36:10<1:35:45, 13.84s/it] {'loss': 0.0022, 'learning_rate': 2.13e-06, 'epoch': 12.55} 96%|█████████▌| 9585/10000 [37:36:10<1:35:45, 13.84s/it] 96%|█████████▌| 9586/10000 [37:36:24<1:35:37, 13.86s/it] {'loss': 0.0008, 'learning_rate': 2.1250000000000004e-06, 'epoch': 12.55} 96%|█████████▌| 9586/10000 [37:36:24<1:35:37, 13.86s/it] 96%|█████████▌| 9587/10000 [37:36:38<1:35:16, 13.84s/it] {'loss': 0.0024, 'learning_rate': 2.12e-06, 'epoch': 12.55} 96%|█████████▌| 9587/10000 [37:36:38<1:35:16, 13.84s/it] 96%|█████████▌| 9588/10000 [37:36:52<1:35:13, 13.87s/it] {'loss': 0.0017, 'learning_rate': 2.115e-06, 'epoch': 12.55} 96%|█████████▌| 9588/10000 [37:36:52<1:35:13, 13.87s/it] 96%|█████████▌| 9589/10000 [37:37:06<1:34:58, 13.87s/it] {'loss': 0.0014, 'learning_rate': 2.11e-06, 'epoch': 12.55} 96%|█████████▌| 9589/10000 [37:37:06<1:34:58, 13.87s/it] 96%|█████████▌| 9590/10000 [37:37:19<1:34:27, 13.82s/it] {'loss': 0.0018, 'learning_rate': 2.105e-06, 'epoch': 12.55} 96%|█████████▌| 9590/10000 [37:37:19<1:34:27, 13.82s/it] 96%|█████████▌| 9591/10000 [37:37:33<1:34:15, 13.83s/it] {'loss': 0.0027, 'learning_rate': 2.1000000000000002e-06, 'epoch': 12.55} 96%|█████████▌| 9591/10000 [37:37:33<1:34:15, 13.83s/it] 96%|█████████▌| 9592/10000 [37:37:47<1:33:46, 13.79s/it] {'loss': 0.0016, 'learning_rate': 2.0950000000000003e-06, 'epoch': 12.55} 96%|█████████▌| 9592/10000 [37:37:47<1:33:46, 13.79s/it] 96%|█████████▌| 9593/10000 [37:38:01<1:33:45, 13.82s/it] {'loss': 0.0034, 'learning_rate': 2.09e-06, 'epoch': 12.56} 96%|█████████▌| 9593/10000 [37:38:01<1:33:45, 13.82s/it] 96%|█████████▌| 9594/10000 [37:38:15<1:33:26, 13.81s/it] {'loss': 0.0033, 'learning_rate': 2.085e-06, 'epoch': 12.56} 96%|█████████▌| 9594/10000 [37:38:15<1:33:26, 13.81s/it] 96%|█████████▌| 9595/10000 [37:38:28<1:33:04, 13.79s/it] {'loss': 0.0025, 'learning_rate': 2.08e-06, 'epoch': 12.56} 96%|█████████▌| 9595/10000 [37:38:28<1:33:04, 13.79s/it] 96%|█████████▌| 9596/10000 [37:38:42<1:33:04, 13.82s/it] {'loss': 0.0014, 'learning_rate': 2.075e-06, 'epoch': 12.56} 96%|█████████▌| 9596/10000 [37:38:42<1:33:04, 13.82s/it] 96%|█████████▌| 9597/10000 [37:38:56<1:33:18, 13.89s/it] {'loss': 0.0034, 'learning_rate': 2.07e-06, 'epoch': 12.56} 96%|█████████▌| 9597/10000 [37:38:56<1:33:18, 13.89s/it] 96%|█████████▌| 9598/10000 [37:39:10<1:33:01, 13.88s/it] {'loss': 0.002, 'learning_rate': 2.065e-06, 'epoch': 12.56} 96%|█████████▌| 9598/10000 [37:39:10<1:33:01, 13.88s/it] 96%|█████████▌| 9599/10000 [37:39:24<1:32:30, 13.84s/it] {'loss': 0.0027, 'learning_rate': 2.06e-06, 'epoch': 12.56} 96%|█████████▌| 9599/10000 [37:39:24<1:32:30, 13.84s/it] 96%|█████████▌| 9600/10000 [37:39:38<1:32:08, 13.82s/it] {'loss': 0.0015, 'learning_rate': 2.055e-06, 'epoch': 12.57} 96%|█████████▌| 9600/10000 [37:39:38<1:32:08, 13.82s/it] 96%|█████████▌| 9601/10000 [37:39:51<1:31:38, 13.78s/it] {'loss': 0.0016, 'learning_rate': 2.0500000000000003e-06, 'epoch': 12.57} 96%|█████████▌| 9601/10000 [37:39:51<1:31:38, 13.78s/it] 96%|█████████▌| 9602/10000 [37:40:05<1:31:38, 13.81s/it] {'loss': 0.0025, 'learning_rate': 2.045e-06, 'epoch': 12.57} 96%|█████████▌| 9602/10000 [37:40:05<1:31:38, 13.81s/it] 96%|█████████▌| 9603/10000 [37:40:19<1:31:28, 13.83s/it] {'loss': 0.0019, 'learning_rate': 2.0400000000000004e-06, 'epoch': 12.57} 96%|█████████▌| 9603/10000 [37:40:19<1:31:28, 13.83s/it] 96%|█████████▌| 9604/10000 [37:40:33<1:31:09, 13.81s/it] {'loss': 0.0015, 'learning_rate': 2.035e-06, 'epoch': 12.57} 96%|█████████▌| 9604/10000 [37:40:33<1:31:09, 13.81s/it] 96%|█████████▌| 9605/10000 [37:40:47<1:31:00, 13.82s/it] {'loss': 0.0026, 'learning_rate': 2.03e-06, 'epoch': 12.57} 96%|█████████▌| 9605/10000 [37:40:47<1:31:00, 13.82s/it] 96%|█████████▌| 9606/10000 [37:41:01<1:30:48, 13.83s/it] {'loss': 0.0017, 'learning_rate': 2.025e-06, 'epoch': 12.57} 96%|█████████▌| 9606/10000 [37:41:01<1:30:48, 13.83s/it] 96%|█████████▌| 9607/10000 [37:41:14<1:30:33, 13.83s/it] {'loss': 0.0037, 'learning_rate': 2.02e-06, 'epoch': 12.57} 96%|█████████▌| 9607/10000 [37:41:14<1:30:33, 13.83s/it] 96%|█████████▌| 9608/10000 [37:41:28<1:30:07, 13.79s/it] {'loss': 0.0017, 'learning_rate': 2.015e-06, 'epoch': 12.58} 96%|█████████▌| 9608/10000 [37:41:28<1:30:07, 13.79s/it] 96%|█████████▌| 9609/10000 [37:41:42<1:29:55, 13.80s/it] {'loss': 0.001, 'learning_rate': 2.0100000000000002e-06, 'epoch': 12.58} 96%|█████████▌| 9609/10000 [37:41:42<1:29:55, 13.80s/it] 96%|█████████▌| 9610/10000 [37:41:56<1:29:43, 13.80s/it] {'loss': 0.0021, 'learning_rate': 2.005e-06, 'epoch': 12.58} 96%|█████████▌| 9610/10000 [37:41:56<1:29:43, 13.80s/it] 96%|█████████▌| 9611/10000 [37:42:09<1:29:26, 13.80s/it] {'loss': 0.0023, 'learning_rate': 2.0000000000000003e-06, 'epoch': 12.58} 96%|█████████▌| 9611/10000 [37:42:10<1:29:26, 13.80s/it] 96%|█████████▌| 9612/10000 [37:42:23<1:29:06, 13.78s/it] {'loss': 0.0012, 'learning_rate': 1.995e-06, 'epoch': 12.58} 96%|█████████▌| 9612/10000 [37:42:23<1:29:06, 13.78s/it] 96%|█████████▌| 9613/10000 [37:42:37<1:28:55, 13.79s/it] {'loss': 0.0025, 'learning_rate': 1.99e-06, 'epoch': 12.58} 96%|█████████▌| 9613/10000 [37:42:37<1:28:55, 13.79s/it] 96%|█████████▌| 9614/10000 [37:42:51<1:28:44, 13.80s/it] {'loss': 0.002, 'learning_rate': 1.985e-06, 'epoch': 12.58} 96%|█████████▌| 9614/10000 [37:42:51<1:28:44, 13.80s/it] 96%|█████████▌| 9615/10000 [37:43:05<1:28:44, 13.83s/it] {'loss': 0.0014, 'learning_rate': 1.98e-06, 'epoch': 12.59} 96%|█████████▌| 9615/10000 [37:43:05<1:28:44, 13.83s/it] 96%|█████████▌| 9616/10000 [37:43:19<1:28:31, 13.83s/it] {'loss': 0.0015, 'learning_rate': 1.975e-06, 'epoch': 12.59} 96%|█████████▌| 9616/10000 [37:43:19<1:28:31, 13.83s/it] 96%|█████████▌| 9617/10000 [37:43:32<1:28:09, 13.81s/it] {'loss': 0.0034, 'learning_rate': 1.9699999999999998e-06, 'epoch': 12.59} 96%|█████████▌| 9617/10000 [37:43:32<1:28:09, 13.81s/it] 96%|█████████▌| 9618/10000 [37:43:46<1:28:02, 13.83s/it] {'loss': 0.0013, 'learning_rate': 1.9650000000000002e-06, 'epoch': 12.59} 96%|█████████▌| 9618/10000 [37:43:46<1:28:02, 13.83s/it] 96%|█████████▌| 9619/10000 [37:44:00<1:27:37, 13.80s/it] {'loss': 0.002, 'learning_rate': 1.96e-06, 'epoch': 12.59} 96%|█████████▌| 9619/10000 [37:44:00<1:27:37, 13.80s/it] 96%|█████████▌| 9620/10000 [37:44:14<1:27:39, 13.84s/it] {'loss': 0.0026, 'learning_rate': 1.9550000000000003e-06, 'epoch': 12.59} 96%|█████████▌| 9620/10000 [37:44:14<1:27:39, 13.84s/it] 96%|█████████▌| 9621/10000 [37:44:28<1:27:15, 13.81s/it] {'loss': 0.0044, 'learning_rate': 1.95e-06, 'epoch': 12.59} 96%|█████████▌| 9621/10000 [37:44:28<1:27:15, 13.81s/it] 96%|█████████▌| 9622/10000 [37:44:42<1:27:08, 13.83s/it] {'loss': 0.0036, 'learning_rate': 1.945e-06, 'epoch': 12.59} 96%|█████████▌| 9622/10000 [37:44:42<1:27:08, 13.83s/it] 96%|█████████▌| 9623/10000 [37:44:55<1:27:01, 13.85s/it] {'loss': 0.0029, 'learning_rate': 1.94e-06, 'epoch': 12.6} 96%|█████████▌| 9623/10000 [37:44:55<1:27:01, 13.85s/it] 96%|█████████▌| 9624/10000 [37:45:09<1:26:40, 13.83s/it] {'loss': 0.0022, 'learning_rate': 1.935e-06, 'epoch': 12.6} 96%|█████████▌| 9624/10000 [37:45:09<1:26:40, 13.83s/it] 96%|█████████▋| 9625/10000 [37:45:23<1:26:26, 13.83s/it] {'loss': 0.0026, 'learning_rate': 1.93e-06, 'epoch': 12.6} 96%|█████████▋| 9625/10000 [37:45:23<1:26:26, 13.83s/it] 96%|█████████▋| 9626/10000 [37:45:37<1:26:12, 13.83s/it] {'loss': 0.0017, 'learning_rate': 1.925e-06, 'epoch': 12.6} 96%|█████████▋| 9626/10000 [37:45:37<1:26:12, 13.83s/it] 96%|█████████▋| 9627/10000 [37:45:51<1:26:01, 13.84s/it] {'loss': 0.0025, 'learning_rate': 1.92e-06, 'epoch': 12.6} 96%|█████████▋| 9627/10000 [37:45:51<1:26:01, 13.84s/it] 96%|█████████▋| 9628/10000 [37:46:05<1:25:49, 13.84s/it] {'loss': 0.0018, 'learning_rate': 1.9150000000000003e-06, 'epoch': 12.6} 96%|█████████▋| 9628/10000 [37:46:05<1:25:49, 13.84s/it] 96%|█████████▋| 9629/10000 [37:46:19<1:25:48, 13.88s/it] {'loss': 0.0034, 'learning_rate': 1.91e-06, 'epoch': 12.6} 96%|█████████▋| 9629/10000 [37:46:19<1:25:48, 13.88s/it] 96%|█████████▋| 9630/10000 [37:46:32<1:25:32, 13.87s/it] {'loss': 0.0035, 'learning_rate': 1.9050000000000002e-06, 'epoch': 12.6} 96%|█████████▋| 9630/10000 [37:46:32<1:25:32, 13.87s/it] 96%|█████████▋| 9631/10000 [37:46:46<1:25:19, 13.87s/it] {'loss': 0.0017, 'learning_rate': 1.9e-06, 'epoch': 12.61} 96%|█████████▋| 9631/10000 [37:46:46<1:25:19, 13.87s/it] 96%|█████████▋| 9632/10000 [37:47:00<1:24:53, 13.84s/it] {'loss': 0.0022, 'learning_rate': 1.8950000000000003e-06, 'epoch': 12.61} 96%|█████████▋| 9632/10000 [37:47:00<1:24:53, 13.84s/it] 96%|█████████▋| 9633/10000 [37:47:14<1:24:29, 13.81s/it] {'loss': 0.0025, 'learning_rate': 1.8900000000000001e-06, 'epoch': 12.61} 96%|█████████▋| 9633/10000 [37:47:14<1:24:29, 13.81s/it] 96%|█████████▋| 9634/10000 [37:47:28<1:24:16, 13.82s/it] {'loss': 0.0019, 'learning_rate': 1.885e-06, 'epoch': 12.61} 96%|█████████▋| 9634/10000 [37:47:28<1:24:16, 13.82s/it] 96%|█████████▋| 9635/10000 [37:47:41<1:24:02, 13.81s/it] {'loss': 0.0017, 'learning_rate': 1.8800000000000002e-06, 'epoch': 12.61} 96%|█████████▋| 9635/10000 [37:47:41<1:24:02, 13.81s/it] 96%|█████████▋| 9636/10000 [37:47:55<1:23:52, 13.82s/it] {'loss': 0.002, 'learning_rate': 1.875e-06, 'epoch': 12.61} 96%|█████████▋| 9636/10000 [37:47:55<1:23:52, 13.82s/it] 96%|█████████▋| 9637/10000 [37:48:09<1:23:33, 13.81s/it] {'loss': 0.0024, 'learning_rate': 1.8700000000000003e-06, 'epoch': 12.61} 96%|█████████▋| 9637/10000 [37:48:09<1:23:33, 13.81s/it] 96%|█████████▋| 9638/10000 [37:48:23<1:23:23, 13.82s/it] {'loss': 0.0026, 'learning_rate': 1.8650000000000001e-06, 'epoch': 12.62} 96%|█████████▋| 9638/10000 [37:48:23<1:23:23, 13.82s/it] 96%|█████████▋| 9639/10000 [37:48:37<1:22:58, 13.79s/it] {'loss': 0.0019, 'learning_rate': 1.86e-06, 'epoch': 12.62} 96%|█████████▋| 9639/10000 [37:48:37<1:22:58, 13.79s/it] 96%|█████████▋| 9640/10000 [37:48:50<1:22:51, 13.81s/it] {'loss': 0.0021, 'learning_rate': 1.8550000000000002e-06, 'epoch': 12.62} 96%|█████████▋| 9640/10000 [37:48:50<1:22:51, 13.81s/it] 96%|█████████▋| 9641/10000 [37:49:04<1:22:37, 13.81s/it] {'loss': 0.0016, 'learning_rate': 1.85e-06, 'epoch': 12.62} 96%|█████████▋| 9641/10000 [37:49:04<1:22:37, 13.81s/it] 96%|█████████▋| 9642/10000 [37:49:18<1:22:22, 13.80s/it] {'loss': 0.0021, 'learning_rate': 1.8450000000000001e-06, 'epoch': 12.62} 96%|█████████▋| 9642/10000 [37:49:18<1:22:22, 13.80s/it] 96%|█████████▋| 9643/10000 [37:49:32<1:22:04, 13.79s/it] {'loss': 0.0023, 'learning_rate': 1.84e-06, 'epoch': 12.62} 96%|█████████▋| 9643/10000 [37:49:32<1:22:04, 13.79s/it] 96%|█████████▋| 9644/10000 [37:49:46<1:21:53, 13.80s/it] {'loss': 0.003, 'learning_rate': 1.8350000000000002e-06, 'epoch': 12.62} 96%|█████████▋| 9644/10000 [37:49:46<1:21:53, 13.80s/it] 96%|█████████▋| 9645/10000 [37:49:59<1:21:42, 13.81s/it] {'loss': 0.0018, 'learning_rate': 1.83e-06, 'epoch': 12.62} 96%|█████████▋| 9645/10000 [37:50:00<1:21:42, 13.81s/it] 96%|█████████▋| 9646/10000 [37:50:13<1:21:47, 13.86s/it] {'loss': 0.0019, 'learning_rate': 1.8249999999999999e-06, 'epoch': 12.63} 96%|█████████▋| 9646/10000 [37:50:14<1:21:47, 13.86s/it] 96%|█████████▋| 9647/10000 [37:50:27<1:21:30, 13.85s/it] {'loss': 0.0025, 'learning_rate': 1.8200000000000002e-06, 'epoch': 12.63} 96%|█████████▋| 9647/10000 [37:50:27<1:21:30, 13.85s/it] 96%|█████████▋| 9648/10000 [37:50:41<1:21:17, 13.86s/it] {'loss': 0.002, 'learning_rate': 1.815e-06, 'epoch': 12.63} 96%|█████████▋| 9648/10000 [37:50:41<1:21:17, 13.86s/it] 96%|█████████▋| 9649/10000 [37:50:55<1:20:48, 13.81s/it] {'loss': 0.0024, 'learning_rate': 1.8100000000000002e-06, 'epoch': 12.63} 96%|█████████▋| 9649/10000 [37:50:55<1:20:48, 13.81s/it] 96%|█████████▋| 9650/10000 [37:51:09<1:20:36, 13.82s/it] {'loss': 0.0021, 'learning_rate': 1.805e-06, 'epoch': 12.63} 96%|█████████▋| 9650/10000 [37:51:09<1:20:36, 13.82s/it] 97%|█████████▋| 9651/10000 [37:51:23<1:20:25, 13.83s/it] {'loss': 0.0019, 'learning_rate': 1.8e-06, 'epoch': 12.63} 97%|█████████▋| 9651/10000 [37:51:23<1:20:25, 13.83s/it] 97%|█████████▋| 9652/10000 [37:51:36<1:20:12, 13.83s/it] {'loss': 0.0021, 'learning_rate': 1.7950000000000002e-06, 'epoch': 12.63} 97%|█████████▋| 9652/10000 [37:51:36<1:20:12, 13.83s/it] 97%|█████████▋| 9653/10000 [37:51:50<1:19:59, 13.83s/it] {'loss': 0.0016, 'learning_rate': 1.79e-06, 'epoch': 12.63} 97%|█████████▋| 9653/10000 [37:51:50<1:19:59, 13.83s/it] 97%|█████████▋| 9654/10000 [37:52:04<1:19:48, 13.84s/it] {'loss': 0.0029, 'learning_rate': 1.7850000000000003e-06, 'epoch': 12.64} 97%|█████████▋| 9654/10000 [37:52:04<1:19:48, 13.84s/it] 97%|█████████▋| 9655/10000 [37:52:18<1:19:19, 13.80s/it] {'loss': 0.0048, 'learning_rate': 1.7800000000000001e-06, 'epoch': 12.64} 97%|█████████▋| 9655/10000 [37:52:18<1:19:19, 13.80s/it] 97%|█████████▋| 9656/10000 [37:52:32<1:19:05, 13.80s/it] {'loss': 0.002, 'learning_rate': 1.775e-06, 'epoch': 12.64} 97%|█████████▋| 9656/10000 [37:52:32<1:19:05, 13.80s/it] 97%|█████████▋| 9657/10000 [37:52:45<1:18:43, 13.77s/it] {'loss': 0.0025, 'learning_rate': 1.7700000000000002e-06, 'epoch': 12.64} 97%|█████████▋| 9657/10000 [37:52:45<1:18:43, 13.77s/it] 97%|█████████▋| 9658/10000 [37:52:59<1:18:36, 13.79s/it] {'loss': 0.0025, 'learning_rate': 1.765e-06, 'epoch': 12.64} 97%|█████████▋| 9658/10000 [37:52:59<1:18:36, 13.79s/it] 97%|█████████▋| 9659/10000 [37:53:13<1:18:19, 13.78s/it] {'loss': 0.0025, 'learning_rate': 1.76e-06, 'epoch': 12.64} 97%|█████████▋| 9659/10000 [37:53:13<1:18:19, 13.78s/it] 97%|█████████▋| 9660/10000 [37:53:27<1:18:15, 13.81s/it] {'loss': 0.0028, 'learning_rate': 1.7550000000000001e-06, 'epoch': 12.64} 97%|█████████▋| 9660/10000 [37:53:27<1:18:15, 13.81s/it] 97%|█████████▋| 9661/10000 [37:53:41<1:17:56, 13.79s/it] {'loss': 0.0021, 'learning_rate': 1.7500000000000002e-06, 'epoch': 12.65} 97%|█████████▋| 9661/10000 [37:53:41<1:17:56, 13.79s/it] 97%|█████████▋| 9662/10000 [37:53:54<1:17:47, 13.81s/it] {'loss': 0.0018, 'learning_rate': 1.745e-06, 'epoch': 12.65} 97%|█████████▋| 9662/10000 [37:53:54<1:17:47, 13.81s/it] 97%|█████████▋| 9663/10000 [37:54:08<1:17:35, 13.82s/it] {'loss': 0.0022, 'learning_rate': 1.7399999999999999e-06, 'epoch': 12.65} 97%|█████████▋| 9663/10000 [37:54:08<1:17:35, 13.82s/it] 97%|█████████▋| 9664/10000 [37:54:22<1:17:16, 13.80s/it] {'loss': 0.0026, 'learning_rate': 1.7350000000000001e-06, 'epoch': 12.65} 97%|█████████▋| 9664/10000 [37:54:22<1:17:16, 13.80s/it] 97%|█████████▋| 9665/10000 [37:54:36<1:17:05, 13.81s/it] {'loss': 0.0012, 'learning_rate': 1.73e-06, 'epoch': 12.65} 97%|█████████▋| 9665/10000 [37:54:36<1:17:05, 13.81s/it] 97%|█████████▋| 9666/10000 [37:54:50<1:16:51, 13.81s/it] {'loss': 0.0014, 'learning_rate': 1.7250000000000002e-06, 'epoch': 12.65} 97%|█████████▋| 9666/10000 [37:54:50<1:16:51, 13.81s/it] 97%|█████████▋| 9667/10000 [37:55:03<1:16:30, 13.78s/it] {'loss': 0.0027, 'learning_rate': 1.72e-06, 'epoch': 12.65} 97%|█████████▋| 9667/10000 [37:55:03<1:16:30, 13.78s/it] 97%|█████████▋| 9668/10000 [37:55:17<1:16:35, 13.84s/it] {'loss': 0.0013, 'learning_rate': 1.7149999999999999e-06, 'epoch': 12.65} 97%|█████████▋| 9668/10000 [37:55:17<1:16:35, 13.84s/it] 97%|█████████▋| 9669/10000 [37:55:31<1:16:10, 13.81s/it] {'loss': 0.0031, 'learning_rate': 1.7100000000000001e-06, 'epoch': 12.66} 97%|█████████▋| 9669/10000 [37:55:31<1:16:10, 13.81s/it] 97%|█████████▋| 9670/10000 [37:55:45<1:16:01, 13.82s/it] {'loss': 0.0014, 'learning_rate': 1.705e-06, 'epoch': 12.66} 97%|█████████▋| 9670/10000 [37:55:45<1:16:01, 13.82s/it] 97%|█████████▋| 9671/10000 [37:55:59<1:15:37, 13.79s/it] {'loss': 0.0027, 'learning_rate': 1.7000000000000002e-06, 'epoch': 12.66} 97%|█████████▋| 9671/10000 [37:55:59<1:15:37, 13.79s/it] 97%|█████████▋| 9672/10000 [37:56:12<1:15:21, 13.78s/it] {'loss': 0.003, 'learning_rate': 1.695e-06, 'epoch': 12.66} 97%|█████████▋| 9672/10000 [37:56:12<1:15:21, 13.78s/it] 97%|█████████▋| 9673/10000 [37:56:26<1:15:13, 13.80s/it] {'loss': 0.0018, 'learning_rate': 1.69e-06, 'epoch': 12.66} 97%|█████████▋| 9673/10000 [37:56:26<1:15:13, 13.80s/it] 97%|█████████▋| 9674/10000 [37:56:40<1:15:05, 13.82s/it] {'loss': 0.0018, 'learning_rate': 1.6850000000000002e-06, 'epoch': 12.66} 97%|█████████▋| 9674/10000 [37:56:40<1:15:05, 13.82s/it] 97%|█████████▋| 9675/10000 [37:56:54<1:14:47, 13.81s/it] {'loss': 0.0018, 'learning_rate': 1.68e-06, 'epoch': 12.66} 97%|█████████▋| 9675/10000 [37:56:54<1:14:47, 13.81s/it] 97%|█████████▋| 9676/10000 [37:57:08<1:14:30, 13.80s/it] {'loss': 0.0013, 'learning_rate': 1.6750000000000003e-06, 'epoch': 12.66} 97%|█████████▋| 9676/10000 [37:57:08<1:14:30, 13.80s/it] 97%|█████████▋| 9677/10000 [37:57:22<1:14:23, 13.82s/it] {'loss': 0.001, 'learning_rate': 1.67e-06, 'epoch': 12.67} 97%|█████████▋| 9677/10000 [37:57:22<1:14:23, 13.82s/it] 97%|█████████▋| 9678/10000 [37:57:35<1:14:23, 13.86s/it] {'loss': 0.0028, 'learning_rate': 1.6650000000000002e-06, 'epoch': 12.67} 97%|█████████▋| 9678/10000 [37:57:35<1:14:23, 13.86s/it] 97%|█████████▋| 9679/10000 [37:57:49<1:14:05, 13.85s/it] {'loss': 0.0035, 'learning_rate': 1.6600000000000002e-06, 'epoch': 12.67} 97%|█████████▋| 9679/10000 [37:57:49<1:14:05, 13.85s/it][2024-11-05 10:16:10,461] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 97%|█████████▋| 9680/10000 [37:58:02<1:12:23, 13.57s/it] {'loss': 0.0031, 'learning_rate': 1.6600000000000002e-06, 'epoch': 12.67} 97%|█████████▋| 9680/10000 [37:58:02<1:12:23, 13.57s/it][2024-11-05 10:16:23,346] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 97%|█████████▋| 9681/10000 [37:58:15<1:11:03, 13.37s/it] {'loss': 0.0021, 'learning_rate': 1.6600000000000002e-06, 'epoch': 12.67} 97%|█████████▋| 9681/10000 [37:58:15<1:11:03, 13.37s/it] 97%|█████████▋| 9682/10000 [37:58:29<1:11:34, 13.50s/it] {'loss': 0.0021, 'learning_rate': 1.655e-06, 'epoch': 12.67} 97%|█████████▋| 9682/10000 [37:58:29<1:11:34, 13.50s/it] 97%|█████████▋| 9683/10000 [37:58:43<1:11:50, 13.60s/it] {'loss': 0.002, 'learning_rate': 1.65e-06, 'epoch': 12.67} 97%|█████████▋| 9683/10000 [37:58:43<1:11:50, 13.60s/it] 97%|█████████▋| 9684/10000 [37:58:57<1:12:04, 13.68s/it] {'loss': 0.0016, 'learning_rate': 1.645e-06, 'epoch': 12.68} 97%|█████████▋| 9684/10000 [37:58:57<1:12:04, 13.68s/it] 97%|█████████▋| 9685/10000 [37:59:10<1:11:52, 13.69s/it] {'loss': 0.0018, 'learning_rate': 1.6400000000000002e-06, 'epoch': 12.68} 97%|█████████▋| 9685/10000 [37:59:10<1:11:52, 13.69s/it] 97%|█████████▋| 9686/10000 [37:59:24<1:11:45, 13.71s/it] {'loss': 0.0021, 'learning_rate': 1.635e-06, 'epoch': 12.68} 97%|█████████▋| 9686/10000 [37:59:24<1:11:45, 13.71s/it] 97%|█████████▋| 9687/10000 [37:59:38<1:11:46, 13.76s/it] {'loss': 0.0033, 'learning_rate': 1.6299999999999999e-06, 'epoch': 12.68} 97%|█████████▋| 9687/10000 [37:59:38<1:11:46, 13.76s/it] 97%|█████████▋| 9688/10000 [37:59:52<1:11:40, 13.78s/it] {'loss': 0.0015, 'learning_rate': 1.6250000000000001e-06, 'epoch': 12.68} 97%|█████████▋| 9688/10000 [37:59:52<1:11:40, 13.78s/it] 97%|█████████▋| 9689/10000 [38:00:06<1:11:29, 13.79s/it] {'loss': 0.002, 'learning_rate': 1.62e-06, 'epoch': 12.68} 97%|█████████▋| 9689/10000 [38:00:06<1:11:29, 13.79s/it] 97%|█████████▋| 9690/10000 [38:00:19<1:11:12, 13.78s/it] {'loss': 0.0022, 'learning_rate': 1.6150000000000002e-06, 'epoch': 12.68} 97%|█████████▋| 9690/10000 [38:00:19<1:11:12, 13.78s/it] 97%|█████████▋| 9691/10000 [38:00:33<1:10:59, 13.79s/it] {'loss': 0.0014, 'learning_rate': 1.61e-06, 'epoch': 12.68} 97%|█████████▋| 9691/10000 [38:00:33<1:10:59, 13.79s/it] 97%|█████████▋| 9692/10000 [38:00:47<1:10:56, 13.82s/it] {'loss': 0.0016, 'learning_rate': 1.6049999999999999e-06, 'epoch': 12.69} 97%|█████████▋| 9692/10000 [38:00:47<1:10:56, 13.82s/it] 97%|█████████▋| 9693/10000 [38:01:01<1:10:45, 13.83s/it] {'loss': 0.0021, 'learning_rate': 1.6000000000000001e-06, 'epoch': 12.69} 97%|█████████▋| 9693/10000 [38:01:01<1:10:45, 13.83s/it] 97%|█████████▋| 9694/10000 [38:01:15<1:10:22, 13.80s/it] {'loss': 0.002, 'learning_rate': 1.595e-06, 'epoch': 12.69} 97%|█████████▋| 9694/10000 [38:01:15<1:10:22, 13.80s/it] 97%|█████████▋| 9695/10000 [38:01:28<1:09:59, 13.77s/it] {'loss': 0.0015, 'learning_rate': 1.5900000000000002e-06, 'epoch': 12.69} 97%|█████████▋| 9695/10000 [38:01:28<1:09:59, 13.77s/it] 97%|█████████▋| 9696/10000 [38:01:42<1:09:51, 13.79s/it] {'loss': 0.0045, 'learning_rate': 1.585e-06, 'epoch': 12.69} 97%|█████████▋| 9696/10000 [38:01:42<1:09:51, 13.79s/it] 97%|█████████▋| 9697/10000 [38:01:56<1:09:46, 13.82s/it] {'loss': 0.0016, 'learning_rate': 1.5800000000000003e-06, 'epoch': 12.69} 97%|█████████▋| 9697/10000 [38:01:56<1:09:46, 13.82s/it] 97%|█████████▋| 9698/10000 [38:02:10<1:09:26, 13.79s/it] {'loss': 0.0019, 'learning_rate': 1.5750000000000002e-06, 'epoch': 12.69} 97%|█████████▋| 9698/10000 [38:02:10<1:09:26, 13.79s/it] 97%|█████████▋| 9699/10000 [38:02:24<1:09:11, 13.79s/it] {'loss': 0.0022, 'learning_rate': 1.57e-06, 'epoch': 12.7} 97%|█████████▋| 9699/10000 [38:02:24<1:09:11, 13.79s/it] 97%|█████████▋| 9700/10000 [38:02:37<1:08:59, 13.80s/it] {'loss': 0.002, 'learning_rate': 1.565e-06, 'epoch': 12.7} 97%|█████████▋| 9700/10000 [38:02:37<1:08:59, 13.80s/it] 97%|█████████▋| 9701/10000 [38:02:51<1:08:49, 13.81s/it] {'loss': 0.0021, 'learning_rate': 1.56e-06, 'epoch': 12.7} 97%|█████████▋| 9701/10000 [38:02:51<1:08:49, 13.81s/it] 97%|█████████▋| 9702/10000 [38:03:05<1:08:36, 13.81s/it] {'loss': 0.0021, 'learning_rate': 1.555e-06, 'epoch': 12.7} 97%|█████████▋| 9702/10000 [38:03:05<1:08:36, 13.81s/it] 97%|█████████▋| 9703/10000 [38:03:19<1:08:28, 13.83s/it] {'loss': 0.0027, 'learning_rate': 1.55e-06, 'epoch': 12.7} 97%|█████████▋| 9703/10000 [38:03:19<1:08:28, 13.83s/it] 97%|█████████▋| 9704/10000 [38:03:33<1:08:22, 13.86s/it] {'loss': 0.0014, 'learning_rate': 1.545e-06, 'epoch': 12.7} 97%|█████████▋| 9704/10000 [38:03:33<1:08:22, 13.86s/it] 97%|█████████▋| 9705/10000 [38:03:47<1:08:08, 13.86s/it] {'loss': 0.0015, 'learning_rate': 1.54e-06, 'epoch': 12.7} 97%|█████████▋| 9705/10000 [38:03:47<1:08:08, 13.86s/it] 97%|█████████▋| 9706/10000 [38:04:01<1:07:55, 13.86s/it] {'loss': 0.0011, 'learning_rate': 1.5350000000000001e-06, 'epoch': 12.7} 97%|█████████▋| 9706/10000 [38:04:01<1:07:55, 13.86s/it] 97%|█████████▋| 9707/10000 [38:04:14<1:07:42, 13.86s/it] {'loss': 0.0016, 'learning_rate': 1.53e-06, 'epoch': 12.71} 97%|█████████▋| 9707/10000 [38:04:14<1:07:42, 13.86s/it] 97%|█████████▋| 9708/10000 [38:04:28<1:07:40, 13.90s/it] {'loss': 0.0022, 'learning_rate': 1.525e-06, 'epoch': 12.71} 97%|█████████▋| 9708/10000 [38:04:29<1:07:40, 13.90s/it] 97%|█████████▋| 9709/10000 [38:04:42<1:07:28, 13.91s/it] {'loss': 0.0026, 'learning_rate': 1.52e-06, 'epoch': 12.71} 97%|█████████▋| 9709/10000 [38:04:42<1:07:28, 13.91s/it] 97%|█████████▋| 9710/10000 [38:04:56<1:07:08, 13.89s/it] {'loss': 0.0019, 'learning_rate': 1.5150000000000001e-06, 'epoch': 12.71} 97%|█████████▋| 9710/10000 [38:04:56<1:07:08, 13.89s/it] 97%|█████████▋| 9711/10000 [38:05:10<1:06:48, 13.87s/it] {'loss': 0.0028, 'learning_rate': 1.5100000000000002e-06, 'epoch': 12.71} 97%|█████████▋| 9711/10000 [38:05:10<1:06:48, 13.87s/it] 97%|█████████▋| 9712/10000 [38:05:24<1:06:38, 13.88s/it] {'loss': 0.0014, 'learning_rate': 1.505e-06, 'epoch': 12.71} 97%|█████████▋| 9712/10000 [38:05:24<1:06:38, 13.88s/it] 97%|█████████▋| 9713/10000 [38:05:38<1:06:26, 13.89s/it] {'loss': 0.0015, 'learning_rate': 1.5e-06, 'epoch': 12.71} 97%|█████████▋| 9713/10000 [38:05:38<1:06:26, 13.89s/it] 97%|█████████▋| 9714/10000 [38:05:52<1:06:13, 13.89s/it] {'loss': 0.0024, 'learning_rate': 1.495e-06, 'epoch': 12.71} 97%|█████████▋| 9714/10000 [38:05:52<1:06:13, 13.89s/it] 97%|█████████▋| 9715/10000 [38:06:06<1:06:06, 13.92s/it] {'loss': 0.0024, 'learning_rate': 1.4900000000000001e-06, 'epoch': 12.72} 97%|█████████▋| 9715/10000 [38:06:06<1:06:06, 13.92s/it] 97%|█████████▋| 9716/10000 [38:06:20<1:05:56, 13.93s/it] {'loss': 0.0029, 'learning_rate': 1.4850000000000002e-06, 'epoch': 12.72} 97%|█████████▋| 9716/10000 [38:06:20<1:05:56, 13.93s/it] 97%|█████████▋| 9717/10000 [38:06:34<1:05:40, 13.92s/it] {'loss': 0.0021, 'learning_rate': 1.4800000000000002e-06, 'epoch': 12.72} 97%|█████████▋| 9717/10000 [38:06:34<1:05:40, 13.92s/it] 97%|█████████▋| 9718/10000 [38:06:48<1:05:24, 13.92s/it] {'loss': 0.0024, 'learning_rate': 1.475e-06, 'epoch': 12.72} 97%|█████████▋| 9718/10000 [38:06:48<1:05:24, 13.92s/it] 97%|█████████▋| 9719/10000 [38:07:01<1:04:59, 13.88s/it] {'loss': 0.0019, 'learning_rate': 1.4700000000000001e-06, 'epoch': 12.72} 97%|█████████▋| 9719/10000 [38:07:01<1:04:59, 13.88s/it] 97%|█████████▋| 9720/10000 [38:07:15<1:04:48, 13.89s/it] {'loss': 0.0017, 'learning_rate': 1.465e-06, 'epoch': 12.72} 97%|█████████▋| 9720/10000 [38:07:15<1:04:48, 13.89s/it] 97%|█████████▋| 9721/10000 [38:07:29<1:04:36, 13.89s/it] {'loss': 0.002, 'learning_rate': 1.46e-06, 'epoch': 12.72} 97%|█████████▋| 9721/10000 [38:07:29<1:04:36, 13.89s/it] 97%|█████████▋| 9722/10000 [38:07:43<1:04:25, 13.91s/it] {'loss': 0.0017, 'learning_rate': 1.455e-06, 'epoch': 12.73} 97%|█████████▋| 9722/10000 [38:07:43<1:04:25, 13.91s/it] 97%|█████████▋| 9723/10000 [38:07:57<1:04:07, 13.89s/it] {'loss': 0.0018, 'learning_rate': 1.45e-06, 'epoch': 12.73} 97%|█████████▋| 9723/10000 [38:07:57<1:04:07, 13.89s/it] 97%|█████████▋| 9724/10000 [38:08:11<1:04:02, 13.92s/it] {'loss': 0.0023, 'learning_rate': 1.445e-06, 'epoch': 12.73} 97%|█████████▋| 9724/10000 [38:08:11<1:04:02, 13.92s/it] 97%|█████████▋| 9725/10000 [38:08:25<1:03:44, 13.91s/it] {'loss': 0.0017, 'learning_rate': 1.44e-06, 'epoch': 12.73} 97%|█████████▋| 9725/10000 [38:08:25<1:03:44, 13.91s/it] 97%|█████████▋| 9726/10000 [38:08:39<1:03:38, 13.94s/it] {'loss': 0.0016, 'learning_rate': 1.435e-06, 'epoch': 12.73} 97%|█████████▋| 9726/10000 [38:08:39<1:03:38, 13.94s/it] 97%|█████████▋| 9727/10000 [38:08:53<1:03:17, 13.91s/it] {'loss': 0.0027, 'learning_rate': 1.43e-06, 'epoch': 12.73} 97%|█████████▋| 9727/10000 [38:08:53<1:03:17, 13.91s/it] 97%|█████████▋| 9728/10000 [38:09:06<1:02:54, 13.88s/it] {'loss': 0.0019, 'learning_rate': 1.4250000000000001e-06, 'epoch': 12.73} 97%|█████████▋| 9728/10000 [38:09:06<1:02:54, 13.88s/it] 97%|█████████▋| 9729/10000 [38:09:20<1:02:43, 13.89s/it] {'loss': 0.0021, 'learning_rate': 1.4200000000000002e-06, 'epoch': 12.73} 97%|█████████▋| 9729/10000 [38:09:20<1:02:43, 13.89s/it] 97%|█████████▋| 9730/10000 [38:09:34<1:02:26, 13.88s/it] {'loss': 0.0018, 'learning_rate': 1.415e-06, 'epoch': 12.74} 97%|█████████▋| 9730/10000 [38:09:34<1:02:26, 13.88s/it] 97%|█████████▋| 9731/10000 [38:09:48<1:02:10, 13.87s/it] {'loss': 0.0031, 'learning_rate': 1.41e-06, 'epoch': 12.74} 97%|█████████▋| 9731/10000 [38:09:48<1:02:10, 13.87s/it] 97%|█████████▋| 9732/10000 [38:10:02<1:02:03, 13.90s/it] {'loss': 0.0022, 'learning_rate': 1.405e-06, 'epoch': 12.74} 97%|█████████▋| 9732/10000 [38:10:02<1:02:03, 13.90s/it] 97%|█████████▋| 9733/10000 [38:10:16<1:01:49, 13.89s/it] {'loss': 0.0019, 'learning_rate': 1.4000000000000001e-06, 'epoch': 12.74} 97%|█████████▋| 9733/10000 [38:10:16<1:01:49, 13.89s/it] 97%|█████████▋| 9734/10000 [38:10:30<1:01:41, 13.92s/it] {'loss': 0.0019, 'learning_rate': 1.3950000000000002e-06, 'epoch': 12.74} 97%|█████████▋| 9734/10000 [38:10:30<1:01:41, 13.92s/it] 97%|█████████▋| 9735/10000 [38:10:44<1:01:20, 13.89s/it] {'loss': 0.0016, 'learning_rate': 1.39e-06, 'epoch': 12.74} 97%|█████████▋| 9735/10000 [38:10:44<1:01:20, 13.89s/it] 97%|█████████▋| 9736/10000 [38:10:58<1:01:17, 13.93s/it] {'loss': 0.0023, 'learning_rate': 1.385e-06, 'epoch': 12.74} 97%|█████████▋| 9736/10000 [38:10:58<1:01:17, 13.93s/it] 97%|█████████▋| 9737/10000 [38:11:11<1:00:47, 13.87s/it] {'loss': 0.0022, 'learning_rate': 1.3800000000000001e-06, 'epoch': 12.74} 97%|█████████▋| 9737/10000 [38:11:11<1:00:47, 13.87s/it] 97%|█████████▋| 9738/10000 [38:11:25<1:00:28, 13.85s/it] {'loss': 0.0016, 'learning_rate': 1.3750000000000002e-06, 'epoch': 12.75} 97%|█████████▋| 9738/10000 [38:11:25<1:00:28, 13.85s/it] 97%|█████████▋| 9739/10000 [38:11:39<1:00:16, 13.85s/it] {'loss': 0.0019, 'learning_rate': 1.37e-06, 'epoch': 12.75} 97%|█████████▋| 9739/10000 [38:11:39<1:00:16, 13.85s/it] 97%|█████████▋| 9740/10000 [38:11:53<1:00:09, 13.88s/it] {'loss': 0.003, 'learning_rate': 1.365e-06, 'epoch': 12.75} 97%|█████████▋| 9740/10000 [38:11:53<1:00:09, 13.88s/it] 97%|█████████▋| 9741/10000 [38:12:07<59:51, 13.87s/it] {'loss': 0.0027, 'learning_rate': 1.36e-06, 'epoch': 12.75} 97%|█████████▋| 9741/10000 [38:12:07<59:51, 13.87s/it] 97%|█████████▋| 9742/10000 [38:12:21<59:43, 13.89s/it] {'loss': 0.0021, 'learning_rate': 1.355e-06, 'epoch': 12.75} 97%|█████████▋| 9742/10000 [38:12:21<59:43, 13.89s/it] 97%|█████████▋| 9743/10000 [38:12:35<59:30, 13.89s/it] {'loss': 0.0025, 'learning_rate': 1.35e-06, 'epoch': 12.75} 97%|█████████▋| 9743/10000 [38:12:35<59:30, 13.89s/it] 97%|█████████▋| 9744/10000 [38:12:49<59:24, 13.92s/it] {'loss': 0.0013, 'learning_rate': 1.345e-06, 'epoch': 12.75} 97%|█████████▋| 9744/10000 [38:12:49<59:24, 13.92s/it] 97%|█████████▋| 9745/10000 [38:13:02<58:58, 13.88s/it] {'loss': 0.0024, 'learning_rate': 1.34e-06, 'epoch': 12.76} 97%|█████████▋| 9745/10000 [38:13:03<58:58, 13.88s/it] 97%|█████████▋| 9746/10000 [38:13:16<58:40, 13.86s/it] {'loss': 0.003, 'learning_rate': 1.3350000000000001e-06, 'epoch': 12.76} 97%|█████████▋| 9746/10000 [38:13:16<58:40, 13.86s/it] 97%|█████████▋| 9747/10000 [38:13:30<58:29, 13.87s/it] {'loss': 0.0018, 'learning_rate': 1.33e-06, 'epoch': 12.76} 97%|█████████▋| 9747/10000 [38:13:30<58:29, 13.87s/it] 97%|█████████▋| 9748/10000 [38:13:44<58:19, 13.89s/it] {'loss': 0.0026, 'learning_rate': 1.325e-06, 'epoch': 12.76} 97%|█████████▋| 9748/10000 [38:13:44<58:19, 13.89s/it] 97%|█████████▋| 9749/10000 [38:13:58<58:05, 13.89s/it] {'loss': 0.0026, 'learning_rate': 1.32e-06, 'epoch': 12.76} 97%|█████████▋| 9749/10000 [38:13:58<58:05, 13.89s/it] 98%|█████████▊| 9750/10000 [38:14:12<57:41, 13.85s/it] {'loss': 0.0022, 'learning_rate': 1.3150000000000001e-06, 'epoch': 12.76} 98%|█████████▊| 9750/10000 [38:14:12<57:41, 13.85s/it] 98%|█████████▊| 9751/10000 [38:14:26<57:29, 13.85s/it] {'loss': 0.002, 'learning_rate': 1.3100000000000002e-06, 'epoch': 12.76} 98%|█████████▊| 9751/10000 [38:14:26<57:29, 13.85s/it] 98%|█████████▊| 9752/10000 [38:14:40<57:21, 13.88s/it] {'loss': 0.0022, 'learning_rate': 1.3050000000000002e-06, 'epoch': 12.76} 98%|█████████▊| 9752/10000 [38:14:40<57:21, 13.88s/it] 98%|█████████▊| 9753/10000 [38:14:53<57:03, 13.86s/it] {'loss': 0.0018, 'learning_rate': 1.3e-06, 'epoch': 12.77} 98%|█████████▊| 9753/10000 [38:14:53<57:03, 13.86s/it] 98%|█████████▊| 9754/10000 [38:15:07<56:54, 13.88s/it] {'loss': 0.0018, 'learning_rate': 1.295e-06, 'epoch': 12.77} 98%|█████████▊| 9754/10000 [38:15:07<56:54, 13.88s/it] 98%|█████████▊| 9755/10000 [38:15:21<56:40, 13.88s/it] {'loss': 0.002, 'learning_rate': 1.2900000000000001e-06, 'epoch': 12.77} 98%|█████████▊| 9755/10000 [38:15:21<56:40, 13.88s/it] 98%|█████████▊| 9756/10000 [38:15:35<56:38, 13.93s/it] {'loss': 0.002, 'learning_rate': 1.2850000000000002e-06, 'epoch': 12.77} 98%|█████████▊| 9756/10000 [38:15:35<56:38, 13.93s/it] 98%|█████████▊| 9757/10000 [38:15:49<56:16, 13.90s/it] {'loss': 0.0025, 'learning_rate': 1.28e-06, 'epoch': 12.77} 98%|█████████▊| 9757/10000 [38:15:49<56:16, 13.90s/it] 98%|█████████▊| 9758/10000 [38:16:03<56:02, 13.89s/it] {'loss': 0.0019, 'learning_rate': 1.275e-06, 'epoch': 12.77} 98%|█████████▊| 9758/10000 [38:16:03<56:02, 13.89s/it] 98%|█████████▊| 9759/10000 [38:16:17<55:46, 13.89s/it] {'loss': 0.0016, 'learning_rate': 1.27e-06, 'epoch': 12.77} 98%|█████████▊| 9759/10000 [38:16:17<55:46, 13.89s/it] 98%|█████████▊| 9760/10000 [38:16:31<55:27, 13.86s/it] {'loss': 0.0015, 'learning_rate': 1.265e-06, 'epoch': 12.77} 98%|█████████▊| 9760/10000 [38:16:31<55:27, 13.86s/it] 98%|█████████▊| 9761/10000 [38:16:44<55:06, 13.83s/it] {'loss': 0.002, 'learning_rate': 1.26e-06, 'epoch': 12.78} 98%|█████████▊| 9761/10000 [38:16:44<55:06, 13.83s/it] 98%|█████████▊| 9762/10000 [38:16:58<54:48, 13.82s/it] {'loss': 0.0019, 'learning_rate': 1.255e-06, 'epoch': 12.78} 98%|█████████▊| 9762/10000 [38:16:58<54:48, 13.82s/it] 98%|█████████▊| 9763/10000 [38:17:12<54:38, 13.83s/it] {'loss': 0.0014, 'learning_rate': 1.25e-06, 'epoch': 12.78} 98%|█████████▊| 9763/10000 [38:17:12<54:38, 13.83s/it] 98%|█████████▊| 9764/10000 [38:17:26<54:27, 13.84s/it] {'loss': 0.0019, 'learning_rate': 1.245e-06, 'epoch': 12.78} 98%|█████████▊| 9764/10000 [38:17:26<54:27, 13.84s/it] 98%|█████████▊| 9765/10000 [38:17:40<54:13, 13.85s/it] {'loss': 0.0018, 'learning_rate': 1.24e-06, 'epoch': 12.78} 98%|█████████▊| 9765/10000 [38:17:40<54:13, 13.85s/it] 98%|█████████▊| 9766/10000 [38:17:54<54:00, 13.85s/it] {'loss': 0.0015, 'learning_rate': 1.235e-06, 'epoch': 12.78} 98%|█████████▊| 9766/10000 [38:17:54<54:00, 13.85s/it] 98%|█████████▊| 9767/10000 [38:18:07<53:44, 13.84s/it] {'loss': 0.002, 'learning_rate': 1.23e-06, 'epoch': 12.78} 98%|█████████▊| 9767/10000 [38:18:07<53:44, 13.84s/it] 98%|█████████▊| 9768/10000 [38:18:21<53:25, 13.82s/it] {'loss': 0.002, 'learning_rate': 1.2250000000000001e-06, 'epoch': 12.79} 98%|█████████▊| 9768/10000 [38:18:21<53:25, 13.82s/it] 98%|█████████▊| 9769/10000 [38:18:35<53:14, 13.83s/it] {'loss': 0.0016, 'learning_rate': 1.2200000000000002e-06, 'epoch': 12.79} 98%|█████████▊| 9769/10000 [38:18:35<53:14, 13.83s/it] 98%|█████████▊| 9770/10000 [38:18:49<53:13, 13.88s/it] {'loss': 0.0021, 'learning_rate': 1.215e-06, 'epoch': 12.79} 98%|█████████▊| 9770/10000 [38:18:49<53:13, 13.88s/it] 98%|█████████▊| 9771/10000 [38:19:03<52:57, 13.87s/it] {'loss': 0.0015, 'learning_rate': 1.21e-06, 'epoch': 12.79} 98%|█████████▊| 9771/10000 [38:19:03<52:57, 13.87s/it] 98%|█████████▊| 9772/10000 [38:19:17<52:36, 13.84s/it] {'loss': 0.0029, 'learning_rate': 1.2050000000000001e-06, 'epoch': 12.79} 98%|█████████▊| 9772/10000 [38:19:17<52:36, 13.84s/it] 98%|█████████▊| 9773/10000 [38:19:31<52:24, 13.85s/it] {'loss': 0.0009, 'learning_rate': 1.2000000000000002e-06, 'epoch': 12.79} 98%|█████████▊| 9773/10000 [38:19:31<52:24, 13.85s/it] 98%|█████████▊| 9774/10000 [38:19:44<52:10, 13.85s/it] {'loss': 0.003, 'learning_rate': 1.1950000000000002e-06, 'epoch': 12.79} 98%|█████████▊| 9774/10000 [38:19:44<52:10, 13.85s/it] 98%|█████████▊| 9775/10000 [38:19:58<51:55, 13.85s/it] {'loss': 0.0016, 'learning_rate': 1.19e-06, 'epoch': 12.79} 98%|█████████▊| 9775/10000 [38:19:58<51:55, 13.85s/it] 98%|█████████▊| 9776/10000 [38:20:12<51:45, 13.86s/it] {'loss': 0.0019, 'learning_rate': 1.185e-06, 'epoch': 12.8} 98%|█████████▊| 9776/10000 [38:20:12<51:45, 13.86s/it] 98%|█████████▊| 9777/10000 [38:20:26<51:32, 13.87s/it] {'loss': 0.0022, 'learning_rate': 1.18e-06, 'epoch': 12.8} 98%|█████████▊| 9777/10000 [38:20:26<51:32, 13.87s/it] 98%|█████████▊| 9778/10000 [38:20:40<51:22, 13.88s/it] {'loss': 0.0012, 'learning_rate': 1.175e-06, 'epoch': 12.8} 98%|█████████▊| 9778/10000 [38:20:40<51:22, 13.88s/it] 98%|█████████▊| 9779/10000 [38:20:54<51:02, 13.86s/it] {'loss': 0.0042, 'learning_rate': 1.17e-06, 'epoch': 12.8} 98%|█████████▊| 9779/10000 [38:20:54<51:02, 13.86s/it] 98%|█████████▊| 9780/10000 [38:21:08<50:48, 13.86s/it] {'loss': 0.0021, 'learning_rate': 1.165e-06, 'epoch': 12.8} 98%|█████████▊| 9780/10000 [38:21:08<50:48, 13.86s/it] 98%|█████████▊| 9781/10000 [38:21:21<50:33, 13.85s/it] {'loss': 0.0019, 'learning_rate': 1.16e-06, 'epoch': 12.8} 98%|█████████▊| 9781/10000 [38:21:21<50:33, 13.85s/it] 98%|█████████▊| 9782/10000 [38:21:35<50:20, 13.85s/it] {'loss': 0.0027, 'learning_rate': 1.155e-06, 'epoch': 12.8} 98%|█████████▊| 9782/10000 [38:21:35<50:20, 13.85s/it] 98%|█████████▊| 9783/10000 [38:21:49<50:11, 13.88s/it] {'loss': 0.0023, 'learning_rate': 1.15e-06, 'epoch': 12.8} 98%|█████████▊| 9783/10000 [38:21:49<50:11, 13.88s/it] 98%|█████████▊| 9784/10000 [38:22:03<49:55, 13.87s/it] {'loss': 0.0016, 'learning_rate': 1.145e-06, 'epoch': 12.81} 98%|█████████▊| 9784/10000 [38:22:03<49:55, 13.87s/it] 98%|█████████▊| 9785/10000 [38:22:17<49:38, 13.86s/it] {'loss': 0.002, 'learning_rate': 1.14e-06, 'epoch': 12.81} 98%|█████████▊| 9785/10000 [38:22:17<49:38, 13.86s/it] 98%|█████████▊| 9786/10000 [38:22:31<49:20, 13.83s/it] {'loss': 0.0035, 'learning_rate': 1.1350000000000001e-06, 'epoch': 12.81} 98%|█████████▊| 9786/10000 [38:22:31<49:20, 13.83s/it] 98%|█████████▊| 9787/10000 [38:22:44<49:01, 13.81s/it] {'loss': 0.0023, 'learning_rate': 1.13e-06, 'epoch': 12.81} 98%|█████████▊| 9787/10000 [38:22:44<49:01, 13.81s/it] 98%|█████████▊| 9788/10000 [38:22:58<48:57, 13.86s/it] {'loss': 0.0032, 'learning_rate': 1.125e-06, 'epoch': 12.81} 98%|█████████▊| 9788/10000 [38:22:58<48:57, 13.86s/it] 98%|█████████▊| 9789/10000 [38:23:12<48:46, 13.87s/it] {'loss': 0.0017, 'learning_rate': 1.12e-06, 'epoch': 12.81} 98%|█████████▊| 9789/10000 [38:23:12<48:46, 13.87s/it] 98%|█████████▊| 9790/10000 [38:23:26<48:30, 13.86s/it] {'loss': 0.0028, 'learning_rate': 1.1150000000000001e-06, 'epoch': 12.81} 98%|█████████▊| 9790/10000 [38:23:26<48:30, 13.86s/it] 98%|█████████▊| 9791/10000 [38:23:40<48:15, 13.85s/it] {'loss': 0.0023, 'learning_rate': 1.1100000000000002e-06, 'epoch': 12.82} 98%|█████████▊| 9791/10000 [38:23:40<48:15, 13.85s/it] 98%|█████████▊| 9792/10000 [38:23:54<48:00, 13.85s/it] {'loss': 0.0025, 'learning_rate': 1.1050000000000002e-06, 'epoch': 12.82} 98%|█████████▊| 9792/10000 [38:23:54<48:00, 13.85s/it] 98%|█████████▊| 9793/10000 [38:24:08<47:48, 13.86s/it] {'loss': 0.0018, 'learning_rate': 1.1e-06, 'epoch': 12.82} 98%|█████████▊| 9793/10000 [38:24:08<47:48, 13.86s/it] 98%|█████████▊| 9794/10000 [38:24:21<47:31, 13.84s/it] {'loss': 0.0017, 'learning_rate': 1.095e-06, 'epoch': 12.82} 98%|█████████▊| 9794/10000 [38:24:22<47:31, 13.84s/it] 98%|█████████▊| 9795/10000 [38:24:35<47:18, 13.85s/it] {'loss': 0.004, 'learning_rate': 1.0900000000000002e-06, 'epoch': 12.82} 98%|█████████▊| 9795/10000 [38:24:35<47:18, 13.85s/it] 98%|█████████▊| 9796/10000 [38:24:49<47:07, 13.86s/it] {'loss': 0.002, 'learning_rate': 1.085e-06, 'epoch': 12.82} 98%|█████████▊| 9796/10000 [38:24:49<47:07, 13.86s/it] 98%|█████████▊| 9797/10000 [38:25:03<46:50, 13.85s/it] {'loss': 0.0013, 'learning_rate': 1.08e-06, 'epoch': 12.82} 98%|█████████▊| 9797/10000 [38:25:03<46:50, 13.85s/it] 98%|█████████▊| 9798/10000 [38:25:17<46:38, 13.85s/it] {'loss': 0.0018, 'learning_rate': 1.0749999999999999e-06, 'epoch': 12.82} 98%|█████████▊| 9798/10000 [38:25:17<46:38, 13.85s/it] 98%|█████████▊| 9799/10000 [38:25:31<46:22, 13.84s/it] {'loss': 0.002, 'learning_rate': 1.07e-06, 'epoch': 12.83} 98%|█████████▊| 9799/10000 [38:25:31<46:22, 13.84s/it] 98%|█████████▊| 9800/10000 [38:25:45<46:10, 13.85s/it] {'loss': 0.0023, 'learning_rate': 1.065e-06, 'epoch': 12.83} 98%|█████████▊| 9800/10000 [38:25:45<46:10, 13.85s/it] 98%|█████████▊| 9801/10000 [38:25:59<45:58, 13.86s/it] {'loss': 0.0014, 'learning_rate': 1.06e-06, 'epoch': 12.83} 98%|█████████▊| 9801/10000 [38:25:59<45:58, 13.86s/it] 98%|█████████▊| 9802/10000 [38:26:12<45:42, 13.85s/it] {'loss': 0.0025, 'learning_rate': 1.055e-06, 'epoch': 12.83} 98%|█████████▊| 9802/10000 [38:26:12<45:42, 13.85s/it] 98%|█████████▊| 9803/10000 [38:26:26<45:28, 13.85s/it] {'loss': 0.0019, 'learning_rate': 1.0500000000000001e-06, 'epoch': 12.83} 98%|█████████▊| 9803/10000 [38:26:26<45:28, 13.85s/it] 98%|█████████▊| 9804/10000 [38:26:40<45:16, 13.86s/it] {'loss': 0.0019, 'learning_rate': 1.045e-06, 'epoch': 12.83} 98%|█████████▊| 9804/10000 [38:26:40<45:16, 13.86s/it] 98%|█████████▊| 9805/10000 [38:26:54<45:03, 13.87s/it] {'loss': 0.0011, 'learning_rate': 1.04e-06, 'epoch': 12.83} 98%|█████████▊| 9805/10000 [38:26:54<45:03, 13.87s/it] 98%|█████████▊| 9806/10000 [38:27:08<44:50, 13.87s/it] {'loss': 0.0014, 'learning_rate': 1.035e-06, 'epoch': 12.84} 98%|█████████▊| 9806/10000 [38:27:08<44:50, 13.87s/it] 98%|█████████▊| 9807/10000 [38:27:22<44:28, 13.83s/it] {'loss': 0.0026, 'learning_rate': 1.03e-06, 'epoch': 12.84} 98%|█████████▊| 9807/10000 [38:27:22<44:28, 13.83s/it] 98%|█████████▊| 9808/10000 [38:27:36<44:23, 13.87s/it] {'loss': 0.0017, 'learning_rate': 1.0250000000000001e-06, 'epoch': 12.84} 98%|█████████▊| 9808/10000 [38:27:36<44:23, 13.87s/it] 98%|█████████▊| 9809/10000 [38:27:50<44:17, 13.91s/it] {'loss': 0.0012, 'learning_rate': 1.0200000000000002e-06, 'epoch': 12.84} 98%|█████████▊| 9809/10000 [38:27:50<44:17, 13.91s/it] 98%|█████████▊| 9810/10000 [38:28:03<43:54, 13.86s/it] {'loss': 0.0023, 'learning_rate': 1.015e-06, 'epoch': 12.84} 98%|█████████▊| 9810/10000 [38:28:03<43:54, 13.86s/it] 98%|█████████▊| 9811/10000 [38:28:17<43:38, 13.85s/it] {'loss': 0.0027, 'learning_rate': 1.01e-06, 'epoch': 12.84} 98%|█████████▊| 9811/10000 [38:28:17<43:38, 13.85s/it] 98%|█████████▊| 9812/10000 [38:28:31<43:26, 13.86s/it] {'loss': 0.0027, 'learning_rate': 1.0050000000000001e-06, 'epoch': 12.84} 98%|█████████▊| 9812/10000 [38:28:31<43:26, 13.86s/it] 98%|█████████▊| 9813/10000 [38:28:45<43:10, 13.86s/it] {'loss': 0.0011, 'learning_rate': 1.0000000000000002e-06, 'epoch': 12.84} 98%|█████████▊| 9813/10000 [38:28:45<43:10, 13.86s/it] 98%|█████████▊| 9814/10000 [38:28:59<43:06, 13.90s/it] {'loss': 0.0022, 'learning_rate': 9.95e-07, 'epoch': 12.85} 98%|█████████▊| 9814/10000 [38:28:59<43:06, 13.90s/it] 98%|█████████▊| 9815/10000 [38:29:13<42:45, 13.86s/it] {'loss': 0.0018, 'learning_rate': 9.9e-07, 'epoch': 12.85} 98%|█████████▊| 9815/10000 [38:29:13<42:45, 13.86s/it] 98%|█████████▊| 9816/10000 [38:29:26<42:28, 13.85s/it] {'loss': 0.0018, 'learning_rate': 9.849999999999999e-07, 'epoch': 12.85} 98%|█████████▊| 9816/10000 [38:29:26<42:28, 13.85s/it] 98%|█████████▊| 9817/10000 [38:29:40<42:15, 13.85s/it] {'loss': 0.0028, 'learning_rate': 9.8e-07, 'epoch': 12.85} 98%|█████████▊| 9817/10000 [38:29:40<42:15, 13.85s/it] 98%|█████████▊| 9818/10000 [38:29:54<42:03, 13.86s/it] {'loss': 0.0024, 'learning_rate': 9.75e-07, 'epoch': 12.85} 98%|█████████▊| 9818/10000 [38:29:54<42:03, 13.86s/it] 98%|█████████▊| 9819/10000 [38:30:08<41:45, 13.84s/it] {'loss': 0.0016, 'learning_rate': 9.7e-07, 'epoch': 12.85} 98%|█████████▊| 9819/10000 [38:30:08<41:45, 13.84s/it] 98%|█████████▊| 9820/10000 [38:30:22<41:33, 13.85s/it] {'loss': 0.0017, 'learning_rate': 9.65e-07, 'epoch': 12.85} 98%|█████████▊| 9820/10000 [38:30:22<41:33, 13.85s/it] 98%|█████████▊| 9821/10000 [38:30:36<41:17, 13.84s/it] {'loss': 0.0021, 'learning_rate': 9.6e-07, 'epoch': 12.85} 98%|█████████▊| 9821/10000 [38:30:36<41:17, 13.84s/it] 98%|█████████▊| 9822/10000 [38:30:49<41:02, 13.83s/it] {'loss': 0.0027, 'learning_rate': 9.55e-07, 'epoch': 12.86} 98%|█████████▊| 9822/10000 [38:30:50<41:02, 13.83s/it] 98%|█████████▊| 9823/10000 [38:31:03<40:53, 13.86s/it] {'loss': 0.0012, 'learning_rate': 9.5e-07, 'epoch': 12.86} 98%|█████████▊| 9823/10000 [38:31:03<40:53, 13.86s/it] 98%|█████████▊| 9824/10000 [38:31:17<40:41, 13.87s/it] {'loss': 0.0034, 'learning_rate': 9.450000000000001e-07, 'epoch': 12.86} 98%|█████████▊| 9824/10000 [38:31:17<40:41, 13.87s/it] 98%|█████████▊| 9825/10000 [38:31:31<40:26, 13.87s/it] {'loss': 0.0013, 'learning_rate': 9.400000000000001e-07, 'epoch': 12.86} 98%|█████████▊| 9825/10000 [38:31:31<40:26, 13.87s/it] 98%|█████████▊| 9826/10000 [38:31:45<40:13, 13.87s/it] {'loss': 0.0017, 'learning_rate': 9.350000000000002e-07, 'epoch': 12.86} 98%|█████████▊| 9826/10000 [38:31:45<40:13, 13.87s/it] 98%|█████████▊| 9827/10000 [38:31:59<39:51, 13.83s/it] {'loss': 0.0008, 'learning_rate': 9.3e-07, 'epoch': 12.86} 98%|█████████▊| 9827/10000 [38:31:59<39:51, 13.83s/it] 98%|█████████▊| 9828/10000 [38:32:12<39:32, 13.80s/it] {'loss': 0.0021, 'learning_rate': 9.25e-07, 'epoch': 12.86} 98%|█████████▊| 9828/10000 [38:32:13<39:32, 13.80s/it] 98%|█████████▊| 9829/10000 [38:32:26<39:22, 13.81s/it] {'loss': 0.0021, 'learning_rate': 9.2e-07, 'epoch': 12.87} 98%|█████████▊| 9829/10000 [38:32:26<39:22, 13.81s/it] 98%|█████████▊| 9830/10000 [38:32:40<39:13, 13.84s/it] {'loss': 0.0021, 'learning_rate': 9.15e-07, 'epoch': 12.87} 98%|█████████▊| 9830/10000 [38:32:40<39:13, 13.84s/it] 98%|█████████▊| 9831/10000 [38:32:54<39:08, 13.90s/it] {'loss': 0.0024, 'learning_rate': 9.100000000000001e-07, 'epoch': 12.87} 98%|█████████▊| 9831/10000 [38:32:54<39:08, 13.90s/it] 98%|█████████▊| 9832/10000 [38:33:08<38:53, 13.89s/it] {'loss': 0.0015, 'learning_rate': 9.050000000000001e-07, 'epoch': 12.87} 98%|█████████▊| 9832/10000 [38:33:08<38:53, 13.89s/it] 98%|█████████▊| 9833/10000 [38:33:22<38:39, 13.89s/it] {'loss': 0.0041, 'learning_rate': 9e-07, 'epoch': 12.87} 98%|█████████▊| 9833/10000 [38:33:22<38:39, 13.89s/it] 98%|█████████▊| 9834/10000 [38:33:36<38:23, 13.88s/it] {'loss': 0.0018, 'learning_rate': 8.95e-07, 'epoch': 12.87} 98%|█████████▊| 9834/10000 [38:33:36<38:23, 13.88s/it] 98%|█████████▊| 9835/10000 [38:33:50<38:11, 13.89s/it] {'loss': 0.0011, 'learning_rate': 8.900000000000001e-07, 'epoch': 12.87} 98%|█████████▊| 9835/10000 [38:33:50<38:11, 13.89s/it] 98%|█████████▊| 9836/10000 [38:34:04<37:57, 13.88s/it] {'loss': 0.0007, 'learning_rate': 8.850000000000001e-07, 'epoch': 12.87} 98%|█████████▊| 9836/10000 [38:34:04<37:57, 13.88s/it] 98%|█████████▊| 9837/10000 [38:34:17<37:40, 13.87s/it] {'loss': 0.0024, 'learning_rate': 8.8e-07, 'epoch': 12.88} 98%|█████████▊| 9837/10000 [38:34:18<37:40, 13.87s/it] 98%|█████████▊| 9838/10000 [38:34:31<37:22, 13.84s/it] {'loss': 0.0017, 'learning_rate': 8.750000000000001e-07, 'epoch': 12.88} 98%|█████████▊| 9838/10000 [38:34:31<37:22, 13.84s/it] 98%|█████████▊| 9839/10000 [38:34:45<37:09, 13.85s/it] {'loss': 0.0025, 'learning_rate': 8.699999999999999e-07, 'epoch': 12.88} 98%|█████████▊| 9839/10000 [38:34:45<37:09, 13.85s/it] 98%|█████████▊| 9840/10000 [38:34:59<36:56, 13.85s/it] {'loss': 0.0025, 'learning_rate': 8.65e-07, 'epoch': 12.88} 98%|█████████▊| 9840/10000 [38:34:59<36:56, 13.85s/it] 98%|█████████▊| 9841/10000 [38:35:13<36:45, 13.87s/it] {'loss': 0.0027, 'learning_rate': 8.6e-07, 'epoch': 12.88} 98%|█████████▊| 9841/10000 [38:35:13<36:45, 13.87s/it] 98%|█████████▊| 9842/10000 [38:35:27<36:27, 13.85s/it] {'loss': 0.0025, 'learning_rate': 8.550000000000001e-07, 'epoch': 12.88} 98%|█████████▊| 9842/10000 [38:35:27<36:27, 13.85s/it] 98%|█████████▊| 9843/10000 [38:35:41<36:12, 13.84s/it] {'loss': 0.0019, 'learning_rate': 8.500000000000001e-07, 'epoch': 12.88} 98%|█████████▊| 9843/10000 [38:35:41<36:12, 13.84s/it] 98%|█████████▊| 9844/10000 [38:35:54<36:00, 13.85s/it] {'loss': 0.0021, 'learning_rate': 8.45e-07, 'epoch': 12.88} 98%|█████████▊| 9844/10000 [38:35:54<36:00, 13.85s/it] 98%|█████████▊| 9845/10000 [38:36:08<35:45, 13.84s/it] {'loss': 0.0027, 'learning_rate': 8.4e-07, 'epoch': 12.89} 98%|█████████▊| 9845/10000 [38:36:08<35:45, 13.84s/it] 98%|█████████▊| 9846/10000 [38:36:22<35:30, 13.84s/it] {'loss': 0.0029, 'learning_rate': 8.35e-07, 'epoch': 12.89} 98%|█████████▊| 9846/10000 [38:36:22<35:30, 13.84s/it] 98%|█████████▊| 9847/10000 [38:36:36<35:21, 13.86s/it] {'loss': 0.0022, 'learning_rate': 8.300000000000001e-07, 'epoch': 12.89} 98%|█████████▊| 9847/10000 [38:36:36<35:21, 13.86s/it] 98%|█████████▊| 9848/10000 [38:36:50<35:13, 13.91s/it] {'loss': 0.0018, 'learning_rate': 8.25e-07, 'epoch': 12.89} 98%|█████████▊| 9848/10000 [38:36:50<35:13, 13.91s/it] 98%|█████████▊| 9849/10000 [38:37:04<34:52, 13.86s/it] {'loss': 0.0053, 'learning_rate': 8.200000000000001e-07, 'epoch': 12.89} 98%|█████████▊| 9849/10000 [38:37:04<34:52, 13.86s/it] 98%|█████████▊| 9850/10000 [38:37:18<34:36, 13.84s/it] {'loss': 0.0032, 'learning_rate': 8.149999999999999e-07, 'epoch': 12.89} 98%|█████████▊| 9850/10000 [38:37:18<34:36, 13.84s/it] 99%|█████████▊| 9851/10000 [38:37:31<34:24, 13.86s/it] {'loss': 0.0025, 'learning_rate': 8.1e-07, 'epoch': 12.89} 99%|█████████▊| 9851/10000 [38:37:31<34:24, 13.86s/it] 99%|█████████▊| 9852/10000 [38:37:45<34:14, 13.88s/it] {'loss': 0.0013, 'learning_rate': 8.05e-07, 'epoch': 12.9} 99%|█████████▊| 9852/10000 [38:37:45<34:14, 13.88s/it] 99%|█████████▊| 9853/10000 [38:37:59<34:00, 13.88s/it] {'loss': 0.0021, 'learning_rate': 8.000000000000001e-07, 'epoch': 12.9} 99%|█████████▊| 9853/10000 [38:37:59<34:00, 13.88s/it] 99%|█████████▊| 9854/10000 [38:38:13<33:50, 13.91s/it] {'loss': 0.0021, 'learning_rate': 7.950000000000001e-07, 'epoch': 12.9} 99%|█████████▊| 9854/10000 [38:38:13<33:50, 13.91s/it] 99%|█████████▊| 9855/10000 [38:38:27<33:31, 13.87s/it] {'loss': 0.0025, 'learning_rate': 7.900000000000002e-07, 'epoch': 12.9} 99%|█████████▊| 9855/10000 [38:38:27<33:31, 13.87s/it] 99%|█████████▊| 9856/10000 [38:38:41<33:17, 13.87s/it] {'loss': 0.002, 'learning_rate': 7.85e-07, 'epoch': 12.9} 99%|█████████▊| 9856/10000 [38:38:41<33:17, 13.87s/it] 99%|█████████▊| 9857/10000 [38:38:55<33:03, 13.87s/it] {'loss': 0.0023, 'learning_rate': 7.8e-07, 'epoch': 12.9} 99%|█████████▊| 9857/10000 [38:38:55<33:03, 13.87s/it] 99%|█████████▊| 9858/10000 [38:39:09<32:46, 13.85s/it] {'loss': 0.0014, 'learning_rate': 7.75e-07, 'epoch': 12.9} 99%|█████████▊| 9858/10000 [38:39:09<32:46, 13.85s/it] 99%|█████████▊| 9859/10000 [38:39:22<32:33, 13.86s/it] {'loss': 0.0017, 'learning_rate': 7.7e-07, 'epoch': 12.9} 99%|█████████▊| 9859/10000 [38:39:22<32:33, 13.86s/it] 99%|█████████▊| 9860/10000 [38:39:36<32:24, 13.89s/it] {'loss': 0.0021, 'learning_rate': 7.65e-07, 'epoch': 12.91} 99%|█████████▊| 9860/10000 [38:39:36<32:24, 13.89s/it] 99%|█████████▊| 9861/10000 [38:39:50<32:16, 13.93s/it] {'loss': 0.002, 'learning_rate': 7.6e-07, 'epoch': 12.91} 99%|█████████▊| 9861/10000 [38:39:50<32:16, 13.93s/it] 99%|█████████▊| 9862/10000 [38:40:04<32:01, 13.92s/it] {'loss': 0.0017, 'learning_rate': 7.550000000000001e-07, 'epoch': 12.91} 99%|█████████▊| 9862/10000 [38:40:04<32:01, 13.92s/it] 99%|█████████▊| 9863/10000 [38:40:18<31:45, 13.91s/it] {'loss': 0.0014, 'learning_rate': 7.5e-07, 'epoch': 12.91} 99%|█████████▊| 9863/10000 [38:40:18<31:45, 13.91s/it] 99%|█████████▊| 9864/10000 [38:40:32<31:26, 13.87s/it] {'loss': 0.002, 'learning_rate': 7.450000000000001e-07, 'epoch': 12.91} 99%|█████████▊| 9864/10000 [38:40:32<31:26, 13.87s/it] 99%|█████████▊| 9865/10000 [38:40:46<31:16, 13.90s/it] {'loss': 0.0036, 'learning_rate': 7.400000000000001e-07, 'epoch': 12.91} 99%|█████████▊| 9865/10000 [38:40:46<31:16, 13.90s/it] 99%|█████████▊| 9866/10000 [38:41:00<31:03, 13.91s/it] {'loss': 0.0014, 'learning_rate': 7.350000000000001e-07, 'epoch': 12.91} 99%|█████████▊| 9866/10000 [38:41:00<31:03, 13.91s/it] 99%|█████████▊| 9867/10000 [38:41:14<30:50, 13.91s/it] {'loss': 0.0024, 'learning_rate': 7.3e-07, 'epoch': 12.91} 99%|█████████▊| 9867/10000 [38:41:14<30:50, 13.91s/it] 99%|█████████▊| 9868/10000 [38:41:28<30:33, 13.89s/it] {'loss': 0.0019, 'learning_rate': 7.25e-07, 'epoch': 12.92} 99%|█████████▊| 9868/10000 [38:41:28<30:33, 13.89s/it] 99%|█████████▊| 9869/10000 [38:41:42<30:20, 13.90s/it] {'loss': 0.002, 'learning_rate': 7.2e-07, 'epoch': 12.92} 99%|█████████▊| 9869/10000 [38:41:42<30:20, 13.90s/it] 99%|█████████▊| 9870/10000 [38:41:55<30:04, 13.88s/it] {'loss': 0.0025, 'learning_rate': 7.15e-07, 'epoch': 12.92} 99%|█████████▊| 9870/10000 [38:41:55<30:04, 13.88s/it] 99%|█████████▊| 9871/10000 [38:42:09<29:44, 13.84s/it] {'loss': 0.0019, 'learning_rate': 7.100000000000001e-07, 'epoch': 12.92} 99%|█████████▊| 9871/10000 [38:42:09<29:44, 13.84s/it] 99%|█████████▊| 9872/10000 [38:42:23<29:31, 13.84s/it] {'loss': 0.0021, 'learning_rate': 7.05e-07, 'epoch': 12.92} 99%|█████████▊| 9872/10000 [38:42:23<29:31, 13.84s/it] 99%|█████████▊| 9873/10000 [38:42:37<29:25, 13.90s/it] {'loss': 0.0019, 'learning_rate': 7.000000000000001e-07, 'epoch': 12.92} 99%|█████████▊| 9873/10000 [38:42:37<29:25, 13.90s/it] 99%|█████████▊| 9874/10000 [38:42:51<29:08, 13.88s/it] {'loss': 0.0017, 'learning_rate': 6.95e-07, 'epoch': 12.92} 99%|█████████▊| 9874/10000 [38:42:51<29:08, 13.88s/it] 99%|█████████▉| 9875/10000 [38:43:05<28:52, 13.86s/it] {'loss': 0.0012, 'learning_rate': 6.900000000000001e-07, 'epoch': 12.93} 99%|█████████▉| 9875/10000 [38:43:05<28:52, 13.86s/it] 99%|█████████▉| 9876/10000 [38:43:18<28:35, 13.83s/it] {'loss': 0.0014, 'learning_rate': 6.85e-07, 'epoch': 12.93} 99%|█████████▉| 9876/10000 [38:43:18<28:35, 13.83s/it] 99%|█████████▉| 9877/10000 [38:43:32<28:29, 13.90s/it] {'loss': 0.0035, 'learning_rate': 6.8e-07, 'epoch': 12.93} 99%|█████████▉| 9877/10000 [38:43:33<28:29, 13.90s/it] 99%|█████████▉| 9878/10000 [38:43:46<28:17, 13.91s/it] {'loss': 0.0026, 'learning_rate': 6.75e-07, 'epoch': 12.93} 99%|█████████▉| 9878/10000 [38:43:46<28:17, 13.91s/it] 99%|█████████▉| 9879/10000 [38:44:00<28:01, 13.90s/it] {'loss': 0.0026, 'learning_rate': 6.7e-07, 'epoch': 12.93} 99%|█████████▉| 9879/10000 [38:44:00<28:01, 13.90s/it] 99%|█████████▉| 9880/10000 [38:44:14<27:46, 13.89s/it] {'loss': 0.002, 'learning_rate': 6.65e-07, 'epoch': 12.93} 99%|█████████▉| 9880/10000 [38:44:14<27:46, 13.89s/it] 99%|█████████▉| 9881/10000 [38:44:28<27:34, 13.90s/it] {'loss': 0.0021, 'learning_rate': 6.6e-07, 'epoch': 12.93} 99%|█████████▉| 9881/10000 [38:44:28<27:34, 13.90s/it] 99%|█████████▉| 9882/10000 [38:44:42<27:21, 13.91s/it] {'loss': 0.0029, 'learning_rate': 6.550000000000001e-07, 'epoch': 12.93} 99%|█████████▉| 9882/10000 [38:44:42<27:21, 13.91s/it] 99%|█████████▉| 9883/10000 [38:44:56<27:06, 13.91s/it] {'loss': 0.0014, 'learning_rate': 6.5e-07, 'epoch': 12.94} 99%|█████████▉| 9883/10000 [38:44:56<27:06, 13.91s/it] 99%|█████████▉| 9884/10000 [38:45:10<26:54, 13.92s/it] {'loss': 0.0023, 'learning_rate': 6.450000000000001e-07, 'epoch': 12.94} 99%|█████████▉| 9884/10000 [38:45:10<26:54, 13.92s/it] 99%|█████████▉| 9885/10000 [38:45:24<26:41, 13.93s/it] {'loss': 0.0016, 'learning_rate': 6.4e-07, 'epoch': 12.94} 99%|█████████▉| 9885/10000 [38:45:24<26:41, 13.93s/it] 99%|█████████▉| 9886/10000 [38:45:38<26:24, 13.90s/it] {'loss': 0.0013, 'learning_rate': 6.35e-07, 'epoch': 12.94} 99%|█████████▉| 9886/10000 [38:45:38<26:24, 13.90s/it] 99%|█████████▉| 9887/10000 [38:45:52<26:10, 13.90s/it] {'loss': 0.002, 'learning_rate': 6.3e-07, 'epoch': 12.94} 99%|█████████▉| 9887/10000 [38:45:52<26:10, 13.90s/it] 99%|█████████▉| 9888/10000 [38:46:05<25:54, 13.88s/it] {'loss': 0.0027, 'learning_rate': 6.25e-07, 'epoch': 12.94} 99%|█████████▉| 9888/10000 [38:46:05<25:54, 13.88s/it] 99%|█████████▉| 9889/10000 [38:46:19<25:40, 13.88s/it] {'loss': 0.0019, 'learning_rate': 6.2e-07, 'epoch': 12.94} 99%|█████████▉| 9889/10000 [38:46:19<25:40, 13.88s/it] 99%|█████████▉| 9890/10000 [38:46:33<25:26, 13.87s/it] {'loss': 0.0027, 'learning_rate': 6.15e-07, 'epoch': 12.95} 99%|█████████▉| 9890/10000 [38:46:33<25:26, 13.87s/it] 99%|█████████▉| 9891/10000 [38:46:47<25:12, 13.88s/it] {'loss': 0.0021, 'learning_rate': 6.100000000000001e-07, 'epoch': 12.95} 99%|█████████▉| 9891/10000 [38:46:47<25:12, 13.88s/it] 99%|█████████▉| 9892/10000 [38:47:01<25:02, 13.91s/it] {'loss': 0.0017, 'learning_rate': 6.05e-07, 'epoch': 12.95} 99%|█████████▉| 9892/10000 [38:47:01<25:02, 13.91s/it] 99%|█████████▉| 9893/10000 [38:47:15<24:45, 13.88s/it] {'loss': 0.0018, 'learning_rate': 6.000000000000001e-07, 'epoch': 12.95} 99%|█████████▉| 9893/10000 [38:47:15<24:45, 13.88s/it] 99%|█████████▉| 9894/10000 [38:47:29<24:34, 13.91s/it] {'loss': 0.0023, 'learning_rate': 5.95e-07, 'epoch': 12.95} 99%|█████████▉| 9894/10000 [38:47:29<24:34, 13.91s/it] 99%|█████████▉| 9895/10000 [38:47:43<24:18, 13.89s/it] {'loss': 0.002, 'learning_rate': 5.9e-07, 'epoch': 12.95} 99%|█████████▉| 9895/10000 [38:47:43<24:18, 13.89s/it] 99%|█████████▉| 9896/10000 [38:47:57<24:08, 13.92s/it] {'loss': 0.0017, 'learning_rate': 5.85e-07, 'epoch': 12.95} 99%|█████████▉| 9896/10000 [38:47:57<24:08, 13.92s/it] 99%|█████████▉| 9897/10000 [38:48:11<23:53, 13.91s/it] {'loss': 0.0023, 'learning_rate': 5.8e-07, 'epoch': 12.95} 99%|█████████▉| 9897/10000 [38:48:11<23:53, 13.91s/it] 99%|█████████▉| 9898/10000 [38:48:24<23:39, 13.91s/it] {'loss': 0.0026, 'learning_rate': 5.75e-07, 'epoch': 12.96} 99%|█████████▉| 9898/10000 [38:48:24<23:39, 13.91s/it] 99%|█████████▉| 9899/10000 [38:48:38<23:20, 13.87s/it] {'loss': 0.0024, 'learning_rate': 5.7e-07, 'epoch': 12.96} 99%|█████████▉| 9899/10000 [38:48:38<23:20, 13.87s/it] 99%|█████████▉| 9900/10000 [38:48:52<23:04, 13.84s/it] {'loss': 0.0021, 'learning_rate': 5.65e-07, 'epoch': 12.96} 99%|█████████▉| 9900/10000 [38:48:52<23:04, 13.84s/it] 99%|█████████▉| 9901/10000 [38:49:06<22:51, 13.85s/it] {'loss': 0.0019, 'learning_rate': 5.6e-07, 'epoch': 12.96} 99%|█████████▉| 9901/10000 [38:49:06<22:51, 13.85s/it] 99%|█████████▉| 9902/10000 [38:49:20<22:33, 13.81s/it] {'loss': 0.003, 'learning_rate': 5.550000000000001e-07, 'epoch': 12.96} 99%|█████████▉| 9902/10000 [38:49:20<22:33, 13.81s/it] 99%|█████████▉| 9903/10000 [38:49:33<22:22, 13.84s/it] {'loss': 0.0023, 'learning_rate': 5.5e-07, 'epoch': 12.96} 99%|█████████▉| 9903/10000 [38:49:34<22:22, 13.84s/it] 99%|█████████▉| 9904/10000 [38:49:47<22:09, 13.85s/it] {'loss': 0.0027, 'learning_rate': 5.450000000000001e-07, 'epoch': 12.96} 99%|█████████▉| 9904/10000 [38:49:47<22:09, 13.85s/it] 99%|█████████▉| 9905/10000 [38:50:01<21:57, 13.87s/it] {'loss': 0.0031, 'learning_rate': 5.4e-07, 'epoch': 12.96} 99%|█████████▉| 9905/10000 [38:50:01<21:57, 13.87s/it] 99%|█████████▉| 9906/10000 [38:50:15<21:41, 13.85s/it] {'loss': 0.0026, 'learning_rate': 5.35e-07, 'epoch': 12.97} 99%|█████████▉| 9906/10000 [38:50:15<21:41, 13.85s/it] 99%|█████████▉| 9907/10000 [38:50:29<21:29, 13.86s/it] {'loss': 0.0012, 'learning_rate': 5.3e-07, 'epoch': 12.97} 99%|█████████▉| 9907/10000 [38:50:29<21:29, 13.86s/it] 99%|█████████▉| 9908/10000 [38:50:43<21:15, 13.86s/it] {'loss': 0.0023, 'learning_rate': 5.250000000000001e-07, 'epoch': 12.97} 99%|█████████▉| 9908/10000 [38:50:43<21:15, 13.86s/it] 99%|█████████▉| 9909/10000 [38:50:57<21:02, 13.88s/it] {'loss': 0.0018, 'learning_rate': 5.2e-07, 'epoch': 12.97} 99%|█████████▉| 9909/10000 [38:50:57<21:02, 13.88s/it] 99%|█████████▉| 9910/10000 [38:51:11<20:46, 13.85s/it] {'loss': 0.0021, 'learning_rate': 5.15e-07, 'epoch': 12.97} 99%|█████████▉| 9910/10000 [38:51:11<20:46, 13.85s/it] 99%|█████████▉| 9911/10000 [38:51:24<20:35, 13.88s/it] {'loss': 0.0013, 'learning_rate': 5.100000000000001e-07, 'epoch': 12.97} 99%|█████████▉| 9911/10000 [38:51:25<20:35, 13.88s/it] 99%|█████████▉| 9912/10000 [38:51:38<20:19, 13.85s/it] {'loss': 0.002, 'learning_rate': 5.05e-07, 'epoch': 12.97} 99%|█████████▉| 9912/10000 [38:51:38<20:19, 13.85s/it] 99%|█████████▉| 9913/10000 [38:51:52<20:05, 13.86s/it] {'loss': 0.0024, 'learning_rate': 5.000000000000001e-07, 'epoch': 12.98} 99%|█████████▉| 9913/10000 [38:51:52<20:05, 13.86s/it] 99%|█████████▉| 9914/10000 [38:52:06<19:49, 13.84s/it] {'loss': 0.0034, 'learning_rate': 4.95e-07, 'epoch': 12.98} 99%|█████████▉| 9914/10000 [38:52:06<19:49, 13.84s/it] 99%|█████████▉| 9915/10000 [38:52:20<19:36, 13.84s/it] {'loss': 0.0022, 'learning_rate': 4.9e-07, 'epoch': 12.98} 99%|█████████▉| 9915/10000 [38:52:20<19:36, 13.84s/it] 99%|█████████▉| 9916/10000 [38:52:34<19:23, 13.86s/it] {'loss': 0.0036, 'learning_rate': 4.85e-07, 'epoch': 12.98} 99%|█████████▉| 9916/10000 [38:52:34<19:23, 13.86s/it] 99%|█████████▉| 9917/10000 [38:52:48<19:11, 13.87s/it] {'loss': 0.0016, 'learning_rate': 4.8e-07, 'epoch': 12.98} 99%|█████████▉| 9917/10000 [38:52:48<19:11, 13.87s/it] 99%|█████████▉| 9918/10000 [38:53:01<18:57, 13.87s/it] {'loss': 0.002, 'learning_rate': 4.75e-07, 'epoch': 12.98} 99%|█████████▉| 9918/10000 [38:53:01<18:57, 13.87s/it] 99%|█████████▉| 9919/10000 [38:53:15<18:42, 13.86s/it] {'loss': 0.002, 'learning_rate': 4.7000000000000005e-07, 'epoch': 12.98} 99%|█████████▉| 9919/10000 [38:53:15<18:42, 13.86s/it] 99%|█████████▉| 9920/10000 [38:53:29<18:27, 13.84s/it] {'loss': 0.0025, 'learning_rate': 4.65e-07, 'epoch': 12.98} 99%|█████████▉| 9920/10000 [38:53:29<18:27, 13.84s/it] 99%|█████████▉| 9921/10000 [38:53:43<18:14, 13.85s/it] {'loss': 0.0024, 'learning_rate': 4.6e-07, 'epoch': 12.99} 99%|█████████▉| 9921/10000 [38:53:43<18:14, 13.85s/it] 99%|█████████▉| 9922/10000 [38:53:57<17:59, 13.84s/it] {'loss': 0.0029, 'learning_rate': 4.5500000000000004e-07, 'epoch': 12.99} 99%|█████████▉| 9922/10000 [38:53:57<17:59, 13.84s/it] 99%|█████████▉| 9923/10000 [38:54:11<17:46, 13.85s/it] {'loss': 0.0017, 'learning_rate': 4.5e-07, 'epoch': 12.99} 99%|█████████▉| 9923/10000 [38:54:11<17:46, 13.85s/it] 99%|█████████▉| 9924/10000 [38:54:24<17:30, 13.82s/it] {'loss': 0.0037, 'learning_rate': 4.4500000000000003e-07, 'epoch': 12.99} 99%|█████████▉| 9924/10000 [38:54:24<17:30, 13.82s/it] 99%|█████████▉| 9925/10000 [38:54:38<17:17, 13.83s/it] {'loss': 0.0019, 'learning_rate': 4.4e-07, 'epoch': 12.99} 99%|█████████▉| 9925/10000 [38:54:38<17:17, 13.83s/it] 99%|█████████▉| 9926/10000 [38:54:52<17:06, 13.87s/it] {'loss': 0.0016, 'learning_rate': 4.3499999999999996e-07, 'epoch': 12.99} 99%|█████████▉| 9926/10000 [38:54:52<17:06, 13.87s/it] 99%|█████████▉| 9927/10000 [38:55:06<16:49, 13.83s/it] {'loss': 0.0024, 'learning_rate': 4.3e-07, 'epoch': 12.99} 99%|█████████▉| 9927/10000 [38:55:06<16:49, 13.83s/it] 99%|█████████▉| 9928/10000 [38:55:20<16:34, 13.81s/it] {'loss': 0.0013, 'learning_rate': 4.2500000000000006e-07, 'epoch': 12.99} 99%|█████████▉| 9928/10000 [38:55:20<16:34, 13.81s/it] 99%|█████████▉| 9929/10000 [38:55:34<16:20, 13.82s/it] {'loss': 0.0019, 'learning_rate': 4.2e-07, 'epoch': 13.0} 99%|█████████▉| 9929/10000 [38:55:34<16:20, 13.82s/it] 99%|█████████▉| 9930/10000 [38:55:47<16:08, 13.84s/it] {'loss': 0.0044, 'learning_rate': 4.1500000000000005e-07, 'epoch': 13.0} 99%|█████████▉| 9930/10000 [38:55:47<16:08, 13.84s/it] 99%|█████████▉| 9931/10000 [38:56:01<15:55, 13.85s/it] {'loss': 0.0021, 'learning_rate': 4.1000000000000004e-07, 'epoch': 13.0} 99%|█████████▉| 9931/10000 [38:56:01<15:55, 13.85s/it] 99%|█████████▉| 9932/10000 [38:56:14<15:14, 13.44s/it] {'loss': 0.0016, 'learning_rate': 4.05e-07, 'epoch': 13.0} 99%|█████████▉| 9932/10000 [38:56:14<15:14, 13.44s/it] 99%|█████████▉| 9933/10000 [38:56:28<15:09, 13.57s/it] {'loss': 0.0014, 'learning_rate': 4.0000000000000003e-07, 'epoch': 13.0} 99%|█████████▉| 9933/10000 [38:56:28<15:09, 13.57s/it] 99%|█████████▉| 9934/10000 [38:56:42<15:01, 13.66s/it] {'loss': 0.0014, 'learning_rate': 3.950000000000001e-07, 'epoch': 13.0} 99%|█████████▉| 9934/10000 [38:56:42<15:01, 13.66s/it] 99%|█████████▉| 9935/10000 [38:56:55<14:50, 13.70s/it] {'loss': 0.0022, 'learning_rate': 3.9e-07, 'epoch': 13.0} 99%|█████████▉| 9935/10000 [38:56:55<14:50, 13.70s/it] 99%|█████████▉| 9936/10000 [38:57:09<14:39, 13.75s/it] {'loss': 0.0017, 'learning_rate': 3.85e-07, 'epoch': 13.01} 99%|█████████▉| 9936/10000 [38:57:09<14:39, 13.75s/it] 99%|█████████▉| 9937/10000 [38:57:23<14:30, 13.82s/it] {'loss': 0.0011, 'learning_rate': 3.8e-07, 'epoch': 13.01} 99%|█████████▉| 9937/10000 [38:57:23<14:30, 13.82s/it] 99%|█████████▉| 9938/10000 [38:57:37<14:16, 13.82s/it] {'loss': 0.0016, 'learning_rate': 3.75e-07, 'epoch': 13.01} 99%|█████████▉| 9938/10000 [38:57:37<14:16, 13.82s/it] 99%|█████████▉| 9939/10000 [38:57:51<14:04, 13.84s/it] {'loss': 0.0016, 'learning_rate': 3.7000000000000006e-07, 'epoch': 13.01} 99%|█████████▉| 9939/10000 [38:57:51<14:04, 13.84s/it] 99%|█████████▉| 9940/10000 [38:58:05<13:51, 13.85s/it] {'loss': 0.0016, 'learning_rate': 3.65e-07, 'epoch': 13.01} 99%|█████████▉| 9940/10000 [38:58:05<13:51, 13.85s/it] 99%|█████████▉| 9941/10000 [38:58:19<13:36, 13.84s/it] {'loss': 0.0021, 'learning_rate': 3.6e-07, 'epoch': 13.01} 99%|█████████▉| 9941/10000 [38:58:19<13:36, 13.84s/it] 99%|█████████▉| 9942/10000 [38:58:33<13:24, 13.87s/it] {'loss': 0.0034, 'learning_rate': 3.5500000000000004e-07, 'epoch': 13.01} 99%|█████████▉| 9942/10000 [38:58:33<13:24, 13.87s/it] 99%|█████████▉| 9943/10000 [38:58:46<13:10, 13.87s/it] {'loss': 0.0014, 'learning_rate': 3.5000000000000004e-07, 'epoch': 13.01} 99%|█████████▉| 9943/10000 [38:58:46<13:10, 13.87s/it] 99%|█████████▉| 9944/10000 [38:59:00<12:56, 13.87s/it] {'loss': 0.0021, 'learning_rate': 3.4500000000000003e-07, 'epoch': 13.02} 99%|█████████▉| 9944/10000 [38:59:00<12:56, 13.87s/it] 99%|█████████▉| 9945/10000 [38:59:14<12:41, 13.85s/it] {'loss': 0.0022, 'learning_rate': 3.4e-07, 'epoch': 13.02} 99%|█████████▉| 9945/10000 [38:59:14<12:41, 13.85s/it] 99%|█████████▉| 9946/10000 [38:59:28<12:27, 13.85s/it] {'loss': 0.0012, 'learning_rate': 3.35e-07, 'epoch': 13.02} 99%|█████████▉| 9946/10000 [38:59:28<12:27, 13.85s/it] 99%|█████████▉| 9947/10000 [38:59:42<12:12, 13.83s/it] {'loss': 0.002, 'learning_rate': 3.3e-07, 'epoch': 13.02} 99%|█████████▉| 9947/10000 [38:59:42<12:12, 13.83s/it] 99%|█████████▉| 9948/10000 [38:59:56<11:59, 13.84s/it] {'loss': 0.0025, 'learning_rate': 3.25e-07, 'epoch': 13.02} 99%|█████████▉| 9948/10000 [38:59:56<11:59, 13.84s/it] 99%|█████████▉| 9949/10000 [39:00:09<11:46, 13.85s/it] {'loss': 0.0016, 'learning_rate': 3.2e-07, 'epoch': 13.02} 99%|█████████▉| 9949/10000 [39:00:09<11:46, 13.85s/it] 100%|█████████▉| 9950/10000 [39:00:23<11:33, 13.86s/it] {'loss': 0.0015, 'learning_rate': 3.15e-07, 'epoch': 13.02} 100%|█████████▉| 9950/10000 [39:00:23<11:33, 13.86s/it] 100%|█████████▉| 9951/10000 [39:00:37<11:17, 13.83s/it] {'loss': 0.0014, 'learning_rate': 3.1e-07, 'epoch': 13.02} 100%|█████████▉| 9951/10000 [39:00:37<11:17, 13.83s/it] 100%|█████████▉| 9952/10000 [39:00:51<11:04, 13.84s/it] {'loss': 0.0018, 'learning_rate': 3.0500000000000004e-07, 'epoch': 13.03} 100%|█████████▉| 9952/10000 [39:00:51<11:04, 13.84s/it] 100%|█████████▉| 9953/10000 [39:01:05<10:50, 13.84s/it] {'loss': 0.0022, 'learning_rate': 3.0000000000000004e-07, 'epoch': 13.03} 100%|█████████▉| 9953/10000 [39:01:05<10:50, 13.84s/it] 100%|█████████▉| 9954/10000 [39:01:19<10:37, 13.85s/it] {'loss': 0.0024, 'learning_rate': 2.95e-07, 'epoch': 13.03} 100%|█████████▉| 9954/10000 [39:01:19<10:37, 13.85s/it] 100%|█████████▉| 9955/10000 [39:01:33<10:24, 13.87s/it] {'loss': 0.0017, 'learning_rate': 2.9e-07, 'epoch': 13.03} 100%|█████████▉| 9955/10000 [39:01:33<10:24, 13.87s/it] 100%|█████████▉| 9956/10000 [39:01:46<10:10, 13.88s/it] {'loss': 0.003, 'learning_rate': 2.85e-07, 'epoch': 13.03} 100%|█████████▉| 9956/10000 [39:01:47<10:10, 13.88s/it] 100%|█████████▉| 9957/10000 [39:02:00<09:56, 13.88s/it] {'loss': 0.0026, 'learning_rate': 2.8e-07, 'epoch': 13.03} 100%|█████████▉| 9957/10000 [39:02:00<09:56, 13.88s/it] 100%|█████████▉| 9958/10000 [39:02:14<09:43, 13.90s/it] {'loss': 0.0019, 'learning_rate': 2.75e-07, 'epoch': 13.03} 100%|█████████▉| 9958/10000 [39:02:14<09:43, 13.90s/it] 100%|█████████▉| 9959/10000 [39:02:28<09:29, 13.88s/it] {'loss': 0.0021, 'learning_rate': 2.7e-07, 'epoch': 13.04} 100%|█████████▉| 9959/10000 [39:02:28<09:29, 13.88s/it] 100%|█████████▉| 9960/10000 [39:02:42<09:14, 13.86s/it] {'loss': 0.0039, 'learning_rate': 2.65e-07, 'epoch': 13.04} 100%|█████████▉| 9960/10000 [39:02:42<09:14, 13.86s/it] 100%|█████████▉| 9961/10000 [39:02:56<08:59, 13.84s/it] {'loss': 0.0017, 'learning_rate': 2.6e-07, 'epoch': 13.04} 100%|█████████▉| 9961/10000 [39:02:56<08:59, 13.84s/it] 100%|█████████▉| 9962/10000 [39:03:10<08:46, 13.85s/it] {'loss': 0.0019, 'learning_rate': 2.5500000000000005e-07, 'epoch': 13.04} 100%|█████████▉| 9962/10000 [39:03:10<08:46, 13.85s/it] 100%|█████████▉| 9963/10000 [39:03:23<08:32, 13.86s/it] {'loss': 0.0011, 'learning_rate': 2.5000000000000004e-07, 'epoch': 13.04} 100%|█████████▉| 9963/10000 [39:03:24<08:32, 13.86s/it] 100%|█████████▉| 9964/10000 [39:03:37<08:18, 13.86s/it] {'loss': 0.0023, 'learning_rate': 2.45e-07, 'epoch': 13.04} 100%|█████████▉| 9964/10000 [39:03:37<08:18, 13.86s/it] 100%|█████████▉| 9965/10000 [39:03:51<08:04, 13.85s/it] {'loss': 0.0027, 'learning_rate': 2.4e-07, 'epoch': 13.04} 100%|█████████▉| 9965/10000 [39:03:51<08:04, 13.85s/it] 100%|█████████▉| 9966/10000 [39:04:05<07:51, 13.88s/it] {'loss': 0.0014, 'learning_rate': 2.3500000000000003e-07, 'epoch': 13.04} 100%|█████████▉| 9966/10000 [39:04:05<07:51, 13.88s/it] 100%|█████████▉| 9967/10000 [39:04:19<07:37, 13.85s/it] {'loss': 0.0024, 'learning_rate': 2.3e-07, 'epoch': 13.05} 100%|█████████▉| 9967/10000 [39:04:19<07:37, 13.85s/it] 100%|█████████▉| 9968/10000 [39:04:33<07:23, 13.86s/it] {'loss': 0.0011, 'learning_rate': 2.25e-07, 'epoch': 13.05} 100%|█████████▉| 9968/10000 [39:04:33<07:23, 13.86s/it] 100%|█████████▉| 9969/10000 [39:04:47<07:11, 13.90s/it] {'loss': 0.0018, 'learning_rate': 2.2e-07, 'epoch': 13.05} 100%|█████████▉| 9969/10000 [39:04:47<07:11, 13.90s/it] 100%|█████████▉| 9970/10000 [39:05:01<06:58, 13.95s/it] {'loss': 0.0019, 'learning_rate': 2.15e-07, 'epoch': 13.05} 100%|█████████▉| 9970/10000 [39:05:01<06:58, 13.95s/it] 100%|█████████▉| 9971/10000 [39:05:15<06:44, 13.93s/it] {'loss': 0.0029, 'learning_rate': 2.1e-07, 'epoch': 13.05} 100%|█████████▉| 9971/10000 [39:05:15<06:44, 13.93s/it] 100%|█████████▉| 9972/10000 [39:05:29<06:29, 13.92s/it] {'loss': 0.0014, 'learning_rate': 2.0500000000000002e-07, 'epoch': 13.05} 100%|█████████▉| 9972/10000 [39:05:29<06:29, 13.92s/it] 100%|█████████▉| 9973/10000 [39:05:42<06:15, 13.89s/it] {'loss': 0.0014, 'learning_rate': 2.0000000000000002e-07, 'epoch': 13.05} 100%|█████████▉| 9973/10000 [39:05:43<06:15, 13.89s/it] 100%|█████████▉| 9974/10000 [39:05:56<06:01, 13.90s/it] {'loss': 0.0025, 'learning_rate': 1.95e-07, 'epoch': 13.05} 100%|█████████▉| 9974/10000 [39:05:56<06:01, 13.90s/it] 100%|█████████▉| 9975/10000 [39:06:10<05:47, 13.89s/it] {'loss': 0.0013, 'learning_rate': 1.9e-07, 'epoch': 13.06} 100%|█████████▉| 9975/10000 [39:06:10<05:47, 13.89s/it] 100%|█████████▉| 9976/10000 [39:06:24<05:32, 13.87s/it] {'loss': 0.0022, 'learning_rate': 1.8500000000000003e-07, 'epoch': 13.06} 100%|█████████▉| 9976/10000 [39:06:24<05:32, 13.87s/it] 100%|█████████▉| 9977/10000 [39:06:38<05:19, 13.87s/it] {'loss': 0.0017, 'learning_rate': 1.8e-07, 'epoch': 13.06} 100%|█████████▉| 9977/10000 [39:06:38<05:19, 13.87s/it] 100%|█████████▉| 9978/10000 [39:06:52<05:05, 13.87s/it] {'loss': 0.0026, 'learning_rate': 1.7500000000000002e-07, 'epoch': 13.06} 100%|█████████▉| 9978/10000 [39:06:52<05:05, 13.87s/it] 100%|█████████▉| 9979/10000 [39:07:06<04:50, 13.84s/it] {'loss': 0.0016, 'learning_rate': 1.7e-07, 'epoch': 13.06} 100%|█████████▉| 9979/10000 [39:07:06<04:50, 13.84s/it] 100%|█████████▉| 9980/10000 [39:07:19<04:37, 13.86s/it] {'loss': 0.0019, 'learning_rate': 1.65e-07, 'epoch': 13.06} 100%|█████████▉| 9980/10000 [39:07:20<04:37, 13.86s/it] 100%|█████████▉| 9981/10000 [39:07:33<04:22, 13.83s/it] {'loss': 0.0021, 'learning_rate': 1.6e-07, 'epoch': 13.06} 100%|█████████▉| 9981/10000 [39:07:33<04:22, 13.83s/it] 100%|█████████▉| 9982/10000 [39:07:47<04:09, 13.85s/it] {'loss': 0.003, 'learning_rate': 1.55e-07, 'epoch': 13.07} 100%|█████████▉| 9982/10000 [39:07:47<04:09, 13.85s/it] 100%|█████████▉| 9983/10000 [39:08:01<03:56, 13.89s/it] {'loss': 0.0017, 'learning_rate': 1.5000000000000002e-07, 'epoch': 13.07} 100%|█████████▉| 9983/10000 [39:08:01<03:56, 13.89s/it] 100%|█████████▉| 9984/10000 [39:08:15<03:41, 13.87s/it] {'loss': 0.0018, 'learning_rate': 1.45e-07, 'epoch': 13.07} 100%|█████████▉| 9984/10000 [39:08:15<03:41, 13.87s/it] 100%|█████████▉| 9985/10000 [39:08:29<03:27, 13.84s/it] {'loss': 0.0018, 'learning_rate': 1.4e-07, 'epoch': 13.07} 100%|█████████▉| 9985/10000 [39:08:29<03:27, 13.84s/it] 100%|█████████▉| 9986/10000 [39:08:43<03:13, 13.85s/it] {'loss': 0.0012, 'learning_rate': 1.35e-07, 'epoch': 13.07} 100%|█████████▉| 9986/10000 [39:08:43<03:13, 13.85s/it] 100%|█████████▉| 9987/10000 [39:08:56<02:59, 13.82s/it] {'loss': 0.0029, 'learning_rate': 1.3e-07, 'epoch': 13.07} 100%|█████████▉| 9987/10000 [39:08:56<02:59, 13.82s/it] 100%|█████████▉| 9988/10000 [39:09:10<02:46, 13.87s/it] {'loss': 0.0019, 'learning_rate': 1.2500000000000002e-07, 'epoch': 13.07} 100%|█████████▉| 9988/10000 [39:09:10<02:46, 13.87s/it] 100%|█████████▉| 9989/10000 [39:09:24<02:32, 13.86s/it] {'loss': 0.0021, 'learning_rate': 1.2e-07, 'epoch': 13.07} 100%|█████████▉| 9989/10000 [39:09:24<02:32, 13.86s/it] 100%|█████████▉| 9990/10000 [39:09:38<02:18, 13.87s/it] {'loss': 0.002, 'learning_rate': 1.15e-07, 'epoch': 13.08} 100%|█████████▉| 9990/10000 [39:09:38<02:18, 13.87s/it] 100%|█████████▉| 9991/10000 [39:09:52<02:04, 13.85s/it] {'loss': 0.0013, 'learning_rate': 1.1e-07, 'epoch': 13.08} 100%|█████████▉| 9991/10000 [39:09:52<02:04, 13.85s/it] 100%|█████████▉| 9992/10000 [39:10:06<01:50, 13.85s/it] {'loss': 0.0006, 'learning_rate': 1.05e-07, 'epoch': 13.08} 100%|█████████▉| 9992/10000 [39:10:06<01:50, 13.85s/it] 100%|█████████▉| 9993/10000 [39:10:20<01:37, 13.87s/it] {'loss': 0.0023, 'learning_rate': 1.0000000000000001e-07, 'epoch': 13.08} 100%|█████████▉| 9993/10000 [39:10:20<01:37, 13.87s/it] 100%|█████████▉| 9994/10000 [39:10:33<01:23, 13.85s/it] {'loss': 0.0017, 'learning_rate': 9.5e-08, 'epoch': 13.08} 100%|█████████▉| 9994/10000 [39:10:33<01:23, 13.85s/it] 100%|█████████▉| 9995/10000 [39:10:47<01:09, 13.82s/it] {'loss': 0.0024, 'learning_rate': 9e-08, 'epoch': 13.08} 100%|█████████▉| 9995/10000 [39:10:47<01:09, 13.82s/it] 100%|█████████▉| 9996/10000 [39:11:01<00:55, 13.85s/it] {'loss': 0.0024, 'learning_rate': 8.5e-08, 'epoch': 13.08} 100%|█████████▉| 9996/10000 [39:11:01<00:55, 13.85s/it] 100%|█████████▉| 9997/10000 [39:11:15<00:41, 13.85s/it] {'loss': 0.0011, 'learning_rate': 8e-08, 'epoch': 13.09} 100%|█████████▉| 9997/10000 [39:11:15<00:41, 13.85s/it] 100%|█████████▉| 9998/10000 [39:11:29<00:27, 13.84s/it] {'loss': 0.0015, 'learning_rate': 7.500000000000001e-08, 'epoch': 13.09} 100%|█████████▉| 9998/10000 [39:11:29<00:27, 13.84s/it] 100%|█████████▉| 9999/10000 [39:11:43<00:13, 13.86s/it] {'loss': 0.0018, 'learning_rate': 7e-08, 'epoch': 13.09} 100%|█████████▉| 9999/10000 [39:11:43<00:13, 13.86s/it] 100%|██████████| 10000/10000 [39:11:56<00:00, 13.86s/it] {'loss': 0.0033, 'learning_rate': 6.5e-08, 'epoch': 13.09} 100%|██████████| 10000/10000 [39:11:57<00:00, 13.86s/it]Saving the whole model [INFO|configuration_utils.py:458] 2024-11-05 11:30:04,834 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-10000/config.json [INFO|configuration_utils.py:364] 2024-11-05 11:30:04,836 >> Configuration saved in output/echo28-20241103-201128-1e-4/checkpoint-10000/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-05 11:30:54,955 >> Model weights saved in output/echo28-20241103-201128-1e-4/checkpoint-10000/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-05 11:30:54,957 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/checkpoint-10000/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-05 11:30:54,959 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/checkpoint-10000/special_tokens_map.json [2024-11-05 11:30:54,969] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is about to be saved! [2024-11-05 11:30:54,993] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt [2024-11-05 11:30:54,993] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt... [2024-11-05 11:31:44,593] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/mp_rank_00_model_states.pt. [2024-11-05 11:31:44,695] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-11-05 11:33:27,074] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-11-05 11:33:27,150] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved output/echo28-20241103-201128-1e-4/checkpoint-10000/global_step10000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-11-05 11:33:27,150] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step10000 is ready now! [INFO|trainer.py:2053] 2024-11-05 11:33:30,449 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 141322.6962, 'train_samples_per_second': 4.529, 'train_steps_per_second': 0.071, 'train_loss': 0.17524216154664754, 'epoch': 13.09} 100%|██████████| 10000/10000 [39:15:22<00:00, 13.86s/it] 100%|██████████| 10000/10000 [39:15:22<00:00, 14.13s/it] Saving the whole model [INFO|configuration_utils.py:458] 2024-11-05 11:33:30,456 >> Configuration saved in output/echo28-20241103-201128-1e-4/config.json [INFO|configuration_utils.py:364] 2024-11-05 11:33:30,458 >> Configuration saved in output/echo28-20241103-201128-1e-4/generation_config.json [INFO|modeling_utils.py:1853] 2024-11-05 11:34:08,024 >> Model weights saved in output/echo28-20241103-201128-1e-4/pytorch_model.bin [INFO|tokenization_utils_base.py:2194] 2024-11-05 11:34:08,027 >> tokenizer config file saved in output/echo28-20241103-201128-1e-4/tokenizer_config.json [INFO|tokenization_utils_base.py:2201] 2024-11-05 11:34:08,028 >> Special tokens file saved in output/echo28-20241103-201128-1e-4/special_tokens_map.json