FINGU-AI
/

Fingu-instruct-3

@@ -251,140 +251,6 @@ You can finetune this model on your own dataset.
 *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
 -->
-## Training Details
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `eval_strategy`: steps
-- `per_device_eval_batch_size`: 4
-- `gradient_accumulation_steps`: 4
-- `learning_rate`: 2e-05
-- `max_steps`: 1500
-- `lr_scheduler_type`: cosine
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 5
-- `bf16`: True
-- `tf32`: True
-- `optim`: adamw_torch_fused
-- `gradient_checkpointing`: True
-- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
-- `batch_sampler`: no_duplicates
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: False
-- `do_predict`: False
-- `eval_strategy`: steps
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 8
-- `per_device_eval_batch_size`: 4
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 4
-- `eval_accumulation_steps`: None
-- `learning_rate`: 2e-05
-- `weight_decay`: 0.0
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 3.0
-- `max_steps`: 1500
-- `lr_scheduler_type`: cosine
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 5
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: True
-- `fp16`: False
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: True
-- `local_rank`: 0
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: True
-- `dataloader_num_workers`: 0
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: False
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch_fused
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: False
-- `resume_from_checkpoint`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_private_repo`: False
-- `hub_always_push`: False
-- `gradient_checkpointing`: True
-- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
-- `include_inputs_for_metrics`: False
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `dispatch_batches`: None
-- `split_batches`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: proportional
-</details>
 ### Training Logs
 | Epoch  | Step | Training Loss | retrival loss |
 |:------:|:----:|:-------------:|:-------------:|
@@ -393,31 +259,8 @@ You can finetune this model on your own dataset.
 | 1.9399 | 1500 | 0.0029        | 0.0039        |
-### Framework Versions
-- Python: 3.10.12
-- Sentence Transformers: 3.0.1
-- Transformers: 4.41.2
-- PyTorch: 2.2.0+cu121
-- Accelerate: 0.32.1
-- Datasets: 2.20.0
-- Tokenizers: 0.19.1
-## Citation
-### BibTeX
-#### Sentence Transformers
-```bibtex
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://arxiv.org/abs/1908.10084",
-}
-```
 <!--
 ## Glossary

 *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
 -->
 ### Training Logs
 | Epoch  | Step | Training Loss | retrival loss |
 |:------:|:----:|:-------------:|:-------------:|
 | 1.9399 | 1500 | 0.0029        | 0.0039        |
 <!--
 ## Glossary