Update README.md
Browse files
README.md
CHANGED
|
@@ -21,13 +21,13 @@ datasets:
|
|
| 21 |
|
| 22 |
# NeuralHermes 2.5 - Mistral 7B
|
| 23 |
|
| 24 |
-
NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
|
| 25 |
|
| 26 |
-
It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
|
| 27 |
|
| 28 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
| 29 |
|
| 30 |
-
GGUF
|
| 31 |
|
| 32 |
## Results
|
| 33 |
|
|
@@ -38,7 +38,7 @@ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **
|
|
| 38 |
### AGIEval
|
| 39 |

|
| 40 |
|
| 41 |
-
### GPT4All
|
| 42 |

|
| 43 |
|
| 44 |
### TruthfulQA
|
|
@@ -87,24 +87,24 @@ print(sequences[0]['generated_text'])
|
|
| 87 |
## Training hyperparameters
|
| 88 |
|
| 89 |
**LoRA**:
|
| 90 |
-
* r=16
|
| 91 |
-
* lora_alpha=16
|
| 92 |
-
* lora_dropout=0.05
|
| 93 |
-
* bias="none"
|
| 94 |
-
* task_type="CAUSAL_LM"
|
| 95 |
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
| 96 |
|
| 97 |
**Training arguments**:
|
| 98 |
-
* per_device_train_batch_size=4
|
| 99 |
-
* gradient_accumulation_steps=4
|
| 100 |
-
* gradient_checkpointing=True
|
| 101 |
-
* learning_rate=5e-5
|
| 102 |
-
* lr_scheduler_type="cosine"
|
| 103 |
-
* max_steps=200
|
| 104 |
-
* optim="paged_adamw_32bit"
|
| 105 |
-
* warmup_steps=100
|
| 106 |
|
| 107 |
**DPOTrainer**:
|
| 108 |
-
* beta=0.1
|
| 109 |
-
* max_prompt_length=1024
|
| 110 |
-
* max_length=1536
|
|
|
|
| 21 |
|
| 22 |
# NeuralHermes 2.5 - Mistral 7B
|
| 23 |
|
| 24 |
+
NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results)
|
| 25 |
|
| 26 |
+
It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
|
| 27 |
|
| 28 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
| 29 |
|
| 30 |
+
🤗 GGUF: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
|
| 31 |
|
| 32 |
## Results
|
| 33 |
|
|
|
|
| 38 |
### AGIEval
|
| 39 |

|
| 40 |
|
| 41 |
+
### GPT4All
|
| 42 |

|
| 43 |
|
| 44 |
### TruthfulQA
|
|
|
|
| 87 |
## Training hyperparameters
|
| 88 |
|
| 89 |
**LoRA**:
|
| 90 |
+
* r=16
|
| 91 |
+
* lora_alpha=16
|
| 92 |
+
* lora_dropout=0.05
|
| 93 |
+
* bias="none"
|
| 94 |
+
* task_type="CAUSAL_LM"
|
| 95 |
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
| 96 |
|
| 97 |
**Training arguments**:
|
| 98 |
+
* per_device_train_batch_size=4
|
| 99 |
+
* gradient_accumulation_steps=4
|
| 100 |
+
* gradient_checkpointing=True
|
| 101 |
+
* learning_rate=5e-5
|
| 102 |
+
* lr_scheduler_type="cosine"
|
| 103 |
+
* max_steps=200
|
| 104 |
+
* optim="paged_adamw_32bit"
|
| 105 |
+
* warmup_steps=100
|
| 106 |
|
| 107 |
**DPOTrainer**:
|
| 108 |
+
* beta=0.1
|
| 109 |
+
* max_prompt_length=1024
|
| 110 |
+
* max_length=1536
|