ggbetz commited on
Commit
c757d9e
·
verified ·
1 Parent(s): 638e023

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -22
README.md CHANGED
@@ -1,36 +1,92 @@
1
  ---
2
- base_model: unsloth/phi-4
 
 
 
 
 
 
 
 
3
  library_name: transformers
4
- model_name: Phi-4-Argunaut-1-SFT-dev0
5
  tags:
6
- - generated_from_trainer
 
 
 
7
  - trl
8
  - sft
9
- licence: license
10
  ---
11
 
12
- # Model Card for Phi-4-Argunaut-1-SFT-dev0
 
13
 
14
  This model is a fine-tuned version of [unsloth/phi-4](https://huggingface.co/unsloth/phi-4).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
 
 
 
 
 
17
  ## Quick start
18
 
19
  ```python
20
  from transformers import pipeline
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="DebateLabKIT/Phi-4-Argunaut-1-SFT-dev0", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ggbetz/argunauts-training/runs/4b99kqwz)
31
 
 
 
 
32
 
33
- This model was trained with SFT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ### Framework versions
36
 
@@ -40,19 +96,15 @@ This model was trained with SFT.
40
  - Datasets: 3.1.0
41
  - Tokenizers: 0.20.3
42
 
43
- ## Citations
 
 
 
 
 
 
 
 
44
 
45
 
46
 
47
- Cite TRL as:
48
-
49
- ```bibtex
50
- @misc{vonwerra2022trl,
51
- title = {{TRL: Transformer Reinforcement Learning}},
52
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
53
- year = 2020,
54
- journal = {GitHub repository},
55
- publisher = {GitHub},
56
- howpublished = {\url{https://github.com/huggingface/trl}}
57
- }
58
- ```
 
1
  ---
2
+ model_name: Phi-4-Argunaut-1-SFT
3
+ license: mit
4
+ datasets:
5
+ - DebateLabKIT/deepa2-conversations
6
+ - DebateLabKIT/deep-argmap-conversations
7
+ - allenai/tulu-3-sft-mixture
8
+ base_model:
9
+ - unsloth/phi-4
10
+ pipeline_tag: text-generation
11
  library_name: transformers
 
12
  tags:
13
+ - logic
14
+ - argumentation
15
+ - critical-thinking
16
+ - argument-mapping
17
  - trl
18
  - sft
 
19
  ---
20
 
21
+
22
+ # Model Card for Phi-4-Argunaut-1-SFT
23
 
24
  This model is a fine-tuned version of [unsloth/phi-4](https://huggingface.co/unsloth/phi-4).
25
  It has been trained using [TRL](https://github.com/huggingface/trl).
26
 
27
+ 📘 [HF Blog Article](https://huggingface.co/blog/ggbetz/argunauts-phase-1)
28
+
29
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ggbetz/Argunauts-1/runs/4b99kqwz/overview)
30
+
31
+
32
  ## Quick start
33
 
34
  ```python
35
  from transformers import pipeline
36
 
37
+ question = "Are you familiar with Argdown syntax? What's its purpose?"
38
+ generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT", device="cuda")
39
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
40
  print(output["generated_text"])
41
  ```
42
 
43
+ ## Evaluation
44
+
45
+
46
+ ### Chat Experience
47
+
48
+ _coming soon_
49
+
50
+ ### Metrics
51
+
52
+ _coming soon_
53
+
54
+
55
+ ## SFT dataset mixture
56
+
57
+ |Dataset|Weight (examples)|Weight (tokens)|
58
+ |:------|:----:|:----:|
59
+ |DebateLabKIT/deepa2-conversations|25%|49%|
60
+ |DebateLabKIT/deep-argmap-conversations|25%|18%|
61
+ |allenai/tulu-3-sft-mixture|50%|33%|
62
+
63
+
64
  ## Training procedure
65
 
66
+ Trained with SFT on **1M examples** and for 1 epoch with
67
 
68
+ * context length 8196
69
+ * packing (trl implementation)
70
+ * *spectrum* (top 50 percent)
71
 
72
+ ```yaml
73
+ # Training parameters
74
+ num_train_epochs: 1
75
+ per_device_train_batch_size: 2
76
+ gradient_accumulation_steps: 8
77
+ gradient_checkpointing: true
78
+ gradient_checkpointing_kwargs:
79
+ use_reentrant: false
80
+ learning_rate: 2.0e-6
81
+ lr_scheduler_type: cosine
82
+ warmup_ratio: 0.1
83
+ ```
84
+
85
+ Hardware: 4 x H100 GPUs.
86
+
87
+ _This work was performed on the HoreKa supercomputer funded by the
88
+ Ministry of Science, Research and the Arts Baden-Württemberg and by
89
+ the Federal Ministry of Education and Research._
90
 
91
  ### Framework versions
92
 
 
96
  - Datasets: 3.1.0
97
  - Tokenizers: 0.20.3
98
 
99
+ ## Credits
100
+
101
+ This work wouldn't be possible without all the **great contributions from the open LLM community**. Thank you! Special kudos go to
102
+
103
+ - @philschmid for his latest [fine-tuning boilerplate](https://www.philschmid.de/fine-tune-llms-in-2025)
104
+ - @lvwerra, @lewtun et al for building and maintaining [trl](https://github.com/huggingface/trl)
105
+ - @cognitivecomputations for sharing [spectrum](https://github.com/cognitivecomputations/spectrum/tree/main)
106
+ - @allenai for releasing [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture)
107
+ - @microsoft-research for building and @unsloth for recasting [phi-4](https://huggingface.co/microsoft/phi-4)
108
 
109
 
110