DebateLabKIT
/

Phi-4-Argunaut-1-SFT

@@ -1,36 +1,92 @@
 ---
-base_model: unsloth/phi-4
 library_name: transformers
-model_name: Phi-4-Argunaut-1-SFT-dev0
 tags:
-- generated_from_trainer
 - trl
 - sft
-licence: license
 ---
-# Model Card for Phi-4-Argunaut-1-SFT-dev0
 This model is a fine-tuned version of [unsloth/phi-4](https://huggingface.co/unsloth/phi-4).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="DebateLabKIT/Phi-4-Argunaut-1-SFT-dev0", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ggbetz/argunauts-training/runs/4b99kqwz)
-This model was trained with SFT.
 ### Framework versions
@@ -40,19 +96,15 @@ This model was trained with SFT.
 - Datasets: 3.1.0
 - Tokenizers: 0.20.3
-## Citations
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
-}
-```

 ---
+model_name: Phi-4-Argunaut-1-SFT
+license: mit
+datasets:
+- DebateLabKIT/deepa2-conversations
+- DebateLabKIT/deep-argmap-conversations
+- allenai/tulu-3-sft-mixture
+base_model:
+- unsloth/phi-4
+pipeline_tag: text-generation
 library_name: transformers
 tags:
+- logic
+- argumentation
+- critical-thinking
+- argument-mapping
 - trl
 - sft
 ---
+# Model Card for Phi-4-Argunaut-1-SFT
 This model is a fine-tuned version of [unsloth/phi-4](https://huggingface.co/unsloth/phi-4).
 It has been trained using [TRL](https://github.com/huggingface/trl).
+📘 [HF Blog Article](https://huggingface.co/blog/ggbetz/argunauts-phase-1)
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ggbetz/Argunauts-1/runs/4b99kqwz/overview)
 ## Quick start
 ```python
 from transformers import pipeline
+question = "Are you familiar with Argdown syntax? What's its purpose?"
+generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
+## Evaluation
+### Chat Experience
+_coming soon_
+### Metrics
+_coming soon_
+## SFT dataset mixture
+|Dataset|Weight (examples)|Weight (tokens)|
+|:------|:----:|:----:|
+|DebateLabKIT/deepa2-conversations|25%|49%|
+|DebateLabKIT/deep-argmap-conversations|25%|18%|
+|allenai/tulu-3-sft-mixture|50%|33%|
 ## Training procedure
+Trained with SFT on **1M examples** and for 1 epoch with
+* context length 8196
+* packing (trl implementation)
+* *spectrum* (top 50 percent)
+```yaml
+# Training parameters
+num_train_epochs: 1
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 8
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+learning_rate: 2.0e-6
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+```
+Hardware: 4 x H100 GPUs.
+_This work was performed on the HoreKa supercomputer funded by the
+Ministry of Science, Research and the Arts Baden-Württemberg and by
+the Federal Ministry of Education and Research._
 ### Framework versions
 - Datasets: 3.1.0
 - Tokenizers: 0.20.3
+## Credits
+This work wouldn't be possible without all the **great contributions from the open LLM community**. Thank you! Special kudos go to
+- @philschmid for his latest [fine-tuning boilerplate](https://www.philschmid.de/fine-tune-llms-in-2025)
+- @lvwerra, @lewtun et al for building and maintaining [trl](https://github.com/huggingface/trl)
+- @cognitivecomputations for sharing [spectrum](https://github.com/cognitivecomputations/spectrum/tree/main)
+- @allenai for releasing [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture)
+- @microsoft-research for building and @unsloth for recasting [phi-4](https://huggingface.co/microsoft/phi-4)