ehristoforu commited on
Commit
aea1ed5
·
verified ·
1 Parent(s): cee03ab

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +162 -0
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ru
5
+ - code
6
+ base_model:
7
+ - Qwen/Qwen3-1.7B
8
+ inference: true
9
+ widget:
10
+ - example_title: FluentlyQwen3
11
+ messages:
12
+ - role: system
13
+ content: You are Fluently, a capable, neutrally-aligned assistant. Prefer concise,
14
+ correct answers.
15
+ - role: user
16
+ content: Explain the difference between BFS and DFS to a new CS student.
17
+ pipeline_tag: text-generation
18
+ library_name: transformers
19
+ ---
20
+
21
+ ![banner](assets/banner.png)
22
+
23
+ # FluentlyQwen3 1.7B
24
+
25
+ Introducing a new LLM model from Project Fluently. The goal of this model is to improve the base model by training it on diverse datasets. This model is obtained by SFT and GRPO training and step-by-step merging.
26
+
27
+ ## Model details
28
+
29
+ - **Developed by:** [@fluently](https://hf.co/fluently)
30
+ - **Model type:** Causal Language Models (Qwen3ForCausalLM, LM Transformer)
31
+ - **Number of Parameters:** 1.7B
32
+ - **Number of Paramaters (Non-Embedding):** 1.4B
33
+ - **Number of Layers:** 28
34
+ - **Number of Attention Heads (GQA):** 16 for Q and 8 for KV
35
+ - **Context Length:** 32,768
36
+ - **License:** MIT
37
+
38
+ ### Recipe
39
+
40
+ ![recipe](assets/recipe.png)
41
+
42
+ **The recipe is approximate, there are some inaccuracies.*
43
+
44
+ ### Strengths
45
+
46
+ #### General improvements
47
+
48
+ | Task | **Result** |
49
+ |-----------------------|--------------------|
50
+ | Basic Communication | **Improved** |
51
+ | Translation | **Improved** |
52
+ | Mathematics | **Improved** |
53
+ | Physics | **Improved** |
54
+ | Biology | **Improved** |
55
+ | Medicine | **Improved** |
56
+ | Coding | **Improved** |
57
+ | Agent Functions | **Improved** |
58
+
59
+ ### Quickstart
60
+
61
+ Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
62
+
63
+ ```py
64
+ from transformers
65
+ import AutoModelForCausalLM, AutoTokenizer
66
+
67
+ model_name = "fluently/FluentlyQwen3-1.7B"
68
+
69
+ # load the tokenizer and the model
70
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ model_name,
73
+ torch_dtype="auto",
74
+ device_map="auto"
75
+ )
76
+
77
+ # prepare the model input
78
+ prompt = "Give me a short introduction to large language model."
79
+ messages = [
80
+ {"role": "user", "content": prompt}
81
+ ]
82
+ text = tokenizer.apply_chat_template(
83
+ messages,
84
+ tokenize=False,
85
+ add_generation_prompt=True,
86
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
87
+ )
88
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
89
+
90
+ # conduct text completion
91
+ generated_ids = model.generate(
92
+ **model_inputs,
93
+ max_new_tokens=32768
94
+ )
95
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
96
+
97
+ # parsing thinking content
98
+ try:
99
+ # rindex finding 151668 (</think>)
100
+ index = len(output_ids) - output_ids[::-1].index(151668)
101
+ except ValueError:
102
+ index = 0
103
+
104
+ thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
105
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
106
+
107
+ print("thinking content:", thinking_content)
108
+ print("content:", content)
109
+ ```
110
+
111
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
112
+
113
+ ## Switching Between Thinking and Non-Thinking Mode
114
+
115
+ > [!TIP]
116
+ > The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
117
+
118
+ ### `enable_thinking=True`
119
+
120
+ By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting `enable_thinking=True` or leaving it as the default value in `tokenizer.apply_chat_template`, the model will engage its thinking mode.
121
+
122
+ ```python
123
+ text = tokenizer.apply_chat_template(
124
+ messages,
125
+ tokenize=False,
126
+ add_generation_prompt=True,
127
+ enable_thinking=True # True is the default value for enable_thinking
128
+ )
129
+ ```
130
+
131
+ In this mode, the model will generate think content wrapped in a `<think>...</think>` block, followed by the final response.
132
+
133
+ > [!NOTE]
134
+ > For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20` and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
135
+
136
+ ### `enable_thinking=False`
137
+
138
+ We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
139
+
140
+ ```python
141
+ text = tokenizer.apply_chat_template(
142
+ messages,
143
+ tokenize=False,
144
+ add_generation_prompt=True,
145
+ enable_thinking=False # Setting enable_thinking=False disables thinking mode
146
+ )
147
+ ```
148
+
149
+ In this mode, the model will not generate any think content and will not include a `<think>...</think>` block.
150
+
151
+ > [!NOTE]
152
+ > For non-thinking mode, we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
153
+
154
+ ## Special thanks
155
+
156
+ 🤗 We are grateful for open source resources, technologies and assistance from:
157
+ - [Unsloth AI](https://unsloth.ai)
158
+ - [Axolotl AI](https://axolotl.ai)
159
+ - [Argilla](https://argilla.io)
160
+ - [Alibaba Cloud: Qwen](https://qwenlm.ai)
161
+ - [NVIDIA on HuggingFace](https://huggingface.co/nvidia)
162
+ - [NousResearch](https://nousresearch.com)