Files changed (1) hide show
  1. README.md +99 -87
README.md CHANGED
@@ -1,88 +1,100 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - prithivMLmods/Math-Solve
5
- - AI-MO/NuminaMath-CoT
6
- - amphora/QwQ-LongCoT-130K
7
- - amphora/QwQ-LongCoT-130K-2
8
- language:
9
- - en
10
- base_model:
11
- - Qwen/Qwen2.5-14B-Instruct
12
- pipeline_tag: text-generation
13
- library_name: transformers
14
- tags:
15
- - Math
16
- - text-generation-inference
17
- - Deep-think
18
- ---
19
-
20
- # **Deepthink-Reasoning-14B**
21
-
22
- The *Deepthink-Reasoning-14B* model is a fine-tuned version of the *Qwen2.5* base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
23
-
24
- With its robust natural language processing capabilities, *Deepthink-Reasoning-14B* excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
25
-
26
- - It possesses significantly **more knowledge** and exhibits greatly improved capabilities in **coding** and **mathematics**, thanks to specialized expert models in these domains.
27
- - Offers substantial improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g., tables), and **producing structured outputs**, especially in JSON format. It is **more resilient to diverse system prompts**, enhancing role-play implementation and condition-setting for chatbots.
28
- - Provides **long-context support** for up to 128K tokens and can generate up to 8K tokens.
29
- - Features **multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
30
-
31
-
32
- # **Quickstart with Tranformers**
33
-
34
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
35
-
36
- ```python
37
- from transformers import AutoModelForCausalLM, AutoTokenizer
38
-
39
- model_name = "prithivMLmods/Deepthink-Reasoning-14B"
40
-
41
- model = AutoModelForCausalLM.from_pretrained(
42
- model_name,
43
- torch_dtype="auto",
44
- device_map="auto"
45
- )
46
- tokenizer = AutoTokenizer.from_pretrained(model_name)
47
-
48
- prompt = "Give me a short introduction to large language model."
49
- messages = [
50
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
51
- {"role": "user", "content": prompt}
52
- ]
53
- text = tokenizer.apply_chat_template(
54
- messages,
55
- tokenize=False,
56
- add_generation_prompt=True
57
- )
58
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
59
-
60
- generated_ids = model.generate(
61
- **model_inputs,
62
- max_new_tokens=512
63
- )
64
- generated_ids = [
65
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
66
- ]
67
-
68
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
69
- ```
70
-
71
- ### **Intended Use:**
72
- 1. **Education:** Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
73
- 2. **Programming:** Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
74
- 3. **Creative Writing:** Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
75
- 4. **Long-Context Processing:** Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
76
- 5. **Multilingual Applications:** Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
77
- 6. **Data Structuring:** Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
78
- 7. **Chatbots and Role-Play:** Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.
79
-
80
-
81
- ### **Limitations:**
82
- 1. **Resource Requirements:** Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
83
- 2. **Hallucination Risk:** The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
84
- 3. **Limited Domain-Specific Expertise:** While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
85
- 4. **Long-Context Limitations:** Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
86
- 5. **Bias in Outputs:** The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
87
- 6. **Dependence on Prompt Quality:** Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
 
 
 
 
 
 
 
 
 
 
 
 
88
  7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - prithivMLmods/Math-Solve
5
+ - AI-MO/NuminaMath-CoT
6
+ - amphora/QwQ-LongCoT-130K
7
+ - amphora/QwQ-LongCoT-130K-2
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ base_model:
23
+ - Qwen/Qwen2.5-14B-Instruct
24
+ pipeline_tag: text-generation
25
+ library_name: transformers
26
+ tags:
27
+ - Math
28
+ - text-generation-inference
29
+ - Deep-think
30
+ ---
31
+
32
+ # **Deepthink-Reasoning-14B**
33
+
34
+ The *Deepthink-Reasoning-14B* model is a fine-tuned version of the *Qwen2.5* base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
35
+
36
+ With its robust natural language processing capabilities, *Deepthink-Reasoning-14B* excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
37
+
38
+ - It possesses significantly **more knowledge** and exhibits greatly improved capabilities in **coding** and **mathematics**, thanks to specialized expert models in these domains.
39
+ - Offers substantial improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g., tables), and **producing structured outputs**, especially in JSON format. It is **more resilient to diverse system prompts**, enhancing role-play implementation and condition-setting for chatbots.
40
+ - Provides **long-context support** for up to 128K tokens and can generate up to 8K tokens.
41
+ - Features **multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
42
+
43
+
44
+ # **Quickstart with Tranformers**
45
+
46
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+
51
+ model_name = "prithivMLmods/Deepthink-Reasoning-14B"
52
+
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_name,
55
+ torch_dtype="auto",
56
+ device_map="auto"
57
+ )
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
59
+
60
+ prompt = "Give me a short introduction to large language model."
61
+ messages = [
62
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
63
+ {"role": "user", "content": prompt}
64
+ ]
65
+ text = tokenizer.apply_chat_template(
66
+ messages,
67
+ tokenize=False,
68
+ add_generation_prompt=True
69
+ )
70
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
71
+
72
+ generated_ids = model.generate(
73
+ **model_inputs,
74
+ max_new_tokens=512
75
+ )
76
+ generated_ids = [
77
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
78
+ ]
79
+
80
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
81
+ ```
82
+
83
+ ### **Intended Use:**
84
+ 1. **Education:** Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
85
+ 2. **Programming:** Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
86
+ 3. **Creative Writing:** Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
87
+ 4. **Long-Context Processing:** Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
88
+ 5. **Multilingual Applications:** Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
89
+ 6. **Data Structuring:** Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
90
+ 7. **Chatbots and Role-Play:** Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.
91
+
92
+
93
+ ### **Limitations:**
94
+ 1. **Resource Requirements:** Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
95
+ 2. **Hallucination Risk:** The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
96
+ 3. **Limited Domain-Specific Expertise:** While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
97
+ 4. **Long-Context Limitations:** Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
98
+ 5. **Bias in Outputs:** The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
99
+ 6. **Dependence on Prompt Quality:** Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
100
  7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.