prithivMLmods
/

Deepthink-Reasoning-14B

@@ -1,88 +1,100 @@
----
-license: apache-2.0
-datasets:
-- prithivMLmods/Math-Solve
-- AI-MO/NuminaMath-CoT
-- amphora/QwQ-LongCoT-130K
-- amphora/QwQ-LongCoT-130K-2
-language:
-- en
-base_model:
-- Qwen/Qwen2.5-14B-Instruct
-pipeline_tag: text-generation
-library_name: transformers
-tags:
-- Math
-- text-generation-inference
-- Deep-think
----
-# **Deepthink-Reasoning-14B**
-The *Deepthink-Reasoning-14B* model is a fine-tuned version of the *Qwen2.5* base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
-With its robust natural language processing capabilities, *Deepthink-Reasoning-14B* excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
-- It possesses significantly **more knowledge** and exhibits greatly improved capabilities in **coding** and **mathematics**, thanks to specialized expert models in these domains.
-- Offers substantial improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g., tables), and **producing structured outputs**, especially in JSON format. It is **more resilient to diverse system prompts**, enhancing role-play implementation and condition-setting for chatbots.
-- Provides **long-context support** for up to 128K tokens and can generate up to 8K tokens.
-- Features **multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
-# **Quickstart with Tranformers**
-Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "prithivMLmods/Deepthink-Reasoning-14B"
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype="auto",
-    device_map="auto"
-)
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-prompt = "Give me a short introduction to large language model."
-messages = [
-    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
-    {"role": "user", "content": prompt}
-]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
-generated_ids = model.generate(
-    **model_inputs,
-    max_new_tokens=512
-)
-generated_ids = [
-    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
-]
-response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-```
-### **Intended Use:**
-1. **Education:** Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
-2. **Programming:** Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
-3. **Creative Writing:** Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
-4. **Long-Context Processing:** Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
-5. **Multilingual Applications:** Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
-6. **Data Structuring:** Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
-7. **Chatbots and Role-Play:** Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.
-### **Limitations:**
-1. **Resource Requirements:** Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
-2. **Hallucination Risk:** The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
-3. **Limited Domain-Specific Expertise:** While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
-4. **Long-Context Limitations:** Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
-5. **Bias in Outputs:** The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
-6. **Dependence on Prompt Quality:** Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
 7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.

+---
+license: apache-2.0
+datasets:
+- prithivMLmods/Math-Solve
+- AI-MO/NuminaMath-CoT
+- amphora/QwQ-LongCoT-130K
+- amphora/QwQ-LongCoT-130K-2
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-14B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- Math
+- text-generation-inference
+- Deep-think
+---
+# **Deepthink-Reasoning-14B**
+The *Deepthink-Reasoning-14B* model is a fine-tuned version of the *Qwen2.5* base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
+With its robust natural language processing capabilities, *Deepthink-Reasoning-14B* excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
+- It possesses significantly **more knowledge** and exhibits greatly improved capabilities in **coding** and **mathematics**, thanks to specialized expert models in these domains.
+- Offers substantial improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g., tables), and **producing structured outputs**, especially in JSON format. It is **more resilient to diverse system prompts**, enhancing role-play implementation and condition-setting for chatbots.
+- Provides **long-context support** for up to 128K tokens and can generate up to 8K tokens.
+- Features **multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
+# **Quickstart with Tranformers**
+Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "prithivMLmods/Deepthink-Reasoning-14B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+### **Intended Use:**
+1. **Education:** Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
+2. **Programming:** Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
+3. **Creative Writing:** Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
+4. **Long-Context Processing:** Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
+5. **Multilingual Applications:** Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
+6. **Data Structuring:** Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
+7. **Chatbots and Role-Play:** Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.
+### **Limitations:**
+1. **Resource Requirements:** Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
+2. **Hallucination Risk:** The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
+3. **Limited Domain-Specific Expertise:** While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
+4. **Long-Context Limitations:** Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
+5. **Bias in Outputs:** The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
+6. **Dependence on Prompt Quality:** Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
 7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.