Improve language tag
#1
by
lbourdois
- opened
README.md
CHANGED
|
@@ -1,88 +1,100 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- prithivMLmods/Math-Solve
|
| 5 |
-
- AI-MO/NuminaMath-CoT
|
| 6 |
-
- amphora/QwQ-LongCoT-130K
|
| 7 |
-
- amphora/QwQ-LongCoT-130K-2
|
| 8 |
-
language:
|
| 9 |
-
-
|
| 10 |
-
|
| 11 |
-
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
-
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
# **
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
tokenizer
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
)
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- prithivMLmods/Math-Solve
|
| 5 |
+
- AI-MO/NuminaMath-CoT
|
| 6 |
+
- amphora/QwQ-LongCoT-130K
|
| 7 |
+
- amphora/QwQ-LongCoT-130K-2
|
| 8 |
+
language:
|
| 9 |
+
- zho
|
| 10 |
+
- eng
|
| 11 |
+
- fra
|
| 12 |
+
- spa
|
| 13 |
+
- por
|
| 14 |
+
- deu
|
| 15 |
+
- ita
|
| 16 |
+
- rus
|
| 17 |
+
- jpn
|
| 18 |
+
- kor
|
| 19 |
+
- vie
|
| 20 |
+
- tha
|
| 21 |
+
- ara
|
| 22 |
+
base_model:
|
| 23 |
+
- Qwen/Qwen2.5-14B-Instruct
|
| 24 |
+
pipeline_tag: text-generation
|
| 25 |
+
library_name: transformers
|
| 26 |
+
tags:
|
| 27 |
+
- Math
|
| 28 |
+
- text-generation-inference
|
| 29 |
+
- Deep-think
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
# **Deepthink-Reasoning-14B**
|
| 33 |
+
|
| 34 |
+
The *Deepthink-Reasoning-14B* model is a fine-tuned version of the *Qwen2.5* base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.
|
| 35 |
+
|
| 36 |
+
With its robust natural language processing capabilities, *Deepthink-Reasoning-14B* excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
|
| 37 |
+
|
| 38 |
+
- It possesses significantly **more knowledge** and exhibits greatly improved capabilities in **coding** and **mathematics**, thanks to specialized expert models in these domains.
|
| 39 |
+
- Offers substantial improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g., tables), and **producing structured outputs**, especially in JSON format. It is **more resilient to diverse system prompts**, enhancing role-play implementation and condition-setting for chatbots.
|
| 40 |
+
- Provides **long-context support** for up to 128K tokens and can generate up to 8K tokens.
|
| 41 |
+
- Features **multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
# **Quickstart with Tranformers**
|
| 45 |
+
|
| 46 |
+
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 50 |
+
|
| 51 |
+
model_name = "prithivMLmods/Deepthink-Reasoning-14B"
|
| 52 |
+
|
| 53 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 54 |
+
model_name,
|
| 55 |
+
torch_dtype="auto",
|
| 56 |
+
device_map="auto"
|
| 57 |
+
)
|
| 58 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 59 |
+
|
| 60 |
+
prompt = "Give me a short introduction to large language model."
|
| 61 |
+
messages = [
|
| 62 |
+
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
|
| 63 |
+
{"role": "user", "content": prompt}
|
| 64 |
+
]
|
| 65 |
+
text = tokenizer.apply_chat_template(
|
| 66 |
+
messages,
|
| 67 |
+
tokenize=False,
|
| 68 |
+
add_generation_prompt=True
|
| 69 |
+
)
|
| 70 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 71 |
+
|
| 72 |
+
generated_ids = model.generate(
|
| 73 |
+
**model_inputs,
|
| 74 |
+
max_new_tokens=512
|
| 75 |
+
)
|
| 76 |
+
generated_ids = [
|
| 77 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 78 |
+
]
|
| 79 |
+
|
| 80 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### **Intended Use:**
|
| 84 |
+
1. **Education:** Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
|
| 85 |
+
2. **Programming:** Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
|
| 86 |
+
3. **Creative Writing:** Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
|
| 87 |
+
4. **Long-Context Processing:** Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
|
| 88 |
+
5. **Multilingual Applications:** Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
|
| 89 |
+
6. **Data Structuring:** Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
|
| 90 |
+
7. **Chatbots and Role-Play:** Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
### **Limitations:**
|
| 94 |
+
1. **Resource Requirements:** Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
|
| 95 |
+
2. **Hallucination Risk:** The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
|
| 96 |
+
3. **Limited Domain-Specific Expertise:** While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
|
| 97 |
+
4. **Long-Context Limitations:** Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
|
| 98 |
+
5. **Bias in Outputs:** The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
|
| 99 |
+
6. **Dependence on Prompt Quality:** Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
|
| 100 |
7. **Error in Multilingual Output:** Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.
|