ReasoningTransferability
/

UniReason-Qwen3-14B-RL

+---
+language: en
+license: apache-2.0
+tags:
+- text-generation
+- math-reasoning
+- transferability
+- RL-GRPO
+- research-paper
+- qwen
+base_model: qwen3-14b
+datasets:
+- math
+- reasoning
+pipeline_tag: text-generation
+arxiv: 2507.00432
+---
+# UniReason-Qwen3-14B-RL
+This model is associated with the research paper:
+**"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"**
+📄 **Paper**: [2507.00432](https://arxiv.org/abs/2507.00432)
+## Abstract
+Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?
+## Model Description
+This model is a **RL-GRPO**-tuned version of qwen3-14b focused on **math-reasoning** capabilities.
+The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.
+### Key Research Questions Addressed:
+- Does math reasoning training improve general LLM capabilities?
+- How do different training methods (RL vs SFT) affect transferability?
+- What is the trade-off between specialized math performance and general capabilities?
+## Model Details
+- **Base Model**: qwen3-14b
+- **Training Method**: RL-GRPO
+- **Primary Focus**: math-reasoning
+- **Training Data**: Math-specific datasets
+- **Architecture**: Transformer-based language model
+- **Parameters**: 14B
+## Training Details
+### Training Method: RL-GRPO
+Custom training methodology - see paper for details.
+### Datasets Used
+- Mathematical reasoning datasets
+- See paper for complete dataset list
+## Performance
+### Math Reasoning Benchmarks
+- **MATH**: See paper
+- **AIME**: See paper
+### General Capabilities
+- **General QA**: See paper
+- **Code Generation**: See paper
+- **Instruction Following**: See paper
+*For detailed performance metrics, please refer to the paper.*
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Example: Math reasoning
+math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
+inputs = tokenizer(math_prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+# Example: General reasoning
+general_prompt = "Explain the concept of supply and demand in economics."
+inputs = tokenizer(general_prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Limitations and Biases
+- **Specialization Trade-offs**: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
+- **Training Method Dependencies**: Performance characteristics vary significantly between RL and SFT training approaches
+- **Domain Transfer**: The extent of capability transfer from math to other domains is limited
+- **Computational Requirements**: Model requires significant computational resources for inference
+## Research Findings
+Key findings from the associated paper:
+1. **RL vs SFT**: RL-tuned models show better transfer to general domains compared to SFT-tuned models
+2. **Capability Trade-offs**: Most math-specialized models fail to transfer gains to other domains
+3. **Forgetting**: SFT-tuned models often forget general capabilities during math-focused training
+## Ethical Considerations
+- This model is intended for research purposes
+- Users should be aware of potential biases in mathematical and general reasoning
+- The model should not be used for making critical decisions without human oversight
+- Consider the environmental impact of large model inference
+## Citation
+If you use this model in your research, please cite both the model and the associated paper:
+```bibtex
+@article{math_reasoning_transfer_2024,
+  title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning},
+  author={[Authors]},
+  journal={arXiv preprint arXiv:2507.00432},
+  year={2024},
+  url={https://arxiv.org/abs/2507.00432}
+}
+@misc{UniReason_Qwen3_14B_RL,
+  author = {See paper},
+  title = {UniReason-Qwen3-14B-RL},
+  year = {2024},
+  url = {https://huggingface.co/ReasoningTransferability/UniReason-Qwen3-14B-RL},
+  note = {Model associated with arXiv:2507.00432}
+}
+```
+## Contact
+For questions about this model or the associated research, please:
+- Open an issue in this repository
+- Contact the paper authors
+- Reference the original paper: https://arxiv.org/abs/2507.00432
+## Acknowledgments
+This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.
+---
+*Model uploaded on 2025-07-03*