File size: 5,312 Bytes
5e565a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
language: en
license: apache-2.0
tags:
- text-generation
- math-reasoning
- transferability
- RL-GRPO
- research-paper
- qwen
base_model: qwen3-14b
datasets:
- math
- reasoning
pipeline_tag: text-generation
arxiv: 2507.00432
---
# UniReason-Qwen3-14B-RL
This model is associated with the research paper:
**"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"**
📄 **Paper**: [2507.00432](https://arxiv.org/abs/2507.00432)
## Abstract
Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?
## Model Description
This model is a **RL-GRPO**-tuned version of qwen3-14b focused on **math-reasoning** capabilities.
The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.
### Key Research Questions Addressed:
- Does math reasoning training improve general LLM capabilities?
- How do different training methods (RL vs SFT) affect transferability?
- What is the trade-off between specialized math performance and general capabilities?
## Model Details
- **Base Model**: qwen3-14b
- **Training Method**: RL-GRPO
- **Primary Focus**: math-reasoning
- **Training Data**: Math-specific datasets
- **Architecture**: Transformer-based language model
- **Parameters**: 14B
## Training Details
### Training Method: RL-GRPO
Custom training methodology - see paper for details.
### Datasets Used
- Mathematical reasoning datasets
- See paper for complete dataset list
## Performance
### Math Reasoning Benchmarks
- **MATH**: See paper
- **AIME**: See paper
### General Capabilities
- **General QA**: See paper
- **Code Generation**: See paper
- **Instruction Following**: See paper
*For detailed performance metrics, please refer to the paper.*
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Example: Math reasoning
math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
inputs = tokenizer(math_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
# Example: General reasoning
general_prompt = "Explain the concept of supply and demand in economics."
inputs = tokenizer(general_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Limitations and Biases
- **Specialization Trade-offs**: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
- **Training Method Dependencies**: Performance characteristics vary significantly between RL and SFT training approaches
- **Domain Transfer**: The extent of capability transfer from math to other domains is limited
- **Computational Requirements**: Model requires significant computational resources for inference
## Research Findings
Key findings from the associated paper:
1. **RL vs SFT**: RL-tuned models show better transfer to general domains compared to SFT-tuned models
2. **Capability Trade-offs**: Most math-specialized models fail to transfer gains to other domains
3. **Forgetting**: SFT-tuned models often forget general capabilities during math-focused training
## Ethical Considerations
- This model is intended for research purposes
- Users should be aware of potential biases in mathematical and general reasoning
- The model should not be used for making critical decisions without human oversight
- Consider the environmental impact of large model inference
## Citation
If you use this model in your research, please cite both the model and the associated paper:
```bibtex
@article{math_reasoning_transfer_2024,
title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning},
author={[Authors]},
journal={arXiv preprint arXiv:2507.00432},
year={2024},
url={https://arxiv.org/abs/2507.00432}
}
@misc{UniReason_Qwen3_14B_RL,
author = {See paper},
title = {UniReason-Qwen3-14B-RL},
year = {2024},
url = {https://huggingface.co/ReasoningTransferability/UniReason-Qwen3-14B-RL},
note = {Model associated with arXiv:2507.00432}
}
```
## Contact
For questions about this model or the associated research, please:
- Open an issue in this repository
- Contact the paper authors
- Reference the original paper: https://arxiv.org/abs/2507.00432
## Acknowledgments
This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.
---
*Model uploaded on 2025-07-03*
|