File size: 5,312 Bytes
5e565a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
language: en
license: apache-2.0
tags:
- text-generation
- math-reasoning
- transferability
- RL-GRPO
- research-paper
- qwen
base_model: qwen3-14b
datasets:
- math
- reasoning
pipeline_tag: text-generation
arxiv: 2507.00432
---

# UniReason-Qwen3-14B-RL

This model is associated with the research paper:
**"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"**

📄 **Paper**: [2507.00432](https://arxiv.org/abs/2507.00432)

## Abstract

Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?

## Model Description

This model is a **RL-GRPO**-tuned version of qwen3-14b focused on **math-reasoning** capabilities. 
The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.

### Key Research Questions Addressed:
- Does math reasoning training improve general LLM capabilities?
- How do different training methods (RL vs SFT) affect transferability?
- What is the trade-off between specialized math performance and general capabilities?

## Model Details

- **Base Model**: qwen3-14b
- **Training Method**: RL-GRPO
- **Primary Focus**: math-reasoning
- **Training Data**: Math-specific datasets
- **Architecture**: Transformer-based language model
- **Parameters**: 14B

## Training Details

### Training Method: RL-GRPO
Custom training methodology - see paper for details.

### Datasets Used
- Mathematical reasoning datasets
- See paper for complete dataset list

## Performance

### Math Reasoning Benchmarks
- **MATH**: See paper
- **AIME**: See paper

### General Capabilities
- **General QA**: See paper
- **Code Generation**: See paper
- **Instruction Following**: See paper

*For detailed performance metrics, please refer to the paper.*

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Math reasoning
math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
inputs = tokenizer(math_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Example: General reasoning
general_prompt = "Explain the concept of supply and demand in economics."
inputs = tokenizer(general_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Limitations and Biases

- **Specialization Trade-offs**: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
- **Training Method Dependencies**: Performance characteristics vary significantly between RL and SFT training approaches
- **Domain Transfer**: The extent of capability transfer from math to other domains is limited
- **Computational Requirements**: Model requires significant computational resources for inference

## Research Findings

Key findings from the associated paper:
1. **RL vs SFT**: RL-tuned models show better transfer to general domains compared to SFT-tuned models
2. **Capability Trade-offs**: Most math-specialized models fail to transfer gains to other domains
3. **Forgetting**: SFT-tuned models often forget general capabilities during math-focused training

## Ethical Considerations

- This model is intended for research purposes
- Users should be aware of potential biases in mathematical and general reasoning
- The model should not be used for making critical decisions without human oversight
- Consider the environmental impact of large model inference

## Citation

If you use this model in your research, please cite both the model and the associated paper:

```bibtex
@article{math_reasoning_transfer_2024,
  title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning},
  author={[Authors]},
  journal={arXiv preprint arXiv:2507.00432},
  year={2024},
  url={https://arxiv.org/abs/2507.00432}
}

@misc{UniReason_Qwen3_14B_RL,
  author = {See paper},
  title = {UniReason-Qwen3-14B-RL},
  year = {2024},
  url = {https://huggingface.co/ReasoningTransferability/UniReason-Qwen3-14B-RL},
  note = {Model associated with arXiv:2507.00432}
}
```

## Contact

For questions about this model or the associated research, please:
- Open an issue in this repository
- Contact the paper authors
- Reference the original paper: https://arxiv.org/abs/2507.00432

## Acknowledgments

This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.

---

*Model uploaded on 2025-07-03*