Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
| 4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
| 5 |
|
| 6 |
**Paper or resources for more information:**
|
| 7 |
-
[[Paper]()] [[Code](https://github.com/SkyworkAI/MoH)]
|
| 8 |
|
| 9 |
## ⚡ Overview
|
| 10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|
|
|
|
| 4 |
# MoH: Multi-Head Attention as Mixture-of-Head Attention
|
| 5 |
|
| 6 |
**Paper or resources for more information:**
|
| 7 |
+
[[Paper](https://huggingface.co/papers/2410.11842)] [[Code](https://github.com/SkyworkAI/MoH)]
|
| 8 |
|
| 9 |
## ⚡ Overview
|
| 10 |
We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages:
|