fix mistake
Browse files
README.md
CHANGED
|
@@ -122,7 +122,7 @@ The model comes in two versions:
|
|
| 122 |
|
| 123 |
The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
|
| 124 |
|
| 125 |
-
*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is
|
| 126 |
|
| 127 |
## 📊 Evaluation
|
| 128 |
|
|
|
|
| 122 |
|
| 123 |
The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
|
| 124 |
|
| 125 |
+
*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
|
| 126 |
|
| 127 |
## 📊 Evaluation
|
| 128 |
|