Nikity
/

lille-130m-instruct

Text Generation

Model card Files Files and versions

Nikity commited on Sep 1

Commit

025ab01

·

verified ·

1 Parent(s): 4c3db4c

fix mistake

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ The model comes in two versions:
 The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
-*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is closer to 140 million.*
 ## 📊 Evaluation

 The model architecture is a modern Transformer decoder featuring Grouped-Query Attention (GQA), RoPE, and RMSNorm, making it efficient and performant for its size.
+*Note on parameter count: While the model name is `130M` for simplicity, the actual parameter count is 127.17 million.*
 ## 📊 Evaluation