File size: 4,381 Bytes
71bd88c 5346059 71bd88c 5346059 71bd88c 5346059 71bd88c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
license: mit
base_model: nanochat
tags:
- nanochat
- llm
- dgx-spark
- grace-blackwell
- from-scratch
language:
- en
pipeline_tag: text-generation
---
# nanochat-1.8B-rl
Final model with reinforcement learning (GRPO). Improved performance on math problems and reduced hallucinations.
## Model Details
- **Model Type:** GPT-style transformer trained from scratch
- **Parameters:** ~1.9 billion
- **Training Phase:** rl
- **Architecture:** 20 layers, 1280 embedding dimension
- **Hardware:** NVIDIA DGX Spark (Grace Blackwell GB10)
- **Framework:** [NanoChat](https://github.com/karpathy/nanochat)
- **Training Precision:** BFloat16
## Training Details
- **GPU:** NVIDIA Grace Blackwell GB10
- **Memory:** 128GB unified memory
- **CUDA:** 13.0
- **Optimization:** Muon optimizer for matrix parameters, AdamW for others
- **Checkpoint Step:** 000466
## Usage
### Prerequisites
```bash
# Clone the NanoChat repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat
# Install dependencies (requires CUDA)
uv venv
uv sync --extra gpu
# Activate the virtual environment
source .venv/bin/activate
```
### Option: DGX Spark Setup
```bash
# Prepare environment and clone NanoChat
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/prepare.sh
chmod +x prepare.sh
./prepare.sh --setup-only
```
### Quick Test
Download and test this model from HuggingFace:
```bash
# Clone the test script
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/hf_test.py
# Set python environment
source nanochat/.venv/bin/activate
# Install dependencies
pip install huggingface_hub
# Run with this model
python hf_test.py --model jasonacox/nanochat-1.8B-rl
```
### Example Code
```python
import sys
import os
import glob
from huggingface_hub import snapshot_download
import torch
from contextlib import nullcontext
# Download model from HuggingFace
print("Downloading model...")
model_path = snapshot_download(
repo_id="jasonacox/nanochat-1.8B-rl",
cache_dir=os.path.expanduser("~/.cache/nanochat/hf_downloads")
)
# Setup NanoChat (clone if needed)
nanochat_path = "nanochat"
if not os.path.exists(nanochat_path):
os.system("git clone https://github.com/karpathy/nanochat.git")
os.system("cd nanochat && uv sync --extra gpu")
sys.path.insert(0, nanochat_path)
from nanochat.checkpoint_manager import build_model
from nanochat.common import compute_init, autodetect_device_type
from nanochat.engine import Engine
# Initialize
device_type = autodetect_device_type()
_, _, _, _, device = compute_init(device_type)
ptdtype = torch.bfloat16
autocast_ctx = torch.amp.autocast(device_type=device_type, dtype=ptdtype) if device_type == "cuda" else nullcontext()
# Load model
checkpoint_files = glob.glob(os.path.join(model_path, "model_*.pt"))
step = int(os.path.basename(checkpoint_files[0]).split("_")[-1].split(".")[0])
model, tokenizer, _ = build_model(model_path, step, device, phase="eval")
engine = Engine(model, tokenizer)
# Generate
prompt = "Hello, how are you?"
tokens = tokenizer.encode(prompt)
print(f"Prompt: {prompt}\nResponse: ", end="", flush=True)
with autocast_ctx:
for token_column, _ in engine.generate(tokens, num_samples=1, max_tokens=100, temperature=0.8, top_k=50):
print(tokenizer.decode([token_column[0]]), end="", flush=True)
print()
```
## Training Pipeline
This model was trained using the DGX Spark optimized training pipeline:
1. **Pretraining:** Base language model on FineWeb-EDU dataset
2. **Midtraining:** Fine-tuned on conversational data (SmolTalk)
3. **SFT:** Supervised fine-tuning on curated conversations
4. **RL:** Reinforcement learning with GRPO
## Limitations
- This is a micro-model (1.9B parameters) - smaller than commercial LLMs
- May make factual errors or hallucinate
- Limited knowledge cutoff from training data
- Best suited for educational purposes and experimentation
## Citation
```bibtex
@misc{nanochat-1.8B,
author = {jasonacox},
title = {nanochat-1.8B-rl},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/jasonacox/nanochat-1.8B-rl}}
}
```
## Acknowledgments
- Andrej Karpathy for [NanoChat](https://github.com/karpathy/nanochat)
- NVIDIA DGX Spark platform
- FineWeb-EDU and SmolTalk datasets
## License
MIT License - Free to use for research and educational purposes
|