File size: 4,381 Bytes
71bd88c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5346059
71bd88c
 
 
 
 
 
5346059
 
 
 
 
 
 
 
 
71bd88c
 
 
 
 
 
 
 
5346059
 
 
71bd88c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
license: mit
base_model: nanochat
tags:
  - nanochat
  - llm
  - dgx-spark
  - grace-blackwell
  - from-scratch
language:
  - en
pipeline_tag: text-generation
---

# nanochat-1.8B-rl

Final model with reinforcement learning (GRPO). Improved performance on math problems and reduced hallucinations.

## Model Details

- **Model Type:** GPT-style transformer trained from scratch
- **Parameters:** ~1.9 billion
- **Training Phase:** rl
- **Architecture:** 20 layers, 1280 embedding dimension
- **Hardware:** NVIDIA DGX Spark (Grace Blackwell GB10)
- **Framework:** [NanoChat](https://github.com/karpathy/nanochat)
- **Training Precision:** BFloat16

## Training Details

- **GPU:** NVIDIA Grace Blackwell GB10
- **Memory:** 128GB unified memory
- **CUDA:** 13.0
- **Optimization:** Muon optimizer for matrix parameters, AdamW for others
- **Checkpoint Step:** 000466

## Usage

### Prerequisites

```bash
# Clone the NanoChat repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Install dependencies (requires CUDA)
uv venv
uv sync --extra gpu

# Activate the virtual environment
source .venv/bin/activate
```

### Option: DGX Spark Setup

```bash
# Prepare environment and clone NanoChat
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/prepare.sh
chmod +x prepare.sh
./prepare.sh --setup-only
```

### Quick Test

Download and test this model from HuggingFace:

```bash
# Clone the test script
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/hf_test.py

# Set python environment
source nanochat/.venv/bin/activate

# Install dependencies
pip install huggingface_hub

# Run with this model
python hf_test.py --model jasonacox/nanochat-1.8B-rl
```

### Example Code

```python
import sys
import os
import glob
from huggingface_hub import snapshot_download
import torch
from contextlib import nullcontext

# Download model from HuggingFace
print("Downloading model...")
model_path = snapshot_download(
    repo_id="jasonacox/nanochat-1.8B-rl",
    cache_dir=os.path.expanduser("~/.cache/nanochat/hf_downloads")
)

# Setup NanoChat (clone if needed)
nanochat_path = "nanochat"
if not os.path.exists(nanochat_path):
    os.system("git clone https://github.com/karpathy/nanochat.git")
    os.system("cd nanochat && uv sync --extra gpu")

sys.path.insert(0, nanochat_path)

from nanochat.checkpoint_manager import build_model
from nanochat.common import compute_init, autodetect_device_type
from nanochat.engine import Engine

# Initialize
device_type = autodetect_device_type()
_, _, _, _, device = compute_init(device_type)
ptdtype = torch.bfloat16
autocast_ctx = torch.amp.autocast(device_type=device_type, dtype=ptdtype) if device_type == "cuda" else nullcontext()

# Load model
checkpoint_files = glob.glob(os.path.join(model_path, "model_*.pt"))
step = int(os.path.basename(checkpoint_files[0]).split("_")[-1].split(".")[0])
model, tokenizer, _ = build_model(model_path, step, device, phase="eval")
engine = Engine(model, tokenizer)

# Generate
prompt = "Hello, how are you?"
tokens = tokenizer.encode(prompt)
print(f"Prompt: {prompt}\nResponse: ", end="", flush=True)

with autocast_ctx:
    for token_column, _ in engine.generate(tokens, num_samples=1, max_tokens=100, temperature=0.8, top_k=50):
        print(tokenizer.decode([token_column[0]]), end="", flush=True)
print()
```

## Training Pipeline

This model was trained using the DGX Spark optimized training pipeline:

1. **Pretraining:** Base language model on FineWeb-EDU dataset
2. **Midtraining:** Fine-tuned on conversational data (SmolTalk)
3. **SFT:** Supervised fine-tuning on curated conversations
4. **RL:** Reinforcement learning with GRPO

## Limitations

- This is a micro-model (1.9B parameters) - smaller than commercial LLMs
- May make factual errors or hallucinate
- Limited knowledge cutoff from training data
- Best suited for educational purposes and experimentation

## Citation

```bibtex
@misc{nanochat-1.8B,
  author = {jasonacox},
  title = {nanochat-1.8B-rl},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jasonacox/nanochat-1.8B-rl}}
}
```

## Acknowledgments

- Andrej Karpathy for [NanoChat](https://github.com/karpathy/nanochat)
- NVIDIA DGX Spark platform
- FineWeb-EDU and SmolTalk datasets

## License

MIT License - Free to use for research and educational purposes