File size: 4,381 Bytes

---
license: mit
base_model: nanochat
tags:
  - nanochat
  - llm
  - dgx-spark
  - grace-blackwell
  - from-scratch
language:
  - en
pipeline_tag: text-generation
---

# nanochat-1.8B-rl

Final model with reinforcement learning (GRPO). Improved performance on math problems and reduced hallucinations.

## Model Details

- **Model Type:** GPT-style transformer trained from scratch
- **Parameters:** ~1.9 billion
- **Training Phase:** rl
- **Architecture:** 20 layers, 1280 embedding dimension
- **Hardware:** NVIDIA DGX Spark (Grace Blackwell GB10)
- **Framework:** [NanoChat](https://github.com/karpathy/nanochat)
- **Training Precision:** BFloat16

## Training Details

- **GPU:** NVIDIA Grace Blackwell GB10
- **Memory:** 128GB unified memory
- **CUDA:** 13.0
- **Optimization:** Muon optimizer for matrix parameters, AdamW for others
- **Checkpoint Step:** 000466

## Usage

### Prerequisites

```bash
# Clone the NanoChat repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Install dependencies (requires CUDA)
uv venv
uv sync --extra gpu

# Activate the virtual environment
source .venv/bin/activate
```

### Option: DGX Spark Setup

```bash
# Prepare environment and clone NanoChat
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/prepare.sh
chmod +x prepare.sh
./prepare.sh --setup-only
```

### Quick Test

Download and test this model from HuggingFace:

```bash
# Clone the test script
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/hf_test.py

# Set python environment
source nanochat/.venv/bin/activate

# Install dependencies
pip install huggingface_hub

# Run with this model
python hf_test.py --model jasonacox/nanochat-1.8B-rl
```

### Example Code

```python
import sys
import os
import glob
from huggingface_hub import snapshot_download
import torch
from contextlib import nullcontext

# Download model from HuggingFace
print("Downloading model...")
model_path = snapshot_download(
    repo_id="jasonacox/nanochat-1.8B-rl",
    cache_dir=os.path.expanduser("~/.cache/nanochat/hf_downloads")
)

# Setup NanoChat (clone if needed)
nanochat_path = "nanochat"
if not os.path.exists(nanochat_path):
    os.system("git clone https://github.com/karpathy/nanochat.git")
    os.system("cd nanochat && uv sync --extra gpu")

sys.path.insert(0, nanochat_path)

from nanochat.checkpoint_manager import build_model
from nanochat.common import compute_init, autodetect_device_type
from nanochat.engine import Engine

# Initialize
device_type = autodetect_device_type()
_, _, _, _, device = compute_init(device_type)
ptdtype = torch.bfloat16
autocast_ctx = torch.amp.autocast(device_type=device_type, dtype=ptdtype) if device_type == "cuda" else nullcontext()

# Load model
checkpoint_files = glob.glob(os.path.join(model_path, "model_*.pt"))
step = int(os.path.basename(checkpoint_files[0]).split("_")[-1].split(".")[0])
model, tokenizer, _ = build_model(model_path, step, device, phase="eval")
engine = Engine(model, tokenizer)

# Generate
prompt = "Hello, how are you?"
tokens = tokenizer.encode(prompt)
print(f"Prompt: {prompt}\nResponse: ", end="", flush=True)

with autocast_ctx:
    for token_column, _ in engine.generate(tokens, num_samples=1, max_tokens=100, temperature=0.8, top_k=50):
        print(tokenizer.decode([token_column[0]]), end="", flush=True)
print()
```

## Training Pipeline

This model was trained using the DGX Spark optimized training pipeline:

1. **Pretraining:** Base language model on FineWeb-EDU dataset
2. **Midtraining:** Fine-tuned on conversational data (SmolTalk)
3. **SFT:** Supervised fine-tuning on curated conversations
4. **RL:** Reinforcement learning with GRPO

## Limitations

- This is a micro-model (1.9B parameters) - smaller than commercial LLMs
- May make factual errors or hallucinate
- Limited knowledge cutoff from training data
- Best suited for educational purposes and experimentation

## Citation

```bibtex
@misc{nanochat-1.8B,
  author = {jasonacox},
  title = {nanochat-1.8B-rl},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jasonacox/nanochat-1.8B-rl}}
}
```

## Acknowledgments

- Andrej Karpathy for [NanoChat](https://github.com/karpathy/nanochat)
- NVIDIA DGX Spark platform
- FineWeb-EDU and SmolTalk datasets

## License

MIT License - Free to use for research and educational purposes