|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- karpathy/fineweb-edu-100b-shuffle |
|
|
language: |
|
|
- en |
|
|
model-index: |
|
|
- name: chat-d10 |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: AI2 Reasoning Challenge (25-Shot) |
|
|
type: ai2_arc |
|
|
config: ARC-Challenge |
|
|
split: test |
|
|
metrics: |
|
|
- type: acc_norm |
|
|
value: 29.61 |
|
|
name: normalized accuracy |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: AI2 Reasoning Challenge (25-Shot) |
|
|
type: ai2_arc |
|
|
config: ARC-Easy |
|
|
split: test |
|
|
metrics: |
|
|
- type: acc_norm |
|
|
value: 42.59 |
|
|
name: normalized accuracy |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: MMLU (5-Shot) |
|
|
type: cais/mmlu |
|
|
config: all |
|
|
split: test |
|
|
metrics: |
|
|
- type: acc |
|
|
value: 32.50 |
|
|
name: accuracy |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: GSM8k (5-shot) |
|
|
type: gsm8k |
|
|
config: main |
|
|
split: test |
|
|
metrics: |
|
|
- type: acc |
|
|
value: 4.32 |
|
|
name: accuracy |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: HumanEval |
|
|
type: openai_humaneval |
|
|
split: test |
|
|
metrics: |
|
|
- type: pass@1 |
|
|
value: 5.49 |
|
|
name: pass@1 |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: ChatCORE |
|
|
type: chatcore |
|
|
split: test |
|
|
metrics: |
|
|
- type: score |
|
|
value: 9.88 |
|
|
name: ChatCORE metric |
|
|
source: |
|
|
url: https://github.com/karpathy/nanochat |
|
|
name: nanochat |
|
|
--- |
|
|
|
|
|
# NanoChat SFT |
|
|
|
|
|
This is the the checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat). |
|
|
|
|
|
## Usage |
|
|
|
|
|
Install transformers from this specific branch: |
|
|
|
|
|
```sh |
|
|
pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation |
|
|
``` |
|
|
|
|
|
Then, you can run this inference snippet: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
|
|
|
model_id="nanochat-students/d20-chat-transformers" |
|
|
max_new_tokens=64 |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device) |
|
|
model.eval() |
|
|
|
|
|
conversation = [ |
|
|
{"role": "user", "content": "What is the capital of France?"}, |
|
|
] |
|
|
|
|
|
inputs = tokenizer.apply_chat_template( |
|
|
conversation, |
|
|
add_generation_prompt=True, |
|
|
tokenize=True, |
|
|
return_tensors="pt" |
|
|
).to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=max_new_tokens, |
|
|
) |
|
|
|
|
|
# Decode only the generated tokens (excluding the input prompt) |
|
|
generated_tokens = outputs[0, inputs.input_ids.shape[1]:] |
|
|
print(tokenizer.decode(generated_tokens, skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## vLLM Integration: |
|
|
|
|
|
You can also run the model in vLLM, using the above branch install: |
|
|
|
|
|
```vllm serve nanochat-students/nanochat-d20 --enforce-eager ``` |
|
|
|
|
|
And then you can call the model like so: |
|
|
|
|
|
```sh |
|
|
url http://localhost:8000/v1/completions \ |
|
|
> -H "Content-Type: application/json" \ |
|
|
> -d '{"model": "nanochat-students/nanochat-d20", "prompt": "What is the capital of France?, "max_tokens": 7, "temperature": 0}' |
|
|
``` |
|
|
|
|
|
## Chat SFT Training Metrics |
|
|
|
|
|
timestamp: 2025-10-14 20:17:42 |
|
|
|
|
|
- run: |
|
|
- source: mid |
|
|
- dtype: bfloat16 |
|
|
- device_batch_size: 4 |
|
|
- num_epochs: 1 |
|
|
- max_iterations: -1 |
|
|
- target_examples_per_step: 32 |
|
|
- unembedding_lr: 0.0040 |
|
|
- embedding_lr: 0.2000 |
|
|
- matrix_lr: 0.0200 |
|
|
- weight_decay: 0.0000 |
|
|
- init_lr_frac: 0.0200 |
|
|
- eval_every: 100 |
|
|
- eval_steps: 100 |
|
|
- eval_metrics_every: 200 |
|
|
- Training rows: 20,843 |
|
|
- Number of iterations: 651 |
|
|
- Training loss: 1.1904 |
|
|
- Validation loss: 1.0664 |
|
|
|
|
|
## Chat evaluation sft |
|
|
|
|
|
timestamp: 2025-10-14 20:29:59 |
|
|
|
|
|
- source: sft |
|
|
- task_name: None |
|
|
- dtype: bfloat16 |
|
|
- temperature: 0.0000 |
|
|
- max_new_tokens: 512 |
|
|
- num_samples: 1 |
|
|
- top_k: 50 |
|
|
- batch_size: 8 |
|
|
- model_tag: None |
|
|
- step: None |
|
|
- max_problems: None |
|
|
- ARC-Easy: 0.4259 |
|
|
- ARC-Challenge: 0.2961 |
|
|
- MMLU: 0.3250 |
|
|
- GSM8K: 0.0432 |
|
|
- HumanEval: 0.0549 |
|
|
- ChatCORE metric: 0.0988 |
|
|
|
|
|
Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio |
|
|
|
|
|
|