nanochat-d20 / README.md

Update README.md

3b2e7cc verified 27 days ago

4.79 kB

	---
	license: apache-2.0
	datasets:
	- karpathy/fineweb-edu-100b-shuffle
	language:
	- en
	model-index:
	- name: chat-d10
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	metrics:
	- type: acc_norm
	value: 29.61
	name: normalized accuracy
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Easy
	split: test
	metrics:
	- type: acc_norm
	value: 42.59
	name: normalized accuracy
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	metrics:
	- type: acc
	value: 32.50
	name: accuracy
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	metrics:
	- type: acc
	value: 4.32
	name: accuracy
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HumanEval
	type: openai_humaneval
	split: test
	metrics:
	- type: pass@1
	value: 5.49
	name: pass@1
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ChatCORE
	type: chatcore
	split: test
	metrics:
	- type: score
	value: 9.88
	name: ChatCORE metric
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	---

	# NanoChat SFT

	This is the the checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).

	## Usage

	Install transformers from this specific branch:

	```sh
	pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
	```

	Then, you can run this inference snippet:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer


	model_id="nanochat-students/d20-chat-transformers"
	max_new_tokens=64
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
	model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
	model.eval()

	conversation = [
	{"role": "user", "content": "What is the capital of France?"},
	]

	inputs = tokenizer.apply_chat_template(
	conversation,
	add_generation_prompt=True,
	tokenize=True,
	return_tensors="pt"
	).to(device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	)

	# Decode only the generated tokens (excluding the input prompt)
	generated_tokens = outputs[0, inputs.input_ids.shape[1]:]
	print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
	```

	## vLLM Integration:

	You can also run the model in vLLM, using the above branch install:

	```vllm serve nanochat-students/nanochat-d20 --enforce-eager ```

	And then you can call the model like so:

	```sh
	url http://localhost:8000/v1/completions \
	> -H "Content-Type: application/json" \
	> -d '{"model": "nanochat-students/nanochat-d20", "prompt": "What is the capital of France?, "max_tokens": 7, "temperature": 0}'
	```

	## Chat SFT Training Metrics

	timestamp: 2025-10-14 20:17:42

	- run:
	- source: mid
	- dtype: bfloat16
	- device_batch_size: 4
	- num_epochs: 1
	- max_iterations: -1
	- target_examples_per_step: 32
	- unembedding_lr: 0.0040
	- embedding_lr: 0.2000
	- matrix_lr: 0.0200
	- weight_decay: 0.0000
	- init_lr_frac: 0.0200
	- eval_every: 100
	- eval_steps: 100
	- eval_metrics_every: 200
	- Training rows: 20,843
	- Number of iterations: 651
	- Training loss: 1.1904
	- Validation loss: 1.0664

	## Chat evaluation sft

	timestamp: 2025-10-14 20:29:59

	- source: sft
	- task_name: None
	- dtype: bfloat16
	- temperature: 0.0000
	- max_new_tokens: 512
	- num_samples: 1
	- top_k: 50
	- batch_size: 8
	- model_tag: None
	- step: None
	- max_problems: None
	- ARC-Easy: 0.4259
	- ARC-Challenge: 0.2961
	- MMLU: 0.3250
	- GSM8K: 0.0432
	- HumanEval: 0.0549
	- ChatCORE metric: 0.0988

	Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio