AzalKhan's picture
Upload README.md with huggingface_hub
cf5b231 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
datasets:
  - open-r1/DAPO-Math-17k-Processed
library_name: transformers
tags:
  - grpo
  - reinforcement-learning
  - reasoning
  - qwen

Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_294_FlashRL_G4-L1024

This repository contains a checkpoint trained with GRPO on open-r1/DAPO-Math-17k-Processed starting from Qwen/Qwen2.5-1.5B-Instruct.
This snapshot corresponds to training step 294.

Contents include:

  • Model weights (.safetensors)
  • Config files (config.json, generation_config.json)
  • Tokenizer files (tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json, added_tokens.json)
  • Optional chat template (chat_template.jinja)

Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.