Alexandre-Numind commited on
Commit
8dd9b58
·
verified ·
1 Parent(s): 2a4020c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -63,7 +63,6 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
63
  1. **SFT**: One-epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
66
- **Model before GRPO loses 80% time vs post-GRPO model (see win-rate matrix)**
67
 
68
  ## Example:
69
 
 
63
  1. **SFT**: One-epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
 
66
 
67
  ## Example:
68