Update README.md
Browse files
README.md
CHANGED
|
@@ -47,55 +47,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
|
|
| 47 |
|
| 48 |
|
| 49 |
|
| 50 |
-
## Training Process:
|
| 51 |
-
|
| 52 |
-
1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
|
| 53 |
-
|
| 54 |
-
2. Replace the (Flash) Attention module into Ulyssess (Flash) Attn using monkey patch to apply sequence parallel.
|
| 55 |
-
|
| 56 |
-
3. Using our custom LongPO Trainer: `LongPOMTLMUlyssesTrainer`
|
| 57 |
-
|
| 58 |
-
4. Train Script (using Mistral-7B-Instruct-v0.2 as example):
|
| 59 |
-
|
| 60 |
-
```
|
| 61 |
-
export training_length=131072
|
| 62 |
-
export gradient_accumulation_steps=8
|
| 63 |
-
export batch_size=1
|
| 64 |
-
|
| 65 |
-
accelerate launch \
|
| 66 |
-
--config_file playground/accelerate_single_node_zero3.yaml \
|
| 67 |
-
train/train_longpo.py \
|
| 68 |
-
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
|
| 69 |
-
--ref_model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
|
| 70 |
-
--data_path /path/to/data \
|
| 71 |
-
--bf16 True \
|
| 72 |
-
--run_name mistral_longpo \
|
| 73 |
-
--report_to wandb \
|
| 74 |
-
--output_dir path/to/save \
|
| 75 |
-
--num_train_epochs 1 \
|
| 76 |
-
--per_device_train_batch_size $batch_size \
|
| 77 |
-
--gradient_accumulation_steps $gradient_accumulation_steps \
|
| 78 |
-
--save_strategy "steps" \
|
| 79 |
-
--save_steps 500 \
|
| 80 |
-
--evaluation_strategy "no" \
|
| 81 |
-
--learning_rate 5e-7 \
|
| 82 |
-
--weight_decay 0. \
|
| 83 |
-
--warmup_ratio 0.1 \
|
| 84 |
-
--lr_scheduler_type "cosine" \
|
| 85 |
-
--optim "rmsprop" \
|
| 86 |
-
--logging_steps 1 \
|
| 87 |
-
--tf32 True \
|
| 88 |
-
--model_max_length $training_length \
|
| 89 |
-
--gradient_checkpointing True \
|
| 90 |
-
--do_train True \
|
| 91 |
-
--do_eval False \
|
| 92 |
-
--do_predict False \
|
| 93 |
-
--seed 42 \
|
| 94 |
-
--use_sequence_parallel True \
|
| 95 |
-
--dpo_beta 0.01 \
|
| 96 |
-
--dpo_lambda 0.01 \
|
| 97 |
-
--rope_theta 10000000
|
| 98 |
-
```
|
| 99 |
|
| 100 |
## Evaluation
|
| 101 |
|
|
|
|
| 47 |
|
| 48 |
|
| 49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
## Evaluation
|
| 52 |
|