DAMO-NLP-SG
/

Qwen2.5-7B-LongPO-128K

Text Generation

text-generation-inference

Model card Files Files and versions

Guanzheng commited on Feb 22

Commit

5f190cd

·

verified ·

1 Parent(s): c95866d

Update README.md

Files changed (1) hide show

README.md +0 -49

README.md CHANGED Viewed

@@ -47,55 +47,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
-## Training Process:
-1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
-2. Replace the (Flash) Attention module into Ulyssess (Flash) Attn using monkey patch to apply sequence parallel.
-3. Using our custom LongPO Trainer: `LongPOMTLMUlyssesTrainer`
-4. Train Script (using Mistral-7B-Instruct-v0.2 as example):
-```
-export training_length=131072
-export gradient_accumulation_steps=8
-export batch_size=1
-accelerate launch \
---config_file playground/accelerate_single_node_zero3.yaml \
-train/train_longpo.py \
-    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
-    --ref_model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
-    --data_path /path/to/data \
-    --bf16 True \
-    --run_name mistral_longpo \
-    --report_to wandb \
-    --output_dir path/to/save \
-    --num_train_epochs 1 \
-    --per_device_train_batch_size $batch_size \
-    --gradient_accumulation_steps $gradient_accumulation_steps \
-    --save_strategy "steps" \
-    --save_steps 500 \
-    --evaluation_strategy "no" \
-    --learning_rate 5e-7 \
-    --weight_decay 0. \
-    --warmup_ratio 0.1 \
-    --lr_scheduler_type "cosine" \
-    --optim "rmsprop" \
-    --logging_steps 1 \
-    --tf32 True \
-    --model_max_length $training_length \
-    --gradient_checkpointing True \
-    --do_train True \
-    --do_eval False \
-    --do_predict False \
-    --seed 42 \
-    --use_sequence_parallel True \
-    --dpo_beta 0.01 \
-    --dpo_lambda 0.01 \
-    --rope_theta 10000000
-```
 ## Evaluation


47
48
49

















































50
51	## Evaluation
52