Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference
Guanzheng commited on
Commit
5f190cd
·
verified ·
1 Parent(s): c95866d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -49
README.md CHANGED
@@ -47,55 +47,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
47
 
48
 
49
 
50
- ## Training Process:
51
-
52
- 1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
53
-
54
- 2. Replace the (Flash) Attention module into Ulyssess (Flash) Attn using monkey patch to apply sequence parallel.
55
-
56
- 3. Using our custom LongPO Trainer: `LongPOMTLMUlyssesTrainer`
57
-
58
- 4. Train Script (using Mistral-7B-Instruct-v0.2 as example):
59
-
60
- ```
61
- export training_length=131072
62
- export gradient_accumulation_steps=8
63
- export batch_size=1
64
-
65
- accelerate launch \
66
- --config_file playground/accelerate_single_node_zero3.yaml \
67
- train/train_longpo.py \
68
- --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
69
- --ref_model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
70
- --data_path /path/to/data \
71
- --bf16 True \
72
- --run_name mistral_longpo \
73
- --report_to wandb \
74
- --output_dir path/to/save \
75
- --num_train_epochs 1 \
76
- --per_device_train_batch_size $batch_size \
77
- --gradient_accumulation_steps $gradient_accumulation_steps \
78
- --save_strategy "steps" \
79
- --save_steps 500 \
80
- --evaluation_strategy "no" \
81
- --learning_rate 5e-7 \
82
- --weight_decay 0. \
83
- --warmup_ratio 0.1 \
84
- --lr_scheduler_type "cosine" \
85
- --optim "rmsprop" \
86
- --logging_steps 1 \
87
- --tf32 True \
88
- --model_max_length $training_length \
89
- --gradient_checkpointing True \
90
- --do_train True \
91
- --do_eval False \
92
- --do_predict False \
93
- --seed 42 \
94
- --use_sequence_parallel True \
95
- --dpo_beta 0.01 \
96
- --dpo_lambda 0.01 \
97
- --rope_theta 10000000
98
- ```
99
 
100
  ## Evaluation
101
 
 
47
 
48
 
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## Evaluation
52