See axolotl config

axolotl version: 0.13.0.dev0

adapter: lora
base_model: Qwen/Qwen2.5-72B-Instruct
load_in_4bit: true
bnb_4bit_compute_dtype: bfloat16
bnb_4bit_use_double_quant: true
bnb_4bit_quant_type: nf4

datasets:
  - path: ./patched_dataset/data.jsonl
    type: alpaca

val_set_size: 0.05
output_dir: ./outputs/qwen80b_qlora_run

micro_batch_size: 1
gradient_accumulation_steps: 8
num_epochs: 3
learning_rate: 2e-4

lora_alpha: 16
lora_r: 8
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj

sequence_len: 1024
train_on_inputs: false
optimizer: paged_adamw_8bit

bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

warmup_ratio: 0.03
weight_decay: 0.01
logging_steps: 10
saves_per_epoch: 1
evals_per_epoch: 1
save_total_limit: 2

device_map: "auto"
low_cpu_mem_usage: true
torch_dtype: bfloat16

outputs/qwen80b_qlora_run

This model is a fine-tuned version of Qwen/Qwen2.5-72B-Instruct on the ./patched_dataset/data.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 1.8941
Memory/max Active (gib): 43.77
Memory/max Allocated (gib): 43.77
Memory/device Reserved (gib): 45.94

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 2
training_steps: 90

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	2.5724	43.65	43.65	52.31
2.0549	1.0	30	1.8877	43.77	43.77	45.94
1.6302	2.0	60	1.8321	43.77	43.77	45.94
1.3038	3.0	90	1.8941	43.77	43.77	45.94

Framework versions

PEFT 0.17.1
Transformers 4.57.0
Pytorch 2.7.1+cu126
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 8

Model tree for htarikk/qwen80b-style-lora

Base model

Qwen/Qwen2.5-72B

Finetuned

Qwen/Qwen2.5-72B-Instruct

Adapter

(19)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard