TinyLlama_v1.1 MARS PEFT Benchmark

This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks.

Overview

We evaluated multiple PEFT methods including:

  • LoRA (Low-Rank Adaptation)
  • LoRA-XS (Extra Small LoRA)
  • LoHA (Low-Rank Hadamard Adaptation)
  • VB LoRA (Vector Bank LoRA)
  • QLoRA (Quantized LoRA with NF4)
  • MARS OPT0 & OPT1 (our method with different optimization levels)
  • QMARS (Quantized MARS with NF4)

Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks:

  • ARC-E (AI2 Reasoning Challenge - Easy)
  • ARC-C (AI2 Reasoning Challenge - Challenge)
  • Winogrande (Commonsense reasoning)
  • BoolQ (Boolean question answering)
  • LogiQA (Logical reasoning)
  • HellaSwag (Commonsense inference)

Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets.

Results Summary

The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method.


Overall Averages

Method ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Overall average
QMARS (nf4) 0.669 0.621 0.530 0.787 0.341 0.798 0.624
MARS OPT1 0.523 0.447 0.513 0.795 0.374 0.807 0.576
MARS OPT0 (fp4) 0.577 0.562 0.515 0.767 0.378 0.639 0.573
MARS OPT1 (int8) 0.450 0.450 0.530 0.795 0.384 0.802 0.568
MARS OPT0 (int8) 0.594 0.515 0.526 0.743 0.382 0.641 0.567
MARS OPT0 0.599 0.511 0.508 0.742 0.382 0.645 0.565
LoRA (fp4) 0.447 0.435 0.524 0.795 0.367 0.818 0.564
MARS OPT1 (fp4) 0.613 0.466 0.502 0.793 0.354 0.626 0.559
QLoRA (nf4) 0.451 0.349 0.520 0.787 0.361 0.828 0.549
LoRA 0.392 0.344 0.522 0.793 0.367 0.835 0.542
LoRA-XS 0.322 0.302 0.516 0.700 0.283 0.468 0.432
LoRA (int8) 0.468 0.391 0.498 0.667 0.262 0.249 0.422
LoHA 0.257 0.261 0.514 0.674 0.282 0.369 0.393
VB LoRA 0.248 0.266 0.522 0.666 0.277 0.261 0.373

Rank r=2

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoHA 3.2M 0.250 0.244 0.503 0.657 0.291 0.263 0.368
LoRA 1.6M 0.286 0.320 0.510 0.760 0.295 0.794 0.494
LoRA (fp4) 1.6M 0.334 0.317 0.512 0.760 0.307 0.780 0.502
LoRA (int8) 1.6M 0.606 0.493 0.504 0.757 0.271 0.245 0.479
QLoRA (nf4) 1.6M 0.308 0.277 0.516 0.754 0.297 0.788 0.490
VB LoRA 1.6M 0.233 0.260 0.525 0.650 0.279 0.257 0.367
MARS OPT0 1.3M 0.566 0.567 0.504 0.622 0.271 0.249 0.463
MARS OPT0 (fp4) 1.3M 0.407 0.569 0.504 0.679 0.271 0.251 0.447
MARS OPT0 (int8) 1.3M 0.454 0.574 0.504 0.621 0.271 0.247 0.445
MARS OPT1 0.79M 0.424 0.485 0.514 0.780 0.270 0.766 0.540
MARS OPT1 (fp4) 0.79M 0.486 0.498 0.504 0.775 0.271 0.246 0.463
MARS OPT1 (int8) 0.79M 0.388 0.468 0.534 0.769 0.271 0.763 0.532
QMARS (nf4) 1.3M 0.567 0.632 0.505 0.749 0.271 0.728 0.575

Rank r=8

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoHA 12.6M 0.255 0.260 0.516 0.681 0.274 0.288 0.379
LoRA 6.3M 0.385 0.335 0.530 0.800 0.365 0.851 0.544
LoRA (fp4) 6.3M 0.500 0.414 0.511 0.810 0.362 0.833 0.572
LoRA (int8) 6.3M 0.578 0.418 0.496 0.622 0.271 0.251 0.439
QLoRA (nf4) 6.3M 0.404 0.291 0.540 0.799 0.351 0.845 0.538
VB LoRA 6.4M 0.239 0.265 0.523 0.668 0.275 0.259 0.371
MARS OPT0 5.2M 0.618 0.502 0.529 0.802 0.446 0.830 0.621
MARS OPT0 (fp4) 5.2M 0.611 0.567 0.524 0.805 0.409 0.813 0.622
MARS OPT0 (int8) 5.2M 0.684 0.493 0.540 0.798 0.429 0.819 0.627
MARS OPT1 3.2M 0.585 0.462 0.494 0.788 0.410 0.820 0.593
MARS OPT1 (fp4) 3.2M 0.680 0.438 0.504 0.798 0.391 0.806 0.603
MARS OPT1 (int8) 3.2M 0.579 0.514 0.512 0.802 0.417 0.813 0.606
QMARS (nf4) 5.2M 0.739 0.620 0.548 0.793 0.344 0.822 0.644

Rank r=16

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoRA-XS 0.04M 0.274 0.261 0.530 0.694 0.281 0.338 0.396

Rank r=32

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoHA 50.5M 0.268 0.277 0.524 0.684 0.281 0.556 0.432
LoRA 25.2M 0.505 0.375 0.526 0.818 0.440 0.860 0.588
LoRA (fp4) 25.2M 0.508 0.574 0.548 0.816 0.433 0.840 0.620
LoRA (int8) 25.2M 0.220 0.263 0.494 0.622 0.244 0.250 0.349
QLoRA (nf4) 25.2M 0.641 0.480 0.504 0.809 0.434 0.852 0.620
VB LoRA 25.3M 0.274 0.274 0.519 0.679 0.277 0.268 0.382
MARS OPT0 21.0M 0.614 0.464 0.490 0.804 0.430 0.856 0.610
MARS OPT0 (fp4) 21.0M 0.712 0.550 0.516 0.817 0.454 0.852 0.650
MARS OPT0 (int8) 21.0M 0.645 0.479 0.533 0.808 0.446 0.858 0.628
MARS OPT1 12.6M 0.561 0.393 0.532 0.815 0.441 0.834 0.596
MARS OPT1 (fp4) 12.6M 0.675 0.462 0.499 0.806 0.401 0.826 0.611
MARS OPT1 (int8) 12.6M 0.384 0.368 0.543 0.814 0.463 0.828 0.567
QMARS (nf4) 21.0M 0.703 0.611 0.538 0.819 0.408 0.843 0.654

Rank r=64

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoRA-XS 0.63M 0.334 0.351 0.515 0.784 0.310 0.817 0.518

Rank r=256

Method Trainable Params ARC-E ARC-C Winogrande BoolQ LogiQA HellaSwag Average
LoRA-XS 10.1M 0.359 0.294 0.504 0.622 0.260 0.249 0.381
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for martinkorelic/TinyLlama_v1.1-mars-peft-benchmark

Finetuned
(52)
this model

Datasets used to train martinkorelic/TinyLlama_v1.1-mars-peft-benchmark