TinyLlama_v1.1 MARS PEFT Benchmark
This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks.
Overview
We evaluated multiple PEFT methods including:
- LoRA (Low-Rank Adaptation)
- LoRA-XS (Extra Small LoRA)
- LoHA (Low-Rank Hadamard Adaptation)
- VB LoRA (Vector Bank LoRA)
- QLoRA (Quantized LoRA with NF4)
- MARS OPT0 & OPT1 (our method with different optimization levels)
- QMARS (Quantized MARS with NF4)
Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks:
- ARC-E (AI2 Reasoning Challenge - Easy)
- ARC-C (AI2 Reasoning Challenge - Challenge)
- Winogrande (Commonsense reasoning)
- BoolQ (Boolean question answering)
- LogiQA (Logical reasoning)
- HellaSwag (Commonsense inference)
Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets.
Results Summary
The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method.
Overall Averages
| Method | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Overall average |
|---|---|---|---|---|---|---|---|
| QMARS (nf4) | 0.669 | 0.621 | 0.530 | 0.787 | 0.341 | 0.798 | 0.624 |
| MARS OPT1 | 0.523 | 0.447 | 0.513 | 0.795 | 0.374 | 0.807 | 0.576 |
| MARS OPT0 (fp4) | 0.577 | 0.562 | 0.515 | 0.767 | 0.378 | 0.639 | 0.573 |
| MARS OPT1 (int8) | 0.450 | 0.450 | 0.530 | 0.795 | 0.384 | 0.802 | 0.568 |
| MARS OPT0 (int8) | 0.594 | 0.515 | 0.526 | 0.743 | 0.382 | 0.641 | 0.567 |
| MARS OPT0 | 0.599 | 0.511 | 0.508 | 0.742 | 0.382 | 0.645 | 0.565 |
| LoRA (fp4) | 0.447 | 0.435 | 0.524 | 0.795 | 0.367 | 0.818 | 0.564 |
| MARS OPT1 (fp4) | 0.613 | 0.466 | 0.502 | 0.793 | 0.354 | 0.626 | 0.559 |
| QLoRA (nf4) | 0.451 | 0.349 | 0.520 | 0.787 | 0.361 | 0.828 | 0.549 |
| LoRA | 0.392 | 0.344 | 0.522 | 0.793 | 0.367 | 0.835 | 0.542 |
| LoRA-XS | 0.322 | 0.302 | 0.516 | 0.700 | 0.283 | 0.468 | 0.432 |
| LoRA (int8) | 0.468 | 0.391 | 0.498 | 0.667 | 0.262 | 0.249 | 0.422 |
| LoHA | 0.257 | 0.261 | 0.514 | 0.674 | 0.282 | 0.369 | 0.393 |
| VB LoRA | 0.248 | 0.266 | 0.522 | 0.666 | 0.277 | 0.261 | 0.373 |
Rank r=2
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 3.2M | 0.250 | 0.244 | 0.503 | 0.657 | 0.291 | 0.263 | 0.368 |
| LoRA | 1.6M | 0.286 | 0.320 | 0.510 | 0.760 | 0.295 | 0.794 | 0.494 |
| LoRA (fp4) | 1.6M | 0.334 | 0.317 | 0.512 | 0.760 | 0.307 | 0.780 | 0.502 |
| LoRA (int8) | 1.6M | 0.606 | 0.493 | 0.504 | 0.757 | 0.271 | 0.245 | 0.479 |
| QLoRA (nf4) | 1.6M | 0.308 | 0.277 | 0.516 | 0.754 | 0.297 | 0.788 | 0.490 |
| VB LoRA | 1.6M | 0.233 | 0.260 | 0.525 | 0.650 | 0.279 | 0.257 | 0.367 |
| MARS OPT0 | 1.3M | 0.566 | 0.567 | 0.504 | 0.622 | 0.271 | 0.249 | 0.463 |
| MARS OPT0 (fp4) | 1.3M | 0.407 | 0.569 | 0.504 | 0.679 | 0.271 | 0.251 | 0.447 |
| MARS OPT0 (int8) | 1.3M | 0.454 | 0.574 | 0.504 | 0.621 | 0.271 | 0.247 | 0.445 |
| MARS OPT1 | 0.79M | 0.424 | 0.485 | 0.514 | 0.780 | 0.270 | 0.766 | 0.540 |
| MARS OPT1 (fp4) | 0.79M | 0.486 | 0.498 | 0.504 | 0.775 | 0.271 | 0.246 | 0.463 |
| MARS OPT1 (int8) | 0.79M | 0.388 | 0.468 | 0.534 | 0.769 | 0.271 | 0.763 | 0.532 |
| QMARS (nf4) | 1.3M | 0.567 | 0.632 | 0.505 | 0.749 | 0.271 | 0.728 | 0.575 |
Rank r=8
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 12.6M | 0.255 | 0.260 | 0.516 | 0.681 | 0.274 | 0.288 | 0.379 |
| LoRA | 6.3M | 0.385 | 0.335 | 0.530 | 0.800 | 0.365 | 0.851 | 0.544 |
| LoRA (fp4) | 6.3M | 0.500 | 0.414 | 0.511 | 0.810 | 0.362 | 0.833 | 0.572 |
| LoRA (int8) | 6.3M | 0.578 | 0.418 | 0.496 | 0.622 | 0.271 | 0.251 | 0.439 |
| QLoRA (nf4) | 6.3M | 0.404 | 0.291 | 0.540 | 0.799 | 0.351 | 0.845 | 0.538 |
| VB LoRA | 6.4M | 0.239 | 0.265 | 0.523 | 0.668 | 0.275 | 0.259 | 0.371 |
| MARS OPT0 | 5.2M | 0.618 | 0.502 | 0.529 | 0.802 | 0.446 | 0.830 | 0.621 |
| MARS OPT0 (fp4) | 5.2M | 0.611 | 0.567 | 0.524 | 0.805 | 0.409 | 0.813 | 0.622 |
| MARS OPT0 (int8) | 5.2M | 0.684 | 0.493 | 0.540 | 0.798 | 0.429 | 0.819 | 0.627 |
| MARS OPT1 | 3.2M | 0.585 | 0.462 | 0.494 | 0.788 | 0.410 | 0.820 | 0.593 |
| MARS OPT1 (fp4) | 3.2M | 0.680 | 0.438 | 0.504 | 0.798 | 0.391 | 0.806 | 0.603 |
| MARS OPT1 (int8) | 3.2M | 0.579 | 0.514 | 0.512 | 0.802 | 0.417 | 0.813 | 0.606 |
| QMARS (nf4) | 5.2M | 0.739 | 0.620 | 0.548 | 0.793 | 0.344 | 0.822 | 0.644 |
Rank r=16
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 0.04M | 0.274 | 0.261 | 0.530 | 0.694 | 0.281 | 0.338 | 0.396 |
Rank r=32
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 50.5M | 0.268 | 0.277 | 0.524 | 0.684 | 0.281 | 0.556 | 0.432 |
| LoRA | 25.2M | 0.505 | 0.375 | 0.526 | 0.818 | 0.440 | 0.860 | 0.588 |
| LoRA (fp4) | 25.2M | 0.508 | 0.574 | 0.548 | 0.816 | 0.433 | 0.840 | 0.620 |
| LoRA (int8) | 25.2M | 0.220 | 0.263 | 0.494 | 0.622 | 0.244 | 0.250 | 0.349 |
| QLoRA (nf4) | 25.2M | 0.641 | 0.480 | 0.504 | 0.809 | 0.434 | 0.852 | 0.620 |
| VB LoRA | 25.3M | 0.274 | 0.274 | 0.519 | 0.679 | 0.277 | 0.268 | 0.382 |
| MARS OPT0 | 21.0M | 0.614 | 0.464 | 0.490 | 0.804 | 0.430 | 0.856 | 0.610 |
| MARS OPT0 (fp4) | 21.0M | 0.712 | 0.550 | 0.516 | 0.817 | 0.454 | 0.852 | 0.650 |
| MARS OPT0 (int8) | 21.0M | 0.645 | 0.479 | 0.533 | 0.808 | 0.446 | 0.858 | 0.628 |
| MARS OPT1 | 12.6M | 0.561 | 0.393 | 0.532 | 0.815 | 0.441 | 0.834 | 0.596 |
| MARS OPT1 (fp4) | 12.6M | 0.675 | 0.462 | 0.499 | 0.806 | 0.401 | 0.826 | 0.611 |
| MARS OPT1 (int8) | 12.6M | 0.384 | 0.368 | 0.543 | 0.814 | 0.463 | 0.828 | 0.567 |
| QMARS (nf4) | 21.0M | 0.703 | 0.611 | 0.538 | 0.819 | 0.408 | 0.843 | 0.654 |
Rank r=64
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 0.63M | 0.334 | 0.351 | 0.515 | 0.784 | 0.310 | 0.817 | 0.518 |
Rank r=256
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 10.1M | 0.359 | 0.294 | 0.504 | 0.622 | 0.260 | 0.249 | 0.381 |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for martinkorelic/TinyLlama_v1.1-mars-peft-benchmark
Base model
TinyLlama/TinyLlama_v1.1