update bias section
Browse files
README.md
CHANGED
|
@@ -620,26 +620,21 @@ This instruction-tuned variant has been trained with a mixture of 276k English,
|
|
| 620 |
|
| 621 |
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
| 622 |
we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
|
| 623 |
-
We report that while performance is high (accuracies
|
| 624 |
-
the model performs very poorly in ambiguous settings, which
|
| 625 |
-
|
| 626 |
-
|
| 627 |
-
|
| 628 |
-
|
| 629 |
-
|
| 630 |
-
|
| 631 |
-
|
| 632 |
-
|
| 633 |
-
|
| 634 |
-
|
| 635 |
-
|
| 636 |
-
|
| 637 |
-
|
| 638 |
-
implying that outputs can be influenced by the prompts.
|
| 639 |
-
|
| 640 |
-
We highlight that these results can be expected from a pretrained model that has not yet been instruction-tuned or aligned.
|
| 641 |
-
These tests are performed in order to show the biases the model may contain.
|
| 642 |
-
We urge developers to take them into account and perform safety testing and tuning tailored to their specific applications of the model.
|
| 643 |
|
| 644 |
---
|
| 645 |
|
|
|
|
| 620 |
|
| 621 |
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
| 622 |
we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
|
| 623 |
+
We report that while performance is high (accuracies around 0.8 depending on the social category) in disambiguated settings,
|
| 624 |
+
the model performs very poorly in ambiguous settings, which indicates the presence of societal biases that need to be further addressed in post-training phases.
|
| 625 |
+
|
| 626 |
+
Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
|
| 627 |
+
For positional effects, we leverage the ARC Multiple Choice Question dataset (Clark et al., 2018). We observe significant,
|
| 628 |
+
but relatively weak primacy effects, whereby the model shows a preference for answers towards the beginning of the list of provided answers.
|
| 629 |
+
We measure effects of majority class effects in few-shot settings using SST-2 (Socher et al., 2013). We again detect significant effects,
|
| 630 |
+
with a small effect size. This suggests that the model is relatively robust against the examined cognitive biases.
|
| 631 |
+
|
| 632 |
+
We highlight that our analyses of these biases are by no means exhaustive and are limited by the relative scarcity of adequate resources
|
| 633 |
+
in all languages present in the training data. We aim to gradually extend and expand our analyses in future work.
|
| 634 |
+
|
| 635 |
+
These results can be expected from a model that has undergone only a preliminary instruction tuning.
|
| 636 |
+
These tests are performed in order to show the biases the model may contain. We urge developers to take
|
| 637 |
+
them into account and perform safety testing and tuning tailored to their specific applications of the model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 638 |
|
| 639 |
---
|
| 640 |
|