|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- it |
|
|
base_model: |
|
|
- EuroBERT/EuroBERT-210m |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- token classification |
|
|
- hallucination detection |
|
|
- transformers |
|
|
- question answer |
|
|
datasets: |
|
|
- KRLabsOrg/ragtruth-it-translated |
|
|
--- |
|
|
|
|
|
# LettuceDetect: Italian Hallucination Detection Model |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://github.com/KRLabsOrg/LettuceDetect/blob/feature/cn_llm_eval/assets/lettuce_detective_multi.png?raw=true" alt="LettuceDetect Logo" width="400"/> |
|
|
</p> |
|
|
|
|
|
**Model Name:** KRLabsOrg/lettucedect-210m-eurobert-it-v1 |
|
|
**Organization:** KRLabsOrg |
|
|
**Github:** https://github.com/KRLabsOrg/LettuceDetect |
|
|
|
|
|
## Overview |
|
|
|
|
|
LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on **EuroBERT-210M**, which has been specifically chosen for its extended context support (up to **8192 tokens**) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context. |
|
|
|
|
|
**This is our Italian base model utilizing EuroBERT-210M architecture** |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture:** EuroBERT-210M with extended context support (up to 8192 tokens) |
|
|
- **Task:** Token Classification / Hallucination Detection |
|
|
- **Training Dataset:** RagTruth-IT (translated from the original RAGTruth dataset) |
|
|
- **Language:** Italian |
|
|
|
|
|
## How It Works |
|
|
|
|
|
The model is trained to identify tokens in the Italian answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated. |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
Install the 'lettucedetect' repository |
|
|
|
|
|
```bash |
|
|
pip install lettucedetect |
|
|
``` |
|
|
|
|
|
### Using the model |
|
|
|
|
|
```python |
|
|
from lettucedetect.models.inference import HallucinationDetector |
|
|
|
|
|
# For a transformer-based approach: |
|
|
detector = HallucinationDetector( |
|
|
method="transformer", |
|
|
model_path="KRLabsOrg/lettucedect-210m-eurobert-it-v1", |
|
|
lang="it", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
contexts = ["La Francia è un paese in Europa. La capitale della Francia è Parigi. La popolazione della Francia è di 67 milioni."] |
|
|
question = "Qual è la capitale della Francia? Qual è la popolazione della Francia?" |
|
|
answer = "La capitale della Francia è Parigi. La popolazione della Francia è di 69 milioni." |
|
|
|
|
|
# Get span-level predictions indicating which parts of the answer are considered hallucinated. |
|
|
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans") |
|
|
print("Previsioni:", predictions) |
|
|
|
|
|
# Previsioni: [{'start': 37, 'end': 83, 'confidence': 0.9231457829475403, 'text': ' La popolazione della Francia è di 69 milioni.'}] |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
**Results on Translated RAGTruth-IT** |
|
|
|
|
|
We evaluate our Italian models on translated versions of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. The EuroBERT-210M Italian model achieves an F1 score of 65.93%, outperforming prompt-based methods like GPT-4.1-mini (61.06%) with an improvement of +4.87 percentage points. |
|
|
|
|
|
For detailed performance metrics, see the table below: |
|
|
|
|
|
| Language | Model | Precision (%) | Recall (%) | F1 (%) | GPT-4.1-mini F1 (%) | Δ F1 (%) | |
|
|
|----------|-----------------|---------------|------------|--------|---------------------|----------| |
|
|
| Italian | EuroBERT-210M | 60.57 | 72.32 | 65.93 | 61.06 | +4.87 | |
|
|
| Italian | EuroBERT-610M | 76.67 | 72.85 | 74.71 | 61.06 | +13.65 | |
|
|
|
|
|
While the 610M variant achieves higher performance, the 210M model offers a good balance between accuracy and computational efficiency, processing examples approximately 3× faster. It shows particularly strong recall performance at 72.32%. |
|
|
|
|
|
## Citing |
|
|
|
|
|
If you use the model or the tool, please cite the following paper: |
|
|
|
|
|
```bibtex |
|
|
@misc{Kovacs:2025, |
|
|
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, |
|
|
author={Ádám Kovács and Gábor Recski}, |
|
|
year={2025}, |
|
|
eprint={2502.17125}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2502.17125}, |
|
|
} |
|
|
``` |