EuroBERT-Sentiment-Analysis-french

Usage

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="Ant-Dlc/EuroBERT-SenimentAnalysis-French",
    truncation=True,
    trust_remote_code=True,
    
)

clf("[text]")

Model description

EuroBERT-Sentiment-Analysis-french is a sentiment analysis model for French verbatims answering to the question "Qualification équipement" from an IT satisfaction survey.
It classifies a given text into three categories:

NEGATIF (french for "negative")
NEUTRE (french for "neutral")
POSITIF (french for "positive")

The model is based on EuroBERT (210M parameters) and has been fine-tuned using Hugging Face Transformers and PyTorch.

Training data

The model was trained on approximately 50,000 verbatims from an IT satisfaction survey. Labels were generated using an LLM (Mistral Small 24B, see Ant-Dlc/LLMPrompt-SentimentAnalysis)

⚠️ Notes on the dataset:

The dataset is not perfectly balanced between classes.
Duplicates were kept when they appeared in the survey (e.g., common responses such as "basique" or "RAS").

Evaluation

For evaluation, 500 verbatims were manually labeled and compared with the model predictions.

To avoid data leakage, any test sample that also appeared in the training set was adjusted.
- Example: if the training set contained 10 times the word "basique" and the test set contained 3 times the same word, then 7 occurrences were kept in the training set.
- We made this choice because the presence of multiple occurrences of the same word reflects the nature of the problem we aim to solve.

The confusion matrix below summarizes the results (to be inserted as an image or table):

Observations

About half of the errors are subjective (e.g., the word "moyen" classified as NEUTRAL instead of NEGATIVE).
The model sometimes overweights local negatives in mixed statements where the overall verdict is positive but followed by specific drawbacks (e.g., "globalement c’est bien, mais …"). These should be POSITIVE but are often predicted NEGATIVE.
A few rare misclassifications occur (POSITIVE instead of NEGATIVE), but they represent only a small fraction of errors.

Overall, performance is good and consistent with human judgment, though some ambiguity in labels remains.

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Ant-Dlc/EuroBERT-SenimentAnalysis-French

Base model

EuroBERT/EuroBERT-210m

Finetuned

(45)

this model