Initial upload of Multilingual-Restaurant-Reviews-Sentiment
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ pipeline_tag: text-classification
|
|
| 21 |
|
| 22 |
Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**.
|
| 23 |
|
| 24 |
-
It was trained on a massive, perfectly balanced dataset of **400,000 real, human-written, reviews** and achieves state-of-the-art performance.
|
| 25 |
|
| 26 |
## ✨ Model Features
|
| 27 |
|
|
@@ -92,8 +92,26 @@ This model was trained as a **regression** task. It predicts a single number (li
|
|
| 92 |
|
| 93 |
Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating.
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
```python
|
| 96 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
|
|
|
|
| 97 |
|
| 98 |
model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment"
|
| 99 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
@@ -106,11 +124,11 @@ model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
|
| 106 |
# Let's create a pipeline
|
| 107 |
sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
| 108 |
|
| 109 |
-
# Example reviews
|
| 110 |
reviews = [
|
| 111 |
-
"This was the best pasta I've ever had in my life.
|
| 112 |
-
"El servicio fue terrible y la comida tardó una hora en llegar.", # 1-star
|
| 113 |
-
"It was... fine. Nothing special, but not bad either." # 3-star
|
| 114 |
]
|
| 115 |
|
| 116 |
# Get the raw predictions
|
|
@@ -125,18 +143,17 @@ print(raw_preds)
|
|
| 125 |
# (Remember our labels are 0-4, so we add 1)
|
| 126 |
# ---
|
| 127 |
for text, pred in zip(reviews, raw_preds):
|
| 128 |
-
# 'score' is the raw regression value
|
| 129 |
raw_score = pred['score']
|
| 130 |
|
| 131 |
-
# Round to
|
| 132 |
-
|
| 133 |
|
| 134 |
-
#
|
| 135 |
-
|
| 136 |
|
| 137 |
-
print(f"Review: {text[:
|
| 138 |
-
print(f"
|
| 139 |
-
print(f" Rounded Rating: {star_rating_rounded} stars\n")
|
| 140 |
```
|
| 141 |
|
| 142 |
---
|
|
|
|
| 21 |
|
| 22 |
Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**.
|
| 23 |
|
| 24 |
+
It was trained on a massive, perfectly balanced dataset of **400,000+ real, human-written, reviews** and achieves state-of-the-art performance.
|
| 25 |
|
| 26 |
## ✨ Model Features
|
| 27 |
|
|
|
|
| 92 |
|
| 93 |
Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating.
|
| 94 |
|
| 95 |
+
### ⚠️ A Critical Note on Input Format
|
| 96 |
+
|
| 97 |
+
**This is very important for getting the best performance!**
|
| 98 |
+
|
| 99 |
+
This model was not just trained on review text; it was trained using a specific format that includes **both the review title and the review text**, separated by the `[SEP]` token.
|
| 100 |
+
|
| 101 |
+
The title often contains a powerful summary of the sentiment (e.g., "Best Pasta Ever!" or "Total Rip-off!"). Using this format ensures the model gets the same type of input it was trained on.
|
| 102 |
+
|
| 103 |
+
**Correct Format:**
|
| 104 |
+
`input_text = review_title + " [SEP] " + review_text`
|
| 105 |
+
|
| 106 |
+
If you only have the review text, the model will still work well, but performance will be slightly lower.
|
| 107 |
+
|
| 108 |
+
### Pipeline Usage Example
|
| 109 |
+
|
| 110 |
+
Here is how you should format your inputs before passing them to the pipeline:
|
| 111 |
+
|
| 112 |
```python
|
| 113 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
|
| 114 |
+
import numpy as np # Make sure to import numpy
|
| 115 |
|
| 116 |
model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment"
|
| 117 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
| 124 |
# Let's create a pipeline
|
| 125 |
sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
| 126 |
|
| 127 |
+
# Example reviews using the recommended format
|
| 128 |
reviews = [
|
| 129 |
+
"Absolutely incredible [SEP] This was the best pasta I've ever had in my life.", # 5-star
|
| 130 |
+
"Servicio terrible [SEP] El servicio fue terrible y la comida tardó una hora en llegar.", # 1-star
|
| 131 |
+
"It was fine [SEP] It was... fine. Nothing special, but not bad either." # 3-star
|
| 132 |
]
|
| 133 |
|
| 134 |
# Get the raw predictions
|
|
|
|
| 143 |
# (Remember our labels are 0-4, so we add 1)
|
| 144 |
# ---
|
| 145 |
for text, pred in zip(reviews, raw_preds):
|
| 146 |
+
# 'score' is the raw regression value (our model predicts 0-4)
|
| 147 |
raw_score = pred['score']
|
| 148 |
|
| 149 |
+
# Round and clamp to be safe (0-4)
|
| 150 |
+
star_label_rounded = np.clip(round(raw_score), 0, 4)
|
| 151 |
|
| 152 |
+
# Add 1 to get the 1-5 star rating
|
| 153 |
+
final_star_rating = int(star_label_rounded + 1)
|
| 154 |
|
| 155 |
+
print(f"Review: {text[:40]}...")
|
| 156 |
+
print(f" Final Rating: {final_star_rating} stars\n")
|
|
|
|
| 157 |
```
|
| 158 |
|
| 159 |
---
|