Festooned commited on
Commit
5596981
·
verified ·
1 Parent(s): afe8853

Initial upload of Multilingual-Restaurant-Reviews-Sentiment

Browse files
Files changed (1) hide show
  1. README.md +30 -13
README.md CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: text-classification
21
 
22
  Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**.
23
 
24
- It was trained on a massive, perfectly balanced dataset of **400,000 real, human-written, reviews** and achieves state-of-the-art performance.
25
 
26
  ## ✨ Model Features
27
 
@@ -92,8 +92,26 @@ This model was trained as a **regression** task. It predicts a single number (li
92
 
93
  Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating.
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```python
96
  from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
 
97
 
98
  model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment"
99
  tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -106,11 +124,11 @@ model = AutoModelForSequenceClassification.from_pretrained(model_name)
106
  # Let's create a pipeline
107
  sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
108
 
109
- # Example reviews
110
  reviews = [
111
- "This was the best pasta I've ever had in my life. Absolutely incredible.", # 5-star
112
- "El servicio fue terrible y la comida tardó una hora en llegar.", # 1-star
113
- "It was... fine. Nothing special, but not bad either." # 3-star
114
  ]
115
 
116
  # Get the raw predictions
@@ -125,18 +143,17 @@ print(raw_preds)
125
  # (Remember our labels are 0-4, so we add 1)
126
  # ---
127
  for text, pred in zip(reviews, raw_preds):
128
- # 'score' is the raw regression value from 0-4
129
  raw_score = pred['score']
130
 
131
- # Round to the nearest star
132
- star_rating_rounded = round(raw_score) + 1
133
 
134
- # Or just use the raw score!
135
- star_rating_precise = raw_score + 1
136
 
137
- print(f"Review: {text[:30]}...")
138
- print(f" Precise Rating: {star_rating_precise:.2f} stars")
139
- print(f" Rounded Rating: {star_rating_rounded} stars\n")
140
  ```
141
 
142
  ---
 
21
 
22
  Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**.
23
 
24
+ It was trained on a massive, perfectly balanced dataset of **400,000+ real, human-written, reviews** and achieves state-of-the-art performance.
25
 
26
  ## ✨ Model Features
27
 
 
92
 
93
  Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating.
94
 
95
+ ### ⚠️ A Critical Note on Input Format
96
+
97
+ **This is very important for getting the best performance!**
98
+
99
+ This model was not just trained on review text; it was trained using a specific format that includes **both the review title and the review text**, separated by the `[SEP]` token.
100
+
101
+ The title often contains a powerful summary of the sentiment (e.g., "Best Pasta Ever!" or "Total Rip-off!"). Using this format ensures the model gets the same type of input it was trained on.
102
+
103
+ **Correct Format:**
104
+ `input_text = review_title + " [SEP] " + review_text`
105
+
106
+ If you only have the review text, the model will still work well, but performance will be slightly lower.
107
+
108
+ ### Pipeline Usage Example
109
+
110
+ Here is how you should format your inputs before passing them to the pipeline:
111
+
112
  ```python
113
  from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
114
+ import numpy as np # Make sure to import numpy
115
 
116
  model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment"
117
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
124
  # Let's create a pipeline
125
  sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
126
 
127
+ # Example reviews using the recommended format
128
  reviews = [
129
+ "Absolutely incredible [SEP] This was the best pasta I've ever had in my life.", # 5-star
130
+ "Servicio terrible [SEP] El servicio fue terrible y la comida tardó una hora en llegar.", # 1-star
131
+ "It was fine [SEP] It was... fine. Nothing special, but not bad either." # 3-star
132
  ]
133
 
134
  # Get the raw predictions
 
143
  # (Remember our labels are 0-4, so we add 1)
144
  # ---
145
  for text, pred in zip(reviews, raw_preds):
146
+ # 'score' is the raw regression value (our model predicts 0-4)
147
  raw_score = pred['score']
148
 
149
+ # Round and clamp to be safe (0-4)
150
+ star_label_rounded = np.clip(round(raw_score), 0, 4)
151
 
152
+ # Add 1 to get the 1-5 star rating
153
+ final_star_rating = int(star_label_rounded + 1)
154
 
155
+ print(f"Review: {text[:40]}...")
156
+ print(f" Final Rating: {final_star_rating} stars\n")
 
157
  ```
158
 
159
  ---