|
|
--- |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- stanfordnlp/imdb |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Model Card for bert-imdb-sentiment |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
This is a fine-tuned `bert-base-uncased` model for **binary sentiment classification** on the IMDb movie reviews dataset. |
|
|
The model predicts whether a given movie review is **positive** or **negative**. |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
This model is a `BertForSequenceClassification` model fine-tuned using Hugging Face Transformers and the IMDb dataset (25,000 movie reviews). |
|
|
The training was done using the `Trainer` API with the following configuration: |
|
|
- Tokenization with `BertTokenizer` (`bert-base-uncased`), max sequence length of 256. |
|
|
- Fine-tuned for 3 epochs with learning rate `2e-5` and mixed-precision (fp16). |
|
|
- Achieved **~91.54% accuracy** and **F1 score of ~91.54%** on the test split. |
|
|
|
|
|
- **Developed by:** *koushik reddy* |
|
|
- **Model type:** Transformer-based sequence classifier (`BertForSequenceClassification`) |
|
|
- **Language(s) (NLP):** English |
|
|
- **Finetuned from model :** `bert-base-uncased` ([Hugging Face link](https://huggingface.co/bert-base-uncased)) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [https://huggingface.co/koushik-25/bert-imdb-sentiment](https://huggingface.co/koushik-25/bert-imdb-sentiment) |
|
|
- **Paper :** Original BERT paper: *Devlin et al., 2018* ([https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805)) |
|
|
- **Demo :** You can test it directly using the Inference Widget on the model page. |
|
|
|
|
|
## Intended Uses & Limitations |
|
|
|
|
|
- ✅ Intended for sentiment classification of English movie reviews. |
|
|
- ⚠️ May not generalize well to other domains (e.g., tweets, product reviews) without additional fine-tuning. |
|
|
- ⚠️ May reflect biases present in the IMDb dataset and the original BERT pre-training corpus. |
|
|
|
|
|
|
|
|
### Direct Use |
|
|
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
```python |
|
|
from transformers import BertForSequenceClassification, BertTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model from the Hub |
|
|
model = BertForSequenceClassification.from_pretrained("your-username/bert-imdb-sentiment") |
|
|
tokenizer = BertTokenizer.from_pretrained("your-username/bert-imdb-sentiment") |
|
|
|
|
|
# Inference |
|
|
inputs = tokenizer("The movie was fantastic!", return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
pred = torch.argmax(logits, dim=1).item() |
|
|
print(["NEGATIVE", "POSITIVE"][pred]) |
|
|
``` |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
|
|
- **Dataset:** IMDb movie reviews (`datasets.load_dataset('imdb')`). |
|
|
- **Size:** 25,000 training, 25,000 test samples. |
|
|
- **Preprocessing:** Tokenization with `max_length=256` chosen based on review length histogram. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
|
|
#### Preprocessing |
|
|
|
|
|
- Text was lowercased automatically because `bert-base-uncased` is a lowercase model. |
|
|
- Each example was tokenized with padding to `max_length=256` and truncated if longer. |
|
|
- The dataset was split into train, validation, and test using: |
|
|
- `train`: 0–20,000 samples from the training set |
|
|
- `val`: 20,000–25,000 samples from the training set |
|
|
- `test`: the official IMDb test split |
|
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Base Model:** `bert-base-uncased` |
|
|
- **Num Labels:** 2 (binary classification) |
|
|
- **Batch size:** 4 per device (with gradient accumulation of 16 steps, so effective batch size = 64) |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Epochs:** 3 |
|
|
- **Optimizer:** AdamW (default in Transformers) |
|
|
- **Mixed Precision:** fp16 mixed precision training enabled for faster training and reduced memory usage (`fp16=True` in `TrainingArguments`) |
|
|
- **Scheduler:** Linear learning rate scheduler with warmup (default) |
|
|
- **Seed:** 224 |
|
|
|
|
|
|
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
- **Training Time:** Approx. varies by GPU; typically around 15-20 minutes on T4 GPU |
|
|
- **Checkpoint Size:** ~420MB for `pytorch_model.bin` (BERT base size plus classification head). |
|
|
- **Total Parameters:** ~110 million. |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
|
|
- **Dataset:** IMDb test split (25,000 reviews) held out from training. |
|
|
- **Preprocessing:** Same as training — lowercased, tokenized with `max_length=256`. |
|
|
|
|
|
#### Factors |
|
|
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
- This model was evaluated on the overall IMDb test set only. No specific subgroup or domain disaggregation was done. |
|
|
- The model is expected to generalize well to similar English movie review sentiment but may not be robust to domain shifts. |
|
|
|
|
|
|
|
|
#### Metrics |
|
|
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
|
|
- **Accuracy:** Measures the fraction of correctly classified reviews. |
|
|
- **F1 Score:** Weighted average F1 across classes to balance precision and recall. |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Metric | Score | |
|
|
|-----------|---------| |
|
|
| Accuracy | 91.54% | |
|
|
| F1 Score | 91.54% | |
|
|
|
|
|
Evaluated on the IMDb test set. |
|
|
|
|
|
## Summary |
|
|
|
|
|
This is a fine-tuned BERT model (`bert-base-uncased`) for binary sentiment analysis on the IMDb movie reviews dataset. |
|
|
It classifies a given movie review as **positive** or **negative** with an accuracy of **91.54%** and a weighted F1 score of **91.54%** on the test set. |
|
|
The model was trained using the Hugging Face `transformers` library, with tokenization based on a maximum sequence length of 256 tokens to balance coverage and efficiency. |
|
|
|
|
|
The model is intended for English movie reviews but may generalize reasonably to similar sentiment analysis tasks on longer-form English text. |
|
|
|
|
|
|
|
|
|
|
|
|