bert-imdb-sentiment / README.md
koushik-25's picture
Update README.md
162a970 verified
---
library_name: transformers
datasets:
- stanfordnlp/imdb
metrics:
- accuracy
- f1
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
---
# Model Card for bert-imdb-sentiment
<!-- Provide a quick summary of what the model is/does. -->
This is a fine-tuned `bert-base-uncased` model for **binary sentiment classification** on the IMDb movie reviews dataset.
The model predicts whether a given movie review is **positive** or **negative**.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model is a `BertForSequenceClassification` model fine-tuned using Hugging Face Transformers and the IMDb dataset (25,000 movie reviews).
The training was done using the `Trainer` API with the following configuration:
- Tokenization with `BertTokenizer` (`bert-base-uncased`), max sequence length of 256.
- Fine-tuned for 3 epochs with learning rate `2e-5` and mixed-precision (fp16).
- Achieved **~91.54% accuracy** and **F1 score of ~91.54%** on the test split.
- **Developed by:** *koushik reddy*
- **Model type:** Transformer-based sequence classifier (`BertForSequenceClassification`)
- **Language(s) (NLP):** English
- **Finetuned from model :** `bert-base-uncased` ([Hugging Face link](https://huggingface.co/bert-base-uncased))
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [https://huggingface.co/koushik-25/bert-imdb-sentiment](https://huggingface.co/koushik-25/bert-imdb-sentiment)
- **Paper :** Original BERT paper: *Devlin et al., 2018* ([https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805))
- **Demo :** You can test it directly using the Inference Widget on the model page.
## Intended Uses & Limitations
- ✅ Intended for sentiment classification of English movie reviews.
- ⚠️ May not generalize well to other domains (e.g., tweets, product reviews) without additional fine-tuning.
- ⚠️ May reflect biases present in the IMDb dataset and the original BERT pre-training corpus.
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
```python
from transformers import BertForSequenceClassification, BertTokenizer
import torch
# Load model from the Hub
model = BertForSequenceClassification.from_pretrained("your-username/bert-imdb-sentiment")
tokenizer = BertTokenizer.from_pretrained("your-username/bert-imdb-sentiment")
# Inference
inputs = tokenizer("The movie was fantastic!", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=1).item()
print(["NEGATIVE", "POSITIVE"][pred])
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
- **Dataset:** IMDb movie reviews (`datasets.load_dataset('imdb')`).
- **Size:** 25,000 training, 25,000 test samples.
- **Preprocessing:** Tokenization with `max_length=256` chosen based on review length histogram.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing
- Text was lowercased automatically because `bert-base-uncased` is a lowercase model.
- Each example was tokenized with padding to `max_length=256` and truncated if longer.
- The dataset was split into train, validation, and test using:
- `train`: 0–20,000 samples from the training set
- `val`: 20,000–25,000 samples from the training set
- `test`: the official IMDb test split
#### Training Hyperparameters
- **Base Model:** `bert-base-uncased`
- **Num Labels:** 2 (binary classification)
- **Batch size:** 4 per device (with gradient accumulation of 16 steps, so effective batch size = 64)
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Optimizer:** AdamW (default in Transformers)
- **Mixed Precision:** fp16 mixed precision training enabled for faster training and reduced memory usage (`fp16=True` in `TrainingArguments`)
- **Scheduler:** Linear learning rate scheduler with warmup (default)
- **Seed:** 224
#### Speeds, Sizes, Times
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
- **Training Time:** Approx. varies by GPU; typically around 15-20 minutes on T4 GPU
- **Checkpoint Size:** ~420MB for `pytorch_model.bin` (BERT base size plus classification head).
- **Total Parameters:** ~110 million.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
- **Dataset:** IMDb test split (25,000 reviews) held out from training.
- **Preprocessing:** Same as training — lowercased, tokenized with `max_length=256`.
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
- This model was evaluated on the overall IMDb test set only. No specific subgroup or domain disaggregation was done.
- The model is expected to generalize well to similar English movie review sentiment but may not be robust to domain shifts.
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
- **Accuracy:** Measures the fraction of correctly classified reviews.
- **F1 Score:** Weighted average F1 across classes to balance precision and recall.
## Evaluation Results
| Metric | Score |
|-----------|---------|
| Accuracy | 91.54% |
| F1 Score | 91.54% |
Evaluated on the IMDb test set.
## Summary
This is a fine-tuned BERT model (`bert-base-uncased`) for binary sentiment analysis on the IMDb movie reviews dataset.
It classifies a given movie review as **positive** or **negative** with an accuracy of **91.54%** and a weighted F1 score of **91.54%** on the test set.
The model was trained using the Hugging Face `transformers` library, with tokenization based on a maximum sequence length of 256 tokens to balance coverage and efficiency.
The model is intended for English movie reviews but may generalize reasonably to similar sentiment analysis tasks on longer-form English text.