--- library_name: transformers datasets: - stanfordnlp/imdb metrics: - accuracy - f1 base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification --- # Model Card for bert-imdb-sentiment This is a fine-tuned `bert-base-uncased` model for **binary sentiment classification** on the IMDb movie reviews dataset. The model predicts whether a given movie review is **positive** or **negative**. ## Model Details ### Model Description This model is a `BertForSequenceClassification` model fine-tuned using Hugging Face Transformers and the IMDb dataset (25,000 movie reviews). The training was done using the `Trainer` API with the following configuration: - Tokenization with `BertTokenizer` (`bert-base-uncased`), max sequence length of 256. - Fine-tuned for 3 epochs with learning rate `2e-5` and mixed-precision (fp16). - Achieved **~91.54% accuracy** and **F1 score of ~91.54%** on the test split. - **Developed by:** *koushik reddy* - **Model type:** Transformer-based sequence classifier (`BertForSequenceClassification`) - **Language(s) (NLP):** English - **Finetuned from model :** `bert-base-uncased` ([Hugging Face link](https://huggingface.co/bert-base-uncased)) ### Model Sources - **Repository:** [https://huggingface.co/koushik-25/bert-imdb-sentiment](https://huggingface.co/koushik-25/bert-imdb-sentiment) - **Paper :** Original BERT paper: *Devlin et al., 2018* ([https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805)) - **Demo :** You can test it directly using the Inference Widget on the model page. ## Intended Uses & Limitations - ✅ Intended for sentiment classification of English movie reviews. - ⚠️ May not generalize well to other domains (e.g., tweets, product reviews) without additional fine-tuning. - ⚠️ May reflect biases present in the IMDb dataset and the original BERT pre-training corpus. ### Direct Use ```python from transformers import BertForSequenceClassification, BertTokenizer import torch # Load model from the Hub model = BertForSequenceClassification.from_pretrained("your-username/bert-imdb-sentiment") tokenizer = BertTokenizer.from_pretrained("your-username/bert-imdb-sentiment") # Inference inputs = tokenizer("The movie was fantastic!", return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits pred = torch.argmax(logits, dim=1).item() print(["NEGATIVE", "POSITIVE"][pred]) ``` ## Training Details ### Training Data - **Dataset:** IMDb movie reviews (`datasets.load_dataset('imdb')`). - **Size:** 25,000 training, 25,000 test samples. - **Preprocessing:** Tokenization with `max_length=256` chosen based on review length histogram. ### Training Procedure #### Preprocessing - Text was lowercased automatically because `bert-base-uncased` is a lowercase model. - Each example was tokenized with padding to `max_length=256` and truncated if longer. - The dataset was split into train, validation, and test using: - `train`: 0–20,000 samples from the training set - `val`: 20,000–25,000 samples from the training set - `test`: the official IMDb test split #### Training Hyperparameters - **Base Model:** `bert-base-uncased` - **Num Labels:** 2 (binary classification) - **Batch size:** 4 per device (with gradient accumulation of 16 steps, so effective batch size = 64) - **Learning Rate:** 2e-5 - **Epochs:** 3 - **Optimizer:** AdamW (default in Transformers) - **Mixed Precision:** fp16 mixed precision training enabled for faster training and reduced memory usage (`fp16=True` in `TrainingArguments`) - **Scheduler:** Linear learning rate scheduler with warmup (default) - **Seed:** 224 #### Speeds, Sizes, Times - **Training Time:** Approx. varies by GPU; typically around 15-20 minutes on T4 GPU - **Checkpoint Size:** ~420MB for `pytorch_model.bin` (BERT base size plus classification head). - **Total Parameters:** ~110 million. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - **Dataset:** IMDb test split (25,000 reviews) held out from training. - **Preprocessing:** Same as training — lowercased, tokenized with `max_length=256`. #### Factors - This model was evaluated on the overall IMDb test set only. No specific subgroup or domain disaggregation was done. - The model is expected to generalize well to similar English movie review sentiment but may not be robust to domain shifts. #### Metrics - **Accuracy:** Measures the fraction of correctly classified reviews. - **F1 Score:** Weighted average F1 across classes to balance precision and recall. ## Evaluation Results | Metric | Score | |-----------|---------| | Accuracy | 91.54% | | F1 Score | 91.54% | Evaluated on the IMDb test set. ## Summary This is a fine-tuned BERT model (`bert-base-uncased`) for binary sentiment analysis on the IMDb movie reviews dataset. It classifies a given movie review as **positive** or **negative** with an accuracy of **91.54%** and a weighted F1 score of **91.54%** on the test set. The model was trained using the Hugging Face `transformers` library, with tokenization based on a maximum sequence length of 256 tokens to balance coverage and efficiency. The model is intended for English movie reviews but may generalize reasonably to similar sentiment analysis tasks on longer-form English text.