Update README.md

162a970 verified 5 months ago

6.38 kB

	---
	library_name: transformers
	datasets:
	- stanfordnlp/imdb
	metrics:
	- accuracy
	- f1
	base_model:
	- google-bert/bert-base-uncased
	pipeline_tag: text-classification
	---

	# Model Card for bert-imdb-sentiment

	<!-- Provide a quick summary of what the model is/does. -->
	This is a fine-tuned `bert-base-uncased` model for binary sentiment classification on the IMDb movie reviews dataset.
	The model predicts whether a given movie review is positive or negative.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model is a `BertForSequenceClassification` model fine-tuned using Hugging Face Transformers and the IMDb dataset (25,000 movie reviews).
	The training was done using the `Trainer` API with the following configuration:
	- Tokenization with `BertTokenizer` (`bert-base-uncased`), max sequence length of 256.
	- Fine-tuned for 3 epochs with learning rate `2e-5` and mixed-precision (fp16).
	- Achieved ~91.54% accuracy and F1 score of ~91.54% on the test split.

	- Developed by: koushik reddy
	- Model type: Transformer-based sequence classifier (`BertForSequenceClassification`)
	- Language(s) (NLP): English
	- Finetuned from model : `bert-base-uncased` ([Hugging Face link](https://huggingface.co/bert-base-uncased))

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [https://huggingface.co/koushik-25/bert-imdb-sentiment](https://huggingface.co/koushik-25/bert-imdb-sentiment)
	- Paper : Original BERT paper: Devlin et al., 2018 ([https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805))
	- Demo : You can test it directly using the Inference Widget on the model page.

	## Intended Uses & Limitations

	- ✅ Intended for sentiment classification of English movie reviews.
	- ⚠️ May not generalize well to other domains (e.g., tweets, product reviews) without additional fine-tuning.
	- ⚠️ May reflect biases present in the IMDb dataset and the original BERT pre-training corpus.


	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
	```python
	from transformers import BertForSequenceClassification, BertTokenizer
	import torch

	# Load model from the Hub
	model = BertForSequenceClassification.from_pretrained("your-username/bert-imdb-sentiment")
	tokenizer = BertTokenizer.from_pretrained("your-username/bert-imdb-sentiment")

	# Inference
	inputs = tokenizer("The movie was fantastic!", return_tensors="pt")
	with torch.no_grad():
	logits = model(**inputs).logits
	pred = torch.argmax(logits, dim=1).item()
	print(["NEGATIVE", "POSITIVE"][pred])
	```


	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	- Dataset: IMDb movie reviews (`datasets.load_dataset('imdb')`).
	- Size: 25,000 training, 25,000 test samples.
	- Preprocessing: Tokenization with `max_length=256` chosen based on review length histogram.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing

	- Text was lowercased automatically because `bert-base-uncased` is a lowercase model.
	- Each example was tokenized with padding to `max_length=256` and truncated if longer.
	- The dataset was split into train, validation, and test using:
	- `train`: 0–20,000 samples from the training set
	- `val`: 20,000–25,000 samples from the training set
	- `test`: the official IMDb test split


	#### Training Hyperparameters

	- Base Model: `bert-base-uncased`
	- Num Labels: 2 (binary classification)
	- Batch size: 4 per device (with gradient accumulation of 16 steps, so effective batch size = 64)
	- Learning Rate: 2e-5
	- Epochs: 3
	- Optimizer: AdamW (default in Transformers)
	- Mixed Precision: fp16 mixed precision training enabled for faster training and reduced memory usage (`fp16=True` in `TrainingArguments`)
	- Scheduler: Linear learning rate scheduler with warmup (default)
	- Seed: 224



	#### Speeds, Sizes, Times

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
	- Training Time: Approx. varies by GPU; typically around 15-20 minutes on T4 GPU
	- Checkpoint Size: ~420MB for `pytorch_model.bin` (BERT base size plus classification head).
	- Total Parameters: ~110 million.


	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	- Dataset: IMDb test split (25,000 reviews) held out from training.
	- Preprocessing: Same as training — lowercased, tokenized with `max_length=256`.

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
	- This model was evaluated on the overall IMDb test set only. No specific subgroup or domain disaggregation was done.
	- The model is expected to generalize well to similar English movie review sentiment but may not be robust to domain shifts.


	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	- Accuracy: Measures the fraction of correctly classified reviews.
	- F1 Score: Weighted average F1 across classes to balance precision and recall.

	## Evaluation Results

	\| Metric \| Score \|
	\|-----------\|---------\|
	\| Accuracy \| 91.54% \|
	\| F1 Score \| 91.54% \|

	Evaluated on the IMDb test set.

	## Summary

	This is a fine-tuned BERT model (`bert-base-uncased`) for binary sentiment analysis on the IMDb movie reviews dataset.
	It classifies a given movie review as positive or negative with an accuracy of 91.54% and a weighted F1 score of 91.54% on the test set.
	The model was trained using the Hugging Face `transformers` library, with tokenization based on a maximum sequence length of 256 tokens to balance coverage and efficiency.

	The model is intended for English movie reviews but may generalize reasonably to similar sentiment analysis tasks on longer-form English text.