Mini Sentiment Transformer
This is a tiny transformer model for sentiment analysis, created as a learning project to understand transformer architecture. It's much smaller than BERT or DistilBERT, with only around 4,188,802 parameters.
Model Details
- Developed by: leorigasaki54
- Type: Text Classification (Sentiment Analysis)
- Language: English
- Training Data: SST-2 (Stanford Sentiment Treebank)
- Size: 4,188,802 parameters (4.19M)
- Architecture:
- 2 transformer encoder layers
- 2 attention heads per layer
- 128 embedding dimensions
- 256 feed-forward dimensions
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") # We use DistilBERT tokenizer
model = AutoModelForSequenceClassification.from_pretrained("leorigasaki54/mini-sentiment-transformer")
# Prepare input
text = "I really enjoyed this movie!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
probabilities = F.softmax(outputs.logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1).item()
sentiment = "Positive" if prediction == 1 else "Negative"
confidence = probabilities[0][prediction].item()
print(f"Sentiment: {sentiment} (confidence: {confidence:.4f})")
Limitations
- This is a minimal implementation meant for educational purposes
- Performance may be lower than larger models like BERT or DistilBERT
- The model has been trained only on movie reviews and may not generalize well to other domains
- Limited to English language text only
Training
The model was trained on the SST-2 dataset for 5 epochs using Adam optimizer with a learning rate of 5e-5.
- Downloads last month
- 2