Mini Sentiment Transformer

This is a tiny transformer model for sentiment analysis, created as a learning project to understand transformer architecture. It's much smaller than BERT or DistilBERT, with only around 4,188,802 parameters.

Model Details

Developed by: leorigasaki54
Type: Text Classification (Sentiment Analysis)
Language: English
Training Data: SST-2 (Stanford Sentiment Treebank)
Size: 4,188,802 parameters (4.19M)
Architecture:
- 2 transformer encoder layers
- 2 attention heads per layer
- 128 embedding dimensions
- 256 feed-forward dimensions

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")  # We use DistilBERT tokenizer
model = AutoModelForSequenceClassification.from_pretrained("leorigasaki54/mini-sentiment-transformer")

# Prepare input
text = "I really enjoyed this movie!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = F.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1).item()

sentiment = "Positive" if prediction == 1 else "Negative"
confidence = probabilities[0][prediction].item()

print(f"Sentiment: {sentiment} (confidence: {confidence:.4f})")

Limitations

This is a minimal implementation meant for educational purposes
Performance may be lower than larger models like BERT or DistilBERT
The model has been trained only on movie reviews and may not generalize well to other domains
Limited to English language text only

Training

The model was trained on the SST-2 dataset for 5 epochs using Adam optimizer with a learning rate of 5e-5.

Downloads last month: 2

Evaluation results

Validation Accuracy on SST-2
self-reported

0.815

View on Papers With Code