leorigasaki54's picture
Upload mini sentiment transformer
fc534c9 verified
metadata
language: en
license: mit
library_name: transformers
tags:
  - sentiment-analysis
  - text-classification
  - transformers
  - mini-transformer
datasets:
  - glue/sst2
model-index:
  - name: mini-sentiment-transformer
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          name: SST-2
          type: glue
          args: sst2
        metrics:
          - type: accuracy
            value: 0.8154
            name: Validation Accuracy

Mini Sentiment Transformer

This is a tiny transformer model for sentiment analysis, created as a learning project to understand transformer architecture. It's much smaller than BERT or DistilBERT, with only around 4,188,802 parameters.

Model Details

  • Developed by: leorigasaki54
  • Type: Text Classification (Sentiment Analysis)
  • Language: English
  • Training Data: SST-2 (Stanford Sentiment Treebank)
  • Size: 4,188,802 parameters (4.19M)
  • Architecture:
    • 2 transformer encoder layers
    • 2 attention heads per layer
    • 128 embedding dimensions
    • 256 feed-forward dimensions

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")  # We use DistilBERT tokenizer
model = AutoModelForSequenceClassification.from_pretrained("leorigasaki54/mini-sentiment-transformer")

# Prepare input
text = "I really enjoyed this movie!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = F.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1).item()

sentiment = "Positive" if prediction == 1 else "Negative"
confidence = probabilities[0][prediction].item()

print(f"Sentiment: {sentiment} (confidence: {confidence:.4f})")

Limitations

  • This is a minimal implementation meant for educational purposes
  • Performance may be lower than larger models like BERT or DistilBERT
  • The model has been trained only on movie reviews and may not generalize well to other domains
  • Limited to English language text only

Training

The model was trained on the SST-2 dataset for 5 epochs using Adam optimizer with a learning rate of 5e-5.