--- language: en license: mit library_name: transformers tags: - sentiment-analysis - text-classification - transformers - mini-transformer datasets: - glue/sst2 model-index: - name: mini-sentiment-transformer results: - task: type: text-classification name: Sentiment Analysis dataset: name: SST-2 type: glue args: sst2 metrics: - type: accuracy value: 0.8154 name: Validation Accuracy --- # Mini Sentiment Transformer This is a tiny transformer model for sentiment analysis, created as a learning project to understand transformer architecture. It's much smaller than BERT or DistilBERT, with only around 4,188,802 parameters. ## Model Details - Developed by: leorigasaki54 - Type: Text Classification (Sentiment Analysis) - Language: English - Training Data: SST-2 (Stanford Sentiment Treebank) - Size: 4,188,802 parameters (4.19M) - Architecture: - 2 transformer encoder layers - 2 attention heads per layer - 128 embedding dimensions - 256 feed-forward dimensions ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch.nn.functional as F # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") # We use DistilBERT tokenizer model = AutoModelForSequenceClassification.from_pretrained("leorigasaki54/mini-sentiment-transformer") # Prepare input text = "I really enjoyed this movie!" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64) # Make prediction with torch.no_grad(): outputs = model(**inputs) probabilities = F.softmax(outputs.logits, dim=-1) prediction = torch.argmax(probabilities, dim=-1).item() sentiment = "Positive" if prediction == 1 else "Negative" confidence = probabilities[0][prediction].item() print(f"Sentiment: {sentiment} (confidence: {confidence:.4f})") ``` ## Limitations - This is a minimal implementation meant for educational purposes - Performance may be lower than larger models like BERT or DistilBERT - The model has been trained only on movie reviews and may not generalize well to other domains - Limited to English language text only ## Training The model was trained on the SST-2 dataset for 5 epochs using Adam optimizer with a learning rate of 5e-5.