bert-cyberbullying-bahasa-classifier

A fine-tuned BERT multilingual classifier for detecting cyberbullying in Bahasa Indonesia. This model performs binary classification:

  • 0 β†’ non-bullying
  • 1 β†’ bullying

βœ… Model Details

Property Value
Model Type BERT (base multilingual)
Task Cyberbullying Detection (Text Classification)
Language Bahasa Indonesia
Labels 0 β€” non-bullying, 1 β€” bullying
Framework Hugging Face Transformers
Files model.safetensors, config.json, tokenizer files

πŸ“š Dataset

This model was trained using a combined dataset, consisting of:

  • Indonesian cyberbullying dataset
  • Additional toxic / abusive comment datasets
  • Social media–style and chat–style text

Preprocessing steps:

  • text normalization
  • emoji removal
  • punctuation cleanup
  • lowercasing
  • label encoding (0 / 1)

Dataset was balanced to reduce bias.


🧠 Training Information

  • Base model: bert-base-multilingual-cased
  • Epochs: 3–5
  • Batch size: 16
  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Loss: Cross Entropy
  • Train/Validation split: 80 / 20

Training was done on a 6GB GPU, optimized for low VRAM.


βœ… How to Use

Python Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "zeltera/bert-cyberbullying-bahasa-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "anjing lu jelek banget"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    label = torch.argmax(logits, dim=1).item()

print("Prediction:", label)  # 1 = bullying

Example Predictions

Text Output
"mampus lu biarin aja" 1 (bullying)
"kamu lagi dimana?" 0 (non-bullying)
"bodoh banget sih" 1 (bullying)
"nice job bro" 0 (non-bullying)

image


πŸ“ˆ Evaluation

image

Metric Score
Accuracy ~0.90
F1 (macro) ~0.88
Precision ~0.89
Recall ~0.87

πŸ—‚οΈ Repository Contents

config.json
model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.txt
README.md

πŸ”§ Intended Use

  • AI chatbots (moderation / filtering)
  • Social media comment analysis
  • Cyberbullying detection systems
  • Student safety applications
  • Research on toxicity detection

⚠️ Limitations

  • Limited sarcasm detection
  • May misclassify unseen slang
  • Works best on Indonesian text
  • Not suitable for legal or high-risk decisions

πŸ“œ License

MIT License


πŸ‘€ Author

Model trained and published by @zeltera Built using Hugging Face Transformers + PyTorch. Contact instagram @gnwnadiwjy

Downloads last month
76
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for zeltera/bert-cyberbullying-bahasa-classifier

Finetuned
(102)
this model

Datasets used to train zeltera/bert-cyberbullying-bahasa-classifier

Space using zeltera/bert-cyberbullying-bahasa-classifier 1