---
license: apache-2.0
base_model: uitnlp/visobert
tags:
- vietnamese
- spam-detection
- text-classification
- e-commerce
datasets:
- ViSpamReviews
metrics:
- accuracy
- macro-f1
- macro-precision
- macro-recall
model-index:
- name: visobert-spam-binary
  results:
  - task:
      type: text-classification
      name: Spam Review Detection
    dataset:
      name: ViSpamReviews
      type: ViSpamReviews
    metrics:
      - type: accuracy
        value: 0.9144
      - type: macro-f1
        value: 0.8916
---
# visobert-spam-binary: Spam Review Detection for Vietnamese Text

This model is a fine-tuned version of [uitnlp/visobert](https://huggingface.co/uitnlp/visobert) on the **ViSpamReviews** dataset for spam review detection in Vietnamese e-commerce reviews.

## Model Details

* **Base Model**: `uitnlp/visobert`
* **Description**: ViSoBERT - Vietnamese Social BERT
* **Dataset**: ViSpamReviews (Vietnamese Spam Review Dataset)
* **Fine-tuning Framework**: HuggingFace Transformers
* **Task**: Spam Review Detection (binary)
* **Number of Classes**: 2

### Hyperparameters

* Max sequence length: `256`
* Learning rate: `5e-5`
* Batch size: `32`
* Epochs: `100`
* Early stopping patience: `5`

## Dataset

The model was trained on the **ViSpamReviews** dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:

* **Train set**: 14,299 samples (72%)
* **Validation set**: 1,590 samples (8%)
* **Test set**: 3,971 samples (20%)

### Label Distribution


* **Non-spam** (0): Genuine product reviews
* **Spam** (1): Fake or promotional reviews

## Results

The model was evaluated on the test set with the following metrics:

* **Accuracy**: `0.9144`
* **Macro-F1**: `0.8916`


## Usage

You can use this model for spam review detection in Vietnamese text. Below is an example:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "visolex/visobert-spam-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example review text
text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"

# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predicted_class = outputs.logits.argmax(dim=-1).item()
    probabilities = torch.softmax(outputs.logits, dim=-1)


# Map to label
label_map = {0: "Non-spam", 1: "Spam"}
predicted_label = label_map[predicted_class]
confidence = probabilities[0][predicted_class].item()

print(f"Text: {text}")
print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")

```

## Citation

If you use this model, please cite:

```bibtex
@misc{{
  {model_key}_spam_detection,
  title={{{description}}},
  author={{ViSoLex Team}},
  year={{2025}},
  howpublished={{\url{{https://huggingface.co/{visolex/visobert-spam-binary}}}}}
}}
```

## License

This model is released under the Apache-2.0 license.

## Acknowledgments

* Base model: [{base_model}](https://huggingface.co/{base_model})
* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
* ViSoLex Toolkit for Vietnamese NLP