nahiar
/

BERT-topic-modelling-v1

Text Classification

topic-classification

Model card Files Files and versions

nahiar commited on Sep 8

Commit

8ac6dd4

·

verified ·

1 Parent(s): 82d663b

Upload folder using huggingface_hub

Files changed (2) hide show

README.md +91 -3
config.json +1 -3

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
----
-license: mit
----

+---
+license: mit
+language:
+  - id
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+  - indonesian
+  - indonesia
+  - topic-classification
+  - bert
+datasets:
+  - custom
+inference: true
+model-index:
+  - name: BERT Indonesian Topic Classification (16 labels)
+    results:
+      - task:
+          type: text-classification
+          name: Topic Classification
+        dataset:
+          name: Custom Dataset (ID)
+          type: custom
+          split: validation
+        metrics:
+          - type: accuracy
+            value: { { ACCURACY } }
+          - type: f1
+            name: f1_macro
+            value: { { F1_MACRO } }
+          - type: f1
+            name: f1_micro
+            value: { { F1_MICRO } }
+---
+# BERT Indonesian Topic Classification (16 labels)
+**Base model**: `cahya/bert-base-indonesian-1.5G`
+**Task**: Topic classification (single-label)
+**Labels (16)**: {{LABELS_INLINE}}
+![Confusion Matrix](./confusion_matrix.png)
+## Intended use
+- Klasifikasi topik untuk teks berbahasa Indonesia pada domain umum.
+## Limitations
+- Performa bergantung pada distribusi label dataset Anda.
+- Teks OOD (di luar domain data latih) bisa turun akurasinya.
+## Training details
+- Framework: 🤗 Transformers (PyTorch)
+- Max length: {{MAX_LEN}}
+- Batch size: {{BATCH_SIZE}}
+- Epochs: {{EPOCHS}}
+- Learning rate: {{LR}}
+- Weight decay: {{WEIGHT_DECAY}}
+- Warmup ratio: {{WARMUP_RATIO}}
+- Scheduler: linear
+- Mixed precision: {{AMP_FLAG}}
+## Evaluation
+- Split: 80/20 stratified
+- Accuracy (val): **{{ACCURACY}}**
+- F1 Macro (val): **{{F1_MACRO}}**
+- F1 Micro (val): **{{F1_MICRO}}**
+Per-label report tersedia pada artifact `eval_results.json`.
+## How to use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+repo_id = "{{REPO_ID}}"  # ganti dgn nama repo kamu, mis: nahiar/indonlp-topic-bert
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForSequenceClassification.from_pretrained(repo_id).eval()
+text = "Harga beras naik akibat distribusi yang terganggu."
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+pred_id = logits.argmax(-1).item()
+label = model.config.id2label[pred_id]
+print(label)
+```

config.json CHANGED Viewed

@@ -1,7 +1,5 @@
 {
-  "architectures": [
-    "BertForSequenceClassification"
-  ],
   "attention_probs_dropout_prob": 0.1,
   "classifier_dropout": null,
   "gradient_checkpointing": false,

 {
+  "architectures": ["BertForSequenceClassification"],
   "attention_probs_dropout_prob": 0.1,
   "classifier_dropout": null,
   "gradient_checkpointing": false,