nahiar commited on
Commit
8ac6dd4
·
verified ·
1 Parent(s): 82d663b

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +91 -3
  2. config.json +1 -3
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - id
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - indonesian
9
+ - indonesia
10
+ - topic-classification
11
+ - bert
12
+ datasets:
13
+ - custom
14
+ inference: true
15
+ model-index:
16
+ - name: BERT Indonesian Topic Classification (16 labels)
17
+ results:
18
+ - task:
19
+ type: text-classification
20
+ name: Topic Classification
21
+ dataset:
22
+ name: Custom Dataset (ID)
23
+ type: custom
24
+ split: validation
25
+ metrics:
26
+ - type: accuracy
27
+ value: { { ACCURACY } }
28
+ - type: f1
29
+ name: f1_macro
30
+ value: { { F1_MACRO } }
31
+ - type: f1
32
+ name: f1_micro
33
+ value: { { F1_MICRO } }
34
+ ---
35
+
36
+ # BERT Indonesian Topic Classification (16 labels)
37
+
38
+ **Base model**: `cahya/bert-base-indonesian-1.5G`
39
+ **Task**: Topic classification (single-label)
40
+ **Labels (16)**: {{LABELS_INLINE}}
41
+
42
+ ![Confusion Matrix](./confusion_matrix.png)
43
+
44
+ ## Intended use
45
+
46
+ - Klasifikasi topik untuk teks berbahasa Indonesia pada domain umum.
47
+
48
+ ## Limitations
49
+
50
+ - Performa bergantung pada distribusi label dataset Anda.
51
+ - Teks OOD (di luar domain data latih) bisa turun akurasinya.
52
+
53
+ ## Training details
54
+
55
+ - Framework: 🤗 Transformers (PyTorch)
56
+ - Max length: {{MAX_LEN}}
57
+ - Batch size: {{BATCH_SIZE}}
58
+ - Epochs: {{EPOCHS}}
59
+ - Learning rate: {{LR}}
60
+ - Weight decay: {{WEIGHT_DECAY}}
61
+ - Warmup ratio: {{WARMUP_RATIO}}
62
+ - Scheduler: linear
63
+ - Mixed precision: {{AMP_FLAG}}
64
+
65
+ ## Evaluation
66
+
67
+ - Split: 80/20 stratified
68
+ - Accuracy (val): **{{ACCURACY}}**
69
+ - F1 Macro (val): **{{F1_MACRO}}**
70
+ - F1 Micro (val): **{{F1_MICRO}}**
71
+
72
+ Per-label report tersedia pada artifact `eval_results.json`.
73
+
74
+ ## How to use
75
+
76
+ ```python
77
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
78
+ import torch
79
+
80
+ repo_id = "{{REPO_ID}}" # ganti dgn nama repo kamu, mis: nahiar/indonlp-topic-bert
81
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
82
+ model = AutoModelForSequenceClassification.from_pretrained(repo_id).eval()
83
+
84
+ text = "Harga beras naik akibat distribusi yang terganggu."
85
+ inputs = tokenizer(text, return_tensors="pt")
86
+ with torch.no_grad():
87
+ logits = model(**inputs).logits
88
+ pred_id = logits.argmax(-1).item()
89
+ label = model.config.id2label[pred_id]
90
+ print(label)
91
+ ```
config.json CHANGED
@@ -1,7 +1,5 @@
1
  {
2
- "architectures": [
3
- "BertForSequenceClassification"
4
- ],
5
  "attention_probs_dropout_prob": 0.1,
6
  "classifier_dropout": null,
7
  "gradient_checkpointing": false,
 
1
  {
2
+ "architectures": ["BertForSequenceClassification"],
 
 
3
  "attention_probs_dropout_prob": 0.1,
4
  "classifier_dropout": null,
5
  "gradient_checkpointing": false,