---
language:
- zh
tags:
- text-classification
- chinese
- traditional-chinese
- bert
- pytorch
license: apache-2.0
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: bert-traditional-chinese-classifier
  results:
  - task:
      type: text-classification
      name: Traditional Chinese Classification
    metrics:
    - type: accuracy
      value: 0.8771
      name: Accuracy
    - type: f1
      value: 0.8771
      name: F1 Score
---

# BERT Traditional Chinese Classifier v7

這是一個用於區分大陸繁體和台灣繁體的 BERT 分類模型。

## 模型描述

- **基礎模型**: ckiplab/bert-base-chinese
- **任務**: 繁體中文文本分類（大陸繁體 vs 台灣繁體）
- **準確率**: 87.71%
- **訓練數據量**: 156824 樣本

## 特點

- ✅ 支持長文本處理（最大長度 384 tokens）
- ✅ 使用 Focal Loss 處理類別不平衡
- ✅ Multi-Sample Dropout 提高泛化能力
- ✅ 分層學習率優化
- ✅ 漸進解凍訓練策略

## 使用方法

```python
from transformers import AutoTokenizer, AutoModel
import torch

# 載入模型和 tokenizer
tokenizer = AutoTokenizer.from_pretrained("renhehuang/bert-traditional-chinese-classifier")
model = torch.load("pytorch_model.bin")  # 需要自定義模型類

# 預測
text = "您的繁體中文文本"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()

# 0: 大陸繁體, 1: 台灣繁體
label = "大陸繁體" if prediction == 0 else "台灣繁體"
print(f"預測: {label}")
```

## 訓練配置

- **Batch Size**: 16
- **Learning Rate**: 2e-05 (base), 4e-05 (head)
- **Epochs**: 4
- **Max Length**: 384
- **Loss Function**: Focal Loss (gamma=2.0)

## 性能指標

### 整體性能
- 準確率: 87.71%

### 分層性能（按文本長度）
詳見評估報告

## 引用

如果您使用此模型，請引用：

```
@misc{bert-traditional-chinese-classifier,
  author = {renhehuang},
  title = {BERT Traditional Chinese Classifier},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/renhehuang/bert-traditional-chinese-classifier}}
}
```

## 授權

Apache 2.0

## 聯繫方式

如有問題，請在 Hugging Face 模型頁面或 GitHub 上提出 issue。