--- license: apache-2.0 language: - ne - en metrics: - accuracy - f1 - precision - recall base_model: sentence-transformers/all-MiniLM-L6-v2 new_version: 1.0.0 pipeline_tag: text-classification library_name: scikit-learn tags: - hybrid-model - logistic-regression - sentence-transformers - sbert - ne-en - rule-based - text-priority - low-resource-nlp - multilingual - civictech - complaint-triage - emergency-detection eval_results: - task: type: text-classification name: Priority Detection (Nepali + English) dataset: name: priority_clean.csv (custom) type: csv size: 266 samples metrics: accuracy: 0.725 f1_macro: 0.72 precision_macro: 0.73 recall_macro: 0.73 per_class: HIGH: precision: 0.73 recall: 0.66 f1: 0.69 MEDIUM: precision: 0.74 recall: 0.8 f1: 0.76 LOW: precision: 0.71 recall: 0.72 f1: 0.71 --- # Priority Classification Model (Nepali + English Hybrid) ## Model Overview This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text. It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets. --- ## Model Architecture | Component | Description | |------------|-------------| | **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | | **Classifier** | Logistic Regression (multiclass, balanced weights) | | **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English | | **Features** | SBERT embeddings + priority keyword preservation | | **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions | --- ## Training Summary | Metric | Value | |---------|-------| | **Total raw samples** | 266 | | **After preprocessing & augmentation** | 594 | | **Train/Test Split** | 445 / 149 | | **Embedding Dimension** | 384 | | **Classes** | `HIGH`, `MEDIUM`, `LOW` | | **Test Accuracy** | **72.5%** | | **Macro F1-score** | **0.72** | ### Label Distribution (After Normalization) | Label | Count | |--------|-------| | HIGH | 203 | | MEDIUM | 29 | | LOW | 34 | ### Label Distribution (After Augmentation) | Label | Count | |--------|-------| | HIGH | 200 | | MEDIUM | 194 | | LOW | 200 | --- ## Classification Report | Class | Precision | Recall | F1 | Support | |--------|------------|--------|----|----------| | HIGH | 0.73 | 0.66 | 0.69 | 50 | | MEDIUM | 0.74 | 0.80 | 0.76 | 49 | | LOW | 0.71 | 0.72 | 0.71 | 50 | | **Overall Accuracy** | | | **0.725** | 149 | **Performance is acceptable (≥70%)** given dataset size. The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases. --- ## Inference (Usage) ### Using the model directly (ML only or Hybrid) ```python from huggingface_hub import hf_hub_download import joblib from priority_det import Embedder, predict_priority # Download the model model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib") # Load the classifier bundle = joblib.load(model_path) clf = bundle["clf"] label_map = bundle["label_map"] # Initialize the embedder embedder = Embedder() # Predict text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।" result = predict_priority(text, embedder, clf, label_map, use_hybrid=True) print(result)