---
title: Vocal Articulation Assessment v2
emoji: 🎤
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: false
license: mit
---

# 🎤 Sistem Penilaian Vokal Indonesia v2.0

Sistem penilaian artikulasi vokal bahasa Indonesia menggunakan **Whisper Large V3** (Indonesian optimized) dan advanced audio signal processing.

## 🌟 Fitur

### Multi-Level Articulation Assessment (5 Levels)

1. **Level 1 - Vokal Tunggal**: A, I, U, E, O
2. **Level 2 - Konsonan Dasar**: BA, PA, DA, TA, KA, dll
3. **Level 3 - Suku Kata**: BA BE BI BO BU, dll
4. **Level 4 - Kata Sulit**: PSIKOLOGI, STRATEGI, dll
5. **Level 5 - Kalimat Kompleks**: Tongue twisters Indonesia

### 6 Comprehensive Metrics

1. **Clarity Score (60% for Level 1)**: Kejelasan pengucapan via Whisper Large V3
2. **Energy Score**: Kualitas volume dan energi suara
3. **Speech Rate (Level 4-5)**: Kecepatan bicara optimal
4. **Pitch Consistency (Level 4-5)**: Stabilitas nada suara
5. **SNR Score**: Signal-to-Noise Ratio (kualitas rekaman)
6. **Articulation Score (15% for Level 1)**: Kejernihan artikulasi spektral

### JSON API (Gradio-based)

Tersedia JSON API dengan structured response untuk integrasi:

- **Tab 1**: UI Assessment (visual interface)
- **Tab 2**: JSON API (RESTful response)
- **Python Client**: `gradio_client` compatible
- **Response Format**: Structured JSON with scores, feedback, suggestions

## 🎯 Optimized Scoring Weights

| Level | Clarity | Articulation | Speech Rate | Pitch | Energy | SNR |
|-------|---------|--------------|-------------|-------|--------|-----|
| 1     | 60%     | 15%          | 0%          | 0%    | 15%    | 10% |
| 2     | 55%     | 20%          | 0%          | 0%    | 15%    | 10% |
| 3     | 50%     | 15%          | 10%         | 5%    | 10%    | 10% |
| 4     | 40%     | 10%          | 20%         | 15%   | 10%    | 5%  |
| 5     | 35%     | 10%          | 25%         | 15%   | 10%    | 5%  |

## 📡 API Usage

### Gradio Python Client

```python
import gradio_client

client = gradio_client.Client("https://huggingface.co/spaces/Cyberlace/latihan-artikulasi")

result = client.predict(
    audio_file="audio.wav",
    target_text="A",
    level=1,
    api_name="/score_audio_api"
)

print(result["data"]["overall"]["score"])  # 95.5
print(result["data"]["transcription"]["detected"])  # "A"
```

### JSON Response Structure

```json
{
  "success": true,
  "data": {
    "overall": {"score": 95.5, "grade": "A", "level": 1},
    "transcription": {"target": "A", "detected": "A", "similarity": 100.0, "wer": 0.0},
    "scores": {...},
    "feedback": {"message": "...", "suggestions": [...]},
    "audio_features": {...}
  }
}
```

## 🚀 Cara Menggunakan

### Di HuggingFace Spaces

1. Upload atau record audio Anda
2. Pilih target vokal (A, I, U, E, O)
3. (Optional) Set expected duration
4. Klik "Nilai Pengucapan"
5. Lihat hasil penilaian dengan grade dan feedback

### Local Development

```bash
# Install dependencies
pip install -r requirements.txt

# Run Gradio App
python app.py

# Or run FastAPI server
python api.py
```

## 📊 Sistem Penilaian

| Grade | Score Range | Keterangan                                         |
| ----- | ----------- | -------------------------------------------------- |
| A     | 90-100      | Sempurna - pengucapan sangat jelas dan akurat      |
| B     | 80-89       | Bagus - pengucapan cukup jelas dengan minor errors |
| C     | 70-79       | Cukup - ada beberapa kesalahan                     |
| D     | 60-69       | Kurang - banyak kesalahan                          |
| E     | <60         | Perlu latihan lebih banyak                         |

## 🔧 Teknologi

- **Model**: HuBERT/Wav2Vec2 fine-tuned untuk klasifikasi vokal Indonesia
- **Backend**: FastAPI
- **Frontend**: Gradio
- **Audio Processing**: librosa, torchaudio
- **Deployment**: HuggingFace Spaces with ZeroGPU

## 📁 Struktur Project

```
.
├── app.py                 # Gradio interface (HF Spaces)
├── api.py                 # FastAPI server
├── scoring_system.py      # Core scoring logic
├── latihan_dasar.py       # Advanced articulation system
├── model_vokal/           # Model checkpoint
│   ├── config.json
│   ├── model.safetensors
│   └── preprocessor_config.json
├── requirements.txt       # Dependencies
└── README.md             # Documentation
```

## 🎯 Roadmap

### Level 1: Pengenalan Vokal ✅

- A, I, U, E, O (Current)

### Level 2-5: Expansi (Coming Soon)

- Level 2: Konsonan Dasar (BA, PA, DA, TA, dll)
- Level 3: Kombinasi Suku Kata (BA-BE-BI-BO-BU, dll)
- Level 4: Kata Sulit (PSIKOLOGI, STRATEGI, dll)
- Level 5: Kalimat Kompleks

## 📝 API Documentation

### FastAPI Endpoints

```bash
# Health check
GET /health

# Get supported labels
GET /labels

# Score single audio
POST /score
- audio: file (required)
- target_label: string (optional)
- expected_duration: float (optional)

# Batch scoring
POST /batch_score
- audios: files (required)
- target_labels: string (optional, comma-separated)
```

### Example cURL

```bash
curl -X POST "http://localhost:8000/score" \
  -F "audio=@test.wav" \
  -F "target_label=a" \
  -F "expected_duration=0.8"
```

## 🤝 Contributing

Contributions are welcome! Terutama untuk:

- Menambah dataset vokal
- Implementasi Level 2-5
- Optimasi model
- UI/UX improvements

## 📄 License

MIT License

## 👥 Author

Dibuat untuk Latihan Dasar Artikulasi Vokal Indonesia

## 🙏 Acknowledgments

- Model base: HuBERT/Wav2Vec2
- Audio processing: librosa
- Framework: FastAPI & Gradio
- Deployment: HuggingFace Spaces