--- title: Vocal Articulation Assessment v2 emoji: 🎤 colorFrom: purple colorTo: pink sdk: gradio sdk_version: 5.0.0 app_file: app.py pinned: false license: mit --- # 🎤 Sistem Penilaian Vokal Indonesia v2.0 Sistem penilaian artikulasi vokal bahasa Indonesia menggunakan **Whisper Large V3** (Indonesian optimized) dan advanced audio signal processing. ## 🌟 Fitur ### Multi-Level Articulation Assessment (5 Levels) 1. **Level 1 - Vokal Tunggal**: A, I, U, E, O 2. **Level 2 - Konsonan Dasar**: BA, PA, DA, TA, KA, dll 3. **Level 3 - Suku Kata**: BA BE BI BO BU, dll 4. **Level 4 - Kata Sulit**: PSIKOLOGI, STRATEGI, dll 5. **Level 5 - Kalimat Kompleks**: Tongue twisters Indonesia ### 6 Comprehensive Metrics 1. **Clarity Score (60% for Level 1)**: Kejelasan pengucapan via Whisper Large V3 2. **Energy Score**: Kualitas volume dan energi suara 3. **Speech Rate (Level 4-5)**: Kecepatan bicara optimal 4. **Pitch Consistency (Level 4-5)**: Stabilitas nada suara 5. **SNR Score**: Signal-to-Noise Ratio (kualitas rekaman) 6. **Articulation Score (15% for Level 1)**: Kejernihan artikulasi spektral ### JSON API (Gradio-based) Tersedia JSON API dengan structured response untuk integrasi: - **Tab 1**: UI Assessment (visual interface) - **Tab 2**: JSON API (RESTful response) - **Python Client**: `gradio_client` compatible - **Response Format**: Structured JSON with scores, feedback, suggestions ## 🎯 Optimized Scoring Weights | Level | Clarity | Articulation | Speech Rate | Pitch | Energy | SNR | |-------|---------|--------------|-------------|-------|--------|-----| | 1 | 60% | 15% | 0% | 0% | 15% | 10% | | 2 | 55% | 20% | 0% | 0% | 15% | 10% | | 3 | 50% | 15% | 10% | 5% | 10% | 10% | | 4 | 40% | 10% | 20% | 15% | 10% | 5% | | 5 | 35% | 10% | 25% | 15% | 10% | 5% | ## 📡 API Usage ### Gradio Python Client ```python import gradio_client client = gradio_client.Client("https://huggingface.co/spaces/Cyberlace/latihan-artikulasi") result = client.predict( audio_file="audio.wav", target_text="A", level=1, api_name="/score_audio_api" ) print(result["data"]["overall"]["score"]) # 95.5 print(result["data"]["transcription"]["detected"]) # "A" ``` ### JSON Response Structure ```json { "success": true, "data": { "overall": {"score": 95.5, "grade": "A", "level": 1}, "transcription": {"target": "A", "detected": "A", "similarity": 100.0, "wer": 0.0}, "scores": {...}, "feedback": {"message": "...", "suggestions": [...]}, "audio_features": {...} } } ``` ## 🚀 Cara Menggunakan ### Di HuggingFace Spaces 1. Upload atau record audio Anda 2. Pilih target vokal (A, I, U, E, O) 3. (Optional) Set expected duration 4. Klik "Nilai Pengucapan" 5. Lihat hasil penilaian dengan grade dan feedback ### Local Development ```bash # Install dependencies pip install -r requirements.txt # Run Gradio App python app.py # Or run FastAPI server python api.py ``` ## 📊 Sistem Penilaian | Grade | Score Range | Keterangan | | ----- | ----------- | -------------------------------------------------- | | A | 90-100 | Sempurna - pengucapan sangat jelas dan akurat | | B | 80-89 | Bagus - pengucapan cukup jelas dengan minor errors | | C | 70-79 | Cukup - ada beberapa kesalahan | | D | 60-69 | Kurang - banyak kesalahan | | E | <60 | Perlu latihan lebih banyak | ## 🔧 Teknologi - **Model**: HuBERT/Wav2Vec2 fine-tuned untuk klasifikasi vokal Indonesia - **Backend**: FastAPI - **Frontend**: Gradio - **Audio Processing**: librosa, torchaudio - **Deployment**: HuggingFace Spaces with ZeroGPU ## 📁 Struktur Project ``` . ├── app.py # Gradio interface (HF Spaces) ├── api.py # FastAPI server ├── scoring_system.py # Core scoring logic ├── latihan_dasar.py # Advanced articulation system ├── model_vokal/ # Model checkpoint │ ├── config.json │ ├── model.safetensors │ └── preprocessor_config.json ├── requirements.txt # Dependencies └── README.md # Documentation ``` ## 🎯 Roadmap ### Level 1: Pengenalan Vokal ✅ - A, I, U, E, O (Current) ### Level 2-5: Expansi (Coming Soon) - Level 2: Konsonan Dasar (BA, PA, DA, TA, dll) - Level 3: Kombinasi Suku Kata (BA-BE-BI-BO-BU, dll) - Level 4: Kata Sulit (PSIKOLOGI, STRATEGI, dll) - Level 5: Kalimat Kompleks ## 📝 API Documentation ### FastAPI Endpoints ```bash # Health check GET /health # Get supported labels GET /labels # Score single audio POST /score - audio: file (required) - target_label: string (optional) - expected_duration: float (optional) # Batch scoring POST /batch_score - audios: files (required) - target_labels: string (optional, comma-separated) ``` ### Example cURL ```bash curl -X POST "http://localhost:8000/score" \ -F "audio=@test.wav" \ -F "target_label=a" \ -F "expected_duration=0.8" ``` ## 🤝 Contributing Contributions are welcome! Terutama untuk: - Menambah dataset vokal - Implementasi Level 2-5 - Optimasi model - UI/UX improvements ## 📄 License MIT License ## 👥 Author Dibuat untuk Latihan Dasar Artikulasi Vokal Indonesia ## 🙏 Acknowledgments - Model base: HuBERT/Wav2Vec2 - Audio processing: librosa - Framework: FastAPI & Gradio - Deployment: HuggingFace Spaces