Spaces:

Cyberlace
/

api-swara-audio-analysis

Sleeping

App Files Files Community

fariedalfarizi commited on 9 days ago

Commit

1e0d7f9

1 Parent(s): 82e47b6

Set all analysis default TRUE and cleanup unused files

Browse files

Files changed (5) hide show

.gitignore +9 -0
app/api/routes.py +4 -4
backup_old_files/REDIS_CONFIG_NOTES.md +0 -312
tempo.py +0 -154
upload_model_to_hf.py +0 -175

.gitignore CHANGED Viewed

@@ -63,3 +63,12 @@ Thumbs.db
 # Jupyter
 .ipynb_checkpoints/
 *.ipynb

 # Jupyter
 .ipynb_checkpoints/
 *.ipynb
+# Local testing files only
+local_only/
+test_*.py
+*.bat
+ffmpeg.exe
+tempo.py
+upload_model_to_hf.py
+backup_old_files/

app/api/routes.py CHANGED Viewed

@@ -27,8 +27,8 @@ async def analyze_audio(
     analyze_tempo: bool = Form(True),
     analyze_articulation: bool = Form(True),
     analyze_structure: bool = Form(True),
-    analyze_keywords: bool = Form(False),
-    analyze_profanity: bool = Form(False)
 ):
     """
     Submit audio file untuk analisis
@@ -42,8 +42,8 @@ async def analyze_audio(
     - analyze_tempo: Analisis tempo (default: true)
     - analyze_articulation: Analisis artikulasi (default: true)
     - analyze_structure: Analisis struktur (default: true)
-    - analyze_keywords: Analisis kata kunci (default: false)
-    - analyze_profanity: Deteksi kata tidak senonoh (default: false)
     Returns task_id yang bisa digunakan untuk check status
     """

     analyze_tempo: bool = Form(True),
     analyze_articulation: bool = Form(True),
     analyze_structure: bool = Form(True),
+    analyze_keywords: bool = Form(True),  # Default TRUE
+    analyze_profanity: bool = Form(True)  # Default TRUE
 ):
     """
     Submit audio file untuk analisis
     - analyze_tempo: Analisis tempo (default: true)
     - analyze_articulation: Analisis artikulasi (default: true)
     - analyze_structure: Analisis struktur (default: true)
+    - analyze_keywords: Analisis kata kunci (default: true) - otomatis skip jika tidak ada topic_id/custom_keywords
+    - analyze_profanity: Deteksi kata tidak senonoh (default: true)
     Returns task_id yang bisa digunakan untuk check status
     """

backup_old_files/REDIS_CONFIG_NOTES.md DELETED Viewed

@@ -1,312 +0,0 @@
-# 🔴 Redis Configuration - Technical Notes
-## ✅ Configuration Summary
-Konfigurasi Redis untuk Swara API sudah **BENAR** dan siap untuk deployment ke Hugging Face Spaces.
----
-## 📋 Redis Settings
-### 1. **Configuration File** (`app/config.py`)
-```python
-REDIS_HOST: str = os.getenv("REDIS_HOST", "localhost")
-REDIS_PORT: int = int(os.getenv("REDIS_PORT", "6379"))
-REDIS_DB: int = int(os.getenv("REDIS_DB", "0"))
-REDIS_PASSWORD: str = os.getenv("REDIS_PASSWORD", "")
-```
-✅ **Correct**: Defaults ke `localhost:6379` untuk single-container deployment
----
-### 2. **Redis Client** (`app/core/redis_client.py`)
-**FIXED Issues:**
-- ❌ **Before**: `decode_responses=True` → Caused RQ errors
-- ✅ **After**: Removed `decode_responses` → RQ compatible
-**Current Configuration:**
-```python
-def get_redis_connection():
-    redis_kwargs = {
-        'host': settings.REDIS_HOST,
-        'port': settings.REDIS_PORT,
-        'db': settings.REDIS_DB,
-    }
-    if settings.REDIS_PASSWORD:
-        redis_kwargs['password'] = settings.REDIS_PASSWORD
-    return redis.Redis(**redis_kwargs)  # No decode_responses!
-```
-✅ **Benefits:**
-- Compatible with RQ (Redis Queue)
-- Proper bytes handling
-- Password support (optional)
-- Clean connection management
-**New Functions:**
-```python
-def check_redis_connection():
-    """Health check function"""
-    try:
-        conn = get_redis_connection()
-        conn.ping()
-        return True, None
-    except Exception as e:
-        return False, str(e)
-```
-✅ **Use case**: Health checks & startup validation
----
-### 3. **Startup Script** (`start.sh`)
-**Improvements Made:**
-**Before:**
-```bash
-redis-server --daemonize yes
-until redis-cli ping; do
-  echo "Waiting for Redis..."
-  sleep 1
-done
-```
-**After:**
-```bash
-# Set environment variables
-export REDIS_HOST=localhost
-export REDIS_PORT=6379
-export REDIS_DB=0
-# Start Redis with specific binding
-redis-server --daemonize yes --bind 127.0.0.1 --port 6379
-# Wait with timeout
-REDIS_TIMEOUT=30
-until redis-cli -h localhost -p 6379 ping 2>/dev/null | grep -q PONG; do
-  if [ $ELAPSED -ge $REDIS_TIMEOUT ]; then
-    echo "ERROR: Redis failed to start"
-    exit 1
-  fi
-  sleep 2
-done
-```
-✅ **Improvements:**
-- Environment variables explicitly set
-- Timeout protection (30s max)
-- Specific binding to localhost
-- Better error handling
-- Clearer logging
----
-### 4. **Worker** (`app/worker.py`)
-**Added Retry Logic:**
-```python
-def run_worker():
-    # Wait for Redis with retries
-    max_retries = 30
-    for attempt in range(1, max_retries + 1):
-        is_connected, error_msg = check_redis_connection()
-        if is_connected:
-            break
-        time.sleep(2)
-    # Then start worker
-    worker = Worker([queue], connection=redis_conn)
-    worker.work()
-```
-✅ **Benefits:**
-- Graceful startup
-- Handles Redis not ready yet
-- Clear error messages
-- Auto-retry mechanism
----
-### 5. **Health Check** (`app/api/routes.py`)
-**Improved Endpoint:**
-```python
-@router.get("/health")
-async def health_check():
-    is_connected, error_msg = check_redis_connection()
-    return {
-        "status": "healthy" if is_connected else "degraded",
-        "redis": "healthy" if is_connected else f"unhealthy: {error_msg}",
-        "version": settings.VERSION
-    }
-```
-✅ **Benefits:**
-- Real-time Redis status
-- Degraded state detection
-- Useful for monitoring
----
-## 🏗️ Architecture for HF Spaces
-```
-┌─────────────────────────────────────────┐
-│   Hugging Face Space (Single Container) │
-│                                          │
-│  ┌──────────────────────────────────┐  │
-│  │  Redis Server (localhost:6379)    │  │
-│  │  - In-memory data store           │  │
-│  │  - Task queue                     │  │
-│  │  - Result storage (24h TTL)       │  │
-│  └─────────┬────────────────────────┘  │
-│            │                             │
-│  ┌─────────▼───────────┐                │
-│  │  RQ Worker          │                │
-│  │  - Process tasks    │                │
-│  │  - Run AI models    │                │
-│  └─────────┬───────────┘                │
-│            │                             │
-│  ┌─────────▼───────────┐                │
-│  │  FastAPI App        │                │
-│  │  - REST API         │                │
-│  │  - Port 7860        │                │
-│  └─────────────────────┘                │
-│                                          │
-└──────────────────────────────────────────┘
-         ▲
-         │ HTTP Requests
-         │
-    ┌────┴─────┐
-    │  Client  │
-    └──────────┘
-```
----
-## 🔍 Configuration Validation
-### Check 1: Environment Variables
-```bash
-# In HF Spaces, these are auto-set by start.sh:
-REDIS_HOST=localhost
-REDIS_PORT=6379
-REDIS_DB=0
-```
-✅ **Status**: Configured in `start.sh`
-### Check 2: Redis Connection
-```python
-# Test connection
-from app.core.redis_client import check_redis_connection
-is_connected, error = check_redis_connection()
-print(f"Connected: {is_connected}")
-```
-✅ **Status**: Function available
-### Check 3: Queue Setup
-```python
-# Test queue
-from app.core.redis_client import get_queue
-queue = get_queue()
-print(f"Queue: {queue.name}")
-```
-✅ **Status**: Queue name: `audio_analysis`
----
-## 🚨 Common Issues & Solutions
-### Issue 1: "Connection refused"
-**Cause**: Redis not started yet
-**Solution**: ✅ Fixed with retry logic in worker
-### Issue 2: "decode_responses error"
-**Cause**: RQ doesn't support `decode_responses=True`
-**Solution**: ✅ Fixed by removing from connection
-### Issue 3: Worker timeout
-**Cause**: Long-running tasks
-**Solution**: ✅ Set `JOB_TIMEOUT=3600` (1 hour)
-### Issue 4: Results disappear
-**Cause**: Default TTL too short
-**Solution**: ✅ Set `RESULT_TTL=86400` (24 hours)
----
-## 📊 Redis Performance Settings
-### Current Settings:
-```python
-QUEUE_NAME: str = "audio_analysis"
-JOB_TIMEOUT: int = 3600        # 1 hour
-RESULT_TTL: int = 86400        # 24 hours
-```
-### Recommended for Production:
-```python
-# For high traffic:
-RESULT_TTL: int = 3600         # 1 hour (save memory)
-# For long audio:
-JOB_TIMEOUT: int = 7200        # 2 hours
-```
----
-## ✅ Final Checklist
-- [x] Redis connection without `decode_responses`
-- [x] Environment variables in `start.sh`
-- [x] Retry logic in worker
-- [x] Health check endpoint
-- [x] Timeout protection
-- [x] Error handling
-- [x] Graceful startup sequence
-- [x] Proper binding to localhost
-- [x] TTL configuration
----
-## 🎯 Status: READY FOR DEPLOYMENT
-Semua konfigurasi Redis sudah **BENAR** dan **OPTIMAL** untuk:
-- ✅ Hugging Face Spaces (single container)
-- ✅ Local development
-- ✅ Production deployment
-- ✅ High availability
-- ✅ Error recovery
-**No further Redis configuration needed!** 🚀

tempo.py DELETED Viewed

@@ -1,154 +0,0 @@
-"""
-tempo.py
-Analisis Tempo dan Jeda Bicara menggunakan Silero VAD
-"""
-import torch
-import pandas as pd
-from typing import Dict, List
-import warnings
-warnings.filterwarnings('ignore')
-class TempoAnalyzer:
-    """Analisis tempo dan jeda bicara"""
-    def __init__(self):
-        """Initialize Silero VAD model"""
-        print("🔄 Loading Silero VAD model...")
-        torch.set_num_threads(1)
-        self.model, utils = torch.hub.load(
-            repo_or_dir='snakers4/silero-vad',
-            model='silero_vad',
-            force_reload=False
-        )
-        (self.get_speech_timestamps,
-         self.save_audio,
-         self.read_audio,
-         self.VADIterator,
-         self.collect_chunks) = utils
-        print("✅ Silero VAD model loaded!\n")
-    def analyze_tempo(self, audio_path: str, sampling_rate: int = 16000) -> Dict:
-        """
-        Analisis tempo dan jeda dari file audio
-        Args:
-            audio_path: Path ke file audio
-            sampling_rate: Sample rate audio (default: 16000)
-        Returns:
-            Dict berisi hasil analisis lengkap
-        """
-        print(f"🎧 Analyzing tempo: {audio_path}")
-        # Load audio
-        wav = self.read_audio(audio_path)
-        # Deteksi segmen bicara
-        speech_timestamps = self.get_speech_timestamps(
-            wav, self.model, sampling_rate=sampling_rate
-        )
-        # Buat daftar data analisis
-        data = []
-        total_pause = 0
-        total_score = 0
-        num_pauses = 0
-        for i, seg in enumerate(speech_timestamps):
-            start_time = seg['start'] / sampling_rate
-            end_time = seg['end'] / sampling_rate
-            duration = end_time - start_time
-            if i == 0:
-                pause_before = start_time  # jeda awal sebelum bicara pertama
-            else:
-                pause_before = start_time - (speech_timestamps[i - 1]['end'] / sampling_rate)
-            # Hitung skor jeda (0 atau 1)
-            # Jika jeda <= 3 detik → 1, jika > 3 detik → 0
-            skor = 1 if pause_before <= 3.0 else 0
-            total_pause += pause_before
-            total_score += skor
-            num_pauses += 1
-            data.append({
-                'Segmen': i + 1,
-                'Mulai (detik)': round(start_time, 2),
-                'Selesai (detik)': round(end_time, 2),
-                'Durasi Bicara (detik)': round(duration, 2),
-                'Jeda Sebelum (detik)': round(pause_before, 2),
-                'Skor Jeda': skor
-            })
-        # Hitung rata-rata jeda dan skor
-        rata_jeda = total_pause / num_pauses if num_pauses > 0 else 0
-        rata_skor = total_score / num_pauses if num_pauses > 0 else 0
-        # Tentukan kategori
-        if rata_skor >= 0.9:
-            kategori = "Sangat Baik"
-            poin = 5
-        elif rata_skor >= 0.7:
-            kategori = "Baik"
-            poin = 4
-        elif rata_skor >= 0.5:
-            kategori = "Cukup"
-            poin = 3
-        elif rata_skor >= 0.3:
-            kategori = "Buruk"
-            poin = 2
-        else:
-            kategori = "Perlu Ditingkatkan"
-            poin = 1
-        print("✅ Tempo analysis complete!\n")
-        return {
-            'segments': data,
-            'total_segments': len(speech_timestamps),
-            'rata_rata_jeda': round(rata_jeda, 2),
-            'rata_rata_skor': round(rata_skor, 2),
-            'kategori': kategori,
-            'poin': poin,
-            'summary': {
-                'score': poin,
-                'category': kategori,
-                'avg_pause': round(rata_jeda, 2),
-                'avg_score': round(rata_skor, 2),
-                'total_segments': len(speech_timestamps)
-            }
-        }
-    def print_report(self, result: Dict):
-        """Print detailed report"""
-        df = pd.DataFrame(result['segments'])
-        print("\n" + "="*70)
-        print("📊 ANALISIS TEMPO DAN JEDA BICARA")
-        print("="*70)
-        print(df.to_string(index=False))
-        print("\n" + "="*70)
-        print(f"Total Segmen Bicara      : {result['total_segments']}")
-        print(f"Rata-rata Jeda (detik)   : {result['rata_rata_jeda']}")
-        print(f"Rata-rata Skor Jeda      : {result['rata_rata_skor']}/1")
-        print(f"Kategori                 : {result['kategori']}")
-        print(f"Poin                     : {result['poin']}/5")
-        print("="*70 + "\n")
-# ========== DEMO ==========
-def demo():
-    """Demo function"""
-    analyzer = TempoAnalyzer()
-    audio_path = "./bad.wav"
-    result = analyzer.analyze_tempo(audio_path)
-    analyzer.print_report(result)
-if __name__ == "__main__":
-    demo()

upload_model_to_hf.py DELETED Viewed

@@ -1,175 +0,0 @@
-"""
-Script untuk upload best_model ke Hugging Face Hub
-Run sekali saja untuk upload model
-"""
-from huggingface_hub import HfApi, create_repo, login
-import os
-# Konfigurasi
-MODEL_PATH = "./best_model"  # Path ke model lokal
-REPO_NAME = "Cyberlace/swara-structure-model"  # Nama repository di HF Hub
-def upload_model():
-    """Upload model ke Hugging Face Hub"""
-    print("=" * 70)
-    print("📦 Uploading Structure Model to Hugging Face Hub")
-    print("=" * 70)
-    # Step 1: Check if already logged in
-    print("\n🔐 Step 1: Checking Hugging Face authentication")
-    from huggingface_hub import HfFolder
-    token = HfFolder.get_token()
-    if token is None:
-        print("❌ Not logged in!")
-        print("\n💡 Please login first:")
-        print("   Run: huggingface-cli login")
-        return
-    print("✅ Already logged in!")
-    # Step 2: Buat repository (jika belum ada)
-    print(f"\n📁 Step 2: Creating repository: {REPO_NAME}")
-    try:
-        create_repo(
-            repo_id=REPO_NAME,
-            repo_type="model",
-            exist_ok=True  # Skip jika sudah ada
-        )
-        print("✅ Repository ready!")
-    except Exception as e:
-        print(f"⚠️  Repository might already exist: {e}")
-    # Step 3: Upload semua files di best_model
-    print(f"\n📤 Step 3: Uploading model files from {MODEL_PATH}")
-    api = HfApi()
-    # List semua files di best_model
-    files_to_upload = []
-    for root, dirs, files in os.walk(MODEL_PATH):
-        for file in files:
-            file_path = os.path.join(root, file)
-            # Relative path untuk upload
-            path_in_repo = os.path.relpath(file_path, MODEL_PATH)
-            files_to_upload.append((file_path, path_in_repo))
-    print(f"   Found {len(files_to_upload)} files to upload:")
-    for file_path, path_in_repo in files_to_upload:
-        file_size = os.path.getsize(file_path) / (1024 * 1024)  # MB
-        print(f"   - {path_in_repo} ({file_size:.2f} MB)")
-    # Upload files
-    print("\n⏳ Uploading files...")
-    try:
-        for file_path, path_in_repo in files_to_upload:
-            print(f"   Uploading {path_in_repo}...", end=" ")
-            api.upload_file(
-                path_or_fileobj=file_path,
-                path_in_repo=path_in_repo,
-                repo_id=REPO_NAME,
-                repo_type="model"
-            )
-            print("✅")
-        print("\n🎉 Upload complete!")
-        print(f"📍 Model URL: https://huggingface.co/{REPO_NAME}")
-    except Exception as e:
-        print(f"\n❌ Upload failed: {e}")
-        return
-    # Step 4: Create README
-    print("\n📝 Step 4: Creating README.md")
-    readme_content = f"""---
-language:
-- id
-license: apache-2.0
-tags:
-- text-classification
-- indonesian
-- speech-structure
-- bert
-datasets:
-- custom
----
-# Swara Structure Analysis Model
-BERT model untuk analisis struktur berbicara (opening, content, closing) dalam Bahasa Indonesia.
-## Model Description
-Model ini dilatih untuk mengklasifikasikan kalimat dalam pidato/presentasi menjadi 3 kategori:
-- **Opening**: Pembukaan (salam, perkenalan, pengantar)
-- **Content**: Isi utama (poin-poin, argumen, penjelasan)
-- **Closing**: Penutup (kesimpulan, ucapan terima kasih)
-## Usage
-```python
-from transformers import BertTokenizer, BertForSequenceClassification
-import torch
-# Load model
-model_name = "{REPO_NAME}"
-tokenizer = BertTokenizer.from_pretrained(model_name)
-model = BertForSequenceClassification.from_pretrained(model_name)
-# Predict
-text = "Selamat pagi hadirin sekalian"
-inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
-with torch.no_grad():
-    outputs = model(**inputs)
-    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
-    predicted_class = torch.argmax(probs, dim=1).item()
-labels = {{0: "opening", 1: "content", 2: "closing"}}
-print(f"Predicted: {{labels[predicted_class]}}")
-```
-## Training Data
-Model dilatih dengan dataset pidato dan presentasi dalam Bahasa Indonesia.
-## Intended Use
-Model ini digunakan dalam sistem analisis public speaking untuk:
-- Evaluasi struktur presentasi
-- Feedback otomatis untuk pembicara
-- Training public speaking
-"""
-    try:
-        api.upload_file(
-            path_or_fileobj=readme_content.encode('utf-8'),
-            path_in_repo="README.md",
-            repo_id=REPO_NAME,
-            repo_type="model"
-        )
-        print("✅ README created!")
-    except Exception as e:
-        print(f"⚠️  README creation failed: {e}")
-    print("\n" + "=" * 70)
-    print("✅ ALL DONE!")
-    print("=" * 70)
-    print(f"\n📍 Model Repository: https://huggingface.co/{REPO_NAME}")
-    print("\n💡 Next steps:")
-    print("   1. Update app/services/structure.py to use this model")
-    print("   2. Remove best_model/ from your Space repository")
-    print("   3. Deploy and test")
-if __name__ == "__main__":
-    # Check if best_model exists
-    if not os.path.exists(MODEL_PATH):
-        print(f"❌ Error: Model path not found: {MODEL_PATH}")
-        print("   Please make sure best_model/ directory exists")
-        exit(1)
-    upload_model()