# 🛠️ Azure Speech Transcription - Developer Guide ## 📋 Table of Contents - [System Architecture](#-system-architecture) - [Development Environment](#-development-environment) - [Deployment Guide](#-deployment-guide) - [API Documentation](#-api-documentation) - [Database Schema](#-database-schema) - [Security Implementation](#-security-implementation) - [Monitoring & Maintenance](#-monitoring--maintenance) - [Contributing Guidelines](#-contributing-guidelines) - [Advanced Configuration](#-advanced-configuration) - [Troubleshooting](#-troubleshooting) --- ## 🏗️ System Architecture ### Overview The Azure Speech Transcription service is built with a modern, secure architecture focusing on user privacy, PDPA compliance, and scalability. ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Frontend UI │ │ Backend API │ │ Azure Services │ │ (Gradio) │◄──►│ (Python) │◄──►│ Speech & Blob │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ User Session │ │ SQLite Database │ │ User Storage │ │ Management │ │ (Metadata) │ │ (Isolated) │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ### Core Components #### 1. Frontend Layer (`gradio_app.py`) - **Technology**: Gradio with custom CSS - **Purpose**: User interface and session management - **Features**: Authentication, file upload, real-time status, history management #### 2. Backend Layer (`app_core.py`) - **Technology**: Python with threading and async processing - **Purpose**: Business logic, authentication, and Azure integration - **Features**: User management, transcription processing, PDPA compliance #### 3. Data Layer - **Database**: SQLite with Azure Blob backup - **Storage**: Azure Blob Storage with user separation - **Security**: User-isolated folders and encrypted connections #### 4. External Services - **Azure Speech Services**: Transcription processing - **Azure Blob Storage**: File and database storage - **FFmpeg**: Audio/video conversion ### Data Flow ``` 1. User uploads file → 2. Authentication check → 3. File validation ↓ ↓ ↓ 8. Download results ← 7. Store transcript ← 6. Process with Azure ↑ ↑ ↑ 9. Update UI status ← 4. Save to user folder ← 5. Background processing ``` --- ## 💻 Development Environment ### Prerequisites - **Python**: 3.8 or higher - **Azure Account**: With Speech Services and Blob Storage - **FFmpeg**: For audio/video processing - **Git**: For version control ### Environment Setup #### 1. Clone Repository ```bash git clone cd azure-speech-transcription ``` #### 2. Virtual Environment ```bash # Create virtual environment python -m venv venv # Activate (Windows) venv\Scripts\activate # Activate (macOS/Linux) source venv/bin/activate ``` #### 3. Install Dependencies ```bash pip install -r requirements.txt ``` #### 4. Environment Configuration ```bash # Copy environment template cp .env.example .env # Edit with your Azure credentials nano .env ``` #### 5. Install FFmpeg **Windows (Chocolatey):** ```bash choco install ffmpeg ``` **macOS (Homebrew):** ```bash brew install ffmpeg ``` **Ubuntu/Debian:** ```bash sudo apt update sudo apt install ffmpeg ``` #### 6. Verify Installation ```python python -c " import gradio as gr from azure.storage.blob import BlobServiceClient import subprocess print('Gradio:', gr.__version__) print('FFmpeg:', subprocess.run(['ffmpeg', '-version'], capture_output=True).returncode == 0) print('Azure Blob:', 'OK') " ``` ### Development Server ```bash # Start development server python gradio_app.py # Server will be available at: # http://localhost:7860 ``` ### Development Tools #### Recommended IDE Setup - **VS Code**: With Python, Azure, and Git extensions - **PyCharm**: Professional edition with Azure toolkit - **Vim/Emacs**: With appropriate Python plugins #### Useful Extensions ```json { "recommendations": [ "ms-python.python", "ms-vscode.azure-cli", "ms-azuretools.azure-cli-tools", "ms-python.black-formatter", "ms-python.flake8" ] } ``` #### Code Quality Tools ```bash # Install development tools pip install black flake8 pytest mypy # Format code black . # Lint code flake8 . # Type checking mypy app_core.py gradio_app.py ``` --- ## 🚀 Deployment Guide ### Production Deployment Options #### Option 1: Traditional Server Deployment **1. Server Preparation** ```bash # Update system sudo apt update && sudo apt upgrade -y # Install Python and dependencies sudo apt install python3 python3-pip python3-venv nginx ffmpeg -y # Create application user sudo useradd -m -s /bin/bash transcription sudo su - transcription ``` **2. Application Setup** ```bash # Clone repository git clone /home/transcription/app cd /home/transcription/app # Setup virtual environment python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env with production values ``` **3. Systemd Service** ```ini # /etc/systemd/system/transcription.service [Unit] Description=Azure Speech Transcription Service After=network.target [Service] Type=simple User=transcription Group=transcription WorkingDirectory=/home/transcription/app Environment=PATH=/home/transcription/app/venv/bin ExecStart=/home/transcription/app/venv/bin/python gradio_app.py Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` **4. Nginx Configuration** ```nginx # /etc/nginx/sites-available/transcription server { listen 80; server_name your-domain.com; client_max_body_size 500M; location / { proxy_pass http://127.0.0.1:7860; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 300s; proxy_connect_timeout 75s; } } ``` **5. SSL Certificate** ```bash # Install Certbot sudo apt install certbot python3-certbot-nginx -y # Get SSL certificate sudo certbot --nginx -d your-domain.com # Verify auto-renewal sudo certbot renew --dry-run ``` **6. Start Services** ```bash # Enable and start application sudo systemctl enable transcription sudo systemctl start transcription # Enable and restart nginx sudo systemctl enable nginx sudo systemctl restart nginx # Check status sudo systemctl status transcription sudo systemctl status nginx ``` #### Option 2: Docker Deployment **1. Dockerfile** ```dockerfile FROM python:3.9-slim # Install system dependencies RUN apt-get update && apt-get install -y \ ffmpeg \ && rm -rf /var/lib/apt/lists/* # Set working directory WORKDIR /app # Copy requirements and install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY . . # Create necessary directories RUN mkdir -p uploads database temp # Expose port EXPOSE 7860 # Run application CMD ["python", "gradio_app.py"] ``` **2. Docker Compose** ```yaml # docker-compose.yml version: '3.8' services: transcription: build: . ports: - "7860:7860" environment: - AZURE_SPEECH_KEY=${AZURE_SPEECH_KEY} - AZURE_SPEECH_KEY_ENDPOINT=${AZURE_SPEECH_KEY_ENDPOINT} - AZURE_REGION=${AZURE_REGION} - AZURE_BLOB_CONNECTION=${AZURE_BLOB_CONNECTION} - AZURE_CONTAINER=${AZURE_CONTAINER} - AZURE_BLOB_SAS_TOKEN=${AZURE_BLOB_SAS_TOKEN} - ALLOWED_LANGS=${ALLOWED_LANGS} volumes: - ./uploads:/app/uploads - ./database:/app/database - ./temp:/app/temp restart: unless-stopped nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf - ./ssl:/etc/ssl/certs depends_on: - transcription restart: unless-stopped ``` **3. Deploy with Docker** ```bash # Build and start docker-compose up -d # View logs docker-compose logs -f transcription # Update application git pull docker-compose build transcription docker-compose up -d transcription ``` #### Option 3: Cloud Deployment (Azure Container Instances) **1. Create Container Registry** ```bash # Create ACR az acr create --resource-group myResourceGroup \ --name myregistry --sku Basic # Login to ACR az acr login --name myregistry # Build and push image docker build -t myregistry.azurecr.io/transcription:latest . docker push myregistry.azurecr.io/transcription:latest ``` **2. Deploy Container Instance** ```bash # Create container instance az container create \ --resource-group myResourceGroup \ --name transcription-app \ --image myregistry.azurecr.io/transcription:latest \ --cpu 2 --memory 4 \ --port 7860 \ --environment-variables \ AZURE_SPEECH_KEY=$AZURE_SPEECH_KEY \ AZURE_SPEECH_KEY_ENDPOINT=$AZURE_SPEECH_KEY_ENDPOINT \ AZURE_REGION=$AZURE_REGION \ AZURE_BLOB_CONNECTION="$AZURE_BLOB_CONNECTION" \ AZURE_CONTAINER=$AZURE_CONTAINER \ AZURE_BLOB_SAS_TOKEN="$AZURE_BLOB_SAS_TOKEN" ``` --- ## 📡 API Documentation ### Core Classes and Methods #### TranscriptionManager Class **Purpose**: Main service class handling all transcription operations ```python class TranscriptionManager: def __init__(self) # User Authentication def register_user(email: str, username: str, password: str, gdpr_consent: bool, data_retention_agreed: bool, marketing_consent: bool) -> Tuple[bool, str, Optional[str]] def login_user(login: str, password: str) -> Tuple[bool, str, Optional[User]] # Transcription Operations def submit_transcription(file_bytes: bytes, original_filename: str, user_id: str, language: str, settings: Dict) -> str def get_job_status(job_id: str) -> Optional[TranscriptionJob] # Data Management def get_user_history(user_id: str, limit: int) -> List[TranscriptionJob] def get_user_stats(user_id: str) -> Dict def export_user_data(user_id: str) -> Dict def delete_user_account(user_id: str) -> bool ``` #### DatabaseManager Class **Purpose**: Handle database operations and Azure blob synchronization ```python class DatabaseManager: def __init__(db_path: str = None) # User Operations def create_user(...) -> Tuple[bool, str, Optional[str]] def authenticate_user(login: str, password: str) -> Tuple[bool, str, Optional[User]] def get_user_by_id(user_id: str) -> Optional[User] # Job Operations def save_job(job: TranscriptionJob) def get_job(job_id: str) -> Optional[TranscriptionJob] def get_user_jobs(user_id: str, limit: int) -> List[TranscriptionJob] def get_pending_jobs() -> List[TranscriptionJob] ``` #### AuthManager Class **Purpose**: Authentication utilities and validation ```python class AuthManager: @staticmethod def hash_password(password: str) -> str def verify_password(password: str, password_hash: str) -> bool def validate_email(email: str) -> bool def validate_username(username: str) -> bool def validate_password(password: str) -> Tuple[bool, str] ``` ### Data Models #### User Model ```python @dataclass class User: user_id: str email: str username: str password_hash: str created_at: str last_login: Optional[str] = None is_active: bool = True gdpr_consent: bool = False data_retention_agreed: bool = False marketing_consent: bool = False ``` #### TranscriptionJob Model ```python @dataclass class TranscriptionJob: job_id: str user_id: str original_filename: str audio_url: str language: str status: str # pending, processing, completed, failed created_at: str completed_at: Optional[str] = None transcript_text: Optional[str] = None transcript_url: Optional[str] = None error_message: Optional[str] = None azure_trans_id: Optional[str] = None settings: Optional[Dict] = None ``` ### Configuration Parameters #### Environment Variables ```python # Required AZURE_SPEECH_KEY: str AZURE_SPEECH_KEY_ENDPOINT: str AZURE_REGION: str AZURE_BLOB_CONNECTION: str AZURE_CONTAINER: str AZURE_BLOB_SAS_TOKEN: str # Optional ALLOWED_LANGS: str # JSON string API_VERSION: str = "v3.2" PASSWORD_SALT: str = "default_salt" MAX_FILE_SIZE_MB: int = 500 ``` #### Transcription Settings ```python settings = { 'audio_format': str, # wav, mp3, etc. 'diarization_enabled': bool, # Speaker identification 'speakers': int, # Max speakers (1-10) 'profanity': str, # masked, removed, raw 'punctuation': str, # automatic, dictated, none 'timestamps': bool, # Include timestamps 'lexical': bool, # Include lexical forms 'language_id_enabled': bool, # Auto language detection 'candidate_locales': List[str] # Language candidates } ``` --- ## 🗄️ Database Schema ### SQLite Database Structure #### Users Table ```sql CREATE TABLE users ( user_id TEXT PRIMARY KEY, email TEXT UNIQUE NOT NULL, username TEXT UNIQUE NOT NULL, password_hash TEXT NOT NULL, created_at TEXT NOT NULL, last_login TEXT, is_active BOOLEAN DEFAULT 1, gdpr_consent BOOLEAN DEFAULT 0, data_retention_agreed BOOLEAN DEFAULT 0, marketing_consent BOOLEAN DEFAULT 0 ); -- Indexes CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_users_username ON users(username); ``` #### Transcriptions Table ```sql CREATE TABLE transcriptions ( job_id TEXT PRIMARY KEY, user_id TEXT NOT NULL, original_filename TEXT NOT NULL, audio_url TEXT, language TEXT NOT NULL, status TEXT NOT NULL, created_at TEXT NOT NULL, completed_at TEXT, transcript_text TEXT, transcript_url TEXT, error_message TEXT, azure_trans_id TEXT, settings TEXT, FOREIGN KEY (user_id) REFERENCES users (user_id) ); -- Indexes CREATE INDEX idx_transcriptions_user_id ON transcriptions(user_id); CREATE INDEX idx_transcriptions_status ON transcriptions(status); CREATE INDEX idx_transcriptions_created_at ON transcriptions(created_at DESC); CREATE INDEX idx_transcriptions_user_created ON transcriptions(user_id, created_at DESC); ``` ### Azure Blob Storage Structure ``` Container: {AZURE_CONTAINER}/ ├── shared/ │ └── database/ │ └── transcriptions.db # Shared database backup ├── users/ │ ├── {user-id-1}/ │ │ ├── audio/ # Processed audio files │ │ │ ├── {job-id-1}.wav │ │ │ └── {job-id-2}.wav │ │ ├── transcripts/ # Transcript files │ │ │ ├── {job-id-1}.txt │ │ │ └── {job-id-2}.txt │ │ └── originals/ # Original uploaded files │ │ ├── {job-id-1}_{filename}.mp4 │ │ └── {job-id-2}_{filename}.wav │ └── {user-id-2}/ │ ├── audio/ │ ├── transcripts/ │ └── originals/ ``` ### Database Operations #### User Management Queries ```sql -- Create user INSERT INTO users (user_id, email, username, password_hash, created_at, gdpr_consent, data_retention_agreed, marketing_consent) VALUES (?, ?, ?, ?, ?, ?, ?, ?); -- Authenticate user SELECT * FROM users WHERE (email = ? OR username = ?) AND is_active = 1; -- Update last login UPDATE users SET last_login = ? WHERE user_id = ?; -- Get user stats SELECT status, COUNT(*) FROM transcriptions WHERE user_id = ? GROUP BY status; ``` #### Job Management Queries ```sql -- Create job INSERT INTO transcriptions (job_id, user_id, original_filename, language, status, created_at, settings) VALUES (?, ?, ?, ?, 'pending', ?, ?); -- Update job status UPDATE transcriptions SET status = ?, completed_at = ?, transcript_text = ?, transcript_url = ? WHERE job_id = ?; -- Get user jobs SELECT * FROM transcriptions WHERE user_id = ? ORDER BY created_at DESC LIMIT ?; -- Get pending jobs for background processor SELECT * FROM transcriptions WHERE status IN ('pending', 'processing'); ``` --- ## 🔒 Security Implementation ### Authentication Security #### Password Security ```python # Password hashing with salt def hash_password(password: str) -> str: salt = os.environ.get("PASSWORD_SALT", "default_salt") return hashlib.sha256((password + salt).encode()).hexdigest() # Password validation def validate_password(password: str) -> Tuple[bool, str]: if len(password) < 8: return False, "Password must be at least 8 characters" if not re.search(r'[A-Z]', password): return False, "Password must contain uppercase letter" if not re.search(r'[a-z]', password): return False, "Password must contain lowercase letter" if not re.search(r'\d', password): return False, "Password must contain number" return True, "Valid" ``` #### Session Management ```python # User session state session_state = { 'user_id': str, 'username': str, 'logged_in_at': datetime, 'last_activity': datetime } # Session validation def validate_session(session_state: dict) -> bool: if not session_state or 'user_id' not in session_state: return False # Check session timeout (if implemented) last_activity = session_state.get('last_activity') if last_activity: timeout = timedelta(hours=24) # 24-hour sessions if datetime.now() - last_activity > timeout: return False return True ``` ### Data Security #### Access Control ```python # User data access verification def verify_user_access(job_id: str, user_id: str) -> bool: job = get_job(job_id) return job and job.user_id == user_id # File path security def get_user_blob_path(user_id: str, blob_type: str, filename: str) -> str: # Ensure user can only access their own folder safe_filename = os.path.basename(filename) # Prevent path traversal return f"users/{user_id}/{blob_type}/{safe_filename}" ``` #### Data Encryption ```python # Azure Blob Storage encryption (configured at Azure level) # - Encryption at rest: Enabled by default # - Encryption in transit: HTTPS enforced # - Customer-managed keys: Optional enhancement # Database encryption (for sensitive fields) from cryptography.fernet import Fernet def encrypt_sensitive_data(data: str, key: bytes) -> str: f = Fernet(key) return f.encrypt(data.encode()).decode() def decrypt_sensitive_data(encrypted_data: str, key: bytes) -> str: f = Fernet(key) return f.decrypt(encrypted_data.encode()).decode() ``` ### Azure Security #### Blob Storage Security ```python # SAS token configuration for least privilege sas_permissions = BlobSasPermissions( read=True, write=True, delete=True, list=True ) # IP restrictions (optional) sas_ip_range = "192.168.1.0/24" # Restrict to specific IP range # Time-limited tokens sas_expiry = datetime.utcnow() + timedelta(hours=1) ``` #### Speech Service Security ```python # Secure API calls headers = { "Ocp-Apim-Subscription-Key": AZURE_SPEECH_KEY, "Content-Type": "application/json" } # Request timeout and retry logic response = requests.post( url, headers=headers, json=body, timeout=30, verify=True # Verify SSL certificates ) ``` ### Input Validation #### File Upload Security ```python def validate_uploaded_file(file_path: str, max_size: int = 500 * 1024 * 1024) -> Tuple[bool, str]: try: # Check file exists if not os.path.exists(file_path): return False, "File not found" # Check file size file_size = os.path.getsize(file_path) if file_size > max_size: return False, f"File too large: {file_size / 1024 / 1024:.1f}MB" # Check file type by content (not just extension) import magic mime_type = magic.from_file(file_path, mime=True) allowed_types = ['audio/', 'video/'] if not any(mime_type.startswith(t) for t in allowed_types): return False, f"Invalid file type: {mime_type}" return True, "Valid" except Exception as e: return False, f"Validation error: {str(e)}" ``` #### SQL Injection Prevention ```python # Use parameterized queries (already implemented) cursor.execute( "SELECT * FROM users WHERE email = ? AND password_hash = ?", (email, password_hash) ) # Input sanitization def sanitize_input(user_input: str) -> str: # Remove dangerous characters import html sanitized = html.escape(user_input) # Limit length return sanitized[:1000] ``` --- ## 📊 Monitoring & Maintenance ### Application Monitoring #### Health Checks ```python def health_check() -> Dict[str, Any]: """System health check endpoint""" try: # Database check db_status = check_database_connection() # Azure services check blob_status = check_blob_storage() speech_status = check_speech_service() # FFmpeg check ffmpeg_status = check_ffmpeg_installation() # Disk space check disk_status = check_disk_space() return { 'status': 'healthy' if all([db_status, blob_status, speech_status, ffmpeg_status]) else 'unhealthy', 'timestamp': datetime.now().isoformat(), 'services': { 'database': db_status, 'blob_storage': blob_status, 'speech_service': speech_status, 'ffmpeg': ffmpeg_status, 'disk_space': disk_status } } except Exception as e: return { 'status': 'error', 'timestamp': datetime.now().isoformat(), 'error': str(e) } def check_database_connection() -> bool: try: with transcription_manager.db.get_connection() as conn: conn.execute("SELECT 1").fetchone() return True except: return False def check_blob_storage() -> bool: try: client = BlobServiceClient.from_connection_string(AZURE_BLOB_CONNECTION) client.list_containers(max_results=1) return True except: return False ``` #### Logging Configuration ```python import logging from logging.handlers import RotatingFileHandler def setup_logging(): """Configure application logging""" # Create formatter formatter = logging.Formatter( '%(asctime)s - %(name)s - %(levelname)s - %(message)s' ) # Console handler console_handler = logging.StreamHandler() console_handler.setFormatter(formatter) console_handler.setLevel(logging.INFO) # File handler with rotation file_handler = RotatingFileHandler( 'logs/transcription.log', maxBytes=10*1024*1024, # 10MB backupCount=5 ) file_handler.setFormatter(formatter) file_handler.setLevel(logging.DEBUG) # Configure root logger logger = logging.getLogger() logger.setLevel(logging.DEBUG) logger.addHandler(console_handler) logger.addHandler(file_handler) # Separate logger for sensitive operations auth_logger = logging.getLogger('auth') auth_handler = RotatingFileHandler( 'logs/auth.log', maxBytes=5*1024*1024, # 5MB backupCount=10 ) auth_handler.setFormatter(formatter) auth_logger.addHandler(auth_handler) auth_logger.setLevel(logging.INFO) ``` #### Performance Monitoring ```python import time from functools import wraps def monitor_performance(func): """Decorator to monitor function performance""" @wraps(func) def wrapper(*args, **kwargs): start_time = time.time() try: result = func(*args, **kwargs) duration = time.time() - start_time logging.info(f"{func.__name__} completed in {duration:.2f}s") return result except Exception as e: duration = time.time() - start_time logging.error(f"{func.__name__} failed after {duration:.2f}s: {str(e)}") raise return wrapper # Usage @monitor_performance def submit_transcription(self, file_bytes, filename, user_id, language, settings): # Implementation here pass ``` ### Database Maintenance #### Backup Strategy ```python def backup_database(): """Backup database to Azure Blob Storage""" try: # Create timestamped backup timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") backup_name = f"shared/backups/transcriptions_backup_{timestamp}.db" # Upload current database blob_client = blob_service.get_blob_client( container=AZURE_CONTAINER, blob=backup_name ) with open(db_path, "rb") as data: blob_client.upload_blob(data) logging.info(f"Database backup created: {backup_name}") # Clean old backups (keep last 30 days) cleanup_old_backups() except Exception as e: logging.error(f"Database backup failed: {str(e)}") def cleanup_old_backups(): """Remove backups older than 30 days""" try: cutoff_date = datetime.now() - timedelta(days=30) container_client = blob_service.get_container_client(AZURE_CONTAINER) for blob in container_client.list_blobs(name_starts_with="shared/backups/"): if blob.last_modified < cutoff_date: blob_service.delete_blob(AZURE_CONTAINER, blob.name) logging.info(f"Deleted old backup: {blob.name}") except Exception as e: logging.error(f"Backup cleanup failed: {str(e)}") ``` #### Database Optimization ```python def optimize_database(): """Optimize database performance""" try: with transcription_manager.db.get_connection() as conn: # Analyze tables conn.execute("ANALYZE") # Vacuum database (compact) conn.execute("VACUUM") # Update statistics conn.execute("PRAGMA optimize") logging.info("Database optimization completed") except Exception as e: logging.error(f"Database optimization failed: {str(e)}") # Schedule optimization (run weekly) import schedule schedule.every().week.do(optimize_database) schedule.every().day.at("02:00").do(backup_database) ``` ### Resource Management #### Cleanup Tasks ```python def cleanup_temporary_files(): """Clean up temporary files older than 24 hours""" try: cutoff_time = time.time() - (24 * 60 * 60) # 24 hours ago temp_dirs = ['uploads', 'temp'] for temp_dir in temp_dirs: if os.path.exists(temp_dir): for filename in os.listdir(temp_dir): filepath = os.path.join(temp_dir, filename) if os.path.isfile(filepath) and os.path.getmtime(filepath) < cutoff_time: os.remove(filepath) logging.info(f"Cleaned up temporary file: {filepath}") except Exception as e: logging.error(f"Temporary file cleanup failed: {str(e)}") def monitor_disk_space(): """Monitor and alert on disk space""" try: import shutil total, used, free = shutil.disk_usage("/") # Convert to GB free_gb = free // (1024**3) total_gb = total // (1024**3) usage_percent = (used / total) * 100 if usage_percent > 85: logging.warning(f"High disk usage: {usage_percent:.1f}% ({free_gb}GB free)") if free_gb < 5: logging.critical(f"Low disk space: {free_gb}GB remaining") except Exception as e: logging.error(f"Disk space monitoring failed: {str(e)}") ``` ### Monitoring Alerts #### Email Alerts (Optional) ```python import smtplib from email.mime.text import MIMEText def send_alert(subject: str, message: str): """Send email alert for critical issues""" try: smtp_server = os.environ.get("SMTP_SERVER") smtp_port = int(os.environ.get("SMTP_PORT", "587")) smtp_user = os.environ.get("SMTP_USER") smtp_pass = os.environ.get("SMTP_PASS") alert_email = os.environ.get("ALERT_EMAIL") if not all([smtp_server, smtp_user, smtp_pass, alert_email]): return # Email not configured msg = MIMEText(message) msg['Subject'] = f"[Transcription Service] {subject}" msg['From'] = smtp_user msg['To'] = alert_email with smtplib.SMTP(smtp_server, smtp_port) as server: server.starttls() server.login(smtp_user, smtp_pass) server.send_message(msg) except Exception as e: logging.error(f"Failed to send alert: {str(e)}") ``` --- ## 🤝 Contributing Guidelines ### Development Workflow #### 1. Setup Development Environment ```bash # Fork repository git clone https://github.com/your-username/azure-speech-transcription.git cd azure-speech-transcription # Create feature branch git checkout -b feature/your-feature-name # Setup environment python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt pip install -r requirements-dev.txt # Development dependencies ``` #### 2. Code Quality Standards **Python Style Guide** - Follow PEP 8 style guidelines - Use type hints for function parameters and return values - Maximum line length: 88 characters (Black formatter) - Use meaningful variable and function names **Code Formatting** ```bash # Install development tools pip install black flake8 mypy pytest # Format code black . # Check style flake8 . # Type checking mypy app_core.py gradio_app.py # Run tests pytest tests/ ``` **Documentation Standards** - All functions must have docstrings - Include type hints - Document complex logic with inline comments - Update README.md for new features ```python def submit_transcription( self, file_bytes: bytes, original_filename: str, user_id: str, language: str, settings: Dict[str, Any] ) -> str: """ Submit a new transcription job for processing. Args: file_bytes: Raw bytes of the audio/video file original_filename: Original name of the uploaded file user_id: ID of the authenticated user language: Language code for transcription (e.g., 'en-US') settings: Transcription configuration options Returns: str: Unique job ID for tracking transcription progress Raises: ValueError: If user_id is invalid or file is too large ConnectionError: If Azure services are unavailable """ ``` #### 3. Testing Requirements **Unit Tests** ```python import pytest from unittest.mock import Mock, patch from app_core import TranscriptionManager, AuthManager class TestAuthManager: def test_password_hashing(self): password = "TestPassword123" hashed = AuthManager.hash_password(password) assert hashed != password assert AuthManager.verify_password(password, hashed) assert not AuthManager.verify_password("wrong", hashed) def test_email_validation(self): assert AuthManager.validate_email("test@example.com") assert not AuthManager.validate_email("invalid-email") assert not AuthManager.validate_email("") class TestTranscriptionManager: @patch('app_core.BlobServiceClient') def test_submit_transcription(self, mock_blob): manager = TranscriptionManager() job_id = manager.submit_transcription( b"fake audio data", "test.wav", "user123", "en-US", {"audio_format": "wav"} ) assert isinstance(job_id, str) assert len(job_id) == 36 # UUID length ``` **Integration Tests** ```python class TestIntegration: def test_full_transcription_workflow(self): # Test complete workflow from upload to download pass def test_user_registration_and_login(self): # Test complete auth workflow pass ``` #### 4. Commit Guidelines **Commit Message Format** ``` type(scope): brief description Detailed explanation of changes if needed - List specific changes - Include any breaking changes - Reference issue numbers Closes #123 ``` **Commit Types** - `feat`: New feature - `fix`: Bug fix - `docs`: Documentation changes - `style`: Code style changes (formatting, etc.) - `refactor`: Code refactoring - `test`: Adding or updating tests - `chore`: Maintenance tasks **Example Commits** ```bash git commit -m "feat(auth): add password strength validation - Implement password complexity requirements - Add client-side validation feedback - Update registration form UI Closes #45" git commit -m "fix(transcription): handle Azure service timeouts - Add retry logic for failed API calls - Improve error messages for users - Log detailed error information Fixes #67" ``` #### 5. Pull Request Process **PR Checklist** - [ ] Code follows style guidelines - [ ] All tests pass - [ ] Documentation updated - [ ] Security considerations reviewed - [ ] Performance impact assessed - [ ] Breaking changes documented **PR Template** ```markdown ## Description Brief description of changes ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation update ## Testing - [ ] Unit tests added/updated - [ ] Integration tests pass - [ ] Manual testing completed ## Security - [ ] No sensitive data exposed - [ ] Input validation implemented - [ ] Access controls maintained ## Performance - [ ] No performance degradation - [ ] Database queries optimized - [ ] Resource usage considered ``` ### Feature Development #### Adding New Languages ```python # 1. Update environment configuration ALLOWED_LANGS = { "en-US": "English (United States)", "es-ES": "Spanish (Spain)", "new-LANG": "New Language Name" } # 2. Test language support def test_new_language(): # Verify Azure Speech Services supports the language # Test transcription accuracy # Update documentation ``` #### Adding New Audio Formats ```python # 1. Update supported formats list AUDIO_FORMATS = [ "wav", "mp3", "ogg", "opus", "flac", "new_format" # Add new format ] # 2. Update FFmpeg conversion logic def _convert_to_audio(self, input_path, output_path, audio_format="wav"): if audio_format == "new_format": # Add specific conversion parameters cmd = ["ffmpeg", "-i", input_path, "-codec", "new_codec", output_path] ``` #### Adding New Features ```python # 1. Database schema updates def upgrade_database_schema(): with self.get_connection() as conn: conn.execute(""" ALTER TABLE transcriptions ADD COLUMN new_feature_data TEXT """) # 2. API endpoint updates def new_feature_endpoint(user_id: str, feature_data: Dict) -> Dict: # Implement new feature logic pass # 3. UI updates def add_new_feature_ui(): new_feature_input = gr.Textbox(label="New Feature") new_feature_button = gr.Button("Use New Feature") ``` --- ## ⚙️ Advanced Configuration ### Performance Optimization #### Concurrent Processing ```python # Adjust worker thread pool size based on server capacity class TranscriptionManager: def __init__(self, max_workers: int = None): if max_workers is None: # Auto-detect based on CPU cores import multiprocessing max_workers = min(multiprocessing.cpu_count(), 10) self.executor = ThreadPoolExecutor(max_workers=max_workers) # Configure based on server specs # Small server: max_workers=2-4 # Medium server: max_workers=5-8 # Large server: max_workers=10+ ``` #### Database Optimization ```python # SQLite performance tuning def configure_database_performance(db_path: str): with sqlite3.connect(db_path) as conn: # Enable WAL mode for better concurrency conn.execute("PRAGMA journal_mode=WAL") # Increase cache size (in KB) conn.execute("PRAGMA cache_size=10000") # Optimize synchronization conn.execute("PRAGMA synchronous=NORMAL") # Enable foreign keys conn.execute("PRAGMA foreign_keys=ON") ``` #### Memory Management ```python # Large file handling def process_large_file(file_path: str): """Process large files in chunks to manage memory""" chunk_size = 64 * 1024 * 1024 # 64MB chunks with open(file_path, 'rb') as f: while chunk := f.read(chunk_size): # Process chunk yield chunk # Garbage collection for long-running processes import gc def cleanup_memory(): """Force garbage collection""" gc.collect() # Schedule periodic cleanup schedule.every(30).minutes.do(cleanup_memory) ``` ### Security Hardening #### Rate Limiting ```python from collections import defaultdict from time import time class RateLimiter: def __init__(self, max_requests: int = 100, window: int = 3600): self.max_requests = max_requests self.window = window self.requests = defaultdict(list) def is_allowed(self, user_id: str) -> bool: now = time() user_requests = self.requests[user_id] # Clean old requests user_requests[:] = [req_time for req_time in user_requests if now - req_time < self.window] # Check limit if len(user_requests) >= self.max_requests: return False user_requests.append(now) return True # Usage in endpoints rate_limiter = RateLimiter(max_requests=50, window=3600) # 50 per hour def submit_transcription(self, user_id: str, ...): if not rate_limiter.is_allowed(user_id): raise Exception("Rate limit exceeded") ``` #### Input Sanitization ```python import bleach import re def sanitize_filename(filename: str) -> str: """Sanitize uploaded filename""" # Remove path traversal attempts filename = os.path.basename(filename) # Remove dangerous characters filename = re.sub(r'[<>:"/\\|?*]', '_', filename) # Limit length if len(filename) > 255: name, ext = os.path.splitext(filename) filename = name[:250] + ext return filename def sanitize_user_input(text: str) -> str: """Sanitize user text input""" # Remove HTML tags text = bleach.clean(text, tags=[], strip=True) # Limit length text = text[:1000] return text.strip() ``` #### Audit Logging ```python class AuditLogger: def __init__(self): self.logger = logging.getLogger('audit') def log_user_action(self, user_id: str, action: str, details: Dict = None): """Log user actions for security auditing""" audit_entry = { 'timestamp': datetime.now().isoformat(), 'user_id': user_id, 'action': action, 'details': details or {}, 'ip_address': self._get_client_ip(), 'user_agent': self._get_user_agent() } self.logger.info(json.dumps(audit_entry)) def _get_client_ip(self) -> str: # Implementation depends on deployment setup return "unknown" def _get_user_agent(self) -> str: # Implementation depends on deployment setup return "unknown" # Usage audit = AuditLogger() audit.log_user_action(user_id, "login", {"success": True}) audit.log_user_action(user_id, "transcription_submit", {"filename": filename}) ``` ### Custom Extensions #### Plugin Architecture ```python class TranscriptionPlugin: """Base class for transcription plugins""" def pre_process(self, file_bytes: bytes, settings: Dict) -> bytes: """Pre-process audio before transcription""" return file_bytes def post_process(self, transcript: str, settings: Dict) -> str: """Post-process transcript text""" return transcript def get_name(self) -> str: """Return plugin name""" raise NotImplementedError class NoiseReductionPlugin(TranscriptionPlugin): def get_name(self) -> str: return "noise_reduction" def pre_process(self, file_bytes: bytes, settings: Dict) -> bytes: # Implement noise reduction using audio processing library # This is a placeholder - actual implementation would use # libraries like librosa, scipy, or pydub return file_bytes class LanguageDetectionPlugin(TranscriptionPlugin): def get_name(self) -> str: return "language_detection" def pre_process(self, file_bytes: bytes, settings: Dict) -> bytes: # Detect language and update settings detected_language = self._detect_language(file_bytes) settings['detected_language'] = detected_language return file_bytes # Plugin manager class PluginManager: def __init__(self): self.plugins: List[TranscriptionPlugin] = [] def register_plugin(self, plugin: TranscriptionPlugin): self.plugins.append(plugin) def apply_pre_processing(self, file_bytes: bytes, settings: Dict) -> bytes: for plugin in self.plugins: file_bytes = plugin.pre_process(file_bytes, settings) return file_bytes def apply_post_processing(self, transcript: str, settings: Dict) -> str: for plugin in self.plugins: transcript = plugin.post_process(transcript, settings) return transcript ``` --- ## 🔧 Troubleshooting ### Common Development Issues #### Environment Setup Problems **Issue**: Azure connection fails ```bash # Check environment variables python -c " import os print('AZURE_SPEECH_KEY:', bool(os.getenv('AZURE_SPEECH_KEY'))) print('AZURE_BLOB_CONNECTION:', bool(os.getenv('AZURE_BLOB_CONNECTION'))) " # Test Azure connection python -c " from azure.storage.blob import BlobServiceClient client = BlobServiceClient.from_connection_string('$AZURE_BLOB_CONNECTION') print('Containers:', list(client.list_containers())) " ``` **Issue**: FFmpeg not found ```bash # Check FFmpeg installation ffmpeg -version # Install FFmpeg (Ubuntu/Debian) sudo apt update && sudo apt install ffmpeg # Install FFmpeg (Windows with Chocolatey) choco install ffmpeg # Install FFmpeg (macOS with Homebrew) brew install ffmpeg ``` **Issue**: Database initialization fails ```python # Check database permissions import os db_dir = "database" if not os.path.exists(db_dir): os.makedirs(db_dir) print(f"Created directory: {db_dir}") # Test database creation import sqlite3 conn = sqlite3.connect("database/test.db") conn.execute("CREATE TABLE test (id INTEGER)") conn.close() print("Database test successful") ``` #### Runtime Issues **Issue**: Memory errors with large files ```python # Monitor memory usage import psutil def check_memory(): memory = psutil.virtual_memory() print(f"Memory usage: {memory.percent}%") print(f"Available: {memory.available / 1024**3:.1f}GB") # Implement file chunking for large uploads def process_large_file_in_chunks(file_path: str, chunk_size: int = 64*1024*1024): with open(file_path, 'rb') as f: while chunk := f.read(chunk_size): yield chunk ``` **Issue**: Transcription jobs stuck ```python # Check pending jobs def diagnose_stuck_jobs(): pending_jobs = transcription_manager.db.get_pending_jobs() print(f"Pending jobs: {len(pending_jobs)}") for job in pending_jobs: duration = datetime.now() - datetime.fromisoformat(job.created_at) print(f"Job {job.job_id}: {job.status} for {duration}") if duration.total_seconds() > 3600: # 1 hour print(f"⚠️ Job {job.job_id} may be stuck") # Reset stuck jobs def reset_stuck_jobs(): with transcription_manager.db.get_connection() as conn: conn.execute(""" UPDATE transcriptions SET status = 'pending', azure_trans_id = NULL WHERE status = 'processing' AND created_at < datetime('now', '-1 hour') """) ``` **Issue**: Azure API errors ```python # Test Azure Speech Service def test_azure_speech(): try: url = f"{AZURE_SPEECH_KEY_ENDPOINT}/speechtotext/v3.2/transcriptions" headers = {"Ocp-Apim-Subscription-Key": AZURE_SPEECH_KEY} response = requests.get(url, headers=headers) print(f"Status: {response.status_code}") print(f"Response: {response.text[:200]}") except Exception as e: print(f"Azure Speech test failed: {e}") # Check Azure service status def check_azure_status(): # Check Azure status page status_url = "https://status.azure.com/en-us/status" print(f"Check Azure status: {status_url}") ``` ### Debugging Tools #### Debug Mode Configuration ```python # Enable debug mode DEBUG = os.environ.get("DEBUG", "false").lower() == "true" if DEBUG: logging.basicConfig(level=logging.DEBUG) # Enable Gradio debug mode demo.launch(debug=True, show_error=True) ``` #### Performance Profiling ```python import cProfile import pstats def profile_function(func): """Profile function performance""" profiler = cProfile.Profile() def wrapper(*args, **kwargs): profiler.enable() result = func(*args, **kwargs) profiler.disable() # Print stats stats = pstats.Stats(profiler) stats.sort_stats('cumulative') stats.print_stats(10) # Top 10 functions return result return wrapper # Usage @profile_function def submit_transcription(self, ...): # Function implementation pass ``` #### Log Analysis ```python def analyze_logs(log_file: str = "logs/transcription.log"): """Analyze application logs for issues""" errors = [] warnings = [] performance_issues = [] with open(log_file, 'r') as f: for line in f: if 'ERROR' in line: errors.append(line.strip()) elif 'WARNING' in line: warnings.append(line.strip()) elif 'completed in' in line: # Extract timing information import re match = re.search(r'completed in (\d+\.\d+)s', line) if match and float(match.group(1)) > 30: # > 30 seconds performance_issues.append(line.strip()) print(f"Errors: {len(errors)}") print(f"Warnings: {len(warnings)}") print(f"Performance issues: {len(performance_issues)}") return { 'errors': errors[-10:], # Last 10 errors 'warnings': warnings[-10:], # Last 10 warnings 'performance_issues': performance_issues[-10:] } ``` ### Production Troubleshooting #### Service Health Check ```bash #!/bin/bash # health_check.sh echo "=== System Health Check ===" # Check service status systemctl is-active transcription systemctl is-active nginx # Check disk space df -h # Check memory usage free -h # Check CPU usage top -b -n1 | grep "Cpu(s)" # Check logs for errors tail -n 50 /home/transcription/app/logs/transcription.log | grep ERROR # Check Azure connectivity curl -s -o /dev/null -w "%{http_code}" https://azure.microsoft.com/ echo "=== Health Check Complete ===" ``` #### Database Recovery ```python def recover_database(): """Recover database from Azure backup""" try: # List available backups container_client = blob_service.get_container_client(AZURE_CONTAINER) backups = [] for blob in container_client.list_blobs(name_starts_with="shared/backups/"): backups.append({ 'name': blob.name, 'modified': blob.last_modified }) # Sort by date (newest first) backups.sort(key=lambda x: x['modified'], reverse=True) if not backups: print("No backups found") return # Download latest backup latest_backup = backups[0]['name'] print(f"Restoring from: {latest_backup}") blob_client = blob_service.get_blob_client( container=AZURE_CONTAINER, blob=latest_backup ) # Download backup with open("database/transcriptions_restored.db", "wb") as f: f.write(blob_client.download_blob().readall()) print("Database restored successfully") print("Restart the application to use restored database") except Exception as e: print(f"Database recovery failed: {str(e)}") ``` --- ## 📚 Additional Resources ### Documentation Links - [Azure Speech Services Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/) - [Azure Blob Storage Documentation](https://docs.microsoft.com/en-us/azure/storage/blobs/) - [Gradio Documentation](https://gradio.app/docs/) - [SQLite Documentation](https://www.sqlite.org/docs.html) - [FFmpeg Documentation](https://ffmpeg.org/documentation.html) ### Useful Tools - **Azure Storage Explorer**: GUI for managing blob storage - **DB Browser for SQLite**: Visual database management - **Postman**: API testing and development - **Azure CLI**: Command-line Azure management - **Visual Studio Code**: Recommended IDE with Azure extensions ### Community Resources - [Azure Speech Services Community](https://docs.microsoft.com/en-us/answers/topics/azure-speech-services.html) - [Gradio Community](https://github.com/gradio-app/gradio/discussions) - [Python Audio Processing Libraries](https://github.com/topics/audio-processing) --- **This developer guide provides comprehensive information for setting up, developing, deploying, and maintaining the Azure Speech Transcription service. For additional help, refer to the linked documentation and community resources.** 🚀