# VAE Audio Evaluation This directory contains the script and resources for evaluating the performance of models in audio reconstruction tasks. The primary script, `eval_compare_matrix.py`, computes a suite of objective metrics to compare the quality of audio generated by the model against the original ground truth audio. ## Features - **Comprehensive Metrics**: Calculates a wide range of industry-standard and research-grade metrics: - **Time-Domain**: Scale-Invariant Signal-to-Distortion Ratio (SI-SDR). - **Frequency-Domain**: Multi-Resolution STFT Loss and Multi-Resolution Mel-Spectrogram Loss. - **Phase**: Multi-Resolution Phase Coherence (both per-channel and inter-channel for stereo). - **Loudness**: Integrated Loudness (LUFS-I), Loudness Range (LRA), and True Peak, analyzed using `ffmpeg`. - **Batch Processing**: Automatically discovers and processes multiple model output directories. - **File Matching**: Intelligently pairs reconstructed audio files (e.g., `*_vae_rec.wav`) with their corresponding ground truth files (e.g., `*.wav`). - **Robust & Resilient**: Handles missing files, audio processing errors, and varying sample rates gracefully. - **Organized Output**: Saves aggregated results in both machine-readable (`.json`) and human-readable (`.txt`) formats for each model. - **Command-Line Interface**: Easy-to-use CLI for specifying the input directory and other options. ## Prerequisites ### 1. Python Environment Ensure you have a Python environment (3.8 or newer recommended) with the required packages installed. You can install them using pip: ```bash pip install torch torchaudio auraloss numpy ``` ### 2. FFmpeg The script relies on `ffmpeg` for loudness analysis. You must have `ffmpeg` installed and accessible in your system's PATH. **On Ubuntu/Debian:** ```bash sudo apt update && sudo apt install ffmpeg ``` **On macOS (using Homebrew):** ```bash brew install ffmpeg ``` **On Windows:** Download the executable from the [official FFmpeg website](https://ffmpeg.org/download.html) and add its `bin` directory to your system's PATH environment variable. You can verify the installation by running: ```bash ffmpeg -version ``` **Also On Conda ENv:** ```bash conda install -c conda-forge 'ffmpeg<7' ``` ## Directory Structure The script expects a specific directory structure for the evaluation data. The root input directory should contain subdirectories, where each subdirectory represents a different model or experiment to be evaluated. Inside each model's subdirectory, you should place the pairs of ground truth and reconstructed audio files. The script identifies pairs based on a naming convention: - **Ground Truth**: `your_audio_file.wav` - **Reconstructed**: `your_audio_file_vae_rec.wav` Here is an example structure: ``` /path/to/your/evaluation_data/ ├── model_A/ │ ├── song1.wav # Ground Truth 1 │ ├── song1_vae_rec.wav # Reconstructed 1 │ ├── song2.wav # Ground Truth 2 │ ├── song2_vae_rec.wav # Reconstructed 2 │ └── ... ├── model_B/ │ ├── trackA.wav │ ├── trackA_vae_rec.wav │ └── ... └── ... ``` ## Usage Run the evaluation script from the command line, pointing it to the root directory containing your model outputs. ```bash python eval_compare_matrix.py --input_dir /path/to/your/evaluation_data/ ``` ### Command-Line Arguments - `--input_dir` (required): The path to the root directory containing the model folders (e.g., `/path/to/your/evaluation_data/`). - `--force` (optional): If specified, the script will re-run the evaluation for all models, even if results files (`evaluation_results.json`) already exist. By default, it skips models that have already been evaluated. - `--echo` (optional): If specified, the script will print the detailed evaluation metrics for each individual audio pair during processing. By default, only the progress bar and final summary are shown. ### Example ```bash python eval/eval_compare_matrix.py --input_dir ./results/ ``` ## Output After running, the script will generate two files inside each model's directory: 1. **`evaluation_results.json`**: A JSON file containing the aggregated average of all computed metrics. This is ideal for programmatic analysis. ```json { "model_name": "model_A", "file_count": 50, "avg_sisdr": 15.78, "avg_mel_distance": 0.45, "avg_stft_distance": 0.89, "avg_per_channel_coherence": 0.95, "avg_interchannel_coherence": 0.92, "avg_gen_lufs-i": -14.2, "avg_gt_lufs-i": -14.0, ... } ``` 2. **`evaluation_summary.txt`**: A human-readable text file summarizing the results. ``` model_name: model_A file_count: 50 avg_sisdr: 15.78... avg_mel_distance: 0.45... ... ``` This allows for quick inspection of a model's performance without needing to parse the JSON.