EAR_VAE / eval /README.md
earlab's picture
Upload folder using huggingface_hub
b3c4dc3 verified

VAE Audio Evaluation

This directory contains the script and resources for evaluating the performance of models in audio reconstruction tasks. The primary script, eval_compare_matrix.py, computes a suite of objective metrics to compare the quality of audio generated by the model against the original ground truth audio.

Features

  • Comprehensive Metrics: Calculates a wide range of industry-standard and research-grade metrics:
    • Time-Domain: Scale-Invariant Signal-to-Distortion Ratio (SI-SDR).
    • Frequency-Domain: Multi-Resolution STFT Loss and Multi-Resolution Mel-Spectrogram Loss.
    • Phase: Multi-Resolution Phase Coherence (both per-channel and inter-channel for stereo).
    • Loudness: Integrated Loudness (LUFS-I), Loudness Range (LRA), and True Peak, analyzed using ffmpeg.
  • Batch Processing: Automatically discovers and processes multiple model output directories.
  • File Matching: Intelligently pairs reconstructed audio files (e.g., *_vae_rec.wav) with their corresponding ground truth files (e.g., *.wav).
  • Robust & Resilient: Handles missing files, audio processing errors, and varying sample rates gracefully.
  • Organized Output: Saves aggregated results in both machine-readable (.json) and human-readable (.txt) formats for each model.
  • Command-Line Interface: Easy-to-use CLI for specifying the input directory and other options.

Prerequisites

1. Python Environment

Ensure you have a Python environment (3.8 or newer recommended) with the required packages installed. You can install them using pip:

pip install torch torchaudio auraloss numpy

2. FFmpeg

The script relies on ffmpeg for loudness analysis. You must have ffmpeg installed and accessible in your system's PATH.

On Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

On macOS (using Homebrew):

brew install ffmpeg

On Windows: Download the executable from the official FFmpeg website and add its bin directory to your system's PATH environment variable.

You can verify the installation by running:

ffmpeg -version

Also On Conda ENv:

conda install -c conda-forge 'ffmpeg<7'

Directory Structure

The script expects a specific directory structure for the evaluation data. The root input directory should contain subdirectories, where each subdirectory represents a different model or experiment to be evaluated.

Inside each model's subdirectory, you should place the pairs of ground truth and reconstructed audio files. The script identifies pairs based on a naming convention:

  • Ground Truth: your_audio_file.wav
  • Reconstructed: your_audio_file_vae_rec.wav

Here is an example structure:

/path/to/your/evaluation_data/
β”œβ”€β”€ model_A/
β”‚   β”œβ”€β”€ song1.wav           # Ground Truth 1
β”‚   β”œβ”€β”€ song1_vae_rec.wav   # Reconstructed 1
β”‚   β”œβ”€β”€ song2.wav           # Ground Truth 2
β”‚   β”œβ”€β”€ song2_vae_rec.wav   # Reconstructed 2
β”‚   └── ...
β”œβ”€β”€ model_B/
β”‚   β”œβ”€β”€ trackA.wav
β”‚   β”œβ”€β”€ trackA_vae_rec.wav
β”‚   └── ...
└── ...

Usage

Run the evaluation script from the command line, pointing it to the root directory containing your model outputs.

python eval_compare_matrix.py --input_dir /path/to/your/evaluation_data/

Command-Line Arguments

  • --input_dir (required): The path to the root directory containing the model folders (e.g., /path/to/your/evaluation_data/).
  • --force (optional): If specified, the script will re-run the evaluation for all models, even if results files (evaluation_results.json) already exist. By default, it skips models that have already been evaluated.
  • --echo (optional): If specified, the script will print the detailed evaluation metrics for each individual audio pair during processing. By default, only the progress bar and final summary are shown.

Example

python eval/eval_compare_matrix.py --input_dir ./results/

Output

After running, the script will generate two files inside each model's directory:

  1. evaluation_results.json: A JSON file containing the aggregated average of all computed metrics. This is ideal for programmatic analysis.

    {
        "model_name": "model_A",
        "file_count": 50,
        "avg_sisdr": 15.78,
        "avg_mel_distance": 0.45,
        "avg_stft_distance": 0.89,
        "avg_per_channel_coherence": 0.95,
        "avg_interchannel_coherence": 0.92,
        "avg_gen_lufs-i": -14.2,
        "avg_gt_lufs-i": -14.0,
        ...
    }
    
  2. evaluation_summary.txt: A human-readable text file summarizing the results.

    model_name: model_A
    file_count: 50
    avg_sisdr: 15.78...
    avg_mel_distance: 0.45...
    ...
    

This allows for quick inspection of a model's performance without needing to parse the JSON.