Echolancer-v0.1-zs

This is a TTS model trained on approximately ~5-7k hours of private labeled data, finetuned from the base model; it's conditioned on SpeechBrain ECAPA embeddings. This model has 177M parameters and on single AMD Instinct MI300X with the ROCm PyTorch Training v25.7 container, fine-tuning for 52k steps -- almost one epoch -- took a little under 4 hours. It's capable of zero-shot voice cloning with a reference clip

The training objective was standard next-token prediction on concatenated text-audio tokens.

Code

For more information including a Colab notebook, see the repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ZDisket/echolancer-v0.1-zs

Base model

ZDisket/echolancer-v0.1-base

Finetuned

(1)

this model