Echolancer-v0.1-zs
This is a TTS model trained on approximately ~5-7k hours of private labeled data, finetuned from the base model; it's conditioned on SpeechBrain ECAPA embeddings. This model has 177M parameters and on single AMD Instinct MI300X with the ROCm PyTorch Training v25.7 container, fine-tuning for 52k steps -- almost one epoch -- took a little under 4 hours. It's capable of zero-shot voice cloning with a reference clip
The training objective was standard next-token prediction on concatenated text-audio tokens.
Code
For more information including a Colab notebook, see the repository.
Model tree for ZDisket/echolancer-v0.1-zs
Base model
ZDisket/echolancer-v0.1-base