Spaces:
Configuration error
Myanmar Text-to-Speech Demo
This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by hpbyte. It's an end-to-end speech synthesis system specifically designed for the Burmese language.
โ ๏ธ IMPORTANT: Model Files Required โ ๏ธ
To use this demo, you must upload the following model files to the trained_model directory:
- checkpoint_latest.pth.tar - The trained model checkpoint
- hparams.yml - The hyperparameters configuration file
These files can be obtained from the original repository or by training the model yourself.
Configuration Instructions
Setting up the model files:
Create the directory structure:
mkdir -p trained_modelDownload and place model files:
- Download
checkpoint_latest.pth.tarandhparams.ymlfrom the release page - Place both files in the
trained_modeldirectory
- Download
Verify directory structure:
myanmar-tts-demo/ โโโ app.py โโโ requirements.txt โโโ trained_model/ โ โโโ checkpoint_latest.pth.tar โ โโโ hparams.yml โโโ README.mdEnvironment setup:
- Make sure the PyTorch version matches the one used for training (check
requirements.txt) - The system requires NVIDIA GPU support for optimal performance
- Make sure the PyTorch version matches the one used for training (check
Testing your configuration:
After setting up the model files, run a quick test to verify everything is working correctly:
from synthesis import synthesize_text
# Test with a simple phrase
test_text = "แแแบแนแแแฌแแซ"
audio_output = synthesize_text(test_text)
# If no errors occur, your configuration is correct
About the Project
This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach.
Key Features:
- End-to-end Burmese text-to-speech synthesis
- Built on the Tacotron 2 architecture
- Custom text processing for the Myanmar language
- Clean and natural-sounding voice output
How to Use This Demo
First-time setup:
- Upload the model files as mentioned above to the
trained_modeldirectory - Wait for the Space to rebuild
- Upload the model files as mentioned above to the
Using the demo:
- Enter Burmese text in the input box
- Click "Submit" to generate speech
- Listen to the generated audio output
Examples
Try these example phrases:
- แแแบแนแแแฌแแซ (Hello)
- แแผแแบแแฌแ แแฌแธแแผแฑแฌแ แแ แบแแญแฏ แแผแญแฏแแญแฏแแซแแแบ (Welcome to the Myanmar speech system)
- แแฎแ แแ แบแแฌ แแผแแบแแฌแ แฌแแญแฏ แกแแถแกแแผแ แบ แแผแฑแฌแแบแธแแฑแธแแญแฏแแบแแซแแแบ (This system can convert Myanmar text to speech)
Troubleshooting
If you encounter issues:
- Model files not found: Make sure you've uploaded the required model files to the
trained_modeldirectory. - App errors: Try restarting the Space after uploading the model files.
- Installation issues: Check the Space logs for specific error messages.
- GPU not detected: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled.
- Module import errors: Check that all dependencies are properly installed via the
requirements.txtfile.
Technical Details
This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte.
System Architecture:
- Text Frontend: Processes Myanmar Unicode text
- Acoustic Model: Tacotron 2 based sequence-to-sequence model with attention
- Vocoder: WaveRNN or Griffin-Lim algorithm for waveform generation
References
- Original Repository: https://github.com/hpbyte/myanmar-tts
- Paper: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Credits
If you use this model in your research or application, please cite the original repository:
@misc{myanmar-tts,
author = {Htet Pyie Sone},
title = {Myanmar Text-to-Speech},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/hpbyte/myanmar-tts}}
}