# Myanmar Text-to-Speech Demo This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by [hpbyte](https://github.com/hpbyte/myanmar-tts). It's an end-to-end speech synthesis system specifically designed for the Burmese language. ## ⚠️ IMPORTANT: Model Files Required ⚠️ To use this demo, you **must** upload the following model files to the `trained_model` directory: 1. **checkpoint_latest.pth.tar** - The trained model checkpoint 2. **hparams.yml** - The hyperparameters configuration file These files can be obtained from the [original repository](https://github.com/hpbyte/myanmar-tts) or by training the model yourself. ## Configuration Instructions ### Setting up the model files: 1. **Create the directory structure**: ```bash mkdir -p trained_model ``` 2. **Download and place model files**: - Download `checkpoint_latest.pth.tar` and `hparams.yml` from the [release page](https://github.com/hpbyte/myanmar-tts/releases) - Place both files in the `trained_model` directory 3. **Verify directory structure**: ``` myanmar-tts-demo/ ├── app.py ├── requirements.txt ├── trained_model/ │ ├── checkpoint_latest.pth.tar │ └── hparams.yml └── README.md ``` 4. **Environment setup**: - Make sure the PyTorch version matches the one used for training (check `requirements.txt`) - The system requires NVIDIA GPU support for optimal performance ### Testing your configuration: After setting up the model files, run a quick test to verify everything is working correctly: ```python from synthesis import synthesize_text # Test with a simple phrase test_text = "မင်္ဂလာပါ" audio_output = synthesize_text(test_text) # If no errors occur, your configuration is correct ``` ## About the Project This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach. ### Key Features: - End-to-end Burmese text-to-speech synthesis - Built on the Tacotron 2 architecture - Custom text processing for the Myanmar language - Clean and natural-sounding voice output ## How to Use This Demo 1. **First-time setup**: - Upload the model files as mentioned above to the `trained_model` directory - Wait for the Space to rebuild 2. **Using the demo**: - Enter Burmese text in the input box - Click "Submit" to generate speech - Listen to the generated audio output ## Examples Try these example phrases: - မင်္ဂလာပါ (Hello) - မြန်မာစကားပြောစနစ်ကို ကြိုဆိုပါတယ် (Welcome to the Myanmar speech system) - ဒီစနစ်ဟာ မြန်မာစာကို အသံအဖြစ် ပြောင်းပေးနိုင်ပါတယ် (This system can convert Myanmar text to speech) ## Troubleshooting If you encounter issues: 1. **Model files not found**: Make sure you've uploaded the required model files to the `trained_model` directory. 2. **App errors**: Try restarting the Space after uploading the model files. 3. **Installation issues**: Check the Space logs for specific error messages. 4. **GPU not detected**: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled. 5. **Module import errors**: Check that all dependencies are properly installed via the `requirements.txt` file. ## Technical Details This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte. ### System Architecture: - **Text Frontend**: Processes Myanmar Unicode text - **Acoustic Model**: Tacotron 2 based sequence-to-sequence model with attention - **Vocoder**: WaveRNN or Griffin-Lim algorithm for waveform generation ## References - Original Repository: [https://github.com/hpbyte/myanmar-tts](https://github.com/hpbyte/myanmar-tts) - Paper: [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) ## Credits If you use this model in your research or application, please cite the original repository: ``` @misc{myanmar-tts, author = {Htet Pyie Sone}, title = {Myanmar Text-to-Speech}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/hpbyte/myanmar-tts}} } ```