Spaces:
Configuration error
Configuration error
| # Myanmar Text-to-Speech Demo | |
| This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by [hpbyte](https://github.com/hpbyte/myanmar-tts). It's an end-to-end speech synthesis system specifically designed for the Burmese language. | |
| ## โ ๏ธ IMPORTANT: Model Files Required โ ๏ธ | |
| To use this demo, you **must** upload the following model files to the `trained_model` directory: | |
| 1. **checkpoint_latest.pth.tar** - The trained model checkpoint | |
| 2. **hparams.yml** - The hyperparameters configuration file | |
| These files can be obtained from the [original repository](https://github.com/hpbyte/myanmar-tts) or by training the model yourself. | |
| ## Configuration Instructions | |
| ### Setting up the model files: | |
| 1. **Create the directory structure**: | |
| ```bash | |
| mkdir -p trained_model | |
| ``` | |
| 2. **Download and place model files**: | |
| - Download `checkpoint_latest.pth.tar` and `hparams.yml` from the [release page](https://github.com/hpbyte/myanmar-tts/releases) | |
| - Place both files in the `trained_model` directory | |
| 3. **Verify directory structure**: | |
| ``` | |
| myanmar-tts-demo/ | |
| โโโ app.py | |
| โโโ requirements.txt | |
| โโโ trained_model/ | |
| โ โโโ checkpoint_latest.pth.tar | |
| โ โโโ hparams.yml | |
| โโโ README.md | |
| ``` | |
| 4. **Environment setup**: | |
| - Make sure the PyTorch version matches the one used for training (check `requirements.txt`) | |
| - The system requires NVIDIA GPU support for optimal performance | |
| ### Testing your configuration: | |
| After setting up the model files, run a quick test to verify everything is working correctly: | |
| ```python | |
| from synthesis import synthesize_text | |
| # Test with a simple phrase | |
| test_text = "แแแบแนแแแฌแแซ" | |
| audio_output = synthesize_text(test_text) | |
| # If no errors occur, your configuration is correct | |
| ``` | |
| ## About the Project | |
| This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach. | |
| ### Key Features: | |
| - End-to-end Burmese text-to-speech synthesis | |
| - Built on the Tacotron 2 architecture | |
| - Custom text processing for the Myanmar language | |
| - Clean and natural-sounding voice output | |
| ## How to Use This Demo | |
| 1. **First-time setup**: | |
| - Upload the model files as mentioned above to the `trained_model` directory | |
| - Wait for the Space to rebuild | |
| 2. **Using the demo**: | |
| - Enter Burmese text in the input box | |
| - Click "Submit" to generate speech | |
| - Listen to the generated audio output | |
| ## Examples | |
| Try these example phrases: | |
| - แแแบแนแแแฌแแซ (Hello) | |
| - แแผแแบแแฌแ แแฌแธแแผแฑแฌแ แแ แบแแญแฏ แแผแญแฏแแญแฏแแซแแแบ (Welcome to the Myanmar speech system) | |
| - แแฎแ แแ แบแแฌ แแผแแบแแฌแ แฌแแญแฏ แกแแถแกแแผแ แบ แแผแฑแฌแแบแธแแฑแธแแญแฏแแบแแซแแแบ (This system can convert Myanmar text to speech) | |
| ## Troubleshooting | |
| If you encounter issues: | |
| 1. **Model files not found**: Make sure you've uploaded the required model files to the `trained_model` directory. | |
| 2. **App errors**: Try restarting the Space after uploading the model files. | |
| 3. **Installation issues**: Check the Space logs for specific error messages. | |
| 4. **GPU not detected**: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled. | |
| 5. **Module import errors**: Check that all dependencies are properly installed via the `requirements.txt` file. | |
| ## Technical Details | |
| This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte. | |
| ### System Architecture: | |
| - **Text Frontend**: Processes Myanmar Unicode text | |
| - **Acoustic Model**: Tacotron 2 based sequence-to-sequence model with attention | |
| - **Vocoder**: WaveRNN or Griffin-Lim algorithm for waveform generation | |
| ## References | |
| - Original Repository: [https://github.com/hpbyte/myanmar-tts](https://github.com/hpbyte/myanmar-tts) | |
| - Paper: [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) | |
| ## Credits | |
| If you use this model in your research or application, please cite the original repository: | |
| ``` | |
| @misc{myanmar-tts, | |
| author = {Htet Pyie Sone}, | |
| title = {Myanmar Text-to-Speech}, | |
| year = {2021}, | |
| publisher = {GitHub}, | |
| journal = {GitHub repository}, | |
| howpublished = {\url{https://github.com/hpbyte/myanmar-tts}} | |
| } | |
| ``` |