mmtts / README.md
aungkomyat's picture
Update README.md
209da9d verified

Myanmar Text-to-Speech Demo

This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by hpbyte. It's an end-to-end speech synthesis system specifically designed for the Burmese language.

โš ๏ธ IMPORTANT: Model Files Required โš ๏ธ

To use this demo, you must upload the following model files to the trained_model directory:

  1. checkpoint_latest.pth.tar - The trained model checkpoint
  2. hparams.yml - The hyperparameters configuration file

These files can be obtained from the original repository or by training the model yourself.

Configuration Instructions

Setting up the model files:

  1. Create the directory structure:

    mkdir -p trained_model
    
  2. Download and place model files:

    • Download checkpoint_latest.pth.tar and hparams.yml from the release page
    • Place both files in the trained_model directory
  3. Verify directory structure:

    myanmar-tts-demo/
    โ”œโ”€โ”€ app.py
    โ”œโ”€โ”€ requirements.txt
    โ”œโ”€โ”€ trained_model/
    โ”‚   โ”œโ”€โ”€ checkpoint_latest.pth.tar
    โ”‚   โ””โ”€โ”€ hparams.yml
    โ””โ”€โ”€ README.md
    
  4. Environment setup:

    • Make sure the PyTorch version matches the one used for training (check requirements.txt)
    • The system requires NVIDIA GPU support for optimal performance

Testing your configuration:

After setting up the model files, run a quick test to verify everything is working correctly:

from synthesis import synthesize_text

# Test with a simple phrase
test_text = "แ€™แ€„แ€บแ€นแ€‚แ€œแ€ฌแ€•แ€ซ"
audio_output = synthesize_text(test_text)

# If no errors occur, your configuration is correct

About the Project

This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach.

Key Features:

  • End-to-end Burmese text-to-speech synthesis
  • Built on the Tacotron 2 architecture
  • Custom text processing for the Myanmar language
  • Clean and natural-sounding voice output

How to Use This Demo

  1. First-time setup:

    • Upload the model files as mentioned above to the trained_model directory
    • Wait for the Space to rebuild
  2. Using the demo:

    • Enter Burmese text in the input box
    • Click "Submit" to generate speech
    • Listen to the generated audio output

Examples

Try these example phrases:

  • แ€™แ€„แ€บแ€นแ€‚แ€œแ€ฌแ€•แ€ซ (Hello)
  • แ€™แ€ผแ€”แ€บแ€™แ€ฌแ€…แ€€แ€ฌแ€ธแ€•แ€ผแ€ฑแ€ฌแ€…แ€”แ€…แ€บแ€€แ€ญแ€ฏ แ€€แ€ผแ€ญแ€ฏแ€†แ€ญแ€ฏแ€•แ€ซแ€แ€šแ€บ (Welcome to the Myanmar speech system)
  • แ€’แ€ฎแ€…แ€”แ€…แ€บแ€Ÿแ€ฌ แ€™แ€ผแ€”แ€บแ€™แ€ฌแ€…แ€ฌแ€€แ€ญแ€ฏ แ€กแ€žแ€ถแ€กแ€–แ€ผแ€…แ€บ แ€•แ€ผแ€ฑแ€ฌแ€„แ€บแ€ธแ€•แ€ฑแ€ธแ€”แ€ญแ€ฏแ€„แ€บแ€•แ€ซแ€แ€šแ€บ (This system can convert Myanmar text to speech)

Troubleshooting

If you encounter issues:

  1. Model files not found: Make sure you've uploaded the required model files to the trained_model directory.
  2. App errors: Try restarting the Space after uploading the model files.
  3. Installation issues: Check the Space logs for specific error messages.
  4. GPU not detected: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled.
  5. Module import errors: Check that all dependencies are properly installed via the requirements.txt file.

Technical Details

This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte.

System Architecture:

  • Text Frontend: Processes Myanmar Unicode text
  • Acoustic Model: Tacotron 2 based sequence-to-sequence model with attention
  • Vocoder: WaveRNN or Griffin-Lim algorithm for waveform generation

References

Credits

If you use this model in your research or application, please cite the original repository:

@misc{myanmar-tts,
  author = {Htet Pyie Sone},
  title = {Myanmar Text-to-Speech},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/hpbyte/myanmar-tts}}
}