Spaces:
Configuration error
Configuration error
File size: 4,611 Bytes
3f7f472 04a351b 209da9d 3f7f472 04a351b 3f7f472 04a351b 209da9d 04a351b 3f7f472 04a351b 3f7f472 209da9d 3f7f472 04a351b 3f7f472 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# Myanmar Text-to-Speech Demo
This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by [hpbyte](https://github.com/hpbyte/myanmar-tts). It's an end-to-end speech synthesis system specifically designed for the Burmese language.
## โ ๏ธ IMPORTANT: Model Files Required โ ๏ธ
To use this demo, you **must** upload the following model files to the `trained_model` directory:
1. **checkpoint_latest.pth.tar** - The trained model checkpoint
2. **hparams.yml** - The hyperparameters configuration file
These files can be obtained from the [original repository](https://github.com/hpbyte/myanmar-tts) or by training the model yourself.
## Configuration Instructions
### Setting up the model files:
1. **Create the directory structure**:
```bash
mkdir -p trained_model
```
2. **Download and place model files**:
- Download `checkpoint_latest.pth.tar` and `hparams.yml` from the [release page](https://github.com/hpbyte/myanmar-tts/releases)
- Place both files in the `trained_model` directory
3. **Verify directory structure**:
```
myanmar-tts-demo/
โโโ app.py
โโโ requirements.txt
โโโ trained_model/
โ โโโ checkpoint_latest.pth.tar
โ โโโ hparams.yml
โโโ README.md
```
4. **Environment setup**:
- Make sure the PyTorch version matches the one used for training (check `requirements.txt`)
- The system requires NVIDIA GPU support for optimal performance
### Testing your configuration:
After setting up the model files, run a quick test to verify everything is working correctly:
```python
from synthesis import synthesize_text
# Test with a simple phrase
test_text = "แแแบแนแแแฌแแซ"
audio_output = synthesize_text(test_text)
# If no errors occur, your configuration is correct
```
## About the Project
This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach.
### Key Features:
- End-to-end Burmese text-to-speech synthesis
- Built on the Tacotron 2 architecture
- Custom text processing for the Myanmar language
- Clean and natural-sounding voice output
## How to Use This Demo
1. **First-time setup**:
- Upload the model files as mentioned above to the `trained_model` directory
- Wait for the Space to rebuild
2. **Using the demo**:
- Enter Burmese text in the input box
- Click "Submit" to generate speech
- Listen to the generated audio output
## Examples
Try these example phrases:
- แแแบแนแแแฌแแซ (Hello)
- แแผแแบแแฌแ
แแฌแธแแผแฑแฌแ
แแ
แบแแญแฏ แแผแญแฏแแญแฏแแซแแแบ (Welcome to the Myanmar speech system)
- แแฎแ
แแ
แบแแฌ แแผแแบแแฌแ
แฌแแญแฏ แกแแถแกแแผแ
แบ แแผแฑแฌแแบแธแแฑแธแแญแฏแแบแแซแแแบ (This system can convert Myanmar text to speech)
## Troubleshooting
If you encounter issues:
1. **Model files not found**: Make sure you've uploaded the required model files to the `trained_model` directory.
2. **App errors**: Try restarting the Space after uploading the model files.
3. **Installation issues**: Check the Space logs for specific error messages.
4. **GPU not detected**: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled.
5. **Module import errors**: Check that all dependencies are properly installed via the `requirements.txt` file.
## Technical Details
This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte.
### System Architecture:
- **Text Frontend**: Processes Myanmar Unicode text
- **Acoustic Model**: Tacotron 2 based sequence-to-sequence model with attention
- **Vocoder**: WaveRNN or Griffin-Lim algorithm for waveform generation
## References
- Original Repository: [https://github.com/hpbyte/myanmar-tts](https://github.com/hpbyte/myanmar-tts)
- Paper: [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
## Credits
If you use this model in your research or application, please cite the original repository:
```
@misc{myanmar-tts,
author = {Htet Pyie Sone},
title = {Myanmar Text-to-Speech},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/hpbyte/myanmar-tts}}
}
``` |