File size: 4,611 Bytes
3f7f472
 
 
 
04a351b
 
 
 
 
 
 
 
 
209da9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f7f472
 
 
 
 
 
 
 
 
 
 
 
04a351b
 
 
 
 
 
 
 
3f7f472
 
 
 
 
 
 
 
04a351b
 
 
 
 
 
 
209da9d
 
04a351b
 
3f7f472
04a351b
3f7f472
209da9d
 
 
 
 
3f7f472
 
 
 
 
04a351b
3f7f472
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# Myanmar Text-to-Speech Demo

This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by [hpbyte](https://github.com/hpbyte/myanmar-tts). It's an end-to-end speech synthesis system specifically designed for the Burmese language.

## โš ๏ธ IMPORTANT: Model Files Required โš ๏ธ

To use this demo, you **must** upload the following model files to the `trained_model` directory:

1. **checkpoint_latest.pth.tar** - The trained model checkpoint
2. **hparams.yml** - The hyperparameters configuration file

These files can be obtained from the [original repository](https://github.com/hpbyte/myanmar-tts) or by training the model yourself.

## Configuration Instructions

### Setting up the model files:

1. **Create the directory structure**:
   ```bash
   mkdir -p trained_model
   ```

2. **Download and place model files**:
   - Download `checkpoint_latest.pth.tar` and `hparams.yml` from the [release page](https://github.com/hpbyte/myanmar-tts/releases)
   - Place both files in the `trained_model` directory

3. **Verify directory structure**:
   ```
   myanmar-tts-demo/
   โ”œโ”€โ”€ app.py
   โ”œโ”€โ”€ requirements.txt
   โ”œโ”€โ”€ trained_model/
   โ”‚   โ”œโ”€โ”€ checkpoint_latest.pth.tar
   โ”‚   โ””โ”€โ”€ hparams.yml
   โ””โ”€โ”€ README.md
   ```

4. **Environment setup**:
   - Make sure the PyTorch version matches the one used for training (check `requirements.txt`)
   - The system requires NVIDIA GPU support for optimal performance

### Testing your configuration:

After setting up the model files, run a quick test to verify everything is working correctly:

```python
from synthesis import synthesize_text

# Test with a simple phrase
test_text = "แ€™แ€„แ€บแ€นแ€‚แ€œแ€ฌแ€•แ€ซ"
audio_output = synthesize_text(test_text)

# If no errors occur, your configuration is correct
```

## About the Project

This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach.

### Key Features:
- End-to-end Burmese text-to-speech synthesis
- Built on the Tacotron 2 architecture
- Custom text processing for the Myanmar language
- Clean and natural-sounding voice output

## How to Use This Demo

1. **First-time setup**: 
   - Upload the model files as mentioned above to the `trained_model` directory
   - Wait for the Space to rebuild

2. **Using the demo**:
   - Enter Burmese text in the input box
   - Click "Submit" to generate speech
   - Listen to the generated audio output

## Examples

Try these example phrases:
- แ€™แ€„แ€บแ€นแ€‚แ€œแ€ฌแ€•แ€ซ (Hello)
- แ€™แ€ผแ€”แ€บแ€™แ€ฌแ€…แ€€แ€ฌแ€ธแ€•แ€ผแ€ฑแ€ฌแ€…แ€”แ€…แ€บแ€€แ€ญแ€ฏ แ€€แ€ผแ€ญแ€ฏแ€†แ€ญแ€ฏแ€•แ€ซแ€แ€šแ€บ (Welcome to the Myanmar speech system)
- แ€’แ€ฎแ€…แ€”แ€…แ€บแ€Ÿแ€ฌ แ€™แ€ผแ€”แ€บแ€™แ€ฌแ€…แ€ฌแ€€แ€ญแ€ฏ แ€กแ€žแ€ถแ€กแ€–แ€ผแ€…แ€บ แ€•แ€ผแ€ฑแ€ฌแ€„แ€บแ€ธแ€•แ€ฑแ€ธแ€”แ€ญแ€ฏแ€„แ€บแ€•แ€ซแ€แ€šแ€บ (This system can convert Myanmar text to speech)

## Troubleshooting

If you encounter issues:

1. **Model files not found**: Make sure you've uploaded the required model files to the `trained_model` directory.
2. **App errors**: Try restarting the Space after uploading the model files.
3. **Installation issues**: Check the Space logs for specific error messages.
4. **GPU not detected**: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled.
5. **Module import errors**: Check that all dependencies are properly installed via the `requirements.txt` file.

## Technical Details

This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte.

### System Architecture:
- **Text Frontend**: Processes Myanmar Unicode text
- **Acoustic Model**: Tacotron 2 based sequence-to-sequence model with attention
- **Vocoder**: WaveRNN or Griffin-Lim algorithm for waveform generation

## References

- Original Repository: [https://github.com/hpbyte/myanmar-tts](https://github.com/hpbyte/myanmar-tts)
- Paper: [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)

## Credits

If you use this model in your research or application, please cite the original repository:

```
@misc{myanmar-tts,
  author = {Htet Pyie Sone},
  title = {Myanmar Text-to-Speech},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/hpbyte/myanmar-tts}}
}
```