Spaces:

aungkomyat
/

mmtts

Configuration error

App Files Files Community

mmtts / README.md

aungkomyat

Update README.md

209da9d verified 6 months ago

preview code

raw

history blame contribute delete

4.61 kB

	# Myanmar Text-to-Speech Demo

	This Hugging Face Space demonstrates the Myanmar Text-to-Speech (TTS) system developed by [hpbyte](https://github.com/hpbyte/myanmar-tts). It's an end-to-end speech synthesis system specifically designed for the Burmese language.

	## ⚠️ IMPORTANT: Model Files Required ⚠️

	To use this demo, you must upload the following model files to the `trained_model` directory:

	1. checkpoint_latest.pth.tar - The trained model checkpoint
	2. hparams.yml - The hyperparameters configuration file

	These files can be obtained from the [original repository](https://github.com/hpbyte/myanmar-tts) or by training the model yourself.

	## Configuration Instructions

	### Setting up the model files:

	1. Create the directory structure:
	```bash
	mkdir -p trained_model
	```

	2. Download and place model files:
	- Download `checkpoint_latest.pth.tar` and `hparams.yml` from the [release page](https://github.com/hpbyte/myanmar-tts/releases)
	- Place both files in the `trained_model` directory

	3. Verify directory structure:
	```
	myanmar-tts-demo/
	├── app.py
	├── requirements.txt
	├── trained_model/
	│ ├── checkpoint_latest.pth.tar
	│ └── hparams.yml
	└── README.md
	```

	4. Environment setup:
	- Make sure the PyTorch version matches the one used for training (check `requirements.txt`)
	- The system requires NVIDIA GPU support for optimal performance

	### Testing your configuration:

	After setting up the model files, run a quick test to verify everything is working correctly:

	```python
	from synthesis import synthesize_text

	# Test with a simple phrase
	test_text = "မင်္ဂလာပါ"
	audio_output = synthesize_text(test_text)

	# If no errors occur, your configuration is correct
	```

	## About the Project

	This is an implementation of Tacotron 2 for Myanmar language text-to-speech synthesis. Unlike Meta's Massively Multilingual Speech (MMS) Burmese TTS, this project is specifically focused on high-quality Burmese speech synthesis using an end-to-end approach.

	### Key Features:
	- End-to-end Burmese text-to-speech synthesis
	- Built on the Tacotron 2 architecture
	- Custom text processing for the Myanmar language
	- Clean and natural-sounding voice output

	## How to Use This Demo

	1. First-time setup:
	- Upload the model files as mentioned above to the `trained_model` directory
	- Wait for the Space to rebuild

	2. Using the demo:
	- Enter Burmese text in the input box
	- Click "Submit" to generate speech
	- Listen to the generated audio output

	## Examples

	Try these example phrases:
	- မင်္ဂလာပါ (Hello)
	- မြန်မာစကားပြောစနစ်ကို ကြိုဆိုပါတယ် (Welcome to the Myanmar speech system)
	- ဒီစနစ်ဟာ မြန်မာစာကို အသံအဖြစ် ပြောင်းပေးနိုင်ပါတယ် (This system can convert Myanmar text to speech)

	## Troubleshooting

	If you encounter issues:

	1. Model files not found: Make sure you've uploaded the required model files to the `trained_model` directory.
	2. App errors: Try restarting the Space after uploading the model files.
	3. Installation issues: Check the Space logs for specific error messages.
	4. GPU not detected: This model requires GPU acceleration. Ensure your Hugging Face Space has GPU access enabled.
	5. Module import errors: Check that all dependencies are properly installed via the `requirements.txt` file.

	## Technical Details

	This model uses the Tacotron 2 architecture to generate mel spectrograms from text, which are then converted to waveforms using a vocoder. The model implementation is based on the work by hpbyte.

	### System Architecture:
	- Text Frontend: Processes Myanmar Unicode text
	- Acoustic Model: Tacotron 2 based sequence-to-sequence model with attention
	- Vocoder: WaveRNN or Griffin-Lim algorithm for waveform generation

	## References

	- Original Repository: [https://github.com/hpbyte/myanmar-tts](https://github.com/hpbyte/myanmar-tts)
	- Paper: [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)

	## Credits

	If you use this model in your research or application, please cite the original repository:

	```
	@misc{myanmar-tts,
	author = {Htet Pyie Sone},
	title = {Myanmar Text-to-Speech},
	year = {2021},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/hpbyte/myanmar-tts}}
	}
	```