File size: 8,055 Bytes
db21234 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
---
language:
- cy
- en
license: cc0-1.0
library_name: piper-tts
tags:
- text-to-speech
- tts
- welsh
- cymraeg
- audio
- onnx
- piper
- accessibility
- assistive-technology
- screen-reader
datasets:
- techiaith/bu-tts-cy-en
model-index:
- name: cy_GB-bu_tts
results: []
---
# cy_GB-bu_tts - Welsh Neural Text-to-Speech
This is a Welsh (Cymraeg) neural text-to-speech model trained using [Piper](https://github.com/rhasspy/piper), a fast, local neural TTS system optimized for Raspberry Pi and other low-end devices.
**Developed by:** Uned Technolegau Iaith (Language Technologies Unit), Bangor University
**Model type:** Neural TTS (VITS-based architecture)
**Language:** Welsh (cy_GB)
**License:** CC0-1.0
**Format:** ONNX
## Model Details
- **Architecture:** Based on Piper's VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
- **Speakers:** Multi-speaker model with 3 speaker variants
- **Quality:** Medium quality (suitable for screen readers and assistive technology)
- **Model Size:** Approximately 77 MB
- **Inference Speed:** Optimized for real-time synthesis on CPU
- **Sample Rate:** 22050 Hz
- **Training Framework:** [Piper training pipeline](https://github.com/rhasspy/piper)
## Training Data
This model was trained on the [bu-tts-cy-en dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) (Bangor University Text to Speech Welsh-English dataset).
**Dataset characteristics:**
- **Size:** 10,000-100,000 samples
- **Languages:** Welsh and English (bilingual dataset)
- **License:** CC0 1.0 (Public Domain)
- **Content:** Audio recordings with corresponding text transcriptions
- **Source:** Language Technologies Unit, Bangor University
**Training data limitations:**
- Dataset consists of freely available recordings (public domain audiobooks and research-quality recordings)
- Coverage is not comprehensive across all Welsh vocabulary and contexts
- Some pronunciation patterns may be influenced by the limited speaker diversity in the training data
- Quality improvements would be possible with larger, more diverse, professionally-recorded datasets
## Intended Use
**Primary use cases:**
- Screen readers and assistive technology (particularly [NVDA integration](https://github.com/techiaith/nvda-addon))
- Accessibility tools for Welsh speakers with visual impairments
- Welsh language learning applications
- Local, offline Welsh TTS applications
- Research in Welsh speech synthesis
**Supported platforms:**
- Compatible with Piper TTS runtime
- Works with [Sonata TTS engine](https://github.com/mush42/sonata)
- ONNX Runtime on x86/x64 architectures
- Raspberry Pi and other resource-constrained devices
## Usage
### With Piper
```bash
# Download model files
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx.json
# Run synthesis
echo "Bore da, sut wyt ti?" | piper \
--model cy_GB-bu_tts.onnx \
--output_file output.wav
```
### With NVDA Screen Reader
Install the [techiaith Welsh Neural Voices addon for NVDA](https://github.com/techiaith/nvda-addon):
1. Download the addon from the [releases page](https://github.com/techiaith/nvda-addon/releases/latest)
2. Install and restart NVDA
3. Voices will download automatically on first run (77 MB)
4. Select "Uned Technolegau Iaith - Welsh Neural Voices" in NVDA's speech settings
### With Python (ONNX Runtime)
```python
import onnxruntime as ort
import numpy as np
import json
import wave
# Load model
session = ort.InferenceSession("cy_GB-bu_tts.onnx")
# Load config
with open("cy_GB-bu_tts.onnx.json") as f:
config = json.load(f)
# For complete implementation, refer to:
# https://github.com/rhasspy/piper/blob/master/src/python_run/piper/voice.py
```
### With Sonata Engine
```python
from sonata import tts_engine
engine = tts_engine.TTSEngine()
engine.load_voice("cy_GB-bu_tts.onnx")
# Synthesize speech
audio = engine.synthesize("Bore da!")
engine.save_audio(audio, "output.wav")
```
## Sample Audio
Listen to voice samples at: [Piper Welsh samples](https://rhasspy.github.io/piper-samples/)
## Limitations
- **Pronunciation:** May exhibit incorrect or unusual pronunciation for some words, particularly:
- Technical terms and neologisms
- Place names not represented in training data
- Words with ambiguous pronunciation rules
- **Audio Quality:** Medium quality - suitable for assistive technology but not studio-grade
- **Domain Coverage:** Best performance on general conversational text; may struggle with specialized domains
- **Expressivity:** Limited emotional range (neutral/informative tone)
- **Platform:** Optimized for CPU inference on x86/x64; ARM64 Windows not supported
- **Language Mixing:** While trained on bilingual data, best results when using pure Welsh text
## Performance
- **Real-time Factor:** < 1.0 on modern CPUs (faster than real-time synthesis)
- **Latency:** Low latency suitable for interactive applications
- **Memory Usage:** ~100 MB RAM during inference
- **Supported Platforms:** Windows 10/11 (x86/x64), Linux (x86/x64), Raspberry Pi
## Model Files
This repository contains:
- `cy_GB-bu_tts.onnx` - The neural TTS model in ONNX format
- `cy_GB-bu_tts.onnx.json` - Model configuration file (phoneme mapping, sample rate, etc.)
## Citation
If you use this model, please cite:
```bibtex
@misc{cy_GB_bu_tts_2025,
author = {{Language Technologies Unit, Bangor University}},
title = {cy\_GB-bu\_tts: Welsh Neural Text-to-Speech Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/techiaith/cy_GB-bu_tts}}
}
@dataset{bu_tts_cy_en_2025,
author = {{Language Technologies Unit, Bangor University}},
title = {Bangor University Text to Speech Welsh-English Dataset},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/techiaith/bu-tts-cy-en}}
}
@misc{piper_tts,
author = {{Rhasspy Community}},
title = {Piper: A fast, local neural text to speech system},
year = {2023},
publisher = {GitHub},
howpublished = {\url{https://github.com/rhasspy/piper}},
note = {Now maintained at \url{https://github.com/OHF-Voice/piper1-gpl}}
}
```
## Acknowledgments
This work builds upon contributions from the wider open-source TTS community:
- **Piper TTS** and the **Rhasspy community** for developing the training framework and TTS architecture that makes high-quality, local neural TTS accessible
- **Musharraf Omer** for creating [Sonata TTS engine](https://github.com/mush42/sonata) and the [Sonata-NVDA addon](https://github.com/mush42/sonata-nvda), which enables seamless integration with screen readers
- Contributors to the Welsh language TTS training data
- The broader open-source speech synthesis community for advancing accessible voice technology
## License
This model is released under **CC0-1.0 (Public Domain)**. You are free to use, modify, and distribute this model for any purpose without restriction.
The training code (Piper) is licensed under MIT License.
## Contact & Support
**Organization:** Uned Technolegau Iaith / Language Technologies Unit, Bangor University
**Issues:** Report issues at [GitHub Issues](https://github.com/techiaith/nvda-addon/issues)
**Project Page:** [NVDA Welsh Neural Voices](https://github.com/techiaith/nvda-addon)
## Version History
- **2025.11.0 (Beta):** Initial public release with 3 speaker variants, medium quality
## Related Resources
- [NVDA Welsh Neural Voices Addon](https://github.com/techiaith/nvda-addon) - Screen reader integration
- [Piper TTS](https://github.com/rhasspy/piper) - Training and inference framework
- [Sonata Engine](https://github.com/mush42/sonata) - Cross-platform TTS engine
- [Training Dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) - Welsh-English TTS corpus
---
*This model was developed to support Welsh language accessibility and to preserve and promote the Welsh language through modern speech technology.*
|