File size: 8,055 Bytes
db21234
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
---
language:
- cy
- en
license: cc0-1.0
library_name: piper-tts
tags:
- text-to-speech
- tts
- welsh
- cymraeg
- audio
- onnx
- piper
- accessibility
- assistive-technology
- screen-reader
datasets:
- techiaith/bu-tts-cy-en
model-index:
- name: cy_GB-bu_tts
  results: []
---

# cy_GB-bu_tts - Welsh Neural Text-to-Speech

This is a Welsh (Cymraeg) neural text-to-speech model trained using [Piper](https://github.com/rhasspy/piper), a fast, local neural TTS system optimized for Raspberry Pi and other low-end devices.

**Developed by:** Uned Technolegau Iaith (Language Technologies Unit), Bangor University
**Model type:** Neural TTS (VITS-based architecture)
**Language:** Welsh (cy_GB)
**License:** CC0-1.0
**Format:** ONNX

## Model Details

- **Architecture:** Based on Piper's VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
- **Speakers:** Multi-speaker model with 3 speaker variants
- **Quality:** Medium quality (suitable for screen readers and assistive technology)
- **Model Size:** Approximately 77 MB
- **Inference Speed:** Optimized for real-time synthesis on CPU
- **Sample Rate:** 22050 Hz
- **Training Framework:** [Piper training pipeline](https://github.com/rhasspy/piper)

## Training Data

This model was trained on the [bu-tts-cy-en dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) (Bangor University Text to Speech Welsh-English dataset).

**Dataset characteristics:**
- **Size:** 10,000-100,000 samples
- **Languages:** Welsh and English (bilingual dataset)
- **License:** CC0 1.0 (Public Domain)
- **Content:** Audio recordings with corresponding text transcriptions
- **Source:** Language Technologies Unit, Bangor University

**Training data limitations:**
- Dataset consists of freely available recordings (public domain audiobooks and research-quality recordings)
- Coverage is not comprehensive across all Welsh vocabulary and contexts
- Some pronunciation patterns may be influenced by the limited speaker diversity in the training data
- Quality improvements would be possible with larger, more diverse, professionally-recorded datasets

## Intended Use

**Primary use cases:**
- Screen readers and assistive technology (particularly [NVDA integration](https://github.com/techiaith/nvda-addon))
- Accessibility tools for Welsh speakers with visual impairments
- Welsh language learning applications
- Local, offline Welsh TTS applications
- Research in Welsh speech synthesis

**Supported platforms:**
- Compatible with Piper TTS runtime
- Works with [Sonata TTS engine](https://github.com/mush42/sonata)
- ONNX Runtime on x86/x64 architectures
- Raspberry Pi and other resource-constrained devices

## Usage

### With Piper

```bash
# Download model files
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx.json

# Run synthesis
echo "Bore da, sut wyt ti?" | piper \
  --model cy_GB-bu_tts.onnx \
  --output_file output.wav
```

### With NVDA Screen Reader

Install the [techiaith Welsh Neural Voices addon for NVDA](https://github.com/techiaith/nvda-addon):

1. Download the addon from the [releases page](https://github.com/techiaith/nvda-addon/releases/latest)
2. Install and restart NVDA
3. Voices will download automatically on first run (77 MB)
4. Select "Uned Technolegau Iaith - Welsh Neural Voices" in NVDA's speech settings

### With Python (ONNX Runtime)

```python
import onnxruntime as ort
import numpy as np
import json
import wave

# Load model
session = ort.InferenceSession("cy_GB-bu_tts.onnx")

# Load config
with open("cy_GB-bu_tts.onnx.json") as f:
    config = json.load(f)

# For complete implementation, refer to:
# https://github.com/rhasspy/piper/blob/master/src/python_run/piper/voice.py
```

### With Sonata Engine

```python
from sonata import tts_engine

engine = tts_engine.TTSEngine()
engine.load_voice("cy_GB-bu_tts.onnx")

# Synthesize speech
audio = engine.synthesize("Bore da!")
engine.save_audio(audio, "output.wav")
```

## Sample Audio

Listen to voice samples at: [Piper Welsh samples](https://rhasspy.github.io/piper-samples/)

## Limitations

- **Pronunciation:** May exhibit incorrect or unusual pronunciation for some words, particularly:
  - Technical terms and neologisms
  - Place names not represented in training data
  - Words with ambiguous pronunciation rules
- **Audio Quality:** Medium quality - suitable for assistive technology but not studio-grade
- **Domain Coverage:** Best performance on general conversational text; may struggle with specialized domains
- **Expressivity:** Limited emotional range (neutral/informative tone)
- **Platform:** Optimized for CPU inference on x86/x64; ARM64 Windows not supported
- **Language Mixing:** While trained on bilingual data, best results when using pure Welsh text

## Performance

- **Real-time Factor:** < 1.0 on modern CPUs (faster than real-time synthesis)
- **Latency:** Low latency suitable for interactive applications
- **Memory Usage:** ~100 MB RAM during inference
- **Supported Platforms:** Windows 10/11 (x86/x64), Linux (x86/x64), Raspberry Pi

## Model Files

This repository contains:
- `cy_GB-bu_tts.onnx` - The neural TTS model in ONNX format
- `cy_GB-bu_tts.onnx.json` - Model configuration file (phoneme mapping, sample rate, etc.)

## Citation

If you use this model, please cite:

```bibtex
@misc{cy_GB_bu_tts_2025,
  author = {{Language Technologies Unit, Bangor University}},
  title = {cy\_GB-bu\_tts: Welsh Neural Text-to-Speech Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/techiaith/cy_GB-bu_tts}}
}

@dataset{bu_tts_cy_en_2025,
  author = {{Language Technologies Unit, Bangor University}},
  title = {Bangor University Text to Speech Welsh-English Dataset},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/techiaith/bu-tts-cy-en}}
}

@misc{piper_tts,
  author = {{Rhasspy Community}},
  title = {Piper: A fast, local neural text to speech system},
  year = {2023},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/rhasspy/piper}},
  note = {Now maintained at \url{https://github.com/OHF-Voice/piper1-gpl}}
}
```

## Acknowledgments

This work builds upon contributions from the wider open-source TTS community:

- **Piper TTS** and the **Rhasspy community** for developing the training framework and TTS architecture that makes high-quality, local neural TTS accessible
- **Musharraf Omer** for creating [Sonata TTS engine](https://github.com/mush42/sonata) and the [Sonata-NVDA addon](https://github.com/mush42/sonata-nvda), which enables seamless integration with screen readers
- Contributors to the Welsh language TTS training data
- The broader open-source speech synthesis community for advancing accessible voice technology

## License

This model is released under **CC0-1.0 (Public Domain)**. You are free to use, modify, and distribute this model for any purpose without restriction.

The training code (Piper) is licensed under MIT License.

## Contact & Support

**Organization:** Uned Technolegau Iaith / Language Technologies Unit, Bangor University
**Issues:** Report issues at [GitHub Issues](https://github.com/techiaith/nvda-addon/issues)
**Project Page:** [NVDA Welsh Neural Voices](https://github.com/techiaith/nvda-addon)

## Version History

- **2025.11.0 (Beta):** Initial public release with 3 speaker variants, medium quality

## Related Resources

- [NVDA Welsh Neural Voices Addon](https://github.com/techiaith/nvda-addon) - Screen reader integration
- [Piper TTS](https://github.com/rhasspy/piper) - Training and inference framework
- [Sonata Engine](https://github.com/mush42/sonata) - Cross-platform TTS engine
- [Training Dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) - Welsh-English TTS corpus

---

*This model was developed to support Welsh language accessibility and to preserve and promote the Welsh language through modern speech technology.*