README.md · techiaith/cy_GB-bu_tts at c56ba80a01f00512142c5d9a99e0af2356ad1e3e

File size: 8,055 Bytes

db21234

---
language:
- cy
- en
license: cc0-1.0
library_name: piper-tts
tags:
- text-to-speech
- tts
- welsh
- cymraeg
- audio
- onnx
- piper
- accessibility
- assistive-technology
- screen-reader
datasets:
- techiaith/bu-tts-cy-en
model-index:
- name: cy_GB-bu_tts
  results: []
---

# cy_GB-bu_tts - Welsh Neural Text-to-Speech

This is a Welsh (Cymraeg) neural text-to-speech model trained using [Piper](https://github.com/rhasspy/piper), a fast, local neural TTS system optimized for Raspberry Pi and other low-end devices.

**Developed by:** Uned Technolegau Iaith (Language Technologies Unit), Bangor University
**Model type:** Neural TTS (VITS-based architecture)
**Language:** Welsh (cy_GB)
**License:** CC0-1.0
**Format:** ONNX

## Model Details

- **Architecture:** Based on Piper's VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
- **Speakers:** Multi-speaker model with 3 speaker variants
- **Quality:** Medium quality (suitable for screen readers and assistive technology)
- **Model Size:** Approximately 77 MB
- **Inference Speed:** Optimized for real-time synthesis on CPU
- **Sample Rate:** 22050 Hz
- **Training Framework:** [Piper training pipeline](https://github.com/rhasspy/piper)

## Training Data

This model was trained on the [bu-tts-cy-en dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) (Bangor University Text to Speech Welsh-English dataset).

**Dataset characteristics:**
- **Size:** 10,000-100,000 samples
- **Languages:** Welsh and English (bilingual dataset)
- **License:** CC0 1.0 (Public Domain)
- **Content:** Audio recordings with corresponding text transcriptions
- **Source:** Language Technologies Unit, Bangor University

**Training data limitations:**
- Dataset consists of freely available recordings (public domain audiobooks and research-quality recordings)
- Coverage is not comprehensive across all Welsh vocabulary and contexts
- Some pronunciation patterns may be influenced by the limited speaker diversity in the training data
- Quality improvements would be possible with larger, more diverse, professionally-recorded datasets

## Intended Use

**Primary use cases:**
- Screen readers and assistive technology (particularly [NVDA integration](https://github.com/techiaith/nvda-addon))
- Accessibility tools for Welsh speakers with visual impairments
- Welsh language learning applications
- Local, offline Welsh TTS applications
- Research in Welsh speech synthesis

**Supported platforms:**
- Compatible with Piper TTS runtime
- Works with [Sonata TTS engine](https://github.com/mush42/sonata)
- ONNX Runtime on x86/x64 architectures
- Raspberry Pi and other resource-constrained devices

## Usage

### With Piper

```bash
# Download model files
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx
wget https://huggingface.co/techiaith/cy_GB-bu_tts/resolve/main/cy_GB-bu_tts.onnx.json

# Run synthesis
echo "Bore da, sut wyt ti?" | piper \
  --model cy_GB-bu_tts.onnx \
  --output_file output.wav
```

### With NVDA Screen Reader

Install the [techiaith Welsh Neural Voices addon for NVDA](https://github.com/techiaith/nvda-addon):

1. Download the addon from the [releases page](https://github.com/techiaith/nvda-addon/releases/latest)
2. Install and restart NVDA
3. Voices will download automatically on first run (77 MB)
4. Select "Uned Technolegau Iaith - Welsh Neural Voices" in NVDA's speech settings

### With Python (ONNX Runtime)

```python
import onnxruntime as ort
import numpy as np
import json
import wave

# Load model
session = ort.InferenceSession("cy_GB-bu_tts.onnx")

# Load config
with open("cy_GB-bu_tts.onnx.json") as f:
    config = json.load(f)

# For complete implementation, refer to:
# https://github.com/rhasspy/piper/blob/master/src/python_run/piper/voice.py
```

### With Sonata Engine

```python
from sonata import tts_engine

engine = tts_engine.TTSEngine()
engine.load_voice("cy_GB-bu_tts.onnx")

# Synthesize speech
audio = engine.synthesize("Bore da!")
engine.save_audio(audio, "output.wav")
```

## Sample Audio

Listen to voice samples at: [Piper Welsh samples](https://rhasspy.github.io/piper-samples/)

## Limitations

- **Pronunciation:** May exhibit incorrect or unusual pronunciation for some words, particularly:
  - Technical terms and neologisms
  - Place names not represented in training data
  - Words with ambiguous pronunciation rules
- **Audio Quality:** Medium quality - suitable for assistive technology but not studio-grade
- **Domain Coverage:** Best performance on general conversational text; may struggle with specialized domains
- **Expressivity:** Limited emotional range (neutral/informative tone)
- **Platform:** Optimized for CPU inference on x86/x64; ARM64 Windows not supported
- **Language Mixing:** While trained on bilingual data, best results when using pure Welsh text

## Performance

- **Real-time Factor:** < 1.0 on modern CPUs (faster than real-time synthesis)
- **Latency:** Low latency suitable for interactive applications
- **Memory Usage:** ~100 MB RAM during inference
- **Supported Platforms:** Windows 10/11 (x86/x64), Linux (x86/x64), Raspberry Pi

## Model Files

This repository contains:
- `cy_GB-bu_tts.onnx` - The neural TTS model in ONNX format
- `cy_GB-bu_tts.onnx.json` - Model configuration file (phoneme mapping, sample rate, etc.)

## Citation

If you use this model, please cite:

```bibtex
@misc{cy_GB_bu_tts_2025,
  author = {{Language Technologies Unit, Bangor University}},
  title = {cy\_GB-bu\_tts: Welsh Neural Text-to-Speech Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/techiaith/cy_GB-bu_tts}}
}

@dataset{bu_tts_cy_en_2025,
  author = {{Language Technologies Unit, Bangor University}},
  title = {Bangor University Text to Speech Welsh-English Dataset},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/techiaith/bu-tts-cy-en}}
}

@misc{piper_tts,
  author = {{Rhasspy Community}},
  title = {Piper: A fast, local neural text to speech system},
  year = {2023},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/rhasspy/piper}},
  note = {Now maintained at \url{https://github.com/OHF-Voice/piper1-gpl}}
}
```

## Acknowledgments

This work builds upon contributions from the wider open-source TTS community:

- **Piper TTS** and the **Rhasspy community** for developing the training framework and TTS architecture that makes high-quality, local neural TTS accessible
- **Musharraf Omer** for creating [Sonata TTS engine](https://github.com/mush42/sonata) and the [Sonata-NVDA addon](https://github.com/mush42/sonata-nvda), which enables seamless integration with screen readers
- Contributors to the Welsh language TTS training data
- The broader open-source speech synthesis community for advancing accessible voice technology

## License

This model is released under **CC0-1.0 (Public Domain)**. You are free to use, modify, and distribute this model for any purpose without restriction.

The training code (Piper) is licensed under MIT License.

## Contact & Support

**Organization:** Uned Technolegau Iaith / Language Technologies Unit, Bangor University
**Issues:** Report issues at [GitHub Issues](https://github.com/techiaith/nvda-addon/issues)
**Project Page:** [NVDA Welsh Neural Voices](https://github.com/techiaith/nvda-addon)

## Version History

- **2025.11.0 (Beta):** Initial public release with 3 speaker variants, medium quality

## Related Resources

- [NVDA Welsh Neural Voices Addon](https://github.com/techiaith/nvda-addon) - Screen reader integration
- [Piper TTS](https://github.com/rhasspy/piper) - Training and inference framework
- [Sonata Engine](https://github.com/mush42/sonata) - Cross-platform TTS engine
- [Training Dataset](https://huggingface.co/datasets/techiaith/bu-tts-cy-en) - Welsh-English TTS corpus

---

*This model was developed to support Welsh language accessibility and to preserve and promote the Welsh language through modern speech technology.*