parakeet-tdt-1.1b-onnx / HF_README.md

jenerallee78

Add verified ONNX export of Parakeet-TDT 1.1B

5729ee7 verified 15 days ago

preview code

raw

history blame contribute delete

3.9 kB

metadata

license: cc-by-4.0
language:
  - en
tags:
  - audio
  - automatic-speech-recognition
  - speech
  - onnx
  - parakeet
  - nvidia
  - transducer
pipeline_tag: automatic-speech-recognition

Parakeet-TDT 1.1B ONNX

Pre-exported ONNX models of NVIDIA's Parakeet-TDT 1.1B for immediate use without requiring Docker/NeMo export.

Model Description

Base Model: nvidia/parakeet-tdt-1.1b
Architecture: Token-and-Duration Transducer (TDT) with FastConformer encoder
Parameters: 1.1 billion
Features: 80 mel filterbank features (16kHz)
Vocabulary: 1024 SentencePiece tokens + blank token
Format: ONNX Runtime compatible (CPU/GPU)

What's Included

Quantized Models (Recommended - 3.8GB total)

encoder.int8.onnx (1.1GB) - INT8 quantized encoder with external weights
encoder.int8.weights (2.0GB) - INT8 encoder weight file
decoder.int8.onnx (7.0MB) - INT8 quantized decoder
joiner.int8.onnx (1.7MB) - INT8 quantized joiner
tokens.txt (11KB) - Vocabulary file

Full Precision Models (Optional - 8.1GB additional)

encoder.onnx (41MB) + encoder.weights (4.0GB) - FP32 encoder
decoder.onnx (28MB) - FP32 decoder
joiner.onnx (6.6MB) - FP32 joiner
639 layer weight files (various sizes)

Quick Start

Download Models

# Download all models
huggingface-cli download YOUR_USERNAME/parakeet-tdt-1.1b-onnx --local-dir ./models

# Or download just INT8 quantized (recommended)
huggingface-cli download YOUR_USERNAME/parakeet-tdt-1.1b-onnx \
  --include "encoder.int8.*" "decoder.int8.onnx" "joiner.int8.onnx" "tokens.txt" \
  --local-dir ./models

Use with ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np

# Load models
encoder_session = ort.InferenceSession("models/encoder.int8.onnx")
decoder_session = ort.InferenceSession("models/decoder.int8.onnx")  
joiner_session = ort.InferenceSession("models/joiner.int8.onnx")

# Load vocabulary
with open("models/tokens.txt") as f:
    vocab = [line.split()[0] for line in f]

# Inference (simplified example)
# ... (add mel feature extraction)
encoder_out = encoder_session.run(None, {"audio_signal": mel_features})[0]
# ... (add decoding loop with decoder and joiner)

Use with Rust

See the Swictation STT implementation for a complete Rust example using these models.

Verification

These models have been verified to produce identical outputs to the original NeMo export:

✅ Correct transcriptions on test audio
✅ MD5 checksums match original export
✅ Metadata verified (feat_dim: 80, vocab_size: 1024)

Export Details

Export Script: Available in Swictation repository

Export Method:

docker run --rm -v $(pwd):/workspace -w /workspace \
  nvcr.io/nvidia/nemo:25.07 \
  bash -c "pip install onnxruntime && python3 export_parakeet_tdt_1.1b.py"

Performance

WER: 1.39% (LibriSpeech test-clean, from base model)
Speed: 64% faster than RNNT baseline
Inference: Real-time on modern CPUs, faster on GPU

License

Same as the original NVIDIA Parakeet-TDT 1.1B model (CC-BY-4.0).

Citation

@misc{parakeet-tdt-1.1b-onnx,
  author = {Your Name},
  title = {Parakeet-TDT 1.1B ONNX Export},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/parakeet-tdt-1.1b-onnx}},
  note = {ONNX export of NVIDIA's Parakeet-TDT 1.1B model}
}

Original model:

@misc{nvidia-parakeet-tdt,
  author = {NVIDIA},
  title = {Parakeet-TDT 1.1B},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/nvidia/parakeet-tdt-1.1b}}
}