Parakeet-TDT 1.1B ONNX

Pre-exported ONNX models of NVIDIA's Parakeet-TDT 1.1B for immediate use without requiring Docker/NeMo export.

Model Description

  • Base Model: nvidia/parakeet-tdt-1.1b
  • Architecture: Token-and-Duration Transducer (TDT) with FastConformer encoder
  • Parameters: 1.1 billion
  • Features: 80 mel filterbank features (16kHz)
  • Vocabulary: 1024 SentencePiece tokens + blank token
  • Format: ONNX Runtime compatible (CPU/GPU)

What's Included

Quantized Models (Recommended - 3.8GB total)

  • encoder.int8.onnx (1.1GB) - INT8 quantized encoder with external weights
  • encoder.int8.weights (2.0GB) - INT8 encoder weight file
  • decoder.int8.onnx (7.0MB) - INT8 quantized decoder
  • joiner.int8.onnx (1.7MB) - INT8 quantized joiner
  • tokens.txt (11KB) - Vocabulary file

Full Precision Models (Optional - 8.1GB additional)

  • encoder.onnx (41MB) + encoder.weights (4.0GB) - FP32 encoder
  • decoder.onnx (28MB) - FP32 decoder
  • joiner.onnx (6.6MB) - FP32 joiner
  • 639 layer weight files (various sizes)

Quick Start

Download Models

# Download all models (12GB)
huggingface-cli download jenerallee78/parakeet-tdt-1.1b-onnx --local-dir ./models

# Or download just INT8 quantized (3.8GB - recommended)
huggingface-cli download jenerallee78/parakeet-tdt-1.1b-onnx \
  --include "encoder.int8.*" "decoder.int8.onnx" "joiner.int8.onnx" "tokens.txt" \
  --local-dir ./models

Use with ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np

# Load models
encoder_session = ort.InferenceSession("models/encoder.int8.onnx")
decoder_session = ort.InferenceSession("models/decoder.int8.onnx")  
joiner_session = ort.InferenceSession("models/joiner.int8.onnx")

# Load vocabulary
with open("models/tokens.txt") as f:
    vocab = [line.split()[0] for line in f]

# Inference (simplified example)
# ... (add mel feature extraction)
encoder_out = encoder_session.run(None, {"audio_signal": mel_features})[0]
# ... (add decoding loop with decoder and joiner)

Use with Rust

See the Swictation STT implementation for a complete Rust example using these models.

Verification

These models have been verified to produce identical outputs to the original NeMo export:

  • βœ… Correct transcriptions on test audio
  • βœ… MD5 checksums match original export
  • βœ… Metadata verified (feat_dim: 80, vocab_size: 1024)
  • βœ… Tested with both short (6s) and long (84s) audio

Test Results

  • Short audio: 100% accuracy - "hello world testing one two three"
  • Long audio: 95% accuracy - Complex technical article (375 tokens from 8410 frames)

Export Details

Export Script: Available in Swictation repository

Export Method:

docker run --rm -v $(pwd):/workspace -w /workspace \
  nvcr.io/nvidia/nemo:25.07 \
  bash -c "pip install onnxruntime && python3 export_parakeet_tdt_1.1b.py"

Performance

  • WER: 1.39% (LibriSpeech test-clean, from base model)
  • Speed: 64% faster than RNNT baseline
  • Inference: Real-time on modern CPUs, faster on GPU
  • Blank Rate: 28.6% (excellent for continuous speech)

Technical Details

Model Architecture

  • Encoder: 42-layer FastConformer (1024-dim)
  • Decoder: 2-layer LSTM (640-dim hidden)
  • Joiner: Feed-forward network
  • Subsampling: 8x (8410 mel frames β†’ 1051 encoder frames)

Metadata

{
  "vocab_size": 1024,
  "normalize_type": "per_feature",
  "pred_rnn_layers": 2,
  "pred_hidden": 640,
  "subsampling_factor": 8,
  "model_type": "EncDecRNNTBPEModel",
  "feat_dim": 80
}

License

Same as the original NVIDIA Parakeet-TDT 1.1B model (CC-BY-4.0).

Citation

@misc{parakeet-tdt-1.1b-onnx,
  author = {Robert Lee},
  title = {Parakeet-TDT 1.1B ONNX Export},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jenerallee78/parakeet-tdt-1.1b-onnx}},
  note = {Verified ONNX export of NVIDIA's Parakeet-TDT 1.1B model}
}

Original model:

@misc{nvidia-parakeet-tdt,
  author = {NVIDIA},
  title = {Parakeet-TDT 1.1B},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/nvidia/parakeet-tdt-1.1b}}
}

Credits

  • Base Model: NVIDIA Research
  • ONNX Export: Swictation Project
  • Verification: Extensive testing with Rust ONNX Runtime implementation
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support