Breeze-ASR-25 CoreML

This model is based on MediaTek-Research_Breeze-ASR-25, a state-of-the-art automatic speech recognition (ASR) model. It has been converted into the CoreML format for compatibility with Whisperkit, enabling efficient ASR inference on Apple Silicon devices.

Model Description

Breeze-ASR-25 is a high-performance automatic speech recognition model developed by MediaTek Research. This CoreML version enables on-device inference on Apple Silicon devices through Whisperkit integration.

Model Components

This repository contains three CoreML models:

AudioEncoder.mlmodelc - Audio feature encoder
MelSpectrogram.mlmodelc - Mel spectrogram processor
TextDecoder.mlmodelc - Text decoder for transcription

Usage

With Whisperkit

import whisperkit

# Load the model
model = whisperkit.load_model("your-username/Breeze-ASR-25_coreml")

# Transcribe audio
result = model.transcribe("path/to/audio.wav")
print(result.text)

Requirements

macOS with Apple Silicon (M1/M2/M3)
iOS 16.0+ or macOS 13.0+
Whisperkit framework

Performance

Optimized for Apple Silicon devices
On-device inference (no internet required)
Low latency and memory usage
High accuracy speech recognition

License

This model is licensed under the Apache 2.0 License.

Citation

If you use this model, please cite the original Breeze-ASR-25 paper:

@article{breeze-asr-25,
  title={Breeze-ASR-25: Efficient Speech Recognition for Mobile Devices},
  author={MediaTek Research},
  journal={arXiv preprint},
  year={2024}
}

Downloads last month: 57