Whisper for developers

This model is a fine-tuned version of Whisper-large-v2 model specifically tuned for software developers. It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.

Model Details

This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity). Further information about this metric will be provided in upcoming paper.

Please refer to the OpenAI Whisper model card for more details about the backbone model.

Model Description

Model Sources

  • Repository: cyc9805
  • Paper: Coming soon

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing data consists of 1 hour of audio data manually recorded from AIWeek 2023, and 2 hours of audio data from developers conference video uploaded on YouTube. Note that testing data can not be provided publicly due to the privacy issue.

Metrics

Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used.
Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.

For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.

Results

Models WER CER DSWES
WhisperX-large-v2 6.89 3.66 87
WhisperX-for-developers 6.56 2.84 91
Downloads last month
7
Safetensors
Model size
2B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support