linagora
/

linto_stt_fr_fastconformer

Automatic Speech Recognition

Model card Files Files and versions

AudranB commited on Jun 6

Commit

96d8997

·

verified ·

1 Parent(s): 44e4584

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -123,11 +123,13 @@ The evaluation code is available in the [ASR Benchmark repository](https://githu
 WER was computed **without punctuation or uppercase letters** and datasets were cleaned.
 The [SUMM-RE dataset](https://huggingface.co/datasets/linagora/SUMM-RE) is the only one used **exclusively for evaluation**, meaning neither model saw it during training.
-Evaluations can be very long (especially for whisper) so we used a subset of the test split for most datasets:
-- 15% of CommonVoice
-- 33% of MultiLingual LibriSpeech
-- 33% of SUMM-RE
-- 33% of VoxPopuli
 ![WER table](https://huggingface.co/linagora/linto_stt_fr_fastconformer/resolve/main/assets/wer_table.png)

 WER was computed **without punctuation or uppercase letters** and datasets were cleaned.
 The [SUMM-RE dataset](https://huggingface.co/datasets/linagora/SUMM-RE) is the only one used **exclusively for evaluation**, meaning neither model saw it during training.
+Evaluations can be very long (especially for whisper) so we selected only segments with a duration over 1 second and used a subset of the test split for most datasets:
+- 15% of CommonVoice: 2424 rows (3.9h)
+- 33% of MultiLingual LibriSpeech: 800 rows (3.3h)
+- 33% of SUMM-RE: 1004 rows (2h). We selected only segments above 4 seconds to ensure quality.
+- 33% of VoxPopuli: 678 rows (1.6h)
+- Multilingual TEDx: 972 rows (1.5h)
+- 50% of our internal Youtube corpus: 956 rows (1h)
 ![WER table](https://huggingface.co/linagora/linto_stt_fr_fastconformer/resolve/main/assets/wer_table.png)