whisper large v3 turbo - ct2 int8 version

Browse files

Files changed (6) hide show

README.md +78 -3
config.json +224 -0
model.bin +3 -0
preprocessor_config.json +14 -0
tokenizer.json +0 -0
vocabulary.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: mit
----

+# Whisper Large v3 Turbo - CTranslate2
+This is a CTranslate2-optimized version of OpenAI's Whisper Large v3 Turbo model for automatic speech recognition (ASR).
+## Model Description
+This model is a converted version of the original Whisper Large v3 Turbo model, optimized for inference using CTranslate2. CTranslate2 is a C++ and Python library for efficient inference with Transformer models, providing:
+- **Faster inference**: Optimized implementations of attention mechanisms and feed-forward networks
+- **Lower memory usage**: Quantization support and memory-efficient attention
+- **Better throughput**: Batching and parallel processing optimizations
+- **Cross-platform compatibility**: Support for CPU and GPU inference
+## Conversion
+This model has been converted using the following command:
+```bash
+ct2-transformers-converter --model openai/whisper-large-v3-turbo --output_dir whisper-large-v3-turbo-ct2-int8 --quantization int8 --copy_files tokenizer.json preprocessor_config.json
+```
+The conversion includes **int8 quantization**, which provides several benefits:
+- **Reduced disk space**: Significantly smaller model size compared to the original float32 version
+- **Lower memory consumption**: Requires less RAM during inference
+- **Maintained accuracy**: Minimal quality loss while providing substantial efficiency gains
+- **Faster loading**: Reduced time to load the model from disk
+## Original Model
+This model is based on OpenAI's Whisper Large v3 Turbo, which is a state-of-the-art automatic speech recognition model that:
+- Supports 99 languages
+- Provides high-quality transcription and translation
+- Features improved accuracy and speed compared to previous Whisper versions
+- Handles various audio conditions and accents
+## Usage
+To use this model, you'll need to install CTranslate2 and the appropriate Whisper integration (faster-whisper):
+```bash
+pip install ctranslate2 faster-whisper
+```
+```python
+from faster_whisper import WhisperModel
+model_size = "path/to/whisper-large-v3-turbo-ct2"
+model = WhisperModel(model_size, device="cpu", compute_type="int8")
+segments, info = model.transcribe("audio.wav", beam_size=5)
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+```
+## Performance
+This CTranslate2 version provides significant performance improvements over the original PyTorch implementation:
+- Up to 4x faster inference
+- Reduced memory consumption
+- Support for quantization
+- Optimized for both CPU and GPU inference
+## Supported Languages
+Same as the original Whisper Large v3 Turbo:
+Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh.
+## Model Card
+- **Developed by**: OpenAI (original), converted to CT2 format
+- **Model type**: Automatic Speech Recognition
+- **Language(s)**: Multilingual (99 languages)
+- **License**: MIT
+- **Model size**: Large (1550M parameters)

config.json ADDED Viewed

	@@ -0,0 +1,224 @@

+{
+  "alignment_heads": [
+    [
+      2,
+      4
+    ],
+    [
+      2,
+      11
+    ],
+    [
+      3,
+      3
+    ],
+    [
+      3,
+      6
+    ],
+    [
+      3,
+      11
+    ],
+    [
+      3,
+      14
+    ]
+  ],
+  "lang_ids": [
+    50259,
+    50260,
+    50261,
+    50262,
+    50263,
+    50264,
+    50265,
+    50266,
+    50267,
+    50268,
+    50269,
+    50270,
+    50271,
+    50272,
+    50273,
+    50274,
+    50275,
+    50276,
+    50277,
+    50278,
+    50279,
+    50280,
+    50281,
+    50282,
+    50283,
+    50284,
+    50285,
+    50286,
+    50287,
+    50288,
+    50289,
+    50290,
+    50291,
+    50292,
+    50293,
+    50294,
+    50295,
+    50296,
+    50297,
+    50298,
+    50299,
+    50300,
+    50301,
+    50302,
+    50303,
+    50304,
+    50305,
+    50306,
+    50307,
+    50308,
+    50309,
+    50310,
+    50311,
+    50312,
+    50313,
+    50314,
+    50315,
+    50316,
+    50317,
+    50318,
+    50319,
+    50320,
+    50321,
+    50322,
+    50323,
+    50324,
+    50325,
+    50326,
+    50327,
+    50328,
+    50329,
+    50330,
+    50331,
+    50332,
+    50333,
+    50334,
+    50335,
+    50336,
+    50337,
+    50338,
+    50339,
+    50340,
+    50341,
+    50342,
+    50343,
+    50344,
+    50345,
+    50346,
+    50347,
+    50348,
+    50349,
+    50350,
+    50351,
+    50352,
+    50353,
+    50354,
+    50355,
+    50356,
+    50357,
+    50358
+  ],
+  "suppress_ids": [
+    1,
+    2,
+    7,
+    8,
+    9,
+    10,
+    14,
+    25,
+    26,
+    27,
+    28,
+    29,
+    31,
+    58,
+    59,
+    60,
+    61,
+    62,
+    63,
+    90,
+    91,
+    92,
+    93,
+    359,
+    503,
+    522,
+    542,
+    873,
+    893,
+    902,
+    918,
+    922,
+    931,
+    1350,
+    1853,
+    1982,
+    2460,
+    2627,
+    3246,
+    3253,
+    3268,
+    3536,
+    3846,
+    3961,
+    4183,
+    4667,
+    6585,
+    6647,
+    7273,
+    9061,
+    9383,
+    10428,
+    10929,
+    11938,
+    12033,
+    12331,
+    12562,
+    13793,
+    14157,
+    14635,
+    15265,
+    15618,
+    16553,
+    16604,
+    18362,
+    18956,
+    20075,
+    21675,
+    22520,
+    26130,
+    26161,
+    26435,
+    28279,
+    29464,
+    31650,
+    32302,
+    32470,
+    36865,
+    42863,
+    47425,
+    49870,
+    50254,
+    50258,
+    50359,
+    50360,
+    50361,
+    50362,
+    50363
+  ],
+  "suppress_ids_begin": [
+    220,
+    50257
+  ]
+}

model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7482f978f9493b93b3d0bf68a148a1a62ca02547e8ee981358ff8ef2de8a9418
+size 814054531

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "chunk_length": 30,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 128,
+  "hop_length": 160,
+  "n_fft": 400,
+  "n_samples": 480000,
+  "nb_max_frames": 3000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "processor_class": "WhisperProcessor",
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

vocabulary.json ADDED Viewed

The diff for this file is too large to render. See raw diff