metadata
license: mit
language:
- ru
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- asr
- gigaam
- stt
- ru
- ctc
- audio
- speech
GigaAM-v2-CTC 🤗 Hugging Face transformers
This is an unofficial Transformers wrapper for the original GigaAM-v2-CTC model released by SberDevices.
- original git https://github.com/salute-developers/GigaAM
Model info
This is GigaAM-v2-CTC with transformers library interface.
File gigaam_transformers.py contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below).
Installation
my lib versions:
torch2.7.1torchaudio2.7.1transformers4.49.0accelerate1.5.2
Usage
Usage is same as other transformers ASR models.
from transformers import AutoModel, AutoProcessor
import torch
import torchaudio
# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)
# load model and processor
processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-ctc", trust_remote_code=True)
model = AutoModel.from_pretrained("waveletdeboshir/gigaam-ctc", trust_remote_code=True)
model.eval()
input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")
# predict
with torch.no_grad():
logits = model(**input_features).logits
# greedy decoding
greedy_ids = logits.argmax(dim=-1)
# decode token ids to text
transcription = processor.batch_decode(greedy_ids)[0]