torch soundfile transformers datasets>=3.5.0,<4.0.0 numpy==1.26.4 sentencepiece>=0.2.0