Good Audio Generation space, model, dataset
Good Audio Generation space, model, dataset collection
-
Audio-to-Audio β’ Updated β’ 159k β’ 80 -
KittenML/kitten-tts-nano-0.1
Updated β’ 19.9k β’ 488 -
FunAudioLLM/ThinkSound
Video-to-Video β’ Updated β’ 47 -
ThinkSound
π302Generate audio for a video using captions and descriptions
-
Higgs Audio Demo
π€389Higgs Audio Demo
-
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech β’ 6B β’ Updated β’ 303k β’ 639 -
Song Generation
π΅493Generate a custom song from lyrics and optional prompts
-
Vui
π’183NotebookLM conversational speech model
-
Hibiki Samples
π€47Translate speech in real-time with high fidelity
-
kyutai/moshiko-pytorch-bf16
Updated β’ 806k β’ 188 -
kyutai/mimi
Feature Extraction β’ 96.2M β’ Updated β’ 395k β’ β’ 266 -
maya-research/Veena
Text-to-Speech β’ 4B β’ Updated β’ 4.4k β’ 214 -
MiniMax Speech Tech Report
π100Generate high-quality speech from text with voice cloning
-
google/magenta-realtime
Updated β’ 233 β’ 521 -
PlayDiffusion
π¨118Generate modified audio from text and voice
-
Qwen2.5 Omni 7B Demo
π358Generate text and speech from text, audio, images, and videos
-
Open ASR Leaderboard
π1.13kDisplay and request speech recognition model benchmarks
-
Open NotebookLM
π143Generate a podcast to discuss the topic of your choice!
-
Voila Demo
π»44Chat with a voice-clone AI
-
Voice Clone
π£2.5kClone a voice to speak any text
-
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech β’ 10B β’ Updated β’ 734 β’ 366 -
moonshotai/Kimi-Audio-7B
Text-to-Speech β’ 10B β’ Updated β’ 2.99k β’ 70 -
Dia 1.6B
π―1.71kGenerate realistic dialogue from a script, using Dia!
-
nari-labs/Dia-1.6B
Text-to-Speech β’ Updated β’ 197k β’ β’ 2.8k -
ByteDance/MegaTTS3
Text-to-Speech β’ Updated β’ 174 β’ 412 -
DiβͺβͺRhythm
πΆ654Blazingly Fast and Embarrassingly Simple Song Generation
-
Gemini Audio Video
β35Gemini understands audio and video!
-
nvidia/diar_sortformer_4spk-v1
Audio Classification β’ 0.1B β’ Updated β’ 5.94k β’ 112 -
ACE Step
π»597A Step Towards Music Generation Foundation Model
-
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio β’ Updated β’ 630 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any β’ 8B β’ Updated β’ 1.61k β’ 237 -
neuphonic/neutts-air
Text-to-Speech β’ 0.7B β’ Updated β’ 30.9k β’ 762 -
NeuTTS-Air
β261Generate speech from text using a reference audio sample
-
KaniTTS
π»102Generate speech from text using selected models
-
microsoft/UserLM-8b
Text Generation β’ 8B β’ Updated β’ 1.84k β’ 350 -
pipecat-ai/smart-turn-v3
Voice Activity Detection β’ Updated β’ 49 -
meituan-longcat/LongCat-Audio-Codec
Updated β’ 36