[Bug?] The generated wav reads description transcript as well...

#25

by simzhou - opened 9 days ago

9 days ago

Above the genarated wav...
My Description:

casual random speaker, casual talk, male 40s

My target text:

i am back

simzhou

9 days ago

This issue can be reproduced randomly, with a probability of like 40%...

simzhou

9 days ago

Oh, I solved by myself!

It seems a DOT is always necessary after the description text!!!

Changing my description from:

casual random speaker, casual talk, male 40s

to:

casual random speaker, casual talk, male 40s.

WOULD PERFECTLY SOLVE THE ISSUE!

simzhou changed discussion title from [Bug] The generated wav reads description transcript as well... to [Bug?] The generated wav reads description transcript as well... 9 days ago

simzhou

9 days ago

•

edited 9 days ago

Well....
However, after adding a dot, the model would STILL READ THE DESCRIPTION text, but with a VERY LOW PROBABILITY ...

BlindTech

8 days ago

•

edited 8 days ago

I have the same issue as well. Thank you for the hint with the dot!

However, it would be nice to hear from the research team because I can reproduce the same issue in official space at https://huggingface.co/spaces/maya-research/maya1. It seems to be happening only when the actual target text is short in my case. I am trying with a short sentence And you? right now. It is not like that all the time when I use HF space, but with GUFF model it is that way all the time. This is a hard limitation as for me, cannot afford this issue in production. I really like the model, so it would be nice if this could be resolved.

Thank you in advance!

DheemanthReddy

Maya Research org 6 days ago

@bharathkumarK will help you on this

iplayfast

2 days ago

I too am having the same issue.

DheemanthReddy

Maya Research org about 10 hours ago

https://x.com/Dheemanthredy/status/1991566362813296965

bharathkumarK

Maya Research org about 9 hours ago

To explain better, the model works best when you create your description similarly as verbose as possible like the template in the description to more stable and consistent results.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment