[Bug?] The generated wav reads description transcript as well...

#25
by simzhou - opened

Above the genarated wav...
My Description:

casual random speaker, casual talk, male 40s

My target text:

i am back

This issue can be reproduced randomly, with a probability of like 40%...

Oh, I solved by myself!

It seems a DOT is always necessary after the description text!!!

Changing my description from:

casual random speaker, casual talk, male 40s

to:

casual random speaker, casual talk, male 40s.

WOULD PERFECTLY SOLVE THE ISSUE!

simzhou changed discussion title from [Bug] The generated wav reads description transcript as well... to [Bug?] The generated wav reads description transcript as well...

Well....
However, after adding a dot, the model would STILL READ THE DESCRIPTION text, but with a VERY LOW PROBABILITY ...

I have the same issue as well. Thank you for the hint with the dot!

However, it would be nice to hear from the research team because I can reproduce the same issue in official space at https://huggingface.co/spaces/maya-research/maya1. It seems to be happening only when the actual target text is short in my case. I am trying with a short sentence And you? right now. It is not like that all the time when I use HF space, but with GUFF model it is that way all the time. This is a hard limitation as for me, cannot afford this issue in production. I really like the model, so it would be nice if this could be resolved.

Thank you in advance!

Maya Research org

@bharathkumarK will help you on this

I too am having the same issue.

Maya Research org

To explain better, the model works best when you create your description similarly as verbose as possible like the template in the description to more stable and consistent results.

Sign up or log in to comment