Spaces:

mrfakename
/

E2-F5-TTS

Running on Zero

mrfakename commited on May 1

Commit

9766167

verified ·

1 Parent(s): c00ee74

Sync from GitHub repo

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

Files changed (5) hide show

README_REPO.md CHANGED Viewed

@@ -112,7 +112,7 @@ docker container run --rm -it --gpus=all --mount 'type=volume,source=f5-tts,targ
 Deployment solution with Triton and TensorRT-LLM.
 #### Benchmark Results
-Decoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs.
 | Model               | Concurrency    | Avg Latency | RTF    | Mode            |
 |---------------------|----------------|-------------|--------|-----------------|

 Deployment solution with Triton and TensorRT-LLM.
 #### Benchmark Results
+Decoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs, 16 NFE.
 | Model               | Concurrency    | Avg Latency | RTF    | Mode            |
 |---------------------|----------------|-------------|--------|-----------------|

pyproject.toml CHANGED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "f5-tts"
-version = "1.1.1"
 description = "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
 readme = "README.md"
 license = {text = "MIT License"}

 [project]
 name = "f5-tts"
+version = "1.1.2"
 description = "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
 readme = "README.md"
 license = {text = "MIT License"}

src/f5_tts/runtime/triton_trtllm/README.md CHANGED Viewed

@@ -57,7 +57,7 @@ benchmark.py --output-dir $log_dir \
 ```
 ### Benchmark Results
-Decoding on a single L20 GPU, using 26 different prompt_audio/target_text pairs.
 | Model               | Concurrency    | Avg Latency | RTF    | Mode            |
 |---------------------|----------------|-------------|--------|-----------------|

 ```
 ### Benchmark Results
+Decoding on a single L20 GPU, using 26 different prompt_audio & target_text pairs, 16 NFE.
 | Model               | Concurrency    | Avg Latency | RTF    | Mode            |
 |---------------------|----------------|-------------|--------|-----------------|

src/f5_tts/runtime/triton_trtllm/benchmark.py CHANGED Viewed

@@ -168,7 +168,9 @@ def data_collator(batch, vocab_char_map, device="cuda", use_perf=False):
         ref_mel_list.append(ref_mel)
         ref_mel_len_list.append(ref_mel_len)
-        estimated_reference_target_mel_len.append(int(ref_mel.shape[0] * (1 + len(target_text) / len(prompt_text))))
     max_seq_len = max(estimated_reference_target_mel_len)
     ref_mel_batch = padded_mel_batch(ref_mel_list, max_seq_len)

         ref_mel_list.append(ref_mel)
         ref_mel_len_list.append(ref_mel_len)
+        estimated_reference_target_mel_len.append(
+            int(ref_mel.shape[0] * (1 + len(target_text.encode("utf-8")) / len(prompt_text.encode("utf-8"))))
+        )
     max_seq_len = max(estimated_reference_target_mel_len)
     ref_mel_batch = padded_mel_batch(ref_mel_list, max_seq_len)

src/f5_tts/runtime/triton_trtllm/model_repo_f5_tts/f5_tts/1/model.py CHANGED Viewed

@@ -219,7 +219,9 @@ class TritonPythonModel:
             reference_mel_len.append(mel_features.shape[1])
             estimated_reference_target_mel_len.append(
-                int(mel_features.shape[1] * (1 + len(target_text) / len(reference_text)))
             )
         max_seq_len = min(max(estimated_reference_target_mel_len), self.max_mel_len)

             reference_mel_len.append(mel_features.shape[1])
             estimated_reference_target_mel_len.append(
+                int(
+                    mel_features.shape[1] * (1 + len(target_text.encode("utf-8")) / len(reference_text.encode("utf-8")))
+                )
             )
         max_seq_len = min(max(estimated_reference_target_mel_len), self.max_mel_len)