Upload folder using huggingface_hub
Browse files- .gitattributes +2 -0
- README.md +94 -0
- ckpts/LongCatAudioCodec_decoder_16k_4codebooks.pt +3 -0
- ckpts/LongCatAudioCodec_decoder_24k_2codebooks.pt +3 -0
- ckpts/LongCatAudioCodec_decoder_24k_4codebooks.pt +3 -0
- ckpts/LongCatAudioCodec_decoder_24k_4codebooks_aug_sft.pt +3 -0
- ckpts/LongCatAudioCodec_encoder.pt +3 -0
- ckpts/LongCatAudioCodec_encoder_cmvn.npy +3 -0
- images/Wechat.png +3 -0
- images/codec-Frame.png +3 -0
- images/longcat_logo.svg +1 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
images/codec-Frame.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
images/Wechat.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
<div align="center">
|
| 5 |
+
<img src="images/longcat_logo.svg" alt="LongCat-Audio-Codec Logo" width="400">
|
| 6 |
+
</div>
|
| 7 |
+
<div align="center">
|
| 8 |
+
<h1 style="text-align: center; margin: 0;">LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models</h1>
|
| 9 |
+
<br>
|
| 10 |
+
<div style="display: flex; justify-content: center; flex-wrap: wrap; gap: 10px;">
|
| 11 |
+
<a href="https://github.com/meituan-longcat/LongCat-Audio-Codec/">
|
| 12 |
+
<img src="https://img.shields.io/badge/Github-LongCatAudioCodec-blue?style=flat" alt="Project Page">
|
| 13 |
+
</a>
|
| 14 |
+
<a href="https://arxiv.org/abs/2510.15227">
|
| 15 |
+
<img src="https://img.shields.io/badge/Technical%20Report-LongCatAudioCodec-green?style=flat" alt="Technique Report">
|
| 16 |
+
</a>
|
| 17 |
+
</div>
|
| 18 |
+
</div>
|
| 19 |
+
|
| 20 |
+
<div align="center" style="line-height: 1;">
|
| 21 |
+
<a href="https://github.com/meituan-longcat/LongCat-Audio-Codec/blob/main/images/Wechat.png" target="_blank" style="margin: 2px;">
|
| 22 |
+
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 23 |
+
</a>
|
| 24 |
+
<a href="https://x.com/Meituan_LongCat" target="_blank" style="margin: 2px;">
|
| 25 |
+
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 26 |
+
</a>
|
| 27 |
+
</div>
|
| 28 |
+
|
| 29 |
+
<div align="center" style="line-height: 1;">
|
| 30 |
+
<a href="https://huggingface.co/meituan-longcat/LongCat-Flash-Chat/blob/main/LICENSE" style="margin: 2px;">
|
| 31 |
+
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
| 32 |
+
</a>
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
We are excited to introduce **LongCat-Audio-Codec**, an audio tokenizer and detokenizer solution designed for speech large language models. It works by generating semantic and acoustic tokens in parallel, enabling high-fidelity audio reconstruction at extremely low bitrates with excellent backend support for Speech LLM.
|
| 40 |
+
|
| 41 |
+
This repository hosts the resources for **LongCat-Audio-Codec**. For complete documentation, installation guides, and usage examples, please visit our **[GitHub Repository](https://github.com/meituan-longcat/LongCat-Audio-Codec/)**.
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
## 🗺️ Framework
|
| 45 |
+
|
| 46 |
+
<div align="center">
|
| 47 |
+
<img src="images/codec-Frame.png" alt="Framework Diagram" width="800">
|
| 48 |
+
</div>
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
### ✨ Key Features
|
| 52 |
+
|
| 53 |
+
- **High Fidelity at Ultra-Low Bitrates:** As a codec, it achieves high-intelligibility audio reconstruction at extremely low bitrates.
|
| 54 |
+
- **Low-Frame-Rate Tokenizer:** As a tokenizer, it extracts semantic and acoustic tokens in parallel at a low frame rate of 16.6Hz, with flexible acoustic codebook configurations to adapt to different downstream tasks.
|
| 55 |
+
- **Low-Latency Streaming Detokenizer:** Equipped with a specially designed streaming detokenizer that requires minimal future information to deliver high-quality audio output with low latency.
|
| 56 |
+
- **Super-Resolution Capability:** Integrates audio super-resolution processing into the detokenizer, enabling the generation of high-quality audio with a higher sample rate if the sample rate of original input is lower than 24khz.
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
### 📦Resources
|
| 62 |
+
|
| 63 |
+
| Resourcess | Notes |
|
| 64 |
+
| ----------------------------- | ------------------------------------------------------------ |
|
| 65 |
+
| LongCatAudioCodec_encoder | encoder weights of LongCat-Audio-Codec with semantic encoder and acoustic encoder |
|
| 66 |
+
| LongCatAudioCodec_encoder_cmvn | coefficients of Cepstral Mean and Variance Normalization, used by encoder |
|
| 67 |
+
| LongCatAudioCodec_decoder16k_4codebooks | native 16k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks |
|
| 68 |
+
| LongCatAudioCodec_decoder24k_2codebooks | super-resolution 24k decoder, supply 1 semantic codebook and 1 acoustic codebook |
|
| 69 |
+
| LongCatAudioCodec_decoder24k_4codebooks | super-resolution 24k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks |
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
## 📚 Citation
|
| 73 |
+
|
| 74 |
+
If you find our work useful in your research, please consider citing:
|
| 75 |
+
|
| 76 |
+
```
|
| 77 |
+
@article{longcataudiocodec,
|
| 78 |
+
title={LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models},
|
| 79 |
+
author={Xiaohan Zhao, Hongyu Xiang, Shengze Ye, Song Li, Zhengkun Tian, Guanyu Chen, Ke Ding, Guanglu Wan},
|
| 80 |
+
journal={arXiv preprint arXiv:2510.15227},
|
| 81 |
+
organization={LongCat Team, Meituan},
|
| 82 |
+
year={2025}
|
| 83 |
+
}
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
## 📜 License
|
| 88 |
+
|
| 89 |
+
The code and models in this repository are released under the **MIT License**. This grants you broad permissions to use, copy, modify, and distribute the software, provided you include the original copyright notice. We claim no ownership over any content you generate using these models.
|
| 90 |
+
|
| 91 |
+
The software is provided "AS IS", without any warranty. You are fully accountable for your use of the models. Your usage must not involve creating or sharing any content that violates applicable laws, causes harm to individuals, disseminates personal information with harmful intent, spreads misinformation, or targets vulnerable groups.
|
| 92 |
+
|
| 93 |
+
## 📩 Contact
|
| 94 |
+
Please contact us at <a href="mailto:[email protected]">[email protected]</a> or open an issue if you have any questions.
|
ckpts/LongCatAudioCodec_decoder_16k_4codebooks.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:90c306487eb2858248f908956c3e60c6e58ad1d52bc7f675092666fd991336d6
|
| 3 |
+
size 649899298
|
ckpts/LongCatAudioCodec_decoder_24k_2codebooks.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a7e226519119f89677386a33d575a7545fa0857e798f9eae7be9f4571c4d74f3
|
| 3 |
+
size 603407876
|
ckpts/LongCatAudioCodec_decoder_24k_4codebooks.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5d88f3603e84af2f553433c9b39c78b64c669a0acb6fe354ea23c5ce58887668
|
| 3 |
+
size 603665034
|
ckpts/LongCatAudioCodec_decoder_24k_4codebooks_aug_sft.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0b2ed907fbefd6ca60cbb6078852233ce693ddddef05f1ca4e201b4fd3119189
|
| 3 |
+
size 603637648
|
ckpts/LongCatAudioCodec_encoder.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0ee1bb5564b7aa07f338d309c445ad722340533af9e706f16a31ffe6ef949c16
|
| 3 |
+
size 2189336877
|
ckpts/LongCatAudioCodec_encoder_cmvn.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98d53c6886478158ffabb1e6e8945eeb1279dc44c0a397b1a9fd17b4ef3f89ad
|
| 3 |
+
size 776
|
images/Wechat.png
ADDED
|
Git LFS Details
|
images/codec-Frame.png
ADDED
|
Git LFS Details
|
images/longcat_logo.svg
ADDED
|
|