update readme
Browse files
README.md
CHANGED
|
@@ -73,10 +73,9 @@ MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github
|
|
| 73 |
<img src="https://github.com/OpenBMB/MiniCPM-o/raw/main/assets/radar.jpg" width=90% />
|
| 74 |
</div>
|
| 75 |
|
| 76 |
-
|
| 77 |
-
<summary>Click to view visual understanding results.</summary>
|
| 78 |
|
| 79 |
-
**Image Understanding
|
| 80 |
|
| 81 |
<div align="center">
|
| 82 |
<table style="margin: 0px auto;">
|
|
@@ -394,8 +393,10 @@ MiniCPM-o 2.6 can be easily used in various ways: (1) [llama.cpp](https://github
|
|
| 394 |
Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
|
| 395 |
|
| 396 |
|
| 397 |
-
**Multi-image and Video Understanding
|
| 398 |
|
|
|
|
|
|
|
| 399 |
<div align="center">
|
| 400 |
|
| 401 |
<table style="margin: 0px auto;">
|
|
@@ -497,10 +498,9 @@ Note: For proprietary models, we calculate token density based on the image enco
|
|
| 497 |
</details>
|
| 498 |
|
| 499 |
|
| 500 |
-
|
| 501 |
-
<summary>Click to view audio understanding and speech conversation results.</summary>
|
| 502 |
|
| 503 |
-
**Audio Understanding
|
| 504 |
|
| 505 |
<div align="center">
|
| 506 |
<table style="margin: 0px auto;">
|
|
@@ -624,7 +624,7 @@ Note: For proprietary models, we calculate token density based on the image enco
|
|
| 624 |
</div>
|
| 625 |
* We evaluate officially released checkpoints by ourselves.<br><br>
|
| 626 |
|
| 627 |
-
**Speech Generation
|
| 628 |
|
| 629 |
<div align="center">
|
| 630 |
<table style="margin: 0px auto;">
|
|
@@ -790,12 +790,10 @@ All results are from AudioEvals, and the evaluation methods along with further d
|
|
| 790 |
</table>
|
| 791 |
</div>
|
| 792 |
|
| 793 |
-
</details>
|
| 794 |
|
| 795 |
-
|
| 796 |
-
<summary>Click to view multimodal live streaming results.</summary>
|
| 797 |
|
| 798 |
-
**Multimodal Live Streaming
|
| 799 |
|
| 800 |
<table style="margin: 0px auto;">
|
| 801 |
<thead>
|
|
@@ -922,7 +920,6 @@ All results are from AudioEvals, and the evaluation methods along with further d
|
|
| 922 |
</tbody>
|
| 923 |
</table>
|
| 924 |
|
| 925 |
-
</details>
|
| 926 |
|
| 927 |
|
| 928 |
### Examples <!-- omit in toc -->
|
|
|
|
| 73 |
<img src="https://github.com/OpenBMB/MiniCPM-o/raw/main/assets/radar.jpg" width=90% />
|
| 74 |
</div>
|
| 75 |
|
| 76 |
+
#### Visual understanding results
|
|
|
|
| 77 |
|
| 78 |
+
**Image Understanding:**
|
| 79 |
|
| 80 |
<div align="center">
|
| 81 |
<table style="margin: 0px auto;">
|
|
|
|
| 393 |
Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.
|
| 394 |
|
| 395 |
|
| 396 |
+
**Multi-image and Video Understanding:**
|
| 397 |
|
| 398 |
+
<details>
|
| 399 |
+
<summary>click to view</summary>
|
| 400 |
<div align="center">
|
| 401 |
|
| 402 |
<table style="margin: 0px auto;">
|
|
|
|
| 498 |
</details>
|
| 499 |
|
| 500 |
|
| 501 |
+
#### Audio understanding and speech conversation results.
|
|
|
|
| 502 |
|
| 503 |
+
**Audio Understanding:**
|
| 504 |
|
| 505 |
<div align="center">
|
| 506 |
<table style="margin: 0px auto;">
|
|
|
|
| 624 |
</div>
|
| 625 |
* We evaluate officially released checkpoints by ourselves.<br><br>
|
| 626 |
|
| 627 |
+
**Speech Generation:**
|
| 628 |
|
| 629 |
<div align="center">
|
| 630 |
<table style="margin: 0px auto;">
|
|
|
|
| 790 |
</table>
|
| 791 |
</div>
|
| 792 |
|
|
|
|
| 793 |
|
| 794 |
+
#### Multimodal live streaming results.
|
|
|
|
| 795 |
|
| 796 |
+
**Multimodal Live Streaming:** results on StreamingBench
|
| 797 |
|
| 798 |
<table style="margin: 0px auto;">
|
| 799 |
<thead>
|
|
|
|
| 920 |
</tbody>
|
| 921 |
</table>
|
| 922 |
|
|
|
|
| 923 |
|
| 924 |
|
| 925 |
### Examples <!-- omit in toc -->
|