Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,8 @@ tags:
|
|
| 9 |
- llm
|
| 10 |
- phi
|
| 11 |
---
|
| 12 |
-
|
|
|
|
| 13 |
|
| 14 |
This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
|
| 15 |
|
|
@@ -27,6 +28,28 @@ This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offe
|
|
| 27 |
|
| 28 |
## Quick Start
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
### **VLM Agent** (WIP)
|
| 31 |
|
| 32 |
VLM's understanding of both text and visuals enables interactive generation and modification of plots/images, opening up new possibilities for GUI development and data visualization.
|
|
@@ -189,7 +212,7 @@ Generation: 8.56 tokens-per-sec (100 tokens / 11.6 sec)
|
|
| 189 |
### **LoRA Testing** (WIP)
|
| 190 |
|
| 191 |
```python
|
| 192 |
-
# from phi_3_vision_mlx import
|
| 193 |
|
| 194 |
test_lora(dataset_path="JosefAlbers/akemiH_MedQA_Reason"):
|
| 195 |
```
|
|
@@ -321,4 +344,4 @@ This project is licensed under the [MIT License](LICENSE).
|
|
| 321 |
|
| 322 |
## Citation
|
| 323 |
|
| 324 |
-
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>
|
|
|
|
| 9 |
- llm
|
| 10 |
- phi
|
| 11 |
---
|
| 12 |
+
|
| 13 |
+
# Phi-3-Vision for Apple MLX
|
| 14 |
|
| 15 |
This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
|
| 16 |
|
|
|
|
| 28 |
|
| 29 |
## Quick Start
|
| 30 |
|
| 31 |
+
**1. Install Phi-3 Vision MLX:**
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
git clone https://github.com/JosefAlbers/Phi-3-Vision-MLX.git
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
**2. Launch Phi-3 Vision MLX:**
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
phi3v
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Or,
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
from phi_3_vision_mlx import chatui
|
| 47 |
+
|
| 48 |
+
chatui()
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Usage
|
| 52 |
+
|
| 53 |
### **VLM Agent** (WIP)
|
| 54 |
|
| 55 |
VLM's understanding of both text and visuals enables interactive generation and modification of plots/images, opening up new possibilities for GUI development and data visualization.
|
|
|
|
| 212 |
### **LoRA Testing** (WIP)
|
| 213 |
|
| 214 |
```python
|
| 215 |
+
# from phi_3_vision_mlx import test_lora
|
| 216 |
|
| 217 |
test_lora(dataset_path="JosefAlbers/akemiH_MedQA_Reason"):
|
| 218 |
```
|
|
|
|
| 344 |
|
| 345 |
## Citation
|
| 346 |
|
| 347 |
+
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>
|