|
|
--- |
|
|
library_name: coreml |
|
|
tags: |
|
|
- vision |
|
|
- feature-extraction |
|
|
- dinov3 |
|
|
- coreml |
|
|
- apple-silicon |
|
|
- fp16 |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# DINOv3 VITB16 CoreML FP16 |
|
|
|
|
|
CoreML conversion of [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) optimized for Apple Silicon. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: facebook/dinov3-vitb16-pretrain-lvd1689m |
|
|
- **Framework**: CoreML |
|
|
- **Precision**: FP16 |
|
|
- **Input Size**: 560Γ560 |
|
|
- **Model Size**: 163.9 MB |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Python (CoreML) |
|
|
|
|
|
```python |
|
|
import coremltools as ct |
|
|
import numpy as np |
|
|
from PIL import Image |
|
|
|
|
|
# Load model |
|
|
model = ct.models.MLModel("dinov3_vitb16_560x560_fp16.mlpackage") |
|
|
|
|
|
# Prepare image |
|
|
image = Image.open("image.jpg").resize((560, 560)) |
|
|
|
|
|
# Extract features |
|
|
output = model.predict({"image": image}) |
|
|
features = output["features"] # Shape: [1, embed_dim, grid_size, grid_size] |
|
|
``` |
|
|
|
|
|
### Swift/iOS |
|
|
|
|
|
```swift |
|
|
import CoreML |
|
|
|
|
|
// Load model |
|
|
guard let model = try? MLModel(contentsOf: modelURL) else { |
|
|
fatalError("Failed to load model") |
|
|
} |
|
|
|
|
|
// Prepare image |
|
|
guard let image = UIImage(named: "image.jpg") else { |
|
|
fatalError("Failed to load image") |
|
|
} |
|
|
|
|
|
// Extract features |
|
|
let input = try MLFeatureValue(image: image.cgImage!) |
|
|
let output = try model.prediction(from: [input]) |
|
|
let features = output.featureValue(for: "features")?.multiArrayValue |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
Performance metrics on Apple Silicon: |
|
|
|
|
|
### CoreML Performance |
|
|
|
|
|
- **Throughput**: 12.19 FPS |
|
|
- **Latency**: 82.04 Β± 0.91 ms |
|
|
- **Min Latency**: 80.72 ms |
|
|
- **Max Latency**: 85.60 ms |
|
|
|
|
|
### Speedup vs PyTorch |
|
|
|
|
|
- **PyTorch**: 3.91 FPS |
|
|
- **CoreML**: 12.19 FPS |
|
|
- **Speedup**: 3.12x faster β‘ |
|
|
|
|
|
### Feature Accuracy |
|
|
|
|
|
- **Cosine Similarity**: 0.4406 (vs PyTorch) |
|
|
- **Correlation**: 0.4406 |
|
|
|
|
|
### Model Specifications |
|
|
|
|
|
- **Precision**: FP16 |
|
|
- **Input Size**: 560Γ560 |
|
|
- **Model Size**: 163.9 MB |
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the DINOv3 License. See [LICENSE.md](LICENSE.md) for details. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{dinov3, |
|
|
title={DINOv3: A Versatile Vision Foundation Model}, |
|
|
author={Meta AI Research}, |
|
|
journal={arXiv preprint arXiv:2508.10104}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
**Reference**: [DINOv3 Paper](https://arxiv.org/pdf/2508.10104) |
|
|
|
|
|
Key contributions: |
|
|
- **Gram anchoring** strategy for high-quality dense feature maps |
|
|
- Self-supervised learning on 1.689B images |
|
|
- Superior performance on dense vision tasks |
|
|
- Versatile across tasks and domains without fine-tuning |
|
|
|
|
|
## Demo Images |
|
|
|
|
|
### Input Image |
|
|
<div align="center"> |
|
|
<img src="demo_image.png" alt="Demo Input Image" width="500"/> |
|
|
</div> |
|
|
*Sample input image for feature extraction demonstration* |
|
|
|
|
|
### Feature Visualization |
|
|
<div align="center"> |
|
|
<img src="dinov3_feature_comparison.png" alt="Feature Comparison Visualization" width="800"/> |
|
|
</div> |
|
|
|
|
|
The visualization shows: |
|
|
- **PCA projection** of high-dimensional features (RGB visualization) |
|
|
- **Feature channel activations** showing spatial patterns |
|
|
- **Gram matrix analysis** for object similarity detection |
|
|
- **Side-by-side comparison** with PyTorch reference implementation |
|
|
|
|
|
This comprehensive visualization demonstrates that CoreML conversion preserves the semantic structure and feature quality of the original DINOv3 model. |
|
|
|
|
|
## Powered By DINOv3 |
|
|
|
|
|
π **This model is powered by DINOv3** π |
|
|
|
|
|
Converted by [Aegis AI](https://github.com/Aegis-AI/Aegis-AI) for optimized Apple Silicon deployment. |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- Original PyTorch Model: [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) |
|
|
- DINOv3 License: https://ai.meta.com/resources/models-and-libraries/dinov3-license/ |
|
|
|
|
|
--- |
|
|
*Last updated: 2025-11-03* |
|
|
|