DINOv3 VITB16 CoreML FP32
CoreML conversion of facebook/dinov3-vitb16-pretrain-lvd1689m optimized for Apple Silicon.
Model Details
- Base Model: facebook/dinov3-vitb16-pretrain-lvd1689m
- Framework: CoreML
- Precision: FP32
- Input Size: 448ร448
- Model Size: 327.3 MB
Usage
Python (CoreML)
import coremltools as ct
import numpy as np
from PIL import Image
# Load model
model = ct.models.MLModel("dinov3_vitb16_448x448_fp32.mlpackage")
# Prepare image
image = Image.open("image.jpg").resize((448, 448))
# Extract features
output = model.predict({"image": image})
features = output["features"] # Shape: [1, embed_dim, grid_size, grid_size]
Swift/iOS
import CoreML
// Load model
guard let model = try? MLModel(contentsOf: modelURL) else {
fatalError("Failed to load model")
}
// Prepare image
guard let image = UIImage(named: "image.jpg") else {
fatalError("Failed to load image")
}
// Extract features
let input = try MLFeatureValue(image: image.cgImage!)
let output = try model.prediction(from: [input])
let features = output.featureValue(for: "features")?.multiArrayValue
Performance
Performance metrics on Apple Silicon:
CoreML Performance
- Throughput: 10.92 FPS
- Latency: 91.55 ยฑ 2.77 ms
- Min Latency: 84.81 ms
- Max Latency: 96.56 ms
Speedup vs PyTorch
- PyTorch: 7.65 FPS
- CoreML: 10.92 FPS
- Speedup: 1.43x faster โก
Feature Accuracy
- Cosine Similarity: 0.9846 (vs PyTorch)
- Correlation: 0.9846
- Quality: โญโญโญ Very Good - Excellent similarity
Model Specifications
- Precision: FP32
- Input Size: 448ร448
- Model Size: 327.3 MB
License
This model is released under the DINOv3 License. See LICENSE.md for details.
Citation
@article{dinov3,
title={DINOv3: A Versatile Vision Foundation Model},
author={Meta AI Research},
journal={arXiv preprint arXiv:2508.10104},
year={2025}
}
Reference: DINOv3 Paper
Key contributions:
- Gram anchoring strategy for high-quality dense feature maps
- Self-supervised learning on 1.689B images
- Superior performance on dense vision tasks
- Versatile across tasks and domains without fine-tuning
Demo Images
Input Image
Feature Visualization
The visualization shows:
- PCA projection of high-dimensional features (RGB visualization)
- Feature channel activations showing spatial patterns
- Gram matrix analysis for object similarity detection
- Side-by-side comparison with PyTorch reference implementation
This comprehensive visualization demonstrates that CoreML conversion preserves the semantic structure and feature quality of the original DINOv3 model.
Powered By DINOv3
๐ This model is powered by DINOv3 ๐
Converted by Aegis AI for optimized Apple Silicon deployment.
Related Models
- Original PyTorch Model: facebook/dinov3-vitb16-pretrain-lvd1689m
- DINOv3 License: https://ai.meta.com/resources/models-and-libraries/dinov3-license/
Last updated: 2025-11-03
- Downloads last month
- 3