--- library_name: coreml tags: - vision - feature-extraction - dinov3 - coreml - apple-silicon - fp32 pipeline_tag: feature-extraction --- # DINOv3 VITB16 CoreML FP32 CoreML conversion of [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) optimized for Apple Silicon. ## Model Details - **Base Model**: facebook/dinov3-vitb16-pretrain-lvd1689m - **Framework**: CoreML - **Precision**: FP32 - **Input Size**: 560×560 - **Model Size**: 327.5 MB ## Usage ### Python (CoreML) ```python import coremltools as ct import numpy as np from PIL import Image # Load model model = ct.models.MLModel("dinov3_vitb16_560x560_fp32.mlpackage") # Prepare image image = Image.open("image.jpg").resize((560, 560)) # Extract features output = model.predict({"image": image}) features = output["features"] # Shape: [1, embed_dim, grid_size, grid_size] ``` ### Swift/iOS ```swift import CoreML // Load model guard let model = try? MLModel(contentsOf: modelURL) else { fatalError("Failed to load model") } // Prepare image guard let image = UIImage(named: "image.jpg") else { fatalError("Failed to load image") } // Extract features let input = try MLFeatureValue(image: image.cgImage!) let output = try model.prediction(from: [input]) let features = output.featureValue(for: "features")?.multiArrayValue ``` ## Performance Performance metrics on Apple Silicon: ### CoreML Performance - **Throughput**: 4.44 FPS - **Latency**: 225.17 ± 24.39 ms - **Min Latency**: 192.45 ms - **Max Latency**: 320.68 ms ### Speedup vs PyTorch - **PyTorch**: 3.22 FPS - **CoreML**: 4.44 FPS - **Speedup**: 1.38x faster ⚡ ### Feature Accuracy - **Cosine Similarity**: 0.9890 (vs PyTorch) - **Correlation**: 0.9890 - **Quality**: ⭐⭐⭐ Very Good - Excellent similarity ### Model Specifications - **Precision**: FP32 - **Input Size**: 560×560 - **Model Size**: 327.5 MB ## License This model is released under the DINOv3 License. See [LICENSE.md](LICENSE.md) for details. ## Citation ```bibtex @article{dinov3, title={DINOv3: A Versatile Vision Foundation Model}, author={Meta AI Research}, journal={arXiv preprint arXiv:2508.10104}, year={2025} } ``` **Reference**: [DINOv3 Paper](https://arxiv.org/pdf/2508.10104) Key contributions: - **Gram anchoring** strategy for high-quality dense feature maps - Self-supervised learning on 1.689B images - Superior performance on dense vision tasks - Versatile across tasks and domains without fine-tuning ## Demo Images ### Input Image
Demo Input Image
*Sample input image for feature extraction demonstration* ### Feature Visualization
Feature Comparison Visualization
The visualization shows: - **PCA projection** of high-dimensional features (RGB visualization) - **Feature channel activations** showing spatial patterns - **Gram matrix analysis** for object similarity detection - **Side-by-side comparison** with PyTorch reference implementation This comprehensive visualization demonstrates that CoreML conversion preserves the semantic structure and feature quality of the original DINOv3 model. ## Powered By DINOv3 🌟 **This model is powered by DINOv3** 🌟 Converted by [Aegis AI](https://github.com/Aegis-AI/Aegis-AI) for optimized Apple Silicon deployment. ## Related Models - Original PyTorch Model: [facebook/dinov3-vitb16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m) - DINOv3 License: https://ai.meta.com/resources/models-and-libraries/dinov3-license/ --- *Last updated: 2025-11-03*