SharpAI's picture
Add dinov3_vits16 CoreML FP16 model (448×448)
0a630a8 verified
metadata
library_name: coreml
tags:
  - vision
  - feature-extraction
  - dinov3
  - coreml
  - apple-silicon
  - fp16
pipeline_tag: feature-extraction

DINOv3 VITS16 CoreML FP16

CoreML conversion of facebook/dinov3-vits16-pretrain-lvd1689m optimized for Apple Silicon.

Model Details

  • Base Model: facebook/dinov3-vits16-pretrain-lvd1689m
  • Framework: CoreML
  • Precision: FP16
  • Input Size: 448×448
  • Model Size: 41.6 MB

Usage

Python (CoreML)

import coremltools as ct
import numpy as np
from PIL import Image

# Load model
model = ct.models.MLModel("dinov3_vits16_448x448_fp16.mlpackage")

# Prepare image
image = Image.open("image.jpg").resize((448, 448))

# Extract features
output = model.predict({"image": image})
features = output["features"]  # Shape: [1, embed_dim, grid_size, grid_size]

Swift/iOS

import CoreML

// Load model
guard let model = try? MLModel(contentsOf: modelURL) else {
    fatalError("Failed to load model")
}

// Prepare image
guard let image = UIImage(named: "image.jpg") else {
    fatalError("Failed to load image")
}

// Extract features
let input = try MLFeatureValue(image: image.cgImage!)
let output = try model.prediction(from: [input])
let features = output.featureValue(for: "features")?.multiArrayValue

Performance

Performance metrics on Apple Silicon:

CoreML Performance

  • Throughput: 32.15 FPS
  • Latency: 31.10 ± 0.53 ms
  • Min Latency: 30.57 ms
  • Max Latency: 32.79 ms

Speedup vs PyTorch

  • PyTorch: 19.33 FPS
  • CoreML: 32.15 FPS
  • Speedup: 1.66x faster ⚡

Feature Accuracy

  • Cosine Similarity: 0.9847 (vs PyTorch)
  • Correlation: 0.9847
  • Quality: ⭐⭐⭐ Very Good - Excellent similarity

Model Specifications

  • Precision: FP16
  • Input Size: 448×448
  • Model Size: 41.6 MB

License

This model is released under the DINOv3 License. See LICENSE.md for details.

Citation

@article{dinov3,
  title={DINOv3: A Versatile Vision Foundation Model},
  author={Meta AI Research},
  journal={arXiv preprint arXiv:2508.10104},
  year={2025}
}

Reference: DINOv3 Paper

Key contributions:

  • Gram anchoring strategy for high-quality dense feature maps
  • Self-supervised learning on 1.689B images
  • Superior performance on dense vision tasks
  • Versatile across tasks and domains without fine-tuning

Demo Images

Input Image

Demo Input Image
*Sample input image for feature extraction demonstration*

Feature Visualization

Feature Comparison Visualization

The visualization shows:

  • PCA projection of high-dimensional features (RGB visualization)
  • Feature channel activations showing spatial patterns
  • Gram matrix analysis for object similarity detection
  • Side-by-side comparison with PyTorch reference implementation

This comprehensive visualization demonstrates that CoreML conversion preserves the semantic structure and feature quality of the original DINOv3 model.

Powered By DINOv3

🌟 This model is powered by DINOv3 🌟

Converted by Aegis AI for optimized Apple Silicon deployment.

Related Models


Last updated: 2025-10-31