DINOv3 VITB16 CoreML FP32

CoreML conversion of facebook/dinov3-vitb16-pretrain-lvd1689m optimized for Apple Silicon.

Model Details

  • Base Model: facebook/dinov3-vitb16-pretrain-lvd1689m
  • Framework: CoreML
  • Precision: FP32
  • Input Size: 448ร—448
  • Model Size: 327.3 MB

Usage

Python (CoreML)

import coremltools as ct
import numpy as np
from PIL import Image

# Load model
model = ct.models.MLModel("dinov3_vitb16_448x448_fp32.mlpackage")

# Prepare image
image = Image.open("image.jpg").resize((448, 448))

# Extract features
output = model.predict({"image": image})
features = output["features"]  # Shape: [1, embed_dim, grid_size, grid_size]

Swift/iOS

import CoreML

// Load model
guard let model = try? MLModel(contentsOf: modelURL) else {
    fatalError("Failed to load model")
}

// Prepare image
guard let image = UIImage(named: "image.jpg") else {
    fatalError("Failed to load image")
}

// Extract features
let input = try MLFeatureValue(image: image.cgImage!)
let output = try model.prediction(from: [input])
let features = output.featureValue(for: "features")?.multiArrayValue

Performance

Performance metrics on Apple Silicon:

CoreML Performance

  • Throughput: 10.92 FPS
  • Latency: 91.55 ยฑ 2.77 ms
  • Min Latency: 84.81 ms
  • Max Latency: 96.56 ms

Speedup vs PyTorch

  • PyTorch: 7.65 FPS
  • CoreML: 10.92 FPS
  • Speedup: 1.43x faster โšก

Feature Accuracy

  • Cosine Similarity: 0.9846 (vs PyTorch)
  • Correlation: 0.9846
  • Quality: โญโญโญ Very Good - Excellent similarity

Model Specifications

  • Precision: FP32
  • Input Size: 448ร—448
  • Model Size: 327.3 MB

License

This model is released under the DINOv3 License. See LICENSE.md for details.

Citation

@article{dinov3,
  title={DINOv3: A Versatile Vision Foundation Model},
  author={Meta AI Research},
  journal={arXiv preprint arXiv:2508.10104},
  year={2025}
}

Reference: DINOv3 Paper

Key contributions:

  • Gram anchoring strategy for high-quality dense feature maps
  • Self-supervised learning on 1.689B images
  • Superior performance on dense vision tasks
  • Versatile across tasks and domains without fine-tuning

Demo Images

Input Image

Demo Input Image
*Sample input image for feature extraction demonstration*

Feature Visualization

Feature Comparison Visualization

The visualization shows:

  • PCA projection of high-dimensional features (RGB visualization)
  • Feature channel activations showing spatial patterns
  • Gram matrix analysis for object similarity detection
  • Side-by-side comparison with PyTorch reference implementation

This comprehensive visualization demonstrates that CoreML conversion preserves the semantic structure and feature quality of the original DINOv3 model.

Powered By DINOv3

๐ŸŒŸ This model is powered by DINOv3 ๐ŸŒŸ

Converted by Aegis AI for optimized Apple Silicon deployment.

Related Models


Last updated: 2025-11-03

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support