microsoft
/

rad-dino-maira-2

@@ -1,6 +1,8 @@
 ---
-library_name: transformers
 license: other
 ---
 # Model card for RAD-DINO
@@ -17,7 +19,7 @@ RAD-DINO-MAIRA-2 is the version of RAD-DINO used in [MAIRA-2: Grounded Radiology
 - **Developed by:** Microsoft Health Futures
 - **Model type:** Vision transformer
-- **License:** MSRLA
 - **Finetuned from model:** [`dinov2-base`](https://huggingface.co/facebook/dinov2-base)
 ## Uses
@@ -55,71 +57,13 @@ Underlying biases of the training datasets may not be well characterized.
 ## Getting started
-Let us first write an auxiliary function to download a chest X-ray.
-```python
->>> import requests
->>> from PIL import Image
->>> def download_sample_image() -> Image.Image:
-...     """Download chest X-ray with CC license."""
-...     base_url = "https://upload.wikimedia.org/wikipedia/commons"
-...     image_url = f"{base_url}/2/20/Chest_X-ray_in_influenza_and_Haemophilus_influenzae.jpg"
-...     headers = {"User-Agent": "RAD-DINO"}
-...     response = requests.get(image_url, headers=headers, stream=True)
-...     return Image.open(response.raw)
-...
 ```
-Now let us download the model and encode an image.
-```python
->>> import torch
->>> from transformers import AutoModel
->>> from transformers import AutoImageProcessor
->>>
->>> # Download the model
->>> repo = "microsoft/rad-dino-maira-2"
->>> model = AutoModel.from_pretrained(repo)
->>>
->>> # The processor takes a PIL image, performs resizing, center-cropping, and
->>> # intensity normalization using stats from MIMIC-CXR, and returns a
->>> # dictionary with a PyTorch tensor ready for the encoder
->>> processor = AutoImageProcessor.from_pretrained(repo)
->>>
->>> # Download and preprocess a chest X-ray
->>> image = download_sample_image()
->>> image.size  # (width, height)
-(2765, 2505)
->>> inputs = processor(images=image, return_tensors="pt")
->>>
->>> # Encode the image!
->>> with torch.inference_mode():
->>>     outputs = model(**inputs)
->>>
->>> # Look at the CLS embeddings
->>> cls_embeddings = outputs.pooler_output
->>> cls_embeddings.shape  # (batch_size, num_channels)
-torch.Size([1, 768])
 ```
-If we are interested in the feature maps, we can reshape the patch embeddings into a grid.
-We will use [`einops`](https://einops.rocks/) (install with `pip install einops`) for this.
-```python
->>> def reshape_patch_embeddings(flat_tokens: torch.Tensor) -> torch.Tensor:
-...     """Reshape flat list of patch tokens into a nice grid."""
-...     from einops import rearrange
-...     image_size = processor.crop_size["height"]
-...     patch_size = model.config.patch_size
-...     embeddings_size = image_size // patch_size
-...     patches_grid = rearrange(flat_tokens, "b (h w) c -> b c h w", h=embeddings_size)
-...     return patches_grid
-...
->>> flat_patch_embeddings = outputs.last_hidden_state[:, 1:]  # first token is CLS
->>> reshaped_patch_embeddings = reshape_patch_embeddings(flat_patch_embeddings)
->>> reshaped_patch_embeddings.shape  # (batch_size, num_channels, height, width)
-torch.Size([1, 768, 37, 37])
-```
 ## Training details

 ---
 license: other
+license_name: msrla
+license_link: https://huggingface.co/microsoft/rad-dino-maira-2/blob/main/LICENSE
+library_name: transformers
 ---
 # Model card for RAD-DINO
 - **Developed by:** Microsoft Health Futures
 - **Model type:** Vision transformer
+- **License:** [MSRLA](./LICENSE)
 - **Finetuned from model:** [`dinov2-base`](https://huggingface.co/facebook/dinov2-base)
 ## Uses
 ## Getting started
 ```
+from transformers import pipeline
+pipe = pipeline(task="image-feature-extraction", model="microsoft/rad-dino-maira-2", pool=False)
+patch_features = pipe("https://www.bhf.org.uk/-/media/images/information-support/tests/chest-x-ray/normal-chest-x-ray-620x400.jpg")
 ```
+Refer to [RAD-DINO](https://huggingface.co/microsoft/rad-dino) for a more detailed example.
 ## Training details