ChexFract: Specialized Vision-Language Models for Fracture Detection in Chest X-rays

This repository contains the pre-trained models from our paper "ChexFract: From General to Specialized - Enhancing Fracture Description Generation in Medical AI".

πŸ“‹ Overview

ChexFract models are specialized vision-language models fine-tuned for accurate fracture detection and description in chest X-ray images. These models significantly outperform general-purpose radiology report generation systems on fracture-specific tasks.

πŸ† Model Performance

Released Models

We release two best-performing models, each optimized for their respective encoder architecture:

  1. ChexFract-MAIRA-2 (Best F1-Score with MAIRA-2 encoder)

    • Configuration: Templated text + Fine-tuned encoder (unfrozen)
    • ROC-AUC: 0.713
    • F1-Score: 0.629
    • Accuracy: 0.748
    • Precision: 0.682
    • Recall: 0.584
  2. ChexFract-CheXagent (Best F1-Score with CheXagent encoder)

    • Configuration: Templated text + Fine-tuned encoder (unfrozen)
    • ROC-AUC: 0.697
    • F1-Score: 0.591
    • Accuracy: 0.752
    • Precision: 0.750
    • Recall: 0.487

πŸš€ Quick Start

Installation

pip install torch torchvision transformers pillow

Basic Usage

Using CheXagent encoder model:

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image

# Load model and processor
model = AutoModelForCausalLM.from_pretrained("AIRI-Institute/chexfract-chexagent", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("AIRI-Institute/chexfract-chexagent", trust_remote_code=True)

messages = [{"role": "user", "content": "<|image_1|>\nDescribe bones on this chest X-ray"}]

# Load chest X-ray image
image = Image.open("chest_xray.png")
prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate fracture description
inputs = processor(prompt, image, return_tensors="pt")
outputs = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, max_new_tokens=1024)
description = processor.decode(outputs[0, inputs['input_ids'].shape[1]:], skip_special_tokens=True)

print(f"Fracture description: {description}")

πŸ“ˆ Performance Comparison

Model ROC-AUC F1-Score Accuracy Precision Recall
General MAIRA-2 (baseline) 0.518 0.085 0.645 0.777 0.045
ChexFract-MAIRA-2 0.713 0.629 0.748 0.682 0.584
General CheXagent (baseline) 0.604 0.376 0.700 0.791 0.246
ChexFract-CheXagent 0.697 0.591 0.752 0.750 0.487

πŸ”¬ Model Architecture

Both models share the same architecture but use different visual encoders:

⚠️ Limitations and Clinical Use

Important: These models are designed for research purposes. They are NOT intended for standalone diagnostic use.

πŸ“ Citation

If you use these models in your research, please cite:

@article{chexfract2025,
  title={ChexFract: From General to Specialized - Enhancing Fracture Description Generation in Medical AI},
  author={Nechaev, Nikolay and Przhezdzetskaia, Evgeniia and Umerenkov, Dmitry and Dylov, Dmitry V.},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025},
  institution={Artificial Intelligence Research Institute (AIRI)}
}

πŸ“„ License

Model License

Important: These models are derivative works based on multiple pre-trained models. The license for these models is subject to the most restrictive terms among the base model licenses.

Effective License: These models are provided under terms compatible with the most restrictive license among the base model licenses. Users must comply with ALL applicable base model licenses.

Base Model Licenses:

⚠️ IMPORTANT - Commercial Use Restrictions:

  • CheXagent-2-3b uses CC-BY-NC-4.0, which PROHIBITS commercial use without explicit permission
  • Rad-DINO (MAIRA-2) uses MSRLA, which typically has restrictions on commercial use without permission
  • Phi-3.5 uses MIT License, which allows commercial use

The most restrictive license applies: These models are NOT licensed for commercial use due to CC-BY-NC-4.0 and MSRLA restrictions. For commercial use, you must obtain appropriate licenses from the original model owners.

Before using these models, you must:

  1. Review the license terms of all base models in their original repositories
  2. Ensure your use case complies with all applicable licenses (especially for commercial purposes)
  3. Include appropriate attribution and copyright notices as required by each license
  4. Obtain commercial licenses if needed from model owners (Microsoft for MAIRA-2, Stanford for CheXagent)

Additional License Information

The fine-tuning code and modifications specific to this work may be subject to additional licensing terms. Please review all applicable licenses before commercial use.

πŸ‘₯ Authors

  • Nikolay Nechaev - Artificial Intelligence Research Institute (AIRI)
  • Evgeniia Przhezdzetskaia - Artificial Intelligence Research Institute (AIRI)
  • Dmitry Umerenkov - Artificial Intelligence Research Institute (AIRI)
  • Dmitry V. Dylov - Artificial Intelligence Research Institute (AIRI)

πŸ”— Related Resources

  • Paper: [arXiv link]

πŸ™ Acknowledgments

We thank the contributors to the MIMIC-CXR, PadChest, BIMCV-COVID19, CheXpert, and OpenI datasets for making their data publicly available. We also acknowledge the computational resources provided for this research.

πŸ“§ Contact

For questions or issues, please contact:

  • Email: [email protected]
  • Institution: Artificial Intelligence Research Institute (AIRI), Moscow, Russia

Disclaimer: These models are provided for research purposes only. They are not intended for clinical use without proper validation and regulatory approval.

Downloads last month
12
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including AIRI-Institute/chexfract-chexagent