tue-mps
/

coco_panoptic_eomt_base_640_dinov3

Image Segmentation

Model card Files Files and versions

coco_panoptic_eomt_base_640_dinov3 / README.md

neikos00's picture

Update README.md

7f6785f verified about 2 months ago

|

history blame contribute delete

1.66 kB

	---
	library_name: transformers
	license: mit
	tags:
	- vision
	- image-segmentation
	- pytorch
	---
	# EoMT

	[![PyTorch](https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)

	EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper:
	[Your ViT is Secretly an Image Segmentation Model](https://www.tue-mps.org/eomt)
	by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.

	> Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗

	The original implementation can be found in this [repository](https://github.com/tue-mps/eomt).

	The HuggingFace model page is available at this [link](https://huggingface.co/papers/2503.19108).

	---

	## Citation
	If you find our work useful, please consider citing us as:
	```bibtex
	@inproceedings{kerssies2025eomt,
	author = {Kerssies, Tommie and Cavagnero, Niccolò and Hermans, Alexander and Norouzi, Narges and Averta, Giuseppe and Leibe, Bastian and Dubbelman, Gijs and de Geus, Daan},
	title = {Your ViT is Secretly an Image Segmentation Model},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year = {2025},
	}
	```