MolecularDiffusion: Pre-trained Models and Datasets
Welcome to the repository for pre-trained models and datasets accompanying the MolecularDiffusion framework.
MolecularDiffusion is a unified Generative AI framework designed to streamline the entire lifecycle of 3D molecular diffusion models, from efficient training to seamless deployment in data-driven computational chemistry pipelines.
Find more details in our paper:
Models
This repository hosts several pre-trained 3D molecular diffusion models described in our paper.
- Pre-trained General Model: A diffusion model trained on our comprehensive compiled dataset of 3D molecules.
- GEOM-Trained Models: Diffusion models trained on the GEOM dataset, potentially exploring different training methodologies or variations described in the paper.
Datasets
We provide the datasets used for training our models, as well as novel datasets generated by our models.
Training Datasets
QM9: Small organic molecules
FORMED: Synthesizable molecules from CSD
Compiled 3D Molecules: Our custom-compiled dataset used for pre-training, combining GEOM, QMug, COMPAS1, COMPAS3, FORMED, and OSCAR.
IFLP Dataset: Dataset of IFLP derived from the CoRE MOF 2019 database
Generated Datasets
These datasets were generated using the MolecularDiffusion models:
Asymmetric Cp Dataset: A generated dataset focusing on asymmetric cyclopentadienyl ligands.
Target IFLP Dataset: Generated IFLP with desired geometrical features for the catlytic hydrogenation of CO$_2$
Singlet Fission Candidates: A curated dataset of potential generated candidates for singlet fission applications.