--- title: Drug-Target Interaction Predictor emoji: 🧬 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: mit --- # Drug-target interaction predictor An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features. ## Features - 🔮 **Binding affinity prediction**: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions - 📊 **Interactive visualizations**: Generate cross-attention heatmaps and contribution analysis plots - 🧬 **RNA-drug interaction analysis**: Understand how different tokens contribute to binding predictions - ⚙️ **Model management**: Load and configure different model checkpoints - 🎯 **Interpretability tools**: Visualize attention weights and token-level contributions - 📈 **Performance metrics**: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA) ## How to use ### 1. Prediction tab - **Load model**: The model loads automatically on startup (if available in the current directory) - **Enter inputs**: - Target RNA sequence (nucleotides: A, U, G, C) - Drug SMILES string (molecular representation) - **Get results**: Click "Predict Interaction" to receive binding affinity prediction (pKd value) ### 2. Visualizations tab - **Generate analysis**: Use the same inputs to create detailed visualizations - **Cross-attention heatmap**: Shows interaction patterns between drug and target tokens - **Raw pKd contribution**: Displays signed contributions from each target token (only when pKd > 0) - **Normalized pKd contribution**: Shows normalized contributions for all predictions ### 3. Model settings tab - **Custom models**: Load your own trained models by specifying the model directory path - **Status monitoring**: Check model loading status and configuration ## Model architecture The model combines state-of-the-art language models with cross-attention mechanisms: - **Target encoder**: RNA-BERTa model for processing RNA sequences - **Drug encoder**: ChemBERTa-77M-MTR model [1] for molecular SMILES processing - **Cross-attention**: Single-head attention mechanism (384-dimensional embeddings) - **Regression head**: Learnable weighted sum with scaling and bias parameters - **Interpretability**: Built-in interpretation mode for attention analysis ## Performance on ROBIN test datasets Evaluated on external ROBIN test datasets [2] across different RNA classes: | RNA Class | Precision | Specificity | Recall | AUROC | F1 Score | |-----------|-----------|-------------|---------|-------|----------| | Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 | | Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 | | Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 | | miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 | ## Example usage Try these example inputs to see the model in action: **Example 1:** - **Target**: `AUGCUAGCUAGUACGUAUAUCUGCACUGC` - **Drug**: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` **Example 2:** - **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU` - **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2` ## Input format requirements - **Target sequence**: - RNA sequences using nucleotides A, U, G, C - Maximum length: 512 tokens - Automatically truncated/padded as needed - **Drug SMILES**: - Standard SMILES notation for molecular structures - Maximum length: 512 tokens - Example: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` (Ibuprofen) ## Technical specifications - **Model size**: RNA-BERTa + ChemBERTa-77M-MTR backbone - **Attention heads**: 1 (single-head cross-attention) - **Embedding dimension**: 384 for cross-attention layer - **Maximum sequence length**: 512 tokens for both inputs - **Output range**: Continuous pKd values (can be negative) - **Scaling**: Built-in StdScaler for target value normalization ## Visualization features ### Cross-attention heatmap - Displays attention weights between drug and target tokens - Helps identify which molecular features interact with specific RNA regions - Color intensity represents attention strength ### Contribution analysis - **Unnormalized contributions**: Signed values showing positive/negative token impacts - **Normalized contributions**: Non-negative values showing relative token importance (only for pKd > 0) - Token-level breakdown of final prediction components ## Limitations & considerations - **RNA class variation**: Performance differs across RNA classes (miRNA shows lower precision) - **Novel sequences**: May not generalize well to completely unseen RNA families or chemical scaffolds - **Sequence length**: Limited to 512 tokens (longer sequences are truncated) - **SMILES limitations**: May not capture all 3D molecular properties - **Single attention head**: May limit capacity for complex interaction patterns ## Scientific applications This tool can be used for: - Drug discovery and design - RNA-targeted therapeutics research - Molecular interaction analysis - Binding affinity prediction - Structure-activity relationship studies - Lead compound optimization ## Technical support For technical issues or questions: - Check model loading status in the Model Settings tab - Ensure input sequences are properly formatted - Verify SMILES notation validity - Review example inputs for correct format ## Data sources The model leverages: - **RNA-BERTa**: Pre-trained on diverse RNA sequences - **ChemBERTa-77M-MTR**: Trained on molecular property prediction tasks [1] - **ROBIN Datasets**: External validation across multiple RNA classes [2] For more detailed technical documentation, model architecture details, and programmatic usage, visit the [model repository](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700). ### Citations [1] ```bibtex @article{ahmad2022chemberta, title={Chemberta-2: Towards chemical foundation models}, author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath}, journal={arXiv preprint arXiv:2209.01712}, year={2022} } ``` [2] ```bibtex @article{krishnan2024reliable, title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning}, author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael}, journal={Briefings in Bioinformatics}, volume={25}, number={2}, pages={bbae002}, year={2024}, publisher={Oxford University Press} } ```