--- title: Drug-Target Interaction Predictor emoji: 🧬 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: updated_app.py pinned: false license: mit --- # Drug-Target Interaction Predictor An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features. ## Features - 🔮 **Binding Affinity Prediction**: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions - 📊 **Interactive Visualizations**: Generate cross-attention heatmaps and contribution analysis plots - 🧬 **RNA-Drug Interaction Analysis**: Understand how different tokens contribute to binding predictions - ⚙️ **Model Management**: Load and configure different model checkpoints - 🎯 **Interpretability Tools**: Visualize attention weights and token-level contributions - 📈 **Performance Metrics**: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA) ## How to Use ### 1. Prediction Tab - **Load Model**: The model loads automatically on startup (if available in the current directory) - **Enter Inputs**: - Target RNA sequence (nucleotides: A, U, G, C) - Drug SMILES string (molecular representation) - **Get Results**: Click "Predict Interaction" to receive binding affinity prediction (pKd value) ### 2. Visualizations Tab - **Generate Analysis**: Use the same inputs to create detailed visualizations - **Cross-Attention Heatmap**: Shows interaction patterns between drug and target tokens - **Raw pKd Contribution**: Displays signed contributions from each target token (only when pKd > 0) - **Normalized pKd Contribution**: Shows normalized contributions for all predictions ### 3. Model Settings Tab - **Custom Models**: Load your own trained models by specifying the model directory path - **Status Monitoring**: Check model loading status and configuration ## Model Architecture The model combines state-of-the-art language models with cross-attention mechanisms: - **Target Encoder**: RNA-BERTa model for processing RNA sequences - **Drug Encoder**: ChemBERTa-77M-MTR model [1] for molecular SMILES processing - **Cross-Attention**: Single-head attention mechanism (384-dimensional embeddings) - **Regression Head**: Learnable weighted sum with scaling and bias parameters - **Interpretability**: Built-in interpretation mode for attention analysis ## Performance on ROBIN Test Datasets Evaluated on external ROBIN test datasets [2] across different RNA classes: | RNA Class | Precision | Specificity | Recall | AUROC | F1 Score | |-----------|-----------|-------------|---------|-------|----------| | Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 | | Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 | | Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 | | miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 | ## Example Usage Try these example inputs to see the model in action: **Example 1:** - **Target**: `AUGCUAGCUAGUACGUAUAUCUGCACUGC` - **Drug**: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` **Example 2:** - **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU` - **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2` ## Input Format Requirements - **Target Sequence**: - RNA sequences using nucleotides A, U, G, C - Maximum length: 512 tokens - Automatically truncated/padded as needed - **Drug SMILES**: - Standard SMILES notation for molecular structures - Maximum length: 512 tokens - Example: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` (Ibuprofen) ## Technical Specifications - **Model Size**: RNA-BERTa + ChemBERTa-77M-MTR backbone - **Attention Heads**: 1 (single-head cross-attention) - **Embedding Dimension**: 384 for cross-attention layer - **Maximum Sequence Length**: 512 tokens for both inputs - **Output Range**: Continuous pKd values (can be negative) - **Scaling**: Built-in StdScaler for target value normalization ## Visualization Features ### Cross-Attention Heatmap - Displays attention weights between drug and target tokens - Helps identify which molecular features interact with specific RNA regions - Color intensity represents attention strength ### Contribution Analysis - **Raw Contributions**: Signed values showing positive/negative token impacts (only for pKd > 0) - **Normalized Contributions**: Non-negative values showing relative token importance - Token-level breakdown of final prediction components ## Limitations & Considerations - **RNA Class Variation**: Performance differs across RNA types (miRNA shows lower precision) - **Novel Sequences**: May not generalize well to completely unseen RNA families or chemical scaffolds - **Sequence Length**: Limited to 512 tokens (longer sequences are truncated) - **SMILES Limitations**: May not capture all 3D molecular properties - **Single Attention Head**: May limit capacity for complex interaction patterns ## Scientific Applications This tool can be used for: - Drug discovery and design - RNA-targeted therapeutics research - Molecular interaction analysis - Binding affinity prediction - Structure-activity relationship studies - Lead compound optimization ## Technical Support For technical issues or questions: - Check model loading status in the Model Settings tab - Ensure input sequences are properly formatted - Verify SMILES notation validity - Review example inputs for correct format ## Data Sources The model leverages: - **RNA-BERTa**: Pre-trained on diverse RNA sequences - **ChemBERTa-77M-MTR**: Trained on molecular property prediction tasks [1] - **ROBIN Datasets**: External validation across multiple RNA classes [2] For more detailed technical documentation, model architecture details, and programmatic usage, visit the [model repository](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700). ### Citations [1] ```bibtex @article{ahmad2022chemberta, title={Chemberta-2: Towards chemical foundation models}, author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath}, journal={arXiv preprint arXiv:2209.01712}, year={2022} } ``` [2] ```bibtex @article{krishnan2024reliable, title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning}, author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael}, journal={Briefings in Bioinformatics}, volume={25}, number={2}, pages={bbae002}, year={2024}, publisher={Oxford University Press} } ```