---
title: Drug-Target Interaction Predictor
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: updated_app.py
pinned: false
license: mit
---

# Drug-Target Interaction Predictor

An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.

## Features

- 🔮 **Binding Affinity Prediction**: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
- 📊 **Interactive Visualizations**: Generate cross-attention heatmaps and contribution analysis plots
- 🧬 **RNA-Drug Interaction Analysis**: Understand how different tokens contribute to binding predictions
- ⚙️ **Model Management**: Load and configure different model checkpoints
- 🎯 **Interpretability Tools**: Visualize attention weights and token-level contributions
- 📈 **Performance Metrics**: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)

## How to Use

### 1. Prediction Tab
- **Load Model**: The model loads automatically on startup (if available in the current directory)
- **Enter Inputs**: 
  - Target RNA sequence (nucleotides: A, U, G, C)
  - Drug SMILES string (molecular representation)
- **Get Results**: Click "Predict Interaction" to receive binding affinity prediction (pKd value)

### 2. Visualizations Tab
- **Generate Analysis**: Use the same inputs to create detailed visualizations
- **Cross-Attention Heatmap**: Shows interaction patterns between drug and target tokens
- **Raw pKd Contribution**: Displays signed contributions from each target token (only when pKd > 0)
- **Normalized pKd Contribution**: Shows normalized contributions for all predictions

### 3. Model Settings Tab
- **Custom Models**: Load your own trained models by specifying the model directory path
- **Status Monitoring**: Check model loading status and configuration

## Model Architecture

The model combines state-of-the-art language models with cross-attention mechanisms:

- **Target Encoder**: RNA-BERTa model for processing RNA sequences
- **Drug Encoder**: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
- **Cross-Attention**: Single-head attention mechanism (384-dimensional embeddings)
- **Regression Head**: Learnable weighted sum with scaling and bias parameters
- **Interpretability**: Built-in interpretation mode for attention analysis

## Performance on ROBIN Test Datasets

Evaluated on external ROBIN test datasets [2] across different RNA classes:

| RNA Class | Precision | Specificity | Recall | AUROC | F1 Score |
|-----------|-----------|-------------|---------|-------|----------|
| Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 |
| Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 |
| Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 |
| miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 |

## Example Usage

Try these example inputs to see the model in action:

**Example 1:**
- **Target**: `AUGCUAGCUAGUACGUAUAUCUGCACUGC`
- **Drug**: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O`

**Example 2:**
- **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU`
- **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2`

## Input Format Requirements

- **Target Sequence**: 
  - RNA sequences using nucleotides A, U, G, C
  - Maximum length: 512 tokens
  - Automatically truncated/padded as needed

- **Drug SMILES**: 
  - Standard SMILES notation for molecular structures
  - Maximum length: 512 tokens
  - Example: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` (Ibuprofen)

## Technical Specifications

- **Model Size**: RNA-BERTa + ChemBERTa-77M-MTR backbone
- **Attention Heads**: 1 (single-head cross-attention)
- **Embedding Dimension**: 384 for cross-attention layer
- **Maximum Sequence Length**: 512 tokens for both inputs
- **Output Range**: Continuous pKd values (can be negative)
- **Scaling**: Built-in StdScaler for target value normalization

## Visualization Features

### Cross-Attention Heatmap
- Displays attention weights between drug and target tokens
- Helps identify which molecular features interact with specific RNA regions
- Color intensity represents attention strength

### Contribution Analysis
- **Raw Contributions**: Signed values showing positive/negative token impacts (only for pKd > 0)
- **Normalized Contributions**: Non-negative values showing relative token importance
- Token-level breakdown of final prediction components

## Limitations & Considerations

- **RNA Class Variation**: Performance differs across RNA types (miRNA shows lower precision)
- **Novel Sequences**: May not generalize well to completely unseen RNA families or chemical scaffolds
- **Sequence Length**: Limited to 512 tokens (longer sequences are truncated)
- **SMILES Limitations**: May not capture all 3D molecular properties
- **Single Attention Head**: May limit capacity for complex interaction patterns

## Scientific Applications

This tool can be used for:
- Drug discovery and design
- RNA-targeted therapeutics research
- Molecular interaction analysis
- Binding affinity prediction
- Structure-activity relationship studies
- Lead compound optimization

## Technical Support

For technical issues or questions:
- Check model loading status in the Model Settings tab
- Ensure input sequences are properly formatted
- Verify SMILES notation validity
- Review example inputs for correct format

## Data Sources

The model leverages:
- **RNA-BERTa**: Pre-trained on diverse RNA sequences
- **ChemBERTa-77M-MTR**: Trained on molecular property prediction tasks [1]
- **ROBIN Datasets**: External validation across multiple RNA classes [2]

For more detailed technical documentation, model architecture details, and programmatic usage, visit the [model repository](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700).

### Citations
[1] 
```bibtex
@article{ahmad2022chemberta,
  title={Chemberta-2: Towards chemical foundation models},
  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
  journal={arXiv preprint arXiv:2209.01712},
  year={2022}
}
```

[2] 
```bibtex
@article{krishnan2024reliable,
  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={2},
  pages={bbae002},
  year={2024},
  publisher={Oxford University Press}
}
```