---
title: Drug-Target Interaction Predictor
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---

# Drug-target interaction predictor

An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.

## Features

- 🔮 **Binding affinity prediction**: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
- 📊 **Interactive visualizations**: Generate cross-attention heatmaps and contribution analysis plots
- 🧬 **RNA-drug interaction analysis**: Understand how different tokens contribute to binding predictions
- ⚙️ **Model management**: Load and configure different model checkpoints
- 🎯 **Interpretability tools**: Visualize attention weights and token-level contributions
- 📈 **Performance metrics**: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)

## How to use

### 1. Prediction tab
- **Load model**: The model loads automatically on startup (if available in the current directory)
- **Enter inputs**: 
  - Target RNA sequence (nucleotides: A, U, G, C)
  - Drug SMILES string (molecular representation)
- **Get results**: Click "Predict Interaction" to receive binding affinity prediction (pKd value)

### 2. Visualizations tab
- **Generate analysis**: Use the same inputs to create detailed visualizations
- **Cross-attention heatmap**: Shows interaction patterns between drug and target tokens
- **Raw pKd contribution**: Displays signed contributions from each target token (only when pKd > 0)
- **Normalized pKd contribution**: Shows normalized contributions for all predictions

### 3. Model settings tab
- **Custom models**: Load your own trained models by specifying the model directory path
- **Status monitoring**: Check model loading status and configuration

## Model architecture

The model combines state-of-the-art language models with cross-attention mechanisms:

- **Target encoder**: RNA-BERTa model for processing RNA sequences
- **Drug encoder**: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
- **Cross-attention**: Single-head attention mechanism (384-dimensional embeddings)
- **Regression head**: Learnable weighted sum with scaling and bias parameters
- **Interpretability**: Built-in interpretation mode for attention analysis

## Performance on ROBIN test datasets

Evaluated on external ROBIN test datasets [2] across different RNA classes:

| RNA Class | Precision | Specificity | Recall | AUROC | F1 Score |
|-----------|-----------|-------------|---------|-------|----------|
| Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 |
| Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 |
| Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 |
| miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 |

## Example usage

Try these example inputs to see the model in action:

**Example 1:**
- **Target**: `AUGCUAGCUAGUACGUAUAUCUGCACUGC`
- **Drug**: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O`

**Example 2:**
- **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU`
- **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2`

## Input format requirements

- **Target sequence**: 
  - RNA sequences using nucleotides A, U, G, C
  - Maximum length: 512 tokens
  - Automatically truncated/padded as needed

- **Drug SMILES**: 
  - Standard SMILES notation for molecular structures
  - Maximum length: 512 tokens
  - Example: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` (Ibuprofen)

## Technical specifications

- **Model size**: RNA-BERTa + ChemBERTa-77M-MTR backbone
- **Attention heads**: 1 (single-head cross-attention)
- **Embedding dimension**: 384 for cross-attention layer
- **Maximum sequence length**: 512 tokens for both inputs
- **Output range**: Continuous pKd values (can be negative)
- **Scaling**: Built-in StdScaler for target value normalization

## Visualization features

### Cross-attention heatmap
- Displays attention weights between drug and target tokens
- Helps identify which molecular features interact with specific RNA regions
- Color intensity represents attention strength

### Contribution analysis
- **Unnormalized contributions**: Signed values showing positive/negative token impacts
- **Normalized contributions**: Non-negative values showing relative token importance (only for pKd > 0)
- Token-level breakdown of final prediction components

## Limitations & considerations

- **RNA class variation**: Performance differs across RNA classes (miRNA shows lower precision)
- **Novel sequences**: May not generalize well to completely unseen RNA families or chemical scaffolds
- **Sequence length**: Limited to 512 tokens (longer sequences are truncated)
- **SMILES limitations**: May not capture all 3D molecular properties
- **Single attention head**: May limit capacity for complex interaction patterns

## Scientific applications

This tool can be used for:
- Drug discovery and design
- RNA-targeted therapeutics research
- Molecular interaction analysis
- Binding affinity prediction
- Structure-activity relationship studies
- Lead compound optimization

## Technical support

For technical issues or questions:
- Check model loading status in the Model Settings tab
- Ensure input sequences are properly formatted
- Verify SMILES notation validity
- Review example inputs for correct format

## Data sources

The model leverages:
- **RNA-BERTa**: Pre-trained on diverse RNA sequences
- **ChemBERTa-77M-MTR**: Trained on molecular property prediction tasks [1]
- **ROBIN Datasets**: External validation across multiple RNA classes [2]

For more detailed technical documentation, model architecture details, and programmatic usage, visit the [model repository](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700).

### Citations
[1] 
```bibtex
@article{ahmad2022chemberta,
  title={Chemberta-2: Towards chemical foundation models},
  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
  journal={arXiv preprint arXiv:2209.01712},
  year={2022}
}
```

[2] 
```bibtex
@article{krishnan2024reliable,
  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={2},
  pages={bbae002},
  year={2024},
  publisher={Oxford University Press}
}
```