Spaces:

IlPakoZ
/

DLRNA-BERTa

Sleeping

App Files Files Community

IlPakoZ commited on Aug 31

Commit

79111ac

verified ·

1 Parent(s): 7d1331e

Upload 3 files

Browse files

Files changed (3) hide show

app.py +103 -60
readme.md +191 -0
readme_spaces.md +169 -0

app.py CHANGED Viewed

@@ -18,7 +18,7 @@ from PIL import Image, ImageDraw, ImageFont
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-def create_placeholder_image(width=400, height=300, text="No visualization available", bg_color=(0, 0, 0, 0)):
     """
     Create a transparent placeholder image with text
@@ -231,7 +231,6 @@ class DrugTargetInteractionApp:
                             logger.info(f"Target attention mask shape: {target_inputs['attention_mask'].shape}")
                             logger.info(f"Drug attention mask shape: {drug_inputs['attention_mask'].shape}")
                             cross_attention_img = plot_crossattention_weights(
                                 target_inputs["attention_mask"][0],
                                 drug_inputs["attention_mask"][0],
@@ -389,10 +388,7 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
                     lines=2
                 )
-                # Buttons side by side
-                with gr.Row():
-                    predict_btn = gr.Button("🚀 Predict Interaction", variant="primary", size="lg")
-                    visualize_btn = gr.Button("📊 Visualize Interaction", variant="secondary", size="lg")
             with gr.Column(scale=1):
                 prediction_output = gr.Textbox(
@@ -401,46 +397,6 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
                     lines=3
                 )
-        # Visualization outputs section
-        gr.HTML("<h3 style='margin-top: 30px; color: #2E86AB;'>📈 Interaction Visualizations</h3>")
-        with gr.Row():
-            with gr.Column():
-                viz_image1 = gr.Image(
-                    label="Cross-Attention Heatmap",
-                    type="pil",
-                    interactive=False,
-                    container=True,
-                    height=300,
-                    value=create_placeholder_image(text="Cross-Attention Heatmap\n(Click Visualize to generate)")
-                )
-            with gr.Column():
-                viz_image2 = gr.Image(
-                    label="Raw pKd Contribution Visualization",
-                    type="pil",
-                    interactive=False,
-                    container=True,
-                    height=300,
-                    value=create_placeholder_image(text="Raw pKd Contribution\n(Click Visualize to generate)")
-                )
-            with gr.Column():
-                viz_image3 = gr.Image(
-                    label="Normalized pKd Contribution Visualization",
-                    type="pil",
-                    interactive=False,
-                    container=True,
-                    height=300,
-                    value=create_placeholder_image(text="Normalized pKd Contribution\n(Click Visualize to generate)")
-                )
-        viz_status = gr.Textbox(
-            label="Visualization Status",
-            interactive=False,
-            lines=2
-        )
         # Example inputs
         gr.HTML("<h3 style='margin-top: 20px; color: #2E86AB;'>📚 Example Inputs:</h3>")
@@ -461,18 +417,100 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
             cache_examples=False
         )
-        # Button click events
         predict_btn.click(
             fn=predict_wrapper,
             inputs=[target_input, drug_input],
             outputs=prediction_output
         )
         visualize_btn.click(
             fn=visualize_wrapper,
-            inputs=[target_input, drug_input],
             outputs=[viz_image1, viz_image2, viz_image3, viz_status]
         )
     with gr.Tab("⚙️ Model Settings"):
         gr.HTML("<h3 style='color: #2E86AB;'>Model Configuration</h3>")
@@ -483,7 +521,7 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
             placeholder="Path to model directory"
         )
-        load_model_btn = gr.Button("🔥 Load Model", variant="secondary")
         model_status = gr.Textbox(
             label="Status",
             interactive=False,
@@ -502,13 +540,13 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         This application uses a deep learning model for predicting drug-target interactions. The model architecture includes:
-        - **Target Encoder**: Processes RNA sequences
-        - **Drug Encoder**: Processes molecular SMILES notation
         - **Cross-Attention Mechanism**: Captures interactions between drugs and targets
-        - **Regression Head**: Predicts binding affinity scores
         ### Input Requirements:
-        - **Target Sequence**: RNA sequence of the target
         - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
         ### Model Features:
@@ -519,16 +557,21 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         ### Usage Tips:
         1. Load your trained model using the Model Settings tab
-        2. Enter a RNA sequence and drug SMILES
-        3. Click "Predict Interaction" to get binding affinity prediction
-        4. Click "Visualize Interaction" to see detailed interaction analysis
-        For best results, ensure your input sequences are properly formatted and within reasonable length limits.
         ### Visualization Features:
-        - **Cross-Attention Heatmap**: Shows cross-attention between drug and target tokens
-        - **Raw pKd Contribution**: Shows raw signed contributions (only when pKd > 0)
-        - **Normalized pKd Contribution**: Shows normalized non-negative contributions
         """)
 # Launch the app

 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
+def create_placeholder_image(width=600, height=400, text="No visualization available", bg_color=(0, 0, 0, 0)):
     """
     Create a transparent placeholder image with text
                             logger.info(f"Target attention mask shape: {target_inputs['attention_mask'].shape}")
                             logger.info(f"Drug attention mask shape: {drug_inputs['attention_mask'].shape}")
                             cross_attention_img = plot_crossattention_weights(
                                 target_inputs["attention_mask"][0],
                                 drug_inputs["attention_mask"][0],
                     lines=2
                 )
+                predict_btn = gr.Button("🚀 Predict Interaction", variant="primary", size="lg")
             with gr.Column(scale=1):
                 prediction_output = gr.Textbox(
                     lines=3
                 )
         # Example inputs
         gr.HTML("<h3 style='margin-top: 20px; color: #2E86AB;'>📚 Example Inputs:</h3>")
             cache_examples=False
         )
+        # Button click event
         predict_btn.click(
             fn=predict_wrapper,
             inputs=[target_input, drug_input],
             outputs=prediction_output
         )
+    with gr.Tab("📊 Visualizations"):
+        gr.HTML("""
+        <div style="text-align: center; margin-bottom: 20px;">
+            <h2 style="color: #2E86AB;">🔬 Interaction Analysis & Visualizations</h2>
+            <p style="font-size: 1.1em; color: #666;">
+                Generate detailed visualizations to understand drug-target interactions
+            </p>
+        </div>
+        """)
+        with gr.Row():
+            with gr.Column(scale=1):
+                viz_target_input = gr.Textbox(
+                    label="Target RNA Sequence",
+                    placeholder="Enter RNA sequence (e.g., AUGCUAGCUAGUACGUA...)",
+                    lines=4,
+                    max_lines=6
+                )
+                viz_drug_input = gr.Textbox(
+                    label="Drug SMILES",
+                    placeholder="Enter SMILES notation (e.g., CC(C)CC1=CC=C(C=C1)C(C)C(=O)O)",
+                    lines=2
+                )
+                visualize_btn = gr.Button("📊 Generate Visualizations", variant="primary", size="lg")
+                viz_status = gr.Textbox(
+                    label="Visualization Status",
+                    interactive=False,
+                    lines=3
+                )
+        # Visualization outputs - Large and vertically aligned
+        gr.HTML("<div style='margin-top: 30px;'></div>")
+        viz_image1 = gr.Image(
+            label="Cross-Attention Heatmap",
+            type="pil",
+            interactive=False,
+            container=True,
+            height=500,
+            value=create_placeholder_image(text="Cross-Attention Heatmap\n(Click Generate Visualizations to create)")
+        )
+        viz_image2 = gr.Image(
+            label="Raw pKd Contribution Visualization",
+            type="pil",
+            interactive=False,
+            container=True,
+            height=500,
+            value=create_placeholder_image(text="Raw pKd Contribution\n(Click Generate Visualizations to create)")
+        )
+        viz_image3 = gr.Image(
+            label="Normalized pKd Contribution Visualization",
+            type="pil",
+            interactive=False,
+            container=True,
+            height=500,
+            value=create_placeholder_image(text="Normalized pKd Contribution\n(Click Generate Visualizations to create)")
+        )
+        # Button click event for visualizations
         visualize_btn.click(
             fn=visualize_wrapper,
+            inputs=[viz_target_input, viz_drug_input],
             outputs=[viz_image1, viz_image2, viz_image3, viz_status]
         )
+        # Example inputs for visualization tab
+        gr.HTML("<h3 style='margin-top: 20px; color: #2E86AB;'>📚 Example Inputs:</h3>")
+        viz_examples = gr.Examples(
+            examples=[
+                [
+                    "AUGCUAGCUAGUACGUAUAUCUGCACUGC",
+                    "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
+                ],
+                [
+                    "AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU",
+                    "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
+                ]
+            ],
+            inputs=[viz_target_input, viz_drug_input],
+            cache_examples=False
+        )
     with gr.Tab("⚙️ Model Settings"):
         gr.HTML("<h3 style='color: #2E86AB;'>Model Configuration</h3>")
             placeholder="Path to model directory"
         )
+        load_model_btn = gr.Button("📥 Load Model", variant="secondary")
         model_status = gr.Textbox(
             label="Status",
             interactive=False,
         This application uses a deep learning model for predicting drug-target interactions. The model architecture includes:
+        - **Target Encoder**: Processes RNA sequences using RNA-BERTa
+        - **Drug Encoder**: Processes molecular SMILES notation using ChemBERTa
         - **Cross-Attention Mechanism**: Captures interactions between drugs and targets
+        - **Regression Head**: Predicts binding affinity scores (pKd values)
         ### Input Requirements:
+        - **Target Sequence**: RNA sequence of the target (nucleotide sequences: A, U, G, C)
         - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
         ### Model Features:
         ### Usage Tips:
         1. Load your trained model using the Model Settings tab
+        2. Enter a RNA sequence and drug SMILES in either the Prediction or Visualization tab
+        3. Click "Predict Interaction" for binding affinity prediction only
+        4. Click "Generate Visualizations" for detailed interaction analysis with visual interpretations
+        For best results, ensure your input sequences are properly formatted and within reasonable length limits (max 512 tokens).
         ### Visualization Features:
+        - **Cross-Attention Heatmap**: Shows cross-attention weights between drug and target tokens
+        - **Raw pKd Contribution**: Shows raw signed contributions from each target token (only when pKd > 0)
+        - **Normalized pKd Contribution**: Shows normalized non-negative contributions from each target token
+        ### Performance Metrics:
+        - Training on diverse drug-target interaction datasets
+        - Evaluated using RMSE, Pearson correlation, and Concordance Index
+        - Optimized for both predictive accuracy and interpretability
         """)
 # Launch the app

readme.md ADDED Viewed

	@@ -0,0 +1,191 @@

+# Drug-Target Interaction Prediction Model
+## Model Description
+This model predicts drug-target interactions using a novel cross-attention architecture that combines RNA sequence understanding with molecular representation learning. The model processes RNA target sequences and drug SMILES representations to predict binding affinity scores (pKd values).
+## Architecture
+The model consists of several key components:
+1. **Target Encoder**: RNA-BERTa model that processes RNA sequences (nucleotides A, U, G, C)
+2. **Drug Encoder**: ChemBERTa-77M-MTR model [1] that processes molecular SMILES representations
+3. **Cross-Attention Layer**: Single-head attention mechanism (1 head) that models interactions between drug and target representations
+4. **Regression Head**: Predicts binding affinity scores with learnable scaling and bias parameters
+### Technical Specifications
+- **Model Size**: Combines RNA-BERTa (target encoder) + ChemBERTa-77M-MTR (drug encoder)
+- **Cross-Attention**: Single-head attention with 384-dimensional embeddings
+- **Maximum Sequence Length**: 512 tokens for both target and drug inputs
+- **Output**: Continuous binding affinity prediction (pKd values)
+- **Dropout**: Configurable attention dropout and hidden dropout for regularization
+- **Layer Normalization**: Applied for training stability
+## Performance Metrics
+Evaluated on external ROBIN test datasets [2] across different RNA classes:
+| Dataset | Precision | Specificity | Recall | AUROC | F1 Score |
+|---------|-----------|-------------|---------|-------|----------|
+| Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 |
+| Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 |
+| Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 |
+| miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 |
+## Usage
+### Using the Gradio Interface
+```python
+import gradio as gr
+from updated_app import demo
+# Launch the interactive interface
+demo.launch()
+```
+### Programmatic Usage
+```python
+from modeling_dlmberta import InteractionModelATTNForRegression, StdScaler
+from configuration_dlmberta import InteractionModelATTNConfig
+from transformers import AutoModel, RobertaModel, AutoConfig
+from chemberta import ChembertaTokenizer
+# Load model components
+config = InteractionModelATTNConfig.from_pretrained("path/to/model")
+# Load encoders
+target_encoder = AutoModel.from_pretrained("IlPakoZ/RNA-BERTa9700")
+drug_encoder_config = AutoConfig.from_pretrained("DeepChem/ChemBERTa-77M-MTR")
+drug_encoder_config.pooler = None
+drug_encoder = RobertaModel(config=drug_encoder_config, add_pooling_layer=False)
+# Load scaler (if available)
+scaler = StdScaler()
+scaler.load("path/to/model")
+# Initialize model
+model = InteractionModelATTNForRegression.from_pretrained(
+    "path/to/model",
+    config=config,
+    target_encoder=target_encoder,
+    drug_encoder=drug_encoder,
+    scaler=scaler
+)
+# Make predictions
+target_sequence = "AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU"
+drug_smiles = "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
+# Tokenize inputs
+target_inputs = target_tokenizer(target_sequence, padding="max_length", truncation=True, max_length=512, return_tensors="pt")
+drug_inputs = drug_tokenizer(drug_smiles, padding="max_length", truncation=True, max_length=512, return_tensors="pt")
+# Predict
+with torch.no_grad():
+    prediction = model(target_inputs, drug_inputs)
+    if model.scaler:
+        prediction = model.unscale(prediction)
+```
+## Model Inputs
+- **Target Sequence**: RNA sequence using nucleotides A, U, G, C (string)
+- **Drug SMILES**: Simplified Molecular Input Line Entry System notation (string)
+## Model Outputs
+- **Binding Affinity**: Predicted pKd binding affinity score (float)
+- **Attention Weights**: Cross-attention weights for interpretability analysis (when enabled)
+## Interpretability Features
+The model includes advanced interpretability capabilities:
+- **Cross-Attention Visualization**: Heatmaps showing interaction patterns between drug and target tokens
+- **Token-Level Contributions**: Visualization of individual token contributions to the final prediction
+- **Raw vs. Normalized Contributions**: Both scaled and unscaled contribution analysis
+- **Interpretation Mode**: Special mode for extracting attention weights and intermediate values
+### Enabling Interpretation Mode
+```python
+# Enable interpretation mode (evaluation only)
+model.INTERPR_ENABLE_MODE()
+# Make prediction with interpretation data
+prediction = model(target_inputs, drug_inputs)
+# Access interpretation data
+cross_attention_weights = model.model.crossattention_weights
+presum_contributions = model.model.presum_layer
+attention_scores = model.model.scores
+# Disable interpretation mode
+model.INTERPR_DISABLE_MODE()
+```
+## Training Details
+### Data Processing
+- **Scaling**: Uses StdScaler for target value normalization
+- **Tokenization**: Separate tokenizers for RNA sequences and SMILES strings
+- **Padding**: Max length padding to 512 tokens
+- **Masking**: Attention masks to handle variable-length sequences
+### Architecture Details
+- **Embedding Dimension**: 384 for cross-attention layer
+- **Target Encoder Output**: 512 dimensions, mapped to 384
+- **Drug Encoder Output**: 384 dimensions (direct use)
+- **Attention Mechanism**: Single-head cross-attention with scaled dot-product
+- **Learnable Parameters**: Weighted sum with learnable scaling vector and bias
+- **Padding Handling**: Learnable padding value for masked positions
+## Limitations
+- Performance varies significantly across RNA classes (miRNA shows lower precision)
+- May not generalize well to RNA sequences or chemical scaffolds not represented in training data
+- Computational requirements scale with sequence length (max 512 tokens)
+- Single attention head may limit capacity to capture diverse interaction patterns
+- SMILES representation may not capture all relevant molecular properties
+## Files in this Repository
+- `modeling_dlmberta.py`: Main model implementation with cross-attention architecture
+- `configuration_dlmberta.py`: Model configuration class
+- `chemberta.py`: Custom tokenizer for chemical SMILES processing
+- `updated_app.py`: Gradio application interface with visualization capabilities
+- `analysis.py`: Visualization functions for interpretability
+- `requirements.txt`: Python dependencies
+- `config.json`: Model configuration file
+## License
+This model is released under the MIT License.
+### Citations
+[1]
+```bibtex
+@article{ahmad2022chemberta,
+  title={Chemberta-2: Towards chemical foundation models},
+  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
+  journal={arXiv preprint arXiv:2209.01712},
+  year={2022}
+}
+```
+[2]
+```bibtex
+@article{krishnan2024reliable,
+  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
+  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
+  journal={Briefings in Bioinformatics},
+  volume={25},
+  number={2},
+  pages={bbae002},
+  year={2024},
+  publisher={Oxford University Press}
+}
+```

readme_spaces.md ADDED Viewed

	@@ -0,0 +1,169 @@

+---
+title: Drug-Target Interaction Predictor
+emoji: 🧬
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.0.0
+app_file: updated_app.py
+pinned: false
+license: mit
+---
+# Drug-Target Interaction Predictor
+An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.
+## Features
+- 🔮 **Binding Affinity Prediction**: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
+- 📊 **Interactive Visualizations**: Generate cross-attention heatmaps and contribution analysis plots
+- 🧬 **RNA-Drug Interaction Analysis**: Understand how different tokens contribute to binding predictions
+- ⚙️ **Model Management**: Load and configure different model checkpoints
+- 🎯 **Interpretability Tools**: Visualize attention weights and token-level contributions
+- 📈 **Performance Metrics**: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)
+## How to Use
+### 1. Prediction Tab
+- **Load Model**: The model loads automatically on startup (if available in the current directory)
+- **Enter Inputs**:
+  - Target RNA sequence (nucleotides: A, U, G, C)
+  - Drug SMILES string (molecular representation)
+- **Get Results**: Click "Predict Interaction" to receive binding affinity prediction (pKd value)
+### 2. Visualizations Tab
+- **Generate Analysis**: Use the same inputs to create detailed visualizations
+- **Cross-Attention Heatmap**: Shows interaction patterns between drug and target tokens
+- **Raw pKd Contribution**: Displays signed contributions from each target token (only when pKd > 0)
+- **Normalized pKd Contribution**: Shows normalized contributions for all predictions
+### 3. Model Settings Tab
+- **Custom Models**: Load your own trained models by specifying the model directory path
+- **Status Monitoring**: Check model loading status and configuration
+## Model Architecture
+The model combines state-of-the-art language models with cross-attention mechanisms:
+- **Target Encoder**: RNA-BERTa model for processing RNA sequences
+- **Drug Encoder**: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
+- **Cross-Attention**: Single-head attention mechanism (384-dimensional embeddings)
+- **Regression Head**: Learnable weighted sum with scaling and bias parameters
+- **Interpretability**: Built-in interpretation mode for attention analysis
+## Performance on ROBIN Test Datasets
+Evaluated on external ROBIN test datasets [2] across different RNA classes:
+| RNA Class | Precision | Specificity | Recall | AUROC | F1 Score |
+|-----------|-----------|-------------|---------|-------|----------|
+| Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 |
+| Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 |
+| Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 |
+| miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 |
+## Example Usage
+Try these example inputs to see the model in action:
+**Example 1:**
+- **Target**: `AUGCUAGCUAGUACGUAUAUCUGCACUGC`
+- **Drug**: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O`
+**Example 2:**
+- **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU`
+- **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2`
+## Input Format Requirements
+- **Target Sequence**:
+  - RNA sequences using nucleotides A, U, G, C
+  - Maximum length: 512 tokens
+  - Automatically truncated/padded as needed
+- **Drug SMILES**:
+  - Standard SMILES notation for molecular structures
+  - Maximum length: 512 tokens
+  - Example: `CC(C)CC1=CC=C(C=C1)C(C)C(=O)O` (Ibuprofen)
+## Technical Specifications
+- **Model Size**: RNA-BERTa + ChemBERTa-77M-MTR backbone
+- **Attention Heads**: 1 (single-head cross-attention)
+- **Embedding Dimension**: 384 for cross-attention layer
+- **Maximum Sequence Length**: 512 tokens for both inputs
+- **Output Range**: Continuous pKd values (can be negative)
+- **Scaling**: Built-in StdScaler for target value normalization
+## Visualization Features
+### Cross-Attention Heatmap
+- Displays attention weights between drug and target tokens
+- Helps identify which molecular features interact with specific RNA regions
+- Color intensity represents attention strength
+### Contribution Analysis
+- **Raw Contributions**: Signed values showing positive/negative token impacts (only for pKd > 0)
+- **Normalized Contributions**: Non-negative values showing relative token importance
+- Token-level breakdown of final prediction components
+## Limitations & Considerations
+- **RNA Class Variation**: Performance differs across RNA types (miRNA shows lower precision)
+- **Novel Sequences**: May not generalize well to completely unseen RNA families or chemical scaffolds
+- **Sequence Length**: Limited to 512 tokens (longer sequences are truncated)
+- **SMILES Limitations**: May not capture all 3D molecular properties
+- **Single Attention Head**: May limit capacity for complex interaction patterns
+## Scientific Applications
+This tool can be used for:
+- Drug discovery and design
+- RNA-targeted therapeutics research
+- Molecular interaction analysis
+- Binding affinity prediction
+- Structure-activity relationship studies
+- Lead compound optimization
+## Technical Support
+For technical issues or questions:
+- Check model loading status in the Model Settings tab
+- Ensure input sequences are properly formatted
+- Verify SMILES notation validity
+- Review example inputs for correct format
+## Data Sources
+The model leverages:
+- **RNA-BERTa**: Pre-trained on diverse RNA sequences
+- **ChemBERTa-77M-MTR**: Trained on molecular property prediction tasks [1]
+- **ROBIN Datasets**: External validation across multiple RNA classes [2]
+For more detailed technical documentation, model architecture details, and programmatic usage, visit the [model repository](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700).
+### Citations
+[1]
+```bibtex
+@article{ahmad2022chemberta,
+  title={Chemberta-2: Towards chemical foundation models},
+  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
+  journal={arXiv preprint arXiv:2209.01712},
+  year={2022}
+}
+```
+[2]
+```bibtex
+@article{krishnan2024reliable,
+  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
+  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
+  journal={Briefings in Bioinformatics},
+  volume={25},
+  number={2},
+  pages={bbae002},
+  year={2024},
+  publisher={Oxford University Press}
+}
+```