Spaces:

2dogey
/

VenusFactory

Runtime error

App Files Files Community

VenusFactory / WebUI_demo.md

2dogey

Upload folder using huggingface_hub

8918ac7 verified 7 months ago

preview code

raw

history blame contribute delete

3.86 kB

	# Quick Demo Guide

	This document provides a comprehensive guide to help you quickly understand the main features of VenusFactory and perform fine-tuning, evaluation, and prediction on a demo dataset for protein solubility prediction.

	## 1. Environment Preparation

	Before starting, please ensure that you have successfully installed VenusFactory and correctly configured the corresponding environment and Python dependencies. If not yet installed, please refer to the ✈️ Requirements section in [README.md](README.md) for installation instructions.

	## 2. Launch Web Interface

	Enter the following command in the command line to launch the Web UI:

	```bash
	python src/webui.py
	```

	## 3. Training (Training Tab)

	### 3.1 Select Pre-trained Model

	Choose a suitable pre-trained model from the Protein Language Model dropdown. It is recommended to start with ESM2-8M, which has lower computational cost and is suitable for beginners.

	### 3.2 Select Dataset

	In the Dataset Configuration section, select the Demo_Solubility dataset (default option). Click the Preview Dataset button to preview the dataset content.

	### 3.3 Set Task Parameters

	- Problem Type, Number of Labels, and Metrics options will be automatically filled when selecting a Pre-defined Dataset.

	- For Batch Processing Mode, it is recommended to select Batch Token Mode to avoid uneven batch processing due to high variance in protein sequence lengths.

	- Batch Token is recommended to be set to 4000. If you encounter CUDA memory errors, you can reduce this value accordingly.

	### 3.4 Choose Training Method

	In the Training Parameters section:

	- Training Method is a key selection. This Demo dataset does not currently support the SES-Adapter method (due to lack of structural sequence information).

	- You can choose the Freeze method to only fine-tune the classification head, or use the LoRA method for efficient parameter fine-tuning.

	### 3.5 Start Training

	- Click Preview Command to preview the command line script.

	- Click Start to begin training. The Web interface will display model statistics and real-time training monitoring.

	- After training is complete, the interface will show the model's Metrics on the test set to evaluate model performance.

	## 4. Evaluation (Evaluation Tab)

	### 4.1 Select Model Path

	In the Model Path option, enter the path of the trained model (under the `ckpt` root directory). Ensure that the selected PLM and method are consistent with those used during training.

	### 4.2 Evaluation Dataset Loading Rules

	- The evaluation system will automatically load the test set of the corresponding dataset.
	- If the test set cannot be found, data will be loaded in the order of validation set → training set.
	- For custom datasets uploaded to Hugging Face:
	- If only a single CSV file is uploaded, the evaluation system will automatically load that file, regardless of naming.
	- If training, validation, and test sets are uploaded, please ensure accurate file naming.

	### 4.3 Start Evaluation

	Click Start Evaluation to begin the evaluation.

	> Example Model
	> This project provides a model demo_provided.pt that has already been trained on the Demo_Solubility dataset using the Freeze method, which can be used directly for evaluation.

	## 5. Prediction (Prediction Tab)

	### 5.1 Single Sequence Prediction

	Enter a single amino acid sequence to directly predict its solubility.

	### 5.2 Batch Prediction

	- By uploading a CSV file, you can predict the solubility of proteins in batch and download the results (in CSV format).

	## 6. Download (Download Tab)

	For detailed instructions and examples regarding the Download Tab, please refer to the Download section in the Manual Tab.