Commit
·
18009c4
1
Parent(s):
bd6c122
Update README.md
Browse files
README.md
CHANGED
|
@@ -35,8 +35,8 @@ In training i use 70000 radiology reports to train the model to summarize radiol
|
|
| 35 |
|
| 36 |
### Model Sources
|
| 37 |
- **Repository:** [GanjinZero/biobart-v2-base]
|
| 38 |
-
- **Paper
|
| 39 |
-
- **Demo
|
| 40 |
|
| 41 |
## Uses
|
| 42 |
|
|
@@ -77,9 +77,9 @@ output= summarizer("heart size normal mediastinal hilar contours remain stable s
|
|
| 77 |
## Training Details
|
| 78 |
|
| 79 |
### Training Data
|
| 80 |
-
Data Source: The training data was a custom dataset of 70,000 radiology reports.
|
| 81 |
-
Data Cleaning: The data was cleaned to remove any personal or confidential information. The data was also tokenized and normalized.
|
| 82 |
-
Data Split: The training data was split into a training set and a validation set. The training set consisted of 63,000 radiology reports, and the validation set consisted of 7,000 radiology reports.
|
| 83 |
|
| 84 |
|
| 85 |
|
|
@@ -91,23 +91,21 @@ The model was trained using the Hugging Face Transformers library: https://huggi
|
|
| 91 |
#### Training Hyperparameters
|
| 92 |
|
| 93 |
- **Training regime:**
|
| 94 |
-
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
|
| 105 |
|
| 106 |
|
| 107 |
## Evaluation
|
| 108 |
|
| 109 |
-
|
| 110 |
-
|
| 111 |
### Testing Data, Factors & Metrics
|
| 112 |
|
| 113 |
#### Testing Data
|
|
@@ -115,17 +113,17 @@ The testing data consisted of 10,000 radiology reports.
|
|
| 115 |
|
| 116 |
#### Factors
|
| 117 |
The following factors were evaluated:
|
| 118 |
-
ROUGE-1
|
| 119 |
-
ROUGE-2
|
| 120 |
-
ROUGE-L
|
| 121 |
-
ROUGELSUM
|
| 122 |
|
| 123 |
#### Metrics
|
| 124 |
The following metrics were used to evaluate the model:
|
| 125 |
-
ROUGE-1 score: 44.857
|
| 126 |
-
ROUGE-2 score: 29.015
|
| 127 |
-
ROUGE-L score: 42.032
|
| 128 |
-
ROUGELSUM score: 42.038
|
| 129 |
|
| 130 |
|
| 131 |
### Results
|
|
@@ -138,13 +136,13 @@ The model was trained on a custom dataset of 70,000 radiology reports. The model
|
|
| 138 |
|
| 139 |
|
| 140 |
## Model Card Authors
|
| 141 |
-
Name: Engr. Hamza Iqbal Malik
|
| 142 |
-
LinkedIn: [www.linkedin.com/in/hamza-iqbal-malik-42366a239](www.linkedin.com/in/hamza-iqbal-malik-42366a239)
|
| 143 |
-
GitHub: [https://github.com/hamza4344](https://github.com/hamza4344)
|
| 144 |
|
| 145 |
## Model Card Contact
|
| 146 |
-
Name: Engr. Hamza Iqbal Malik
|
| 147 |
-
LinkedIn: [www.linkedin.com/in/hamza-iqbal-malik-42366a239](www.linkedin.com/in/hamza-iqbal-malik-42366a239)
|
| 148 |
-
GitHub: [https://github.com/hamza4344](https://github.com/hamza4344)
|
| 149 |
|
| 150 |
|
|
|
|
| 35 |
|
| 36 |
### Model Sources
|
| 37 |
- **Repository:** [GanjinZero/biobart-v2-base]
|
| 38 |
+
- **Paper:** [BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model]
|
| 39 |
+
- **Demo:** [hamzamalik11/radiology_summarizer]
|
| 40 |
|
| 41 |
## Uses
|
| 42 |
|
|
|
|
| 77 |
## Training Details
|
| 78 |
|
| 79 |
### Training Data
|
| 80 |
+
-Data Source: The training data was a custom dataset of 70,000 radiology reports.
|
| 81 |
+
-Data Cleaning: The data was cleaned to remove any personal or confidential information. The data was also tokenized and normalized.
|
| 82 |
+
-Data Split: The training data was split into a training set and a validation set. The training set consisted of 63,000 radiology reports, and the validation set consisted of 7,000 radiology reports.
|
| 83 |
|
| 84 |
|
| 85 |
|
|
|
|
| 91 |
#### Training Hyperparameters
|
| 92 |
|
| 93 |
- **Training regime:**
|
| 94 |
+
-evaluation_strategy="epoch",
|
| 95 |
+
-learning_rate=5.6e-5,
|
| 96 |
+
-per_device_train_batch_size=batch_size //4,
|
| 97 |
+
-per_device_eval_batch_size=batch_size //4,
|
| 98 |
+
-weight_decay=0.01,
|
| 99 |
+
-save_total_limit=3,
|
| 100 |
+
-num_train_epochs=num_train_epochs,
|
| 101 |
+
-predict_with_generate=True,
|
| 102 |
+
-logging_steps=logging_steps,
|
| 103 |
+
-push_to_hub=False,
|
| 104 |
|
| 105 |
|
| 106 |
|
| 107 |
## Evaluation
|
| 108 |
|
|
|
|
|
|
|
| 109 |
### Testing Data, Factors & Metrics
|
| 110 |
|
| 111 |
#### Testing Data
|
|
|
|
| 113 |
|
| 114 |
#### Factors
|
| 115 |
The following factors were evaluated:
|
| 116 |
+
-ROUGE-1
|
| 117 |
+
-ROUGE-2
|
| 118 |
+
-ROUGE-L
|
| 119 |
+
-ROUGELSUM
|
| 120 |
|
| 121 |
#### Metrics
|
| 122 |
The following metrics were used to evaluate the model:
|
| 123 |
+
-ROUGE-1 score: 44.857
|
| 124 |
+
-ROUGE-2 score: 29.015
|
| 125 |
+
-ROUGE-L score: 42.032
|
| 126 |
+
-ROUGELSUM score: 42.038
|
| 127 |
|
| 128 |
|
| 129 |
### Results
|
|
|
|
| 136 |
|
| 137 |
|
| 138 |
## Model Card Authors
|
| 139 |
+
-Name: Engr. Hamza Iqbal Malik
|
| 140 |
+
-LinkedIn: [www.linkedin.com/in/hamza-iqbal-malik-42366a239](www.linkedin.com/in/hamza-iqbal-malik-42366a239)
|
| 141 |
+
-GitHub: [https://github.com/hamza4344](https://github.com/hamza4344)
|
| 142 |
|
| 143 |
## Model Card Contact
|
| 144 |
+
-Name: Engr. Hamza Iqbal Malik
|
| 145 |
+
-LinkedIn: [www.linkedin.com/in/hamza-iqbal-malik-42366a239](www.linkedin.com/in/hamza-iqbal-malik-42366a239)
|
| 146 |
+
-GitHub: [https://github.com/hamza4344](https://github.com/hamza4344)
|
| 147 |
|
| 148 |
|