Spaces:

robustness-gym
/

summvis

Runtime error

App Files Files Community

cbensimon HF Staff commited on Aug 19, 2021

Commit

68e5edd

1 Parent(s): 6124176

Add README.md configuration

Browse files

Files changed (1) hide show

README.md +21 -72

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 # SummVis
 SummVis is an open-source visualization tool that supports fine-grained analysis of summarization models, data, and evaluation
@@ -95,14 +105,7 @@ is omitted for copyright reasons). The `preprocessing.py` script can be used for
 #### Deanonymize 10 examples:
 ```shell
-python preprocessing.py \
---deanonymize \
---dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \
---dataset cnn_dailymail \
---version 3.0.0 \
---split validation \
---processed_dataset_path data/10:cnn_dailymail_1000.validation \
---n_samples 10
 ```
 This will take either a few seconds or a few minutes depending on whether you've previously loaded CNN/DailyMail from
 the Datasets library.
@@ -149,48 +152,22 @@ Set the `--n_samples` argument and name the `--processed_dataset_path` output fi
 #### Example: Deanonymize 100 examples from CNN / Daily Mail:
 ```shell
-python preprocessing.py \
---deanonymize \
---dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \
---dataset cnn_dailymail \
---version 3.0.0 \
---split validation \
---processed_dataset_path data/100:cnn_dailymail_1000.validation \
---n_samples 100
 ```
 #### Example: Deanonymize all pre-loaded examples from CNN / Daily Mail (1000 examples dataset):
 ```shell
-python preprocessing.py \
---deanonymize \
---dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \
---dataset cnn_dailymail \
---version 3.0.0 \
---split validation \
---processed_dataset_path data/full:cnn_dailymail_1000.validation \
---n_samples 1000
 ```
 #### Example: Deanonymize all pre-loaded examples from CNN / Daily Mail (full dataset):
 ```shell
-python preprocessing.py \
---deanonymize \
---dataset_rg preprocessing/cnn_dailymail.validation.anonymized \
---dataset cnn_dailymail \
---version 3.0.0 \
---split validation \
---processed_dataset_path data/full:cnn_dailymail.validation
 ```
 #### Example: Deanonymize all pre-loaded examples from XSum (1000 examples dataset):
 ```shell
-python preprocessing.py \
---deanonymize \
---dataset_rg preprocessing/xsum_1000.validation.anonymized \
---dataset xsum \
---split validation \
---processed_dataset_path data/full:xsum_1000.validation \
---n_samples 1000
 ```
 ### 3. Run SummVis
@@ -244,10 +221,7 @@ You may run `preprocessing.py` to precompute all data required in the interface
 1. Run preprocessing script to generate cache file
     ```shell
-    python preprocessing.py \
-    --workflow \
-    --dataset_jsonl path/to/my_dataset.jsonl \
-    --processed_dataset_path path/to/my_cache_file
     ```
      You may wish to first try it with a subset of your data by adding the following argument: `--n_samples <number_of_samples>`.
@@ -278,20 +252,12 @@ standardized format with columns for `document` and `summary:reference`.
 ##### Example: Save CNN / Daily Mail validation split to disk as a jsonl file.
 ```shell
-python preprocessing.py \
---standardize \
---dataset cnn_dailymail \
---version 3.0.0 \
---split validation \
---save_jsonl_path preprocessing/cnn_dailymail.validation.jsonl
 ```
 ##### Example: Load custom `my_dataset.jsonl`, standardize, and save.
 ```shell
-python preprocessing.py \
---standardize \
---dataset_jsonl path/to/my_dataset.jsonl \
---save_jsonl_path preprocessing/my_dataset.jsonl
 ```
 Expected format of `my_dataset.jsonl`:
@@ -313,17 +279,7 @@ You may also generate your own predictions using this [this script](generation.p
 ##### Example: Add 6 prediction files for PEGASUS and BART to the dataset.
 ```shell
-python preprocessing.py \
---join_predictions \
---dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \
---prediction_jsonls \
-predictions/bart-cnndm.cnndm.validation.results.anonymized \
-predictions/bart-xsum.cnndm.validation.results.anonymized \
-predictions/pegasus-cnndm.cnndm.validation.results.anonymized \
-predictions/pegasus-multinews.cnndm.validation.results.anonymized \
-predictions/pegasus-newsroom.cnndm.validation.results.anonymized \
-predictions/pegasus-xsum.cnndm.validation.results.anonymized \
---save_jsonl_path preprocessing/cnn_dailymail.validation.jsonl
 ```
 #### 3. Run the preprocessing workflow and save the dataset.
@@ -333,19 +289,12 @@ and stores the processed dataset back to disk.
 ##### Example: Autorun with default settings on a few examples to try it.
 ```shell
-python preprocessing.py \
---workflow \
---dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \
---processed_dataset_path data/cnn_dailymail.validation \
---try_it
 ```
 ##### Example: Autorun with default settings on all examples.
 ```shell
-python preprocessing.py \
---workflow \
---dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \
---processed_dataset_path data/cnn_dailymail
 ```

+---
+title: Summvis
+emoji: 📚
+colorFrom: yellow
+colorTo: green
+sdk: streamlit
+app_file: app.py
+pinned: false
+---
 # SummVis
 SummVis is an open-source visualization tool that supports fine-grained analysis of summarization models, data, and evaluation
 #### Deanonymize 10 examples:
 ```shell
+python preprocessing.py \\n--deanonymize \\n--dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \\n--dataset cnn_dailymail \\n--version 3.0.0 \\n--split validation \\n--processed_dataset_path data/10:cnn_dailymail_1000.validation \\n--n_samples 10
 ```
 This will take either a few seconds or a few minutes depending on whether you've previously loaded CNN/DailyMail from
 the Datasets library.
 #### Example: Deanonymize 100 examples from CNN / Daily Mail:
 ```shell
+python preprocessing.py \\n--deanonymize \\n--dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \\n--dataset cnn_dailymail \\n--version 3.0.0 \\n--split validation \\n--processed_dataset_path data/100:cnn_dailymail_1000.validation \\n--n_samples 100
 ```
 #### Example: Deanonymize all pre-loaded examples from CNN / Daily Mail (1000 examples dataset):
 ```shell
+python preprocessing.py \\n--deanonymize \\n--dataset_rg preprocessing/cnn_dailymail_1000.validation.anonymized \\n--dataset cnn_dailymail \\n--version 3.0.0 \\n--split validation \\n--processed_dataset_path data/full:cnn_dailymail_1000.validation \\n--n_samples 1000
 ```
 #### Example: Deanonymize all pre-loaded examples from CNN / Daily Mail (full dataset):
 ```shell
+python preprocessing.py \\n--deanonymize \\n--dataset_rg preprocessing/cnn_dailymail.validation.anonymized \\n--dataset cnn_dailymail \\n--version 3.0.0 \\n--split validation \\n--processed_dataset_path data/full:cnn_dailymail.validation
 ```
 #### Example: Deanonymize all pre-loaded examples from XSum (1000 examples dataset):
 ```shell
+python preprocessing.py \\n--deanonymize \\n--dataset_rg preprocessing/xsum_1000.validation.anonymized \\n--dataset xsum \\n--split validation \\n--processed_dataset_path data/full:xsum_1000.validation \\n--n_samples 1000
 ```
 ### 3. Run SummVis
 1. Run preprocessing script to generate cache file
     ```shell
+    python preprocessing.py \\n    --workflow \\n    --dataset_jsonl path/to/my_dataset.jsonl \\n    --processed_dataset_path path/to/my_cache_file
     ```
      You may wish to first try it with a subset of your data by adding the following argument: `--n_samples <number_of_samples>`.
 ##### Example: Save CNN / Daily Mail validation split to disk as a jsonl file.
 ```shell
+python preprocessing.py \\n--standardize \\n--dataset cnn_dailymail \\n--version 3.0.0 \\n--split validation \\n--save_jsonl_path preprocessing/cnn_dailymail.validation.jsonl
 ```
 ##### Example: Load custom `my_dataset.jsonl`, standardize, and save.
 ```shell
+python preprocessing.py \\n--standardize \\n--dataset_jsonl path/to/my_dataset.jsonl \\n--save_jsonl_path preprocessing/my_dataset.jsonl
 ```
 Expected format of `my_dataset.jsonl`:
 ##### Example: Add 6 prediction files for PEGASUS and BART to the dataset.
 ```shell
+python preprocessing.py \\n--join_predictions \\n--dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \\n--prediction_jsonls \\npredictions/bart-cnndm.cnndm.validation.results.anonymized \\npredictions/bart-xsum.cnndm.validation.results.anonymized \\npredictions/pegasus-cnndm.cnndm.validation.results.anonymized \\npredictions/pegasus-multinews.cnndm.validation.results.anonymized \\npredictions/pegasus-newsroom.cnndm.validation.results.anonymized \\npredictions/pegasus-xsum.cnndm.validation.results.anonymized \\n--save_jsonl_path preprocessing/cnn_dailymail.validation.jsonl
 ```
 #### 3. Run the preprocessing workflow and save the dataset.
 ##### Example: Autorun with default settings on a few examples to try it.
 ```shell
+python preprocessing.py \\n--workflow \\n--dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \\n--processed_dataset_path data/cnn_dailymail.validation \\n--try_it
 ```
 ##### Example: Autorun with default settings on all examples.
 ```shell
+python preprocessing.py \\n--workflow \\n--dataset_jsonl preprocessing/cnn_dailymail.validation.jsonl \\n--processed_dataset_path data/cnn_dailymail
 ```