jemartin commited on
Commit
b64b17d
·
verified ·
1 Parent(s): e7316dc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +166 -0
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ model_name: bertsquad-10.onnx
5
+ tags:
6
+ - validated
7
+ - text
8
+ - machine_comprehension
9
+ - bert-squad
10
+ ---
11
+ <!--- SPDX-License-Identifier: Apache-2.0 -->
12
+
13
+ # BERT-Squad
14
+
15
+ ## Use cases
16
+ This model answers questions based on the context of the given input paragraph.
17
+
18
+ ## Description
19
+ BERT (Bidirectional Encoder Representations from Transformers) applies Transformers, a popular attention model, to language modelling. This mechanism has an encoder to read the input text and a decoder that produces a prediction for the task. This model uses the technique of masking out some of the words in the input and then condition each word bidirectionally to predict the masked words. BERT also learns to model relationships between sentences, predicts if the sentences are connected or not.
20
+
21
+ ## Model
22
+
23
+ |Model |Download |Download (with sample test data)| ONNX version |Opset version| Accuracy|
24
+ | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
25
+ |BERT-Squad| [416 MB](model/bertsquad-8.onnx) | [385 MB](model/bertsquad-8.tar.gz) | 1.3 | 8| |
26
+ |BERT-Squad| [416 MB](model/bertsquad-10.onnx) | [384 MB](model/bertsquad-10.tar.gz) | 1.5 | 10| |
27
+ |BERT-Squad| [416 MB](model/bertsquad-12.onnx) | [384 MB](model/bertsquad-12.tar.gz) | 1.9 | 12| 80.67171|
28
+ |BERT-Squad-int8| [119 MB](model/bertsquad-12-int8.onnx) | [101 MB](model/bertsquad-12-int8.tar.gz) | 1.9 | 12| 80.43519|
29
+ > Compared with the fp32 BERT-Squad, BERT-Squad-int8's accuracy drop ratio is 0.29%, performance improvement is 1.81x.
30
+ >
31
+ > Note the performance depends on the test hardware.
32
+ >
33
+ > Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
34
+
35
+ Dependencies
36
+ * [tokenization.py](dependencies/tokenization.py)
37
+ * [run_onnx_squad.py](dependencies/run_onnx_squad.py)
38
+
39
+ ## Inference
40
+ We used [ONNX Runtime](https://github.com/microsoft/onnxruntime) to perform the inference.
41
+
42
+ ### Input
43
+ The input is a paragraph and questions relating to that paragraph. The model uses the WordPiece tokenization method to split the input paragraph and questions into list of tokens that are available in the vocabulary (30,522 words).
44
+ Then converts these tokens into features
45
+ <li>input_ids: list of numerical ids for the tokenized text
46
+ <li>input_mask: will be set to 1 for real tokens and 0 for the padding tokens
47
+ <li>segment_ids: for our case, this will be set to the list of ones
48
+ <li>label_ids: one-hot encoded labels for the text
49
+
50
+ ### Preprocessing
51
+ Write an inputs.json file that includes the context paragraph and questions.
52
+ ```python
53
+ %%writefile inputs.json
54
+ {
55
+ "version": "1.4",
56
+ "data": [
57
+ {
58
+ "paragraphs": [
59
+ {
60
+ "context": "In its early years, the new convention center failed to meet attendance and revenue expectations.[12] By 2002, many Silicon Valley businesses were choosing the much larger Moscone Center in San Francisco over the San Jose Convention Center due to the latter's limited space. A ballot measure to finance an expansion via a hotel tax failed to reach the required two-thirds majority to pass. In June 2005, Team San Jose built the South Hall, a $6.77 million, blue and white tent, adding 80,000 square feet (7,400 m2) of exhibit space",
61
+ "qas": [
62
+ {
63
+ "question": "where is the businesses choosing to go?",
64
+ "id": "1"
65
+ },
66
+ {
67
+ "question": "how may votes did the ballot measure need?",
68
+ "id": "2"
69
+ },
70
+ {
71
+ "question": "By what year many Silicon Valley businesses were choosing the Moscone Center?",
72
+ "id": "3"
73
+ }
74
+ ]
75
+ }
76
+ ],
77
+ "title": "Conference Center"
78
+ }
79
+ ]
80
+ }
81
+ ```
82
+ Get parameters and convert input examples into features
83
+ ```python
84
+ # preprocess input
85
+ predict_file = 'inputs.json'
86
+
87
+ # Use read_squad_examples method from run_onnx_squad to read the input file
88
+ eval_examples = read_squad_examples(input_file=predict_file)
89
+
90
+ max_seq_length = 256
91
+ doc_stride = 128
92
+ max_query_length = 64
93
+ batch_size = 1
94
+ n_best_size = 20
95
+ max_answer_length = 30
96
+
97
+ vocab_file = os.path.join('uncased_L-12_H-768_A-12', 'vocab.txt')
98
+ tokenizer = tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=True)
99
+
100
+ # Use convert_examples_to_features method from run_onnx_squad to get parameters from the input
101
+ input_ids, input_mask, segment_ids, extra_data = convert_examples_to_features(eval_examples, tokenizer,
102
+ max_seq_length, doc_stride, max_query_length)
103
+ ```
104
+
105
+ ### Output
106
+ For each question about the context paragraph, the model predicts a start and an end token from the paragraph that most likely answers the questions.
107
+
108
+ ### Postprocessing
109
+ Write the predictions (answers to the questions) in a file.
110
+ ```python
111
+ # postprocess results
112
+ output_dir = 'predictions'
113
+ os.makedirs(output_dir, exist_ok=True)
114
+ output_prediction_file = os.path.join(output_dir, "predictions.json")
115
+ output_nbest_file = os.path.join(output_dir, "nbest_predictions.json")
116
+ write_predictions(eval_examples, extra_data, all_results,
117
+ n_best_size, max_answer_length,
118
+ True, output_prediction_file, output_nbest_file)
119
+ ```
120
+
121
+ ## Dataset (Train and Validation)
122
+ The model is trained with [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/) dataset that contains 100,000+ question-answer pairs on 500+ articles.
123
+
124
+ ## Validation accuracy
125
+ Metric is Exact Matching (EM) of 80.7, computed over SQuAD v1.1 dev data, for this onnx model.
126
+
127
+ ## Training
128
+ Fine-tuned the model using SQuAD-1.1 dataset. Look at [BertTutorial.ipynb](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb) for more information for converting the model from tensorflow to onnx and for fine-tuning
129
+
130
+ ## Quantization
131
+ BERT-Squad-int8 is obtained by quantizing BERT-Squad model (opset=12). We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/language_translation/onnx_model_zoo/bert-squad/quantization/ptq/readme.md) to understand how to use Intel® Neural Compressor for quantization.
132
+
133
+ ### Environment
134
+ onnx: 1.9.0
135
+ onnxruntime: 1.8.0
136
+
137
+ ### Prepare model
138
+ ```shell
139
+ wget https://github.com/onnx/models/raw/main/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
140
+ ```
141
+
142
+ ### Model quantize
143
+ ```bash
144
+ bash run_tuning.sh --input_model=/path/to/model \ # model path as *.onnx
145
+ --output_model=/path/to/model_tune \
146
+ --dataset_location=/path/to/SQuAD/dataset \
147
+ --config=bert.yaml
148
+ ```
149
+
150
+ ## References
151
+ * **BERT** Model from the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
152
+
153
+ * [BERT Tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb)
154
+
155
+ * [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
156
+
157
+ ## Contributors
158
+ * [Kundana Pillari](https://github.com/kundanapillari)
159
+ * [mengniwang95](https://github.com/mengniwang95) (Intel)
160
+ * [airMeng](https://github.com/airMeng) (Intel)
161
+ * [ftian1](https://github.com/ftian1) (Intel)
162
+ * [hshen14](https://github.com/hshen14) (Intel)
163
+
164
+ ## License
165
+ Apache 2.0
166
+