lsalsi commited on
Commit
27c7690
·
verified ·
1 Parent(s): 9ee2ab8

Upload EsmForMaskedLM

Browse files
Files changed (4) hide show
  1. README.md +199 -0
  2. config.json +36 -0
  3. esm_config.py +379 -0
  4. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/shared/pretrained_models/overlap_multi_species_sh_gc/checkpoint-12000/",
3
+ "add_bias_fnn": false,
4
+ "architectures": [
5
+ "EsmForMaskedLM"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "esm_config.EsmConfig",
10
+ "AutoModelForMaskedLM": "InstaDeepAI/nucleotide-transformer-v2-50m-multi-species--modeling_esm.EsmForMaskedLM",
11
+ "AutoModelForSequenceClassification": "InstaDeepAI/nucleotide-transformer-v2-50m-multi-species--modeling_esm.EsmForSequenceClassification",
12
+ "AutoModelForTokenClassification": "InstaDeepAI/nucleotide-transformer-v2-50m-multi-species--modeling_esm.EsmForTokenClassification"
13
+ },
14
+ "emb_layer_norm_before": false,
15
+ "esmfold_config": null,
16
+ "hidden_dropout_prob": 0.0,
17
+ "hidden_size": 512,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 2048,
20
+ "is_folding_model": false,
21
+ "layer_norm_eps": 1e-12,
22
+ "mask_token_id": 2,
23
+ "max_position_embeddings": 2050,
24
+ "model_type": "esm",
25
+ "num_attention_heads": 16,
26
+ "num_hidden_layers": 12,
27
+ "pad_token_id": 1,
28
+ "position_embedding_type": "rotary",
29
+ "tie_word_embeddings": false,
30
+ "token_dropout": false,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.49.0",
33
+ "use_cache": false,
34
+ "vocab_list": null,
35
+ "vocab_size": 4107
36
+ }
esm_config.py ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2022 Meta and The HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """ ESM model configuration"""
16
+
17
+ from dataclasses import asdict, dataclass
18
+ from typing import Optional
19
+
20
+ from transformers import PretrainedConfig, logging
21
+
22
+ logger = logging.get_logger(__name__)
23
+
24
+ # TODO Update this
25
+ ESM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
26
+ "facebook/esm-1b": "https://huggingface.co/facebook/esm-1b/resolve/main/config.json",
27
+ # See all ESM models at https://huggingface.co/models?filter=esm
28
+ }
29
+
30
+
31
+ class EsmConfig(PretrainedConfig):
32
+ r"""
33
+ This is the configuration class to store the configuration of a [`ESMModel`]. It is used to instantiate a ESM model
34
+ according to the specified arguments, defining the model architecture. Instantiating a configuration with the
35
+ defaults will yield a similar configuration to that of the ESM
36
+ [facebook/esm-1b](https://huggingface.co/facebook/esm-1b) architecture.
37
+
38
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
39
+ documentation from [`PretrainedConfig`] for more information.
40
+
41
+
42
+ Args:
43
+ vocab_size (`int`, *optional*):
44
+ Vocabulary size of the ESM model. Defines the number of different tokens that can be represented by the
45
+ `inputs_ids` passed when calling [`ESMModel`].
46
+ mask_token_id (`int`, *optional*):
47
+ The index of the mask token in the vocabulary. This must be included in the config because of the
48
+ "mask-dropout" scaling trick, which will scale the inputs depending on the number of masked tokens.
49
+ pad_token_id (`int`, *optional*):
50
+ The index of the padding token in the vocabulary. This must be included in the config because certain parts
51
+ of the ESM code use this instead of the attention mask.
52
+ hidden_size (`int`, *optional*, defaults to 768):
53
+ Dimensionality of the encoder layers and the pooler layer.
54
+ num_hidden_layers (`int`, *optional*, defaults to 12):
55
+ Number of hidden layers in the Transformer encoder.
56
+ num_attention_heads (`int`, *optional*, defaults to 12):
57
+ Number of attention heads for each attention layer in the Transformer encoder.
58
+ intermediate_size (`int`, *optional*, defaults to 3072):
59
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
60
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
61
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
62
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
63
+ The dropout ratio for the attention probabilities.
64
+ max_position_embeddings (`int`, *optional*, defaults to 1026):
65
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
66
+ just in case (e.g., 512 or 1024 or 2048).
67
+ initializer_range (`float`, *optional*, defaults to 0.02):
68
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
69
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
70
+ The epsilon used by the layer normalization layers.
71
+ position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
72
+ Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query", "rotary"`.
73
+ For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
74
+ [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
75
+ For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
76
+ with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
77
+ is_decoder (`bool`, *optional*, defaults to `False`):
78
+ Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
79
+ use_cache (`bool`, *optional*, defaults to `True`):
80
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
81
+ relevant if `config.is_decoder=True`.
82
+ emb_layer_norm_before (`bool`, *optional*):
83
+ Whether to apply layer normalization after embeddings but before the main stem of the network.
84
+ token_dropout (`bool`, defaults to `False`):
85
+ When this is enabled, masked tokens are treated as if they had been dropped out by input dropout.
86
+
87
+ Examples:
88
+
89
+ ```python
90
+ >>> from transformers import EsmModel, EsmConfig
91
+
92
+ >>> # Initializing a ESM facebook/esm-1b style configuration >>> configuration = EsmConfig()
93
+
94
+ >>> # Initializing a model from the configuration >>> model = ESMModel(configuration)
95
+
96
+ >>> # Accessing the model configuration >>> configuration = model.config
97
+ ```"""
98
+ model_type = "esm"
99
+
100
+ def __init__(
101
+ self,
102
+ vocab_size=None,
103
+ mask_token_id=None,
104
+ pad_token_id=None,
105
+ hidden_size=768,
106
+ num_hidden_layers=12,
107
+ num_attention_heads=12,
108
+ intermediate_size=3072,
109
+ hidden_dropout_prob=0.1,
110
+ attention_probs_dropout_prob=0.1,
111
+ max_position_embeddings=1026,
112
+ initializer_range=0.02,
113
+ layer_norm_eps=1e-12,
114
+ position_embedding_type="absolute",
115
+ use_cache=True,
116
+ emb_layer_norm_before=None,
117
+ token_dropout=False,
118
+ is_folding_model=False,
119
+ esmfold_config=None,
120
+ vocab_list=None,
121
+ add_bias_fnn=True,
122
+ **kwargs,
123
+ ):
124
+ super().__init__(
125
+ pad_token_id=pad_token_id, mask_token_id=mask_token_id, **kwargs
126
+ )
127
+
128
+ self.vocab_size = vocab_size
129
+ self.hidden_size = hidden_size
130
+ self.num_hidden_layers = num_hidden_layers
131
+ self.num_attention_heads = num_attention_heads
132
+ self.intermediate_size = intermediate_size
133
+ self.hidden_dropout_prob = hidden_dropout_prob
134
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
135
+ self.max_position_embeddings = max_position_embeddings
136
+ self.initializer_range = initializer_range
137
+ self.layer_norm_eps = layer_norm_eps
138
+ self.position_embedding_type = position_embedding_type
139
+ self.use_cache = use_cache
140
+ self.emb_layer_norm_before = emb_layer_norm_before
141
+ self.token_dropout = token_dropout
142
+ self.is_folding_model = is_folding_model
143
+ # Arguments needed for Dalmatian
144
+ self.add_bias_fnn = add_bias_fnn
145
+ if is_folding_model:
146
+ if esmfold_config is None:
147
+ logger.info(
148
+ "No esmfold_config supplied for folding model, using default values."
149
+ )
150
+ esmfold_config = EsmFoldConfig()
151
+ elif isinstance(esmfold_config, dict):
152
+ esmfold_config = EsmFoldConfig(**esmfold_config)
153
+ self.esmfold_config = esmfold_config
154
+ if vocab_list is None:
155
+ logger.warning(
156
+ "No vocab_list supplied for folding model, assuming the ESM-2 vocabulary!"
157
+ )
158
+ self.vocab_list = get_default_vocab_list()
159
+ else:
160
+ self.vocab_list = vocab_list
161
+ else:
162
+ self.esmfold_config = None
163
+ self.vocab_list = None
164
+ if self.esmfold_config is not None and getattr(
165
+ self.esmfold_config, "use_esm_attn_map", False
166
+ ):
167
+ raise ValueError(
168
+ "The HuggingFace port of ESMFold does not support use_esm_attn_map at this time!"
169
+ )
170
+
171
+ def to_dict(self):
172
+ """
173
+ Serializes this instance to a Python dictionary. Override the default [`~PretrainedConfig.to_dict`].
174
+
175
+ Returns:
176
+ `Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
177
+ """
178
+ output = super().to_dict()
179
+ if isinstance(self.esmfold_config, EsmFoldConfig):
180
+ output["esmfold_config"] = self.esmfold_config.to_dict()
181
+ return output
182
+
183
+
184
+ @dataclass
185
+ class EsmFoldConfig:
186
+ esm_type: str = None
187
+ fp16_esm: bool = True
188
+ use_esm_attn_map: bool = False
189
+ esm_ablate_pairwise: bool = False
190
+ esm_ablate_sequence: bool = False
191
+ esm_input_dropout: float = 0
192
+
193
+ embed_aa: bool = True
194
+ bypass_lm: bool = False
195
+
196
+ lddt_head_hid_dim: int = 128
197
+ trunk: "TrunkConfig" = None
198
+
199
+ def __post_init__(self):
200
+ if self.trunk is None:
201
+ self.trunk = TrunkConfig()
202
+ elif isinstance(self.trunk, dict):
203
+ self.trunk = TrunkConfig(**self.trunk)
204
+
205
+ def to_dict(self):
206
+ """
207
+ Serializes this instance to a Python dictionary. Override the default [`~PretrainedConfig.to_dict`].
208
+
209
+ Returns:
210
+ `Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
211
+ """
212
+ output = asdict(self)
213
+ output["trunk"] = self.trunk.to_dict()
214
+ return output
215
+
216
+
217
+ @dataclass
218
+ class TrunkConfig:
219
+ num_blocks: int = 48
220
+ sequence_state_dim: int = 1024
221
+ pairwise_state_dim: int = 128
222
+ sequence_head_width: int = 32
223
+ pairwise_head_width: int = 32
224
+ position_bins: int = 32
225
+ dropout: float = 0
226
+ layer_drop: float = 0
227
+ cpu_grad_checkpoint: bool = False
228
+ max_recycles: int = 4
229
+ chunk_size: Optional[int] = 128
230
+ structure_module: "StructureModuleConfig" = None
231
+
232
+ def __post_init__(self):
233
+ if self.structure_module is None:
234
+ self.structure_module = StructureModuleConfig()
235
+ elif isinstance(self.structure_module, dict):
236
+ self.structure_module = StructureModuleConfig(**self.structure_module)
237
+
238
+ if self.max_recycles <= 0:
239
+ raise ValueError(
240
+ f"`max_recycles` should be positive, got {self.max_recycles}."
241
+ )
242
+ if self.sequence_state_dim % self.sequence_state_dim != 0:
243
+ raise ValueError(
244
+ "`sequence_state_dim` should be a round multiple of `sequence_state_dim`, got"
245
+ f" {self.sequence_state_dim} and {self.sequence_state_dim}."
246
+ )
247
+ if self.pairwise_state_dim % self.pairwise_state_dim != 0:
248
+ raise ValueError(
249
+ "`pairwise_state_dim` should be a round multiple of `pairwise_state_dim`, got"
250
+ f" {self.pairwise_state_dim} and {self.pairwise_state_dim}."
251
+ )
252
+
253
+ sequence_num_heads = self.sequence_state_dim // self.sequence_head_width
254
+ pairwise_num_heads = self.pairwise_state_dim // self.pairwise_head_width
255
+
256
+ if self.sequence_state_dim != sequence_num_heads * self.sequence_head_width:
257
+ raise ValueError(
258
+ "`sequence_state_dim` should be equal to `sequence_num_heads * sequence_head_width, got"
259
+ f" {self.sequence_state_dim} != {sequence_num_heads} * {self.sequence_head_width}."
260
+ )
261
+ if self.pairwise_state_dim != pairwise_num_heads * self.pairwise_head_width:
262
+ raise ValueError(
263
+ "`pairwise_state_dim` should be equal to `pairwise_num_heads * pairwise_head_width, got"
264
+ f" {self.pairwise_state_dim} != {pairwise_num_heads} * {self.pairwise_head_width}."
265
+ )
266
+ if self.pairwise_state_dim % 2 != 0:
267
+ raise ValueError(
268
+ f"`pairwise_state_dim` should be even, got {self.pairwise_state_dim}."
269
+ )
270
+
271
+ if self.dropout >= 0.4:
272
+ raise ValueError(
273
+ f"`dropout` should not be greater than 0.4, got {self.dropout}."
274
+ )
275
+
276
+ def to_dict(self):
277
+ """
278
+ Serializes this instance to a Python dictionary. Override the default [`~PretrainedConfig.to_dict`].
279
+
280
+ Returns:
281
+ `Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
282
+ """
283
+ output = asdict(self)
284
+ output["structure_module"] = self.structure_module.to_dict()
285
+ return output
286
+
287
+
288
+ @dataclass
289
+ class StructureModuleConfig:
290
+ """
291
+ Args:
292
+ sequence_dim:
293
+ Single representation channel dimension
294
+ pairwise_dim:
295
+ Pair representation channel dimension
296
+ ipa_dim:
297
+ IPA hidden channel dimension
298
+ resnet_dim:
299
+ Angle resnet (Alg. 23 lines 11-14) hidden channel dimension
300
+ num_heads_ipa:
301
+ Number of IPA heads
302
+ num_qk_points:
303
+ Number of query/key points to generate during IPA
304
+ num_v_points:
305
+ Number of value points to generate during IPA
306
+ dropout_rate:
307
+ Dropout rate used throughout the layer
308
+ num_blocks:
309
+ Number of structure module blocks
310
+ num_transition_layers:
311
+ Number of layers in the single representation transition (Alg. 23 lines 8-9)
312
+ num_resnet_blocks:
313
+ Number of blocks in the angle resnet
314
+ num_angles:
315
+ Number of angles to generate in the angle resnet
316
+ trans_scale_factor:
317
+ Scale of single representation transition hidden dimension
318
+ epsilon:
319
+ Small number used in angle resnet normalization
320
+ inf:
321
+ Large number used for attention masking
322
+ """
323
+
324
+ sequence_dim: int = 384
325
+ pairwise_dim: int = 128
326
+ ipa_dim: int = 16
327
+ resnet_dim: int = 128
328
+ num_heads_ipa: int = 12
329
+ num_qk_points: int = 4
330
+ num_v_points: int = 8
331
+ dropout_rate: float = 0.1
332
+ num_blocks: int = 8
333
+ num_transition_layers: int = 1
334
+ num_resnet_blocks: int = 2
335
+ num_angles: int = 7
336
+ trans_scale_factor: int = 10
337
+ epsilon: float = 1e-8
338
+ inf: float = 1e5
339
+
340
+ def to_dict(self):
341
+ return asdict(self)
342
+
343
+
344
+ def get_default_vocab_list():
345
+ return (
346
+ "<cls>",
347
+ "<pad>",
348
+ "<eos>",
349
+ "<unk>",
350
+ "L",
351
+ "A",
352
+ "G",
353
+ "V",
354
+ "S",
355
+ "E",
356
+ "R",
357
+ "T",
358
+ "I",
359
+ "D",
360
+ "P",
361
+ "K",
362
+ "Q",
363
+ "N",
364
+ "F",
365
+ "Y",
366
+ "M",
367
+ "H",
368
+ "W",
369
+ "C",
370
+ "X",
371
+ "B",
372
+ "U",
373
+ "Z",
374
+ "O",
375
+ ".",
376
+ "-",
377
+ "<null_1>",
378
+ "<mask>",
379
+ )
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cfd963e726f5fe2d272e93f6b52047acb1fb62946f27a521a3acc9f656eb2a9
3
+ size 223642688