add README
Browse files
README.md
CHANGED
|
@@ -1,3 +1,87 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-sa-4.0
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-sa-4.0
|
| 3 |
+
language:
|
| 4 |
+
- ja
|
| 5 |
---
|
| 6 |
+
# Model card for model ID
|
| 7 |
+
|
| 8 |
+
This is a T5 v1.1 model, pre-trained on a Japanese corpus.
|
| 9 |
+
|
| 10 |
+
## Model details
|
| 11 |
+
|
| 12 |
+
T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5.
|
| 13 |
+
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
|
| 14 |
+
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
|
| 15 |
+
- no parameter sharing between embedding and classifier layer
|
| 16 |
+
- "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.
|
| 17 |
+
|
| 18 |
+
This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used.
|
| 19 |
+
|
| 20 |
+
### Model Description
|
| 21 |
+
|
| 22 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 23 |
+
|
| 24 |
+
- **Developed by:** Retrieva, Inc.
|
| 25 |
+
- **Model type:** T5 v1.1
|
| 26 |
+
- **Language(s) (NLP):** Japanese
|
| 27 |
+
- **License:** CC-BY-SA 4.0
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## Training Details
|
| 31 |
+
|
| 32 |
+
We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format.
|
| 33 |
+
|
| 34 |
+
## Training Data
|
| 35 |
+
|
| 36 |
+
The training data used is
|
| 37 |
+
- The Japanese part of the multilingual C4(mC4/ja).
|
| 38 |
+
- Japanese Wikipedia(20220920).
|
| 39 |
+
|
| 40 |
+
#### Preprocessing
|
| 41 |
+
The following filtering is done
|
| 42 |
+
- Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
|
| 43 |
+
- Whitelist-style filtering using TLD of URL to remove affiliate sites.
|
| 44 |
+
|
| 45 |
+
#### Training Hyperparameters
|
| 46 |
+
|
| 47 |
+
- dropout rate: 0.0
|
| 48 |
+
- batch size: 256
|
| 49 |
+
- fp32
|
| 50 |
+
- input length: 512
|
| 51 |
+
- output length: 114
|
| 52 |
+
|
| 53 |
+
- Otherwise, the default value of T5X (https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin) is followed, including the following.
|
| 54 |
+
- optimizer: Adafactor
|
| 55 |
+
- base_learning_rate: 1.0
|
| 56 |
+
- warmup steps: 10000
|
| 57 |
+
|
| 58 |
+
#### Speeds, Sizes, Times
|
| 59 |
+
|
| 60 |
+
We trained 2097152 steps.
|
| 61 |
+
|
| 62 |
+
## Technical Specifications
|
| 63 |
+
|
| 64 |
+
### Model Architecture and Objective
|
| 65 |
+
Model architecture.
|
| 66 |
+
- T5 v1.1(https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511)
|
| 67 |
+
- Size: Large(~770 million parameters)
|
| 68 |
+
|
| 69 |
+
### Compute Infrastructure
|
| 70 |
+
|
| 71 |
+
Google Cloud TPU v4-8.
|
| 72 |
+
|
| 73 |
+
#### Software
|
| 74 |
+
|
| 75 |
+
- T5X(https://github.com/google-research/t5x).
|
| 76 |
+
|
| 77 |
+
## More Information
|
| 78 |
+
|
| 79 |
+
https://note.com/retrieva/n/n7b4186dc5ada (in Japanese)
|
| 80 |
+
|
| 81 |
+
## Model Card Authors
|
| 82 |
+
|
| 83 |
+
Jiro Nishitoba
|
| 84 |
+
|
| 85 |
+
## Model Card Contact
|
| 86 |
+
|
| 87 |