expand abbreviations of TLD.
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ The training data used is
|
|
| 40 |
#### Preprocessing
|
| 41 |
The following filtering is done
|
| 42 |
- Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
|
| 43 |
-
- Whitelist-style filtering using
|
| 44 |
|
| 45 |
#### Training Hyperparameters
|
| 46 |
|
|
|
|
| 40 |
#### Preprocessing
|
| 41 |
The following filtering is done
|
| 42 |
- Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
|
| 43 |
+
- Whitelist-style filtering using the top level domain of URL to remove affiliate sites.
|
| 44 |
|
| 45 |
#### Training Hyperparameters
|
| 46 |
|