Update README_EN.md
Browse files- README_EN.md +3 -71
README_EN.md
CHANGED
|
@@ -124,75 +124,7 @@ Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC
|
|
| 124 |
|
| 125 |
|
| 126 |
|
| 127 |
-
# 4.
|
| 128 |
-
|
| 129 |
-
| Name | Download | Quantity | Description |
|
| 130 |
-
| ---------------------- | ------------------------------------------------------------ | -------- | ------------------------------------------------------------ |
|
| 131 |
-
| InstructIE | [Google drive](https://drive.google.com/file/d/1raf0h98x3GgIhaDyNn1dLle9_HvwD6wT/view?usp=sharing) <br/> [Baidu Netdisk](https://pan.baidu.com/s/1-u8bD85H1Otbzk-gjLxaFw?pwd=c1i6) | 20w+ | InstrumentIE dataset (bilingual in Chinese and English) |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
The `InstructIE` dataset contains two core files: `InstructIE-zh.json` and `InstructIE-en.json`. Both files cover a range of fields that provide detailed descriptions of different aspects of the dataset:
|
| 136 |
-
|
| 137 |
-
- `'id'`: A unique identifier for each data entry, ensuring the independence and traceability of the data items.
|
| 138 |
-
- `'cate'`: The text's subject category, which provides a high-level categorical label for the content (there are 12 categories in total).
|
| 139 |
-
-'text ': The text to be extracted.
|
| 140 |
-
- `'relation'`: Represent **relationship triples**, respectively. These fields allow users to freely construct instructions and expected outputs for information extraction.
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
<details>
|
| 145 |
-
<summary><b>Explanation of each field</b></summary>
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
| Field | Description |
|
| 149 |
-
| ----------- | ---------------------------------------------------------------- |
|
| 150 |
-
| id | The unique identifier for each data point. |
|
| 151 |
-
| cate | The category of the text's subject, with a total of 12 different thematic categories. |
|
| 152 |
-
| input | The input text for the model, with the goal of extracting all the involved relationship triples. |
|
| 153 |
-
| instruction | Instructions guiding the model to perform information extraction tasks. |
|
| 154 |
-
| output | The expected output result of the model. |
|
| 155 |
-
| relation | Describes the relationship triples contained in the text, i.e., the connections between entities (head, relation, tail). |
|
| 156 |
-
|
| 157 |
-
</details>
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
<details>
|
| 161 |
-
<summary><b>Example of data</b></summary>
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
```json
|
| 165 |
-
{
|
| 166 |
-
"id": "6e4f87f7f92b1b9bd5cb3d2c3f2cbbc364caaed30940a1f8b7b48b04e64ec403",
|
| 167 |
-
"cate": "Person",
|
| 168 |
-
"input": "Dionisio Pérez Gutiérrez (born 1872 in Grazalema (Cádiz) - died 23 February 1935 in Madrid) was a Spanish writer, journalist, and gastronome. He has been called \"one of Spain's most authoritative food writers\" and was an early adopter of the term Hispanidad.\nHis pen name, \"Post-Thebussem\", was chosen as a show of support for Mariano Pardo de Figueroa, who went by the handle \"Dr. Thebussem\".",
|
| 169 |
-
"entity": [
|
| 170 |
-
{"entity": "Dionisio Pérez Gutiérrez", "entity_type": "human"},
|
| 171 |
-
{"entity": "Post-Thebussem", "entity_type": "human"},
|
| 172 |
-
{"entity": "Grazalema", "entity_type": "geographic_region"},
|
| 173 |
-
{"entity": "Cádiz", "entity_type": "geographic_region"},
|
| 174 |
-
{"entity": "Madrid", "entity_type": "geographic_region"},
|
| 175 |
-
{"entity": "gastronome", "entity_type": "event"},
|
| 176 |
-
{"entity": "Spain", "entity_type": "geographic_region"},
|
| 177 |
-
{"entity": "Hispanidad", "entity_type": "architectural_structure"},
|
| 178 |
-
{"entity": "Mariano Pardo de Figueroa", "entity_type": "human"},
|
| 179 |
-
{"entity": "23 February 1935", "entity_type": "time"}
|
| 180 |
-
],
|
| 181 |
-
"relation": [
|
| 182 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "country of citizenship", "tail": "Spain"},
|
| 183 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "place of birth", "tail":"Grazalema"},
|
| 184 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "place of death", "tail": "Madrid"},
|
| 185 |
-
{"head": "Mariano Pardo de Figueroa", "relation": "country of citizenship", "tail": "Spain"},
|
| 186 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "alternative name", "tail": "Post-Thebussem"},
|
| 187 |
-
{"head": "Dionisio Pérez Gutiérrez", "relation": "date of death", "tail": "23 February 1935"}
|
| 188 |
-
]
|
| 189 |
-
}
|
| 190 |
-
```
|
| 191 |
-
|
| 192 |
-
</details>
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
# 5.Convert script
|
| 196 |
|
| 197 |
**Training Data Transformation**
|
| 198 |
|
|
@@ -306,7 +238,7 @@ After data conversion, you will obtain structured data containing the `input` te
|
|
| 306 |
|
| 307 |
|
| 308 |
|
| 309 |
-
#
|
| 310 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
| 311 |
|
| 312 |
```bash
|
|
@@ -322,7 +254,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
|
|
| 322 |
|
| 323 |
|
| 324 |
|
| 325 |
-
#
|
| 326 |
|
| 327 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
| 328 |
|
|
|
|
| 124 |
|
| 125 |
|
| 126 |
|
| 127 |
+
# 4.Convert script
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
**Training Data Transformation**
|
| 130 |
|
|
|
|
| 238 |
|
| 239 |
|
| 240 |
|
| 241 |
+
# 5.Usage
|
| 242 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
| 243 |
|
| 244 |
```bash
|
|
|
|
| 254 |
|
| 255 |
|
| 256 |
|
| 257 |
+
# 6.Evaluate
|
| 258 |
|
| 259 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
| 260 |
|