Yannis Katsis
commited on
Commit
·
55473a2
1
Parent(s):
2c0ff00
Update README.md for citations
Browse files- citations/README.md +8 -17
citations/README.md
CHANGED
|
@@ -3,18 +3,13 @@ license: apache-2.0
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
-
base_model: ibm-granite/granite-3.3-8b-instruct
|
| 7 |
library_name: peft
|
| 8 |
library_name: transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
# Intrinsics for Citation Generation
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.
|
| 16 |
-
|
| 17 |
-
# Model Summary
|
| 18 |
|
| 19 |
This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
|
| 20 |
1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
|
|
@@ -29,7 +24,7 @@ We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-
|
|
| 29 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 30 |
|
| 31 |
## Intended use
|
| 32 |
-
This is a
|
| 33 |
|
| 34 |
> [!TIP]
|
| 35 |
> Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
|
|
@@ -42,7 +37,7 @@ This is a a family of citation generation intrinsics that give the ability to ge
|
|
| 42 |
|
| 43 |
## Quickstart Example
|
| 44 |
|
| 45 |
-
To run the citation generation intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face
|
| 46 |
|
| 47 |
### Using an OpenAI-Compatible Inference Backend
|
| 48 |
|
|
@@ -50,8 +45,7 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
|
|
| 50 |
|
| 51 |
1. Install the granite-common library:
|
| 52 |
```
|
| 53 |
-
pip install
|
| 54 |
-
pip install granite_common[nltk]
|
| 55 |
```
|
| 56 |
|
| 57 |
2. Install the Hugging Face CLI:
|
|
@@ -137,8 +131,7 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
|
|
| 137 |
request_json["temperature"] = 0.0
|
| 138 |
|
| 139 |
# Apply input processor
|
| 140 |
-
|
| 141 |
-
rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
|
| 142 |
|
| 143 |
# Run inference
|
| 144 |
client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
|
|
@@ -157,12 +150,11 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
|
|
| 157 |
|
| 158 |
### Using the Hugging Face Transformers Library
|
| 159 |
|
| 160 |
-
To run the intrinsic using the Hugging Face
|
| 161 |
|
| 162 |
1. Install the granite-common library:
|
| 163 |
```
|
| 164 |
-
pip install
|
| 165 |
-
pip install granite_common[nltk]
|
| 166 |
```
|
| 167 |
|
| 168 |
2. Install the Hugging Face CLI:
|
|
@@ -239,8 +231,7 @@ To run the intrinsic using the Hugging Face transformers library directly, follo
|
|
| 239 |
request_json["temperature"] = 0.0
|
| 240 |
|
| 241 |
# Apply input processor
|
| 242 |
-
|
| 243 |
-
rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
|
| 244 |
|
| 245 |
# Load the base model and merge LoRA weights
|
| 246 |
model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
|
|
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
pipeline_tag: text-generation
|
|
|
|
| 6 |
library_name: peft
|
| 7 |
library_name: transformers
|
| 8 |
---
|
| 9 |
|
| 10 |
# Intrinsics for Citation Generation
|
| 11 |
|
| 12 |
+
## Model Summary
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
|
| 15 |
1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
|
|
|
|
| 24 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 25 |
|
| 26 |
## Intended use
|
| 27 |
+
This is a family of citation generation intrinsics that give the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. They can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
|
| 28 |
|
| 29 |
> [!TIP]
|
| 30 |
> Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
|
|
|
|
| 37 |
|
| 38 |
## Quickstart Example
|
| 39 |
|
| 40 |
+
To run the citation generation intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face Transformers library. We provide below instructions for each of the two approaches. Note that running inference using vLLM or another scalable OpenAI-compatible inference backend should be significantly faster than using the Hugging Face Transformers library directly.
|
| 41 |
|
| 42 |
### Using an OpenAI-Compatible Inference Backend
|
| 43 |
|
|
|
|
| 45 |
|
| 46 |
1. Install the granite-common library:
|
| 47 |
```
|
| 48 |
+
pip install granite-common[nltk]
|
|
|
|
| 49 |
```
|
| 50 |
|
| 51 |
2. Install the Hugging Face CLI:
|
|
|
|
| 131 |
request_json["temperature"] = 0.0
|
| 132 |
|
| 133 |
# Apply input processor
|
| 134 |
+
rewritten_request = rewriter.transform(request_json)
|
|
|
|
| 135 |
|
| 136 |
# Run inference
|
| 137 |
client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
|
|
|
|
| 150 |
|
| 151 |
### Using the Hugging Face Transformers Library
|
| 152 |
|
| 153 |
+
To run the intrinsic using the Hugging Face Transformers library directly, follow the steps below. We recommend using Python 3.11 or higher.
|
| 154 |
|
| 155 |
1. Install the granite-common library:
|
| 156 |
```
|
| 157 |
+
pip install granite-common[nltk]
|
|
|
|
| 158 |
```
|
| 159 |
|
| 160 |
2. Install the Hugging Face CLI:
|
|
|
|
| 231 |
request_json["temperature"] = 0.0
|
| 232 |
|
| 233 |
# Apply input processor
|
| 234 |
+
rewritten_request = rewriter.transform(request_json)
|
|
|
|
| 235 |
|
| 236 |
# Load the base model and merge LoRA weights
|
| 237 |
model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
|