Chulaka Gunasekara commited on
Commit
53a5058
·
1 Parent(s): f2e3c77

adding hallucination detection readme

Browse files
Files changed (1) hide show
  1. hallucination_detection/README.md +319 -0
hallucination_detection/README.md ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Hallucination Detection
11
+
12
+ ## Model Summary
13
+
14
+ This is a RAG-specific family of intrinsics fine-tuned for the hallucination detection task. Given a multi-turn conversation between a user and an AI assistant, ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a hallucination label (faithful/partial/unfaithful/NA) for each sentence in the assistant response.
15
+
16
+ We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-2b-instruct and Granite-3.3-8b-instruct, respectively.
17
+
18
+ </br>
19
+
20
+ - **Developer:** IBM Research
21
+ - **Model type:** LoRA adapter for [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) and [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
22
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
23
+
24
+ ## Intended use
25
+ This is a family of hallucination detection intrinsics that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
26
+
27
+ > [!TIP]
28
+ > Note: While you can invoke the hallucination detection intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output). We next describe the input/output of the hallucination detection intrinsics when invoked through granite-common.
29
+
30
+ **Intrinsic input**: The hallucination detection intrinsic takes as input an OpenAI-compatible chat completion request. This request includes: a list of conversation turns that ends with the assistant’s response (the response to be checked for hallucinations) and a list of reference documents that the final assistant response should be grounded on. See the code snippets in the Quickstart Example section below for examples of how to format the chat completion request as a JSON object.
31
+
32
+ **Intrinsic output**: The output of the hallucination detection intrinsic is formatted as the result of the original chat completion request containing the hallucinations detected for the last assistant response. The hallucinations are provided in the form of a JSON array, whose items include the text and begin/end of a response span (sentence) together with the text, faithfulness_likelihood of the response sentence, and the explanation for the faithfulness_likelihood.
33
+
34
+ **Going from input to output**: When calling the intrinsic through granite-common one should follow the steps below to transform the intrinsic input to the corresponding output. These steps are also exemplified in the code snippets included in the Quickstart Example section below. Given an input chat completion request, the request should be passed to the corresponding input processor (also referred to as IntrinsicsRewriter) provided by granite-common. The input processor converts the request to the appropriate format expected by the underlying hallucination detection model. This includes, among others, splitting the last assistant response and the documents into sentences and prepending them with sentence IDs as well as introducing an appropriate task-specific instruction. The input processor's result should then be passed to the underlying hallucination detection model for inference. The model identifies hallucinations using a compact representation consisting of sentence IDs in the last assistant response and documents. This output should finally be passed to the appropriate output processor (also referred to as IntrinsicsResultProcessor) provided by granite-common. The output processor converts the low-level raw model output to the final output by, among others, mapping the sentence IDs back to response and document spans. The result is an application-friendly format ready for consumption by downstream applications.
35
+
36
+ ## Quickstart Example
37
+
38
+ To run the hallucination detection intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face Transformers library. We provide below instructions for each of the two approaches. Note that running inference using vLLM or another scalable OpenAI-compatible inference backend should be significantly faster than using the Hugging Face Transformers library directly.
39
+
40
+ ### Using an OpenAI-Compatible Inference Backend
41
+
42
+ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM, follow the steps below. We recommend using Python 3.11 or higher.
43
+
44
+ 1. Install the granite-common library and nltk:
45
+ ```
46
+ pip install git+https://github.com/ibm-granite/granite-common.git
47
+ pip install nltk
48
+ ```
49
+
50
+ 2. Install the Hugging Face CLI:
51
+ ```
52
+ pip install -U "huggingface_hub[cli]"
53
+ ```
54
+
55
+ 3. Install vLLM:
56
+ ```
57
+ pip install vllm
58
+ ```
59
+
60
+ 4. Download the intrinsics library:
61
+ ```
62
+ hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
63
+ ```
64
+
65
+ 5. Edit the vLLM startup script found in `./rag-intrisics-lib/run_vllm.sh` using your favorite editor:
66
+
67
+ Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the base model on which the desired LoRA adapter has been trained. Optionally, edit the constant `PORT` to change the port on which vLLM will run. Save the modified file and exit the editor.
68
+
69
+ 6. Start vLLM through the startup script. The first time you run the script, you may have to change the permissions to allow execution:
70
+ ```
71
+ cd rag-intrinsics-lib
72
+ chmod u+x ./run_vllm.sh
73
+ ./run_vllm.sh &
74
+ ```
75
+
76
+ 7. Run the following code snippet:
77
+
78
+ ```
79
+ import json
80
+ import openai
81
+ import granite_common
82
+
83
+ intrinsic_name = "hallucination_detection"
84
+
85
+ # Change the following constant to select a different base model
86
+ base_model_name = "granite-3.3-8b-instruct"
87
+
88
+ # Change the following constants as needed to reflect the location of the vLLM server
89
+ # The selected port should be identical to the one you specified in the vLLM startup script
90
+ openai_base_url = "http://p2-r15-n1.bluevela.rmf.ibm.com:55555/v1"
91
+ openai_api_key = "rag_intrinsics_1234"
92
+
93
+ # Fetch IO configuration file from Hugging Face Hub
94
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
95
+ intrinsic_name, base_model_name
96
+ )
97
+
98
+ # Instantiate input/output processors
99
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
100
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
101
+
102
+ # Sample request
103
+ request_json = {
104
+ "messages": [
105
+ {
106
+ "role": "user",
107
+ "content": "What happened to Dennis Wilson of the Beach Boys in 1983?"
108
+ },
109
+ {
110
+ "role": "assistant",
111
+ "content": "On December 28, 1983, Dennis Wilson of the Beach Boys drowned in Marina del Rey while diving from a friend’s boat in an attempt to retrieve belongings he had earlier thrown. Forensic pathologists believed that Dennis experienced shallow-water blackout just before his death."
112
+ }
113
+ ],
114
+ "extra_body": {
115
+ "documents": [
116
+ {
117
+ "doc_id": "0",
118
+ "text": "The Beach Boys are an American rock band formed in Hawthorne, California, in 1961. The group's original lineup consisted of brothers Brian, Dennis, and Carl Wilson; their cousin Mike Love; and their friend Al Jardine. Distinguished by their vocal harmonies and early surf songs, they are one of the most influential acts of the rock era. The band drew on the music of jazz-based vocal groups, 1950s rock and roll, and black R&B to create their unique sound, and with Brian as composer, arranger, producer, and de facto leader, often incorporated classical or jazz elements and unconventional recording techniques in innovative ways. In 1983, tensions between Dennis and Love escalated so high that each obtained a restraining order against each other. With the rest of the band fearing that he would end up like Brian, Dennis was given an ultimatum after his last performance in November 1983 to check into rehab for his alcohol problems or be banned from performing live with them. Dennis checked into rehab for his chance to get sober, but on December 28, 1983, he fatally drowned in Marina del Rey while diving from a friend's boat trying to recover items that he had previously thrown overboard in fits of rage."
119
+ },
120
+ {
121
+ "doc_id": "1",
122
+ "text": "A cigarette smoker since the age of 13, Carl was diagnosed with lung cancer after becoming ill at his vacation home in Hawaii, in early 1997. Despite his illness, Carl continued to perform while undergoing chemotherapy. He played and sang throughout the Beach Boys' entire summer tour which ended in the fall of 1997. During the performances, he sat on a stool, but he stood while singing \"God Only Knows\". Carl died of lung cancer in Los Angeles, surrounded by his family, on February 6, 1998, just two months after the death of his mother, Audree Wilson. He was interred at Westwood Village Memorial Park Cemetery in Los Angeles."
123
+ },
124
+ {
125
+ "doc_id": "2",
126
+ "text": "Carl Dean Wilson (December 21, 1946 - February 6, 1998) was an American musician, singer, and songwriter who co-founded the Beach Boys. He is best remembered as their lead guitarist, as the youngest brother of bandmates Brian and Dennis Wilson, and as the group's de facto leader in the early 1970s. He was also the band's musical director on stage from 1965 until his death. Influenced by the guitar playing of Chuck Berry and the Ventures, Carl's initial role in the group was that of lead guitarist and backing vocals, but he performed lead vocals on several of their later hits, including \"God Only Knows\" (1966), \"Good Vibrations\" (1966), and \"Kokomo\" (1988). By the early 1980s the Beach Boys were in disarray; the band had split into several camps. Frustrated with the band's sluggishness to record new material and reluctance to rehearse, Wilson took a leave of absence in 1981. He quickly recorded and released a solo album, Carl Wilson, composed largely of rock n' roll songs co-written with Myrna Smith-Schilling, a former backing vocalist for Elvis Presley and Aretha Franklin, and wife of Wilson's then-manager Jerry Schilling. The album briefly charted, and its second single, \"Heaven\", reached the top 20 on Billboard's Adult Contemporary chart."
127
+ }
128
+
129
+ ]
130
+ }
131
+ }
132
+
133
+ # Add other parameters
134
+ request_json["model"] = intrinsic_name
135
+ request_json["temperature"] = 0.0
136
+
137
+ # Apply input processor
138
+ rewritten_request = rewriter.transform(request_json)
139
+
140
+ # Run inference
141
+ client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
142
+ chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
143
+
144
+ # Apply output processor
145
+ processed_chat_completion = result_processor.transform(
146
+ chat_completion, rewritten_request
147
+ )
148
+
149
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
150
+ parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
151
+ print("JSON output:")
152
+ print(json.dumps(parsed_contents, indent=2))
153
+ ```
154
+
155
+ ### Using the Hugging Face Transformers Library
156
+
157
+ To run the intrinsic using the Hugging Face Transformers library directly, follow the steps below. We recommend using Python 3.11 or higher.
158
+
159
+ 1. Install the granite-common library and nltk:
160
+ ```
161
+ pip install git+https://github.com/ibm-granite/granite-common.git
162
+ pip install nltk
163
+ ```
164
+
165
+ 2. Install the Hugging Face CLI:
166
+ ```
167
+ pip install -U "huggingface_hub[cli]"
168
+ ```
169
+
170
+ 3. Install PEFT:
171
+ ```
172
+ pip install peft
173
+ ```
174
+
175
+ 4. Install xgrammar:
176
+ ```
177
+ pip install xgrammar
178
+ ```
179
+
180
+ 5. Run the following code snippet:
181
+
182
+ ```
183
+ import json
184
+ import granite_common.util
185
+ import peft
186
+
187
+ intrinsic_name = "hallucination_detection"
188
+
189
+ # Change the following constant to select a different base model
190
+ base_model_name = "granite-3.3-8b-instruct"
191
+
192
+ use_cuda = True # Set to False to use default PyTorch device for this machine + model
193
+
194
+ # Fetch IO configuration file from Hugging Face Hub
195
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
196
+ intrinsic_name, base_model_name
197
+ )
198
+
199
+ # Fetch LoRA directory from Hugging Face Hub
200
+ lora_dir = granite_common.intrinsics.util.obtain_lora(
201
+ intrinsic_name, base_model_name
202
+ )
203
+
204
+ # Instantiate input/output processors
205
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
206
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
207
+
208
+ # Sample request
209
+ request_json = {
210
+ "messages": [
211
+ {
212
+ "role": "user",
213
+ "content": "What is the visibility level of Git Repos and Issue Tracking projects?"
214
+ },
215
+ {
216
+ "role": "assistant",
217
+ "content": "Git Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. Private projects are visible only to project members, internal projects are visible to all users that are logged in to IBM Cloud, and public projects are visible to anyone. By default, new projects are set to private visibility level, which is the most secure for your data."
218
+ }
219
+ ],
220
+ "extra_body": {
221
+ "documents": [
222
+ {
223
+ "doc_id": "0",
224
+ "text": "Git Repos and Issue Tracking is an IBM-hosted component of the Continuous Delivery service. All of the data that you provide to Git Repos and Issue Tracking, including but not limited to source files, issues, pull requests, and project configuration properties, is managed securely within Continuous Delivery. However, Git Repos and Issue Tracking supports various mechanisms for exporting, sending, or otherwise sharing data to users and third parties. The ability of Git Repos and Issue Tracking to share information is typical of many social coding platforms. However, such sharing might conflict with regulatory controls that apply to your business. After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that you deem necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project membership \n\nGit Repos and Issue Tracking is a cloud hosted social coding environment that is available to all Continuous Delivery users. If you are a Git Repos and Issue Tracking project Maintainer or Owner, you can invite any user and group members to the project. IBM Cloud places no restrictions on who you can invite to a project."
225
+ },
226
+ {
227
+ "doc_id": "1",
228
+ "text": "After you create a project in Git Repos and Issue Tracking, but before you entrust any files, issues, records, or other data with the project, review the project settings and change any settings that are necessary to protect your data. Settings to review include visibility levels, email notifications, integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \n\nGit Repos and Issue Tracking projects can have one of the following visibility levels: private, internal, or public. * Private projects are visible only to project members. This setting is the default visibility level for new projects, and is the most secure visibility level for your data. * Internal projects are visible to all users that are logged in to IBM Cloud. * Public projects are visible to anyone. To limit project access to only project members, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the General Settings page, click Visibility > project features > permissions. 3. Locate the Project visibility setting. 4. Select Private, if it is not already selected. 5. Click Save changes. Project email settings \n\nBy default, Git Repos and Issue Tracking notifies project members by way of email about project activities. These emails typically include customer-owned data that was provided to Git Repos and Issue Tracking by users. For example, if a user posts a comment to an issue, Git Repos and Issue Tracking sends an email to all subscribers. The email includes information such as a copy of the comment, the user who posted it, and when the comment was posted. To turn off all email notifications for your project, complete the following steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the **General Settings **page, click Visibility > project features > permissions. 3. Select the Disable email notifications checkbox. 4. Click Save changes. Project integrations and webhooks"
229
+ }
230
+ ]
231
+ }
232
+ }
233
+
234
+ # Add additional parameters
235
+ request_json["model"] = intrinsic_name
236
+ request_json["temperature"] = 0.0
237
+
238
+ # Apply input processor
239
+ rewritten_request = rewriter.transform(request_json)
240
+
241
+ # Load the base model and merge LoRA weights
242
+ model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
243
+ if use_cuda:
244
+ model = model.cuda()
245
+
246
+ # Convert the chat completion request into a the Transformers library's proprietary
247
+ # format.
248
+ generate_input, other_input = (
249
+ granite_common.util.chat_completion_request_to_transformers_inputs(
250
+ rewritten_request,
251
+ tokenizer,
252
+ model,
253
+ )
254
+ )
255
+
256
+ # Use the Transformers library's APIs to generate one or more completions,
257
+ # then convert those completions into OpenAI-compatible chat completion
258
+ responses = granite_common.util.generate_with_transformers(
259
+ tokenizer, model, generate_input, other_input
260
+ )
261
+
262
+ # Apply output processor
263
+ transformed_responses = result_processor.transform(responses, rewritten_request)
264
+
265
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
266
+ parsed_contents = json.loads(transformed_responses.choices[0].message.content)
267
+ print("JSON output:")
268
+ print(json.dumps(parsed_contents, indent=2))
269
+ ```
270
+
271
+ ## Training Details
272
+
273
+ The process of generating the training data for the hallucination detection intrinsic consisted of two main steps:
274
+
275
+ - **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpus. For details on the RAG conversation generation process, please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500).
276
+
277
+ - **Faithfulness label generation:** For creating the faithfulness labels for the responses, we used a multi-step synthetic hallucination label and reasoning generation pipeline.
278
+ This process resulted in ~50K data instances, which were used to train the LoRA adapter.
279
+
280
+
281
+
282
+ ### Training Data
283
+
284
+
285
+
286
+ The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:
287
+
288
+ - [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages
289
+
290
+ - [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
291
+
292
+ - [QuAC](https://huggingface.co/datasets/allenai/quac)
293
+
294
+
295
+
296
+
297
+ ## Evaluation
298
+
299
+ We evaluated the LoRA adapter on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the LoRA adapter and the methods reported in the RAGTruth paper. The responses that obtain a faithfulness labels `partial` or `unfaithful` for at least one sentence are considered as hallucinated responses.
300
+
301
+
302
+
303
+ The results are shown in the table below. The results for the baselines are extracted from the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) paper.
304
+
305
+
306
+ | Model | Precision | Recall | F1 |
307
+ |---|---|---|---|
308
+ | GPT 4o mini (prompted) | 46.8 | 59.6 | 52.4 |
309
+ | GPT 4o (prompted) | 49.5 | 60.1 | 54.3 |
310
+ | gpt-4-turbo (prompted) | 33.2 | 90.6 | 45.6 |
311
+ | [SelfCheckGPT](https://aclanthology.org/2023.emnlp-main.557.pdf) | 35.0 | 58.0 | 43.7 |
312
+ | [LMvLM](https://aclanthology.org/2023.emnlp-main.778.pdf) | 18.7 | 76.9 | 30.1 |
313
+ | Granite 3.3-2b_hallucination-detection_LoRA | 55.8 | 74.9 | 63.9 |
314
+ | Granite 3.3-8b_hallucination-detection_LoRA | 58.1 | 77.6 | 66.5 |
315
+
316
+
317
+ ## Model Card Author
318
+
319
+ [Chulaka Gunasekara](mailto:[email protected])