ibm-granite
/

rag-intrinsics-lib

Safetensors

Model card Files Files and versions

xet

Community

Yannis Katsis commited on Oct 9

Commit

55473a2

1 Parent(s): 2c0ff00

Update README.md for citations

Browse files

Files changed (1) hide show

citations/README.md +8 -17

citations/README.md CHANGED Viewed

@@ -3,18 +3,13 @@ license: apache-2.0
 language:
 - en
 pipeline_tag: text-generation
-base_model: ibm-granite/granite-3.3-8b-instruct
 library_name: peft
 library_name: transformers
 ---
 # Intrinsics for Citation Generation
-The following are experimental releases. They are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next – we'll keep an eye out for feedback and questions. Happy exploring!
-Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.
-# Model Summary
 This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
 1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
@@ -29,7 +24,7 @@ We provide two intrinsics implemented as LoRA adapters trained over Granite-3.3-
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Intended use
-This is a a family of citation generation intrinsics that give the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. They can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
 > [!TIP]
 > Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
@@ -42,7 +37,7 @@ This is a a family of citation generation intrinsics that give the ability to ge
 ## Quickstart Example
-To run the citation generation intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face transformers library. We provide below instructions for each of the two approaches. Note that running inference using vLLM or another scalable OpenAI-compatible inference backend should be significantly faster than using the Hugging Face transformers library directly.
 ### Using an OpenAI-Compatible Inference Backend
@@ -50,8 +45,7 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
 1. Install the granite-common library:
    ```
-   pip install git+https://github.com/ibm-granite/granite-common.git
-   pip install granite_common[nltk]
    ```
 2. Install the Hugging Face CLI:
@@ -137,8 +131,7 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
    request_json["temperature"] = 0.0
    # Apply input processor
-   intrinsic_kwargs = {}
-   rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
    # Run inference
    client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
@@ -157,12 +150,11 @@ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
 ### Using the Hugging Face Transformers Library
-To run the intrinsic using the Hugging Face transformers library directly, follow the steps below. We recommend using Python 3.11 or higher.
 1. Install the granite-common library:
    ```
-   pip install git+https://github.com/ibm-granite/granite-common.git
-   pip install granite_common[nltk]
    ```
 2. Install the Hugging Face CLI:
@@ -239,8 +231,7 @@ To run the intrinsic using the Hugging Face transformers library directly, follo
    request_json["temperature"] = 0.0
    # Apply input processor
-   intrinsic_kwargs = {}
-   rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
    # Load the base model and merge LoRA weights
    model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)

 language:
 - en
 pipeline_tag: text-generation
 library_name: peft
 library_name: transformers
 ---
 # Intrinsics for Citation Generation
+## Model Summary
 This is a RAG-specific family of intrinsics fine-tuned for the citation generation task. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, each intrinsic in the family generates citations for the last assistant response from the provided documents/passages. The intrinsic has the following features:
 1. **Fine-grained citations:** The intrinsic generates citations for each sentence in the assistant response (when available). Moreover, each citation consists of a set of sentences from the documents/passages that support the corresponding sentence in the assistant response.
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ## Intended use
+This is a family of citation generation intrinsics that give the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. They can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
 > [!TIP]
 > Note: While you can invoke a citation generation intrinsic directly, it is strongly recommended to call it through [granite-common](https://github.com/ibm-granite/granite-common), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the intrinsic as well as validating the intrinsic's output and transforming the returned sentence IDs into spans over the documents and the response). We next describe the input/output of the citation generation intrinsics when invoked through granite-common.
 ## Quickstart Example
+To run the citation generation intrinsics through granite-common, you can either (a) use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging Face Transformers library. We provide below instructions for each of the two approaches. Note that running inference using vLLM or another scalable OpenAI-compatible inference backend should be significantly faster than using the Hugging Face Transformers library directly.
 ### Using an OpenAI-Compatible Inference Backend
 1. Install the granite-common library:
    ```
+   pip install granite-common[nltk]
    ```
 2. Install the Hugging Face CLI:
    request_json["temperature"] = 0.0
    # Apply input processor
+   rewritten_request = rewriter.transform(request_json)
    # Run inference
    client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
 ### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face Transformers library directly, follow the steps below. We recommend using Python 3.11 or higher.
 1. Install the granite-common library:
    ```
+   pip install granite-common[nltk]
    ```
 2. Install the Hugging Face CLI:
    request_json["temperature"] = 0.0
    # Apply input processor
+   rewritten_request = rewriter.transform(request_json)
    # Load the base model and merge LoRA weights
    model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)