Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

.gitattributes +1 -0
README.md +355 -3
added_tokens.json +9 -0
config.json +59 -0
examples/image.png +3 -0
examples/image_mask.png +0 -0
generation_config.json +9 -0
infer.py +235 -0
pytorch_model-00001-of-00002.bin +3 -0
pytorch_model-00002-of-00002.bin +3 -0
pytorch_model.bin.index.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +35 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+examples/image.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,355 @@
----
-license: mit
----

+---
+license: mit
+---
+# LEGION-8B-replicate
+## Overview
+Since the project [LEGION: Learning to Ground and Explain for Synthetic Image Detection](https://arxiv.org/abs/2503.15264) open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights.
+> [!NOTE]
+> Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks.
+### Training Details
+We conducted training on 4x A100 40G GPUs.
+For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4.
+For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512.
+### Inference Usage
+A simple inference script is provided at [infer.py](./infer.py).
+Usage instructions are as follows:
+```bash
+cp infer.py /path/to/LEGION
+python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results
+```
+### Examples
+<table>
+  <tr>
+    <td><img src="./examples/image.png" alt="Original Image" style="max-width:100%;"></td>
+    <td><img src="./examples/image_mask.png" alt="Mask generated by LEGION-8B-replicate" style="max-width:100%;"></td>
+  </tr>
+</table>
+Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed.
+## Performance
+> [!NOTE]
+> Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate.
+> The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores.
+### Localization
+<table>
+  <tr>
+    <th rowspan="2">Method</th>
+    <th colspan="2">SynthScars</th>
+    <th colspan="2">LOKI</th>
+    <th colspan="2">RichHF-18K</th>
+  </tr>
+  <tr>
+    <th>mIoU</th>
+    <th>F1</th>
+    <th>mIoU</th>
+    <th>F1</th>
+    <th>mIoU</th>
+    <th>F1</th>
+  </tr>
+  <tr>
+    <td>HiFi-Net</td>
+    <td>45.65</td>
+    <td>0.57</td>
+    <td>39.60</td>
+    <td>2.41</td>
+    <td>44.96</td>
+    <td>0.39</td>
+  </tr>
+  <tr>
+    <td>TruFor</td>
+    <td>48.60</td>
+    <td>15.29</td>
+    <td>46.55</td>
+    <td>16.70</td>
+    <td>48.41</td>
+    <td>18.03</td>
+  </tr>
+  <tr>
+    <td>PAL4VST</td>
+    <td>56.10</td>
+    <td>29.21</td>
+    <td>47.34</td>
+    <td>11.58</td>
+    <td>49.88</td>
+    <td>14.78</td>
+  </tr>
+  <tr>
+    <td>Ferret</td>
+    <td>27.09</td>
+    <td>15.24</td>
+    <td>24.50</td>
+    <td>18.88</td>
+    <td>26.52</td>
+    <td>16.22</td>
+  </tr>
+  <tr>
+    <td>Griffon</td>
+    <td>27.68</td>
+    <td>16.67</td>
+    <td>21.96</td>
+    <td>20.41</td>
+    <td>28.13</td>
+    <td>18.19</td>
+  </tr>
+  <tr>
+    <td>LISA-v1-7B</td>
+    <td>34.51</td>
+    <td>18.77</td>
+    <td>31.10</td>
+    <td>9.29</td>
+    <td>35.90</td>
+    <td>21.94</td>
+  </tr>
+  <tr>
+    <td>InternVL2-8B</td>
+    <td>41.25</td>
+    <td>6.39</td>
+    <td>42.03</td>
+    <td>10.06</td>
+    <td>39.90</td>
+    <td>9.58</td>
+  </tr>
+  <tr>
+    <td>Qwen2-VL-72B</td>
+    <td>30.20</td>
+    <td>17.50</td>
+    <td>26.62</td>
+    <td>20.99</td>
+    <td>27.58</td>
+    <td>19.02</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Official)</td>
+    <td>58.13</td>
+    <td>34.54</td>
+    <td>48.66</td>
+    <td>16.71</td>
+    <td>50.07</td>
+    <td>17.41</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Replicate)</td>
+    <td>23.92</td>
+    <td>33.47</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+  </tr>
+</table>
+### Explanation
+<table>
+  <tr>
+    <th rowspan="2">Method</th>
+    <th rowspan="2">Params</th>
+    <th colspan="2">SynthScars</th>
+    <th colspan="2">LOKI</th>
+  </tr>
+  <tr>
+    <th>ROUGE-L ↑</th>
+    <th>CSS ↑</th>
+    <th>ROUGE-L ↑</th>
+    <th>CSS ↑</th>
+  </tr>
+  <tr>
+    <td>Qwen2-VL</td>
+    <td>72B</td>
+    <td>25.84</td>
+    <td>58.15</td>
+    <td>11.80</td>
+    <td>37.64</td>
+  </tr>
+  <tr>
+    <td>LLaVA-v1.6</td>
+    <td>7B</td>
+    <td>29.61</td>
+    <td>61.75</td>
+    <td>16.07</td>
+    <td>41.07</td>
+  </tr>
+  <tr>
+    <td>InternVL2</td>
+    <td>8B</td>
+    <td>25.93</td>
+    <td>56.89</td>
+    <td>10.10</td>
+    <td>39.62</td>
+  </tr>
+  <tr>
+    <td>Deepseek-VL2</td>
+    <td>27B</td>
+    <td>25.50</td>
+    <td>47.77</td>
+    <td>6.70</td>
+    <td>28.76</td>
+  </tr>
+  <tr>
+    <td>GPT-4o</td>
+    <td>-</td>
+    <td>22.43</td>
+    <td>53.55</td>
+    <td>9.61</td>
+    <td>38.98</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Official)</td>
+    <td>8B</td>
+    <td>39.50</td>
+    <td>72.60</td>
+    <td>18.55</td>
+    <td>45.96</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Replicate)</td>
+    <td>8B</td>
+    <td>50.57</td>
+    <td>-</td>
+    <td>-</td>
+    <td>-</td>
+  </tr>
+</table>
+### Detection
+<table>
+  <tr>
+    <th rowspan="2">Method</th>
+    <th rowspan="2">GANs</th>
+    <th rowspan="2">Deepfakes</th>
+    <th colspan="2">Perceptual Loss</th>
+    <th colspan="2">Low Level Vision</th>
+    <th rowspan="2">Diffusion</th>
+  </tr>
+  <tr>
+    <th>CRN</th>
+    <th>IMLE</th>
+    <th>SITD</th>
+    <th>SAN</th>
+  </tr>
+  <tr>
+    <td>Co-occurence</td>
+    <td>75.17</td>
+    <td>59.14</td>
+    <td>73.06</td>
+    <td>87.21</td>
+    <td>68.98</td>
+    <td>60.42</td>
+    <td>85.53</td>
+  </tr>
+  <tr>
+    <td>Freq-spec</td>
+    <td>75.28</td>
+    <td>45.18</td>
+    <td>53.61</td>
+    <td>50.98</td>
+    <td>47.46</td>
+    <td>57.12</td>
+    <td>69.00</td>
+  </tr>
+  <tr>
+    <td>CNNSpot</td>
+    <td>85.29</td>
+    <td>53.47</td>
+    <td>86.31</td>
+    <td>86.26</td>
+    <td>66.67</td>
+    <td>48.69</td>
+    <td>58.63</td>
+  </tr>
+  <tr>
+    <td>Patchfor</td>
+    <td>69.97</td>
+    <td>75.54</td>
+    <td>72.33</td>
+    <td>55.30</td>
+    <td>75.14</td>
+    <td>75.28</td>
+    <td>72.54</td>
+  </tr>
+  <tr>
+    <td>UniFD</td>
+    <td>95.25</td>
+    <td>66.60</td>
+    <td>59.50</td>
+    <td>72.00</td>
+    <td>63.00</td>
+    <td>57.50</td>
+    <td>82.02</td>
+  </tr>
+  <tr>
+    <td>LDGard</td>
+    <td>89.17</td>
+    <td>58.00</td>
+    <td>50.74</td>
+    <td>50.78</td>
+    <td>62.50</td>
+    <td>50.00</td>
+    <td>89.79</td>
+  </tr>
+  <tr>
+    <td>FreqNet</td>
+    <td>94.23</td>
+    <td>97.40</td>
+    <td>71.92</td>
+    <td>67.35</td>
+    <td>88.92</td>
+    <td>59.04</td>
+    <td>83.34</td>
+  </tr>
+  <tr>
+    <td>NPR</td>
+    <td>94.16</td>
+    <td>76.89</td>
+    <td>50.00</td>
+    <td>50.00</td>
+    <td>66.94</td>
+    <td>98.63</td>
+    <td>94.54</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Official)</td>
+    <td>97.01</td>
+    <td>63.37</td>
+    <td>90.78</td>
+    <td>98.93</td>
+    <td>79.44</td>
+    <td>57.76</td>
+    <td>83.10</td>
+  </tr>
+  <tr style="background-color: #e6ffe6;">
+    <td>LEGION (Replicate)</td>
+    <td>91.48</td>
+    <td>79.16</td>
+    <td>84.73</td>
+    <td>96.71</td>
+    <td>78.06</td>
+    <td>53.70</td>
+    <td>-</td>
+  </tr>
+</table>
+## Acknowledgements
+Thanks to [Gennadiyev](https://github.com/Gennadiyev) for providing computational resources and moral support, and for helping me complete the reproduction.
+Thanks to [draw-your-dream/LEGION](https://github.com/draw-your-dream/LEGION/tree/main) for fixing bugs in the first-stage training.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "</p>": 32006,
+  "<bbox>": 32002,
+  "<im_end>": 32001,
+  "<im_start>": 32000,
+  "<p>": 32005,
+  "<point>": 32003,
+  "[SEG]": 32004
+}

config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "architectures": [
+    "LegionForCls"
+  ],
+  "bbox_token_idx": 32002,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "freeze_mlp_adapter": true,
+  "freeze_mm_mlp_adapter": false,
+  "freeze_mm_vision_resampler": false,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "image_aspect": "square",
+  "image_aspect_ratio": "square",
+  "image_grid_pinpoints": null,
+  "image_grid_points": null,
+  "initializer_range": 0.02,
+  "intermediate_size": 11008,
+  "max_length": 4096,
+  "max_position_embeddings": 4096,
+  "mm_hidden_size": 1024,
+  "mm_projector_type": "mlp2x_gelu",
+  "mm_resampler_type": null,
+  "mm_use_im_patch_token": false,
+  "mm_use_im_start_end": true,
+  "mm_use_image_start_end": true,
+  "mm_vision_module": "openai/clip-vit-large-patch14-336",
+  "mm_vision_select_feature": "patch",
+  "mm_vision_select_layer": -2,
+  "mm_vision_tower": "openai/clip-vit-large-patch14-336",
+  "model_type": "llava",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 32,
+  "num_level_reg_features": 4,
+  "num_reg_features": 4,
+  "out_dim": 256,
+  "pad_token_id": 0,
+  "pretrain_mm_mlp_adapter": null,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "select_feature_type": "patch",
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "train_mask_decoder": true,
+  "transformers_version": "4.28.0",
+  "tune_mlp_adapter": false,
+  "tune_mm_mlp_adapter": false,
+  "tune_mm_vision_resampler": false,
+  "unfreeze_mm_vision_tower": false,
+  "use_cache": false,
+  "use_image_patch_token": false,
+  "use_mm_proj": true,
+  "vision_module": "openai/clip-vit-large-patch14-336",
+  "vision_tower": "openai/clip-vit-large-patch14-336",
+  "vocab_size": 32007,
+  "with_region": true
+}

examples/image.png ADDED Viewed

Git LFS Details

SHA256: 779932f4595b0795ae02a113f363f78b0ce07e7a85a4a0b9c53adad6981ff7ae
Pointer size: 131 Bytes
Size of remote file: 372 kB

examples/image_mask.png ADDED Viewed

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "max_length": 4096,
+  "pad_token_id": 0,
+  "transformers_version": "4.28.0",
+  "use_cache": false
+}

infer.py ADDED Viewed

	@@ -0,0 +1,235 @@

+import argparse
+import os
+import re
+import bleach
+import cv2
+import jsonlines
+import numpy as np
+import torch
+from loguru import logger
+from PIL import Image
+from tqdm import tqdm
+from transformers import AutoTokenizer, CLIPImageProcessor, PreTrainedTokenizer
+from eval.utils import grounding_image_ecoder_preprocess
+from model.Legion import LegionForCls
+from model.llava import conversation as conversation_lib
+from model.llava.mm_utils import tokenizer_image_token
+from model.SAM.utils.transforms import ResizeLongestSide
+from tools.utils import DEFAULT_IM_END_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
+def parse_args():
+    parser = argparse.ArgumentParser(description="LEGION Inference")
+    # model related
+    parser.add_argument("--model_path", required=True, help="The directory to your legion ckpt")
+    parser.add_argument("--image_size", default=1024, type=int, help="image size")
+    parser.add_argument("--model_max_length", default=512, type=int)
+    # data related
+    parser.add_argument("--image_root", required=True, help="The directory containing images to run inference.")
+    parser.add_argument("--save_root", required=True, help="The directory to store the inference result.")
+    args = parser.parse_args()
+    return args
+class LEGION:
+    """A simple wrapper for LEGION model loading and inference.
+    Args:
+        model_path (str): Path to the model checkpoint.
+        image_size (int): Size of the input images.
+        model_max_length (int): Maximum length of the model input sequence.
+    """
+    INSTRUCTION = (
+        "Please provide a detailed analysis of artifacts in this photo, considering "
+        "physical artifacts (e.g., optical display issues, violations of physical laws, "
+        "and spatial/perspective errors), structural artifacts (e.g., deformed objects, asymmetry, or distorted text), "
+        "and distortion artifacts (e.g., color/texture distortion, noise/blur, artistic style errors, and material misrepresentation). "
+        "Output with interleaved segmentation masks for the corresponding parts of the answer."
+    )
+    def __init__(self, model_path: str, image_size: int = 1024, model_max_length: int = 512):
+        self.model_path = model_path
+        self.image_size = image_size
+        self.model_max_length = model_max_length
+        # load tokenizer
+        self.tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(
+            self.model_path,
+            cache_dir=None,
+            model_max_length=self.model_max_length,
+            padding_side="right",
+            use_fast=False
+        )
+        self.tokenizer.pad_token = self.tokenizer.unk_token
+        seg_token_idx = self.tokenizer("[SEG]", add_special_tokens=False).input_ids[0]
+        logger.info("Tokenizer loaded successfully.")
+        # load model
+        self.model: LegionForCls = LegionForCls.from_pretrained(
+            self.model_path,
+            low_cpu_mem_usage=True,
+            seg_token_idx=seg_token_idx,
+            torch_dtype=torch.bfloat16
+        )
+        # update model config
+        self.model.config.eos_token_id = self.tokenizer.eos_token_id
+        self.model.config.bos_token_id = self.tokenizer.bos_token_id
+        self.model.config.pad_token_id = self.tokenizer.pad_token_id
+        # init global image encoder (CLIP)
+        self.model.get_model().initialize_vision_modules(self.model.get_model().config)
+        vision_tower = self.model.get_model().get_vision_tower()
+        vision_tower.to(dtype=torch.bfloat16)
+        # transfer the model to GPU
+        self.model = self.model.bfloat16().cuda()
+        vision_tower.to(device="cuda")
+        self.model.eval()
+        logger.info("Model loaded successfully.")
+        # init image processor for global image encoder (CLIP)
+        self.image_processor = CLIPImageProcessor.from_pretrained(self.model.config.vision_tower)
+        self.transform = ResizeLongestSide(self.image_size)
+        logger.info("Image processor initialized successfully.")
+    @torch.inference_mode()
+    def _infer(self, raw_image: np.ndarray):
+        """Run inference on a single image.
+        Args:
+            raw_image (np.ndarray): The input image in numpy array format.
+        Returns:
+            tuple: A tuple containing the explanation string, predicted masks, phrases, and classification result.
+        """
+        # clean instructions
+        instructions = bleach.clean(LEGION.INSTRUCTION)
+        instructions = instructions.replace('&lt;', '<').replace('&gt;', '>')
+        # prepare prompt
+        conv = conversation_lib.conv_templates["llava_v1"].copy()
+        conv.messages = []
+        prompt = f"The {DEFAULT_IM_START_TOKEN}{DEFAULT_IMAGE_TOKEN}{DEFAULT_IM_END_TOKEN} provides an overview of the picture.\n" + instructions
+        conv.append_message(conv.roles[0], prompt)
+        conv.append_message(conv.roles[1], "")
+        prompt = conv.get_prompt()
+        # preprocess image (CLIP)
+        image_np = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB)
+        original_size_list = [image_np.shape[:2]]
+        image_clip = (self.image_processor.preprocess(image_np, return_tensors="pt")["pixel_values"][0].unsqueeze(0).cuda())
+        image_clip = image_clip.bfloat16()
+        # preprocess image (Grounding image encoder)
+        image = self.transform.apply_image(image_np)
+        resize_list = [image.shape[:2]]
+        image = (grounding_image_ecoder_preprocess(torch.from_numpy(image).permute(2, 0, 1).contiguous()).unsqueeze(0).cuda())
+        image = image.bfloat16()
+        # prepare inputs for inference
+        input_ids = tokenizer_image_token(prompt, self.tokenizer, return_tensors="pt")
+        input_ids = input_ids.unsqueeze(0).cuda()
+        # generate output
+        output_ids, pred_masks = self.model.evaluate(
+            image_clip,
+            image,
+            input_ids,
+            resize_list,
+            original_size_list,
+            max_tokens_new=512,
+            bboxes=None  # No box/region is input in GCG task
+        )
+        output_ids = output_ids[0][output_ids[0] != IMAGE_TOKEN_INDEX]
+        # post-processing
+        text_output = self.tokenizer.decode(output_ids, skip_special_tokens=False)
+        text_output = text_output.replace("\n", "").replace("  ", " ")
+        text_output = text_output.split("ASSISTANT: ")[-1]
+        cleaned_str = re.sub(r'<.*?>', '', text_output)
+        # remove [SEG] token and unnecessary spaces
+        cleaned_str = cleaned_str.replace('[SEG]', '')
+        # strip unnecessary spaces
+        cleaned_str = ' '.join(cleaned_str.split()).strip("'")
+        cleaned_str = cleaned_str.strip()
+        # infer detection head
+        logits = self.model(global_enc_images=image_clip, inference_cls=True)['logits'].cpu()
+        _, pred_cls = torch.max(logits, dim=1)
+        pred_cls = int(pred_cls)
+        return cleaned_str, pred_masks, pred_cls
+    @torch.inference_mode()
+    def infer(self, image_path: str):
+        """Run inference on a single image.
+        Args:
+            image_path (str): Path to the input image.
+        Returns:
+            dict: A dictionary containing the explanation, localization mask path, and detection result.
+        """
+        raw_image = cv2.imread(image_path)
+        explanation, localization, detection = self._infer(raw_image.astype(np.uint8))
+        # post-process localization mask
+        localization = localization[0].cpu()
+        binary_localization = localization > 0
+        binary_localization = torch.any(binary_localization, dim=0).int()
+        localization = (binary_localization.numpy() * 255).astype(np.uint8)
+        localization = Image.fromarray(localization, mode="L")
+        # post-process detection
+        detection = "real" if detection == 1 else "fake"
+        return {
+            "explanation": explanation,
+            "localization": localization,
+            "detection": detection
+        }
+def main(args):
+    # get images
+    suffixes = [".jpg", ".jpeg", ".png"]
+    image_paths = sorted(os.listdir(args.image_root))
+    image_paths = [p for p in image_paths if os.path.splitext(p)[-1].lower() in suffixes]
+    logger.info(f"Found {len(image_paths)} images for inference.")
+    # init legion
+    legion = LEGION(args.model_path, args.image_size, args.model_max_length)
+    # check save root
+    os.makedirs(args.save_root, exist_ok=True)
+    localization_save_dir = os.path.join(args.save_root, "localization")
+    os.makedirs(localization_save_dir, exist_ok=True)
+    explanation_save_path = os.path.join(args.save_root, "explanations.jsonl")
+    # prepare resume
+    num_processed_images = 0
+    if os.path.exists(explanation_save_path):
+        num_processed_images = len(list(jsonlines.open(explanation_save_path)))
+        logger.info(f"Resuming from {num_processed_images} processed images.")
+    image_paths = image_paths[num_processed_images:]
+    # run inference
+    with jsonlines.open(explanation_save_path, mode="a", flush=True) as writer:
+        for image_path in tqdm(image_paths):
+            image_name = os.path.splitext(image_path)[0]
+            full_image_path = os.path.join(args.image_root, image_path)
+            result = legion.infer(full_image_path)
+            # save localization
+            this_localization_save_path = os.path.join(localization_save_dir, f"{image_name}_mask.png")
+            result["localization"].save(this_localization_save_path)
+            result["localization"] = this_localization_save_path
+            # add original image path
+            result["image_path"] = full_image_path
+            # write to jsonl
+            writer.write(result)
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)

pytorch_model-00001-of-00002.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52bedf3c0f9c51c46511a732449dc08dfa36241639bd21e261fee7030a108be4
+size 9976695294

pytorch_model-00002-of-00002.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8aa2a634e2e0667569d2f607b9038eda50fb970c2388932c4a9975941094220c
+size 6070091263

pytorch_model.bin.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": false,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "legacy": false,
+  "model_max_length": 1536,
+  "pad_token": null,
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}