File size: 7,878 Bytes

96ea080

---
license: mit
---

# LEGION-8B-replicate

## Overview

Since the project [LEGION: Learning to Ground and Explain for Synthetic Image Detection](https://arxiv.org/abs/2503.15264) open-sourced its code repository but did not provide pre-trained weights, we replicated the model by referring to the open-source code and the paper, and are now releasing our replicated weights.

> [!NOTE]
> Due to potential discrepancies in the replication process, the released weights may achieve lower scores than officially reported results on certain benchmarks.

### Training Details

We conducted training on 4x A100 40G GPUs.

For the first training stage, the official configuration uses 8 GPUs with a global batch size of 16 (batch size per device = 2). To maintain the same global batch size, we used 4 GPUs with a per-device batch size of 4.

For the second training stage, the official configuration uses 8 GPUs with a global batch size of 512 (batch size per device = 64). We used 4 GPUs with a per-device batch size of 8 and a gradient accumulation step of 16. This results in an effective per-device batch size of 128, maintaining an equivalent global batch size of 512.

### Inference Usage

A simple inference script is provided at [infer.py](./infer.py).

Usage instructions are as follows:

```bash
cp infer.py /path/to/LEGION
python infer.py --model_path /path/to/LEGION-8B-replicate --image_root /path/to/images --save_root /path/to/results
```

### Examples

<table>
  <tr>
    <td><img src="./examples/image.png" alt="Original Image" style="max-width:100%;"></td>
    <td><img src="./examples/image_mask.png" alt="Mask generated by LEGION-8B-replicate" style="max-width:100%;"></td>
  </tr>
</table>

Upon examining the image. I have found: A cat sits on a rooftop at sunset, with its right front paw missing and the left front paw appearing deformed. To elaborate, I have found the following artifacts. Cat's right front paw :The cat's right front paw is missing. Cat's left front paw :The cat's left front paw is deformed.

## Performance

> [!NOTE]
> Due to the evaluation and metric-related code not being open-sourced, the test results may be inaccurate.
> The IoU evaluation metric for masks may be affected by mask processing during inference, resulting in lower scores.

### Localization

<table>
  <tr>
    <th rowspan="2">Method</th>
    <th colspan="2">SynthScars</th>
    <th colspan="2">LOKI</th>
    <th colspan="2">RichHF-18K</th>
  </tr>
  <tr>
    <th>mIoU</th>
    <th>F1</th>
    <th>mIoU</th>
    <th>F1</th>
    <th>mIoU</th>
    <th>F1</th>
  </tr>
  <tr>
    <td>HiFi-Net</td>
    <td>45.65</td>
    <td>0.57</td>
    <td>39.60</td>
    <td>2.41</td>
    <td>44.96</td>
    <td>0.39</td>
  </tr>
  <tr>
    <td>TruFor</td>
    <td>48.60</td>
    <td>15.29</td>
    <td>46.55</td>
    <td>16.70</td>
    <td>48.41</td>
    <td>18.03</td>
  </tr>
  <tr>
    <td>PAL4VST</td>
    <td>56.10</td>
    <td>29.21</td>
    <td>47.34</td>
    <td>11.58</td>
    <td>49.88</td>
    <td>14.78</td>
  </tr>
  <tr>
    <td>Ferret</td>
    <td>27.09</td>
    <td>15.24</td>
    <td>24.50</td>
    <td>18.88</td>
    <td>26.52</td>
    <td>16.22</td>
  </tr>
  <tr>
    <td>Griffon</td>
    <td>27.68</td>
    <td>16.67</td>
    <td>21.96</td>
    <td>20.41</td>
    <td>28.13</td>
    <td>18.19</td>
  </tr>
  <tr>
    <td>LISA-v1-7B</td>
    <td>34.51</td>
    <td>18.77</td>
    <td>31.10</td>
    <td>9.29</td>
    <td>35.90</td>
    <td>21.94</td>
  </tr>
  <tr>
    <td>InternVL2-8B</td>
    <td>41.25</td>
    <td>6.39</td>
    <td>42.03</td>
    <td>10.06</td>
    <td>39.90</td>
    <td>9.58</td>
  </tr>
  <tr>
    <td>Qwen2-VL-72B</td>
    <td>30.20</td>
    <td>17.50</td>
    <td>26.62</td>
    <td>20.99</td>
    <td>27.58</td>
    <td>19.02</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Official)</td>
    <td>58.13</td>
    <td>34.54</td>
    <td>48.66</td>
    <td>16.71</td>
    <td>50.07</td>
    <td>17.41</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Replicate)</td>
    <td>23.92</td>
    <td>33.47</td>
    <td>-</td>
    <td>-</td>
    <td>-</td>
    <td>-</td>
  </tr>
</table>

### Explanation

<table>
  <tr>
    <th rowspan="2">Method</th>
    <th rowspan="2">Params</th>
    <th colspan="2">SynthScars</th>
    <th colspan="2">LOKI</th>
  </tr>
  <tr>
    <th>ROUGE-L ↑</th>
    <th>CSS ↑</th>
    <th>ROUGE-L ↑</th>
    <th>CSS ↑</th>
  </tr>
  <tr>
    <td>Qwen2-VL</td>
    <td>72B</td>
    <td>25.84</td>
    <td>58.15</td>
    <td>11.80</td>
    <td>37.64</td>
  </tr>
  <tr>
    <td>LLaVA-v1.6</td>
    <td>7B</td>
    <td>29.61</td>
    <td>61.75</td>
    <td>16.07</td>
    <td>41.07</td>
  </tr>
  <tr>
    <td>InternVL2</td>
    <td>8B</td>
    <td>25.93</td>
    <td>56.89</td>
    <td>10.10</td>
    <td>39.62</td>
  </tr>
  <tr>
    <td>Deepseek-VL2</td>
    <td>27B</td>
    <td>25.50</td>
    <td>47.77</td>
    <td>6.70</td>
    <td>28.76</td>
  </tr>
  <tr>
    <td>GPT-4o</td>
    <td>-</td>
    <td>22.43</td>
    <td>53.55</td>
    <td>9.61</td>
    <td>38.98</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Official)</td>
    <td>8B</td>
    <td>39.50</td>
    <td>72.60</td>
    <td>18.55</td>
    <td>45.96</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Replicate)</td>
    <td>8B</td>
    <td>50.57</td>
    <td>-</td>
    <td>-</td>
    <td>-</td>
  </tr>
</table>

### Detection

<table>
  <tr>
    <th rowspan="2">Method</th>
    <th rowspan="2">GANs</th>
    <th rowspan="2">Deepfakes</th>
    <th colspan="2">Perceptual Loss</th>
    <th colspan="2">Low Level Vision</th>
    <th rowspan="2">Diffusion</th>
  </tr>
  <tr>
    <th>CRN</th>
    <th>IMLE</th>
    <th>SITD</th>
    <th>SAN</th>
  </tr>
  <tr>
    <td>Co-occurence</td>
    <td>75.17</td>
    <td>59.14</td>
    <td>73.06</td>
    <td>87.21</td>
    <td>68.98</td>
    <td>60.42</td>
    <td>85.53</td>
  </tr>
  <tr>
    <td>Freq-spec</td>
    <td>75.28</td>
    <td>45.18</td>
    <td>53.61</td>
    <td>50.98</td>
    <td>47.46</td>
    <td>57.12</td>
    <td>69.00</td>
  </tr>
  <tr>
    <td>CNNSpot</td>
    <td>85.29</td>
    <td>53.47</td>
    <td>86.31</td>
    <td>86.26</td>
    <td>66.67</td>
    <td>48.69</td>
    <td>58.63</td>
  </tr>
  <tr>
    <td>Patchfor</td>
    <td>69.97</td>
    <td>75.54</td>
    <td>72.33</td>
    <td>55.30</td>
    <td>75.14</td>
    <td>75.28</td>
    <td>72.54</td>
  </tr>
  <tr>
    <td>UniFD</td>
    <td>95.25</td>
    <td>66.60</td>
    <td>59.50</td>
    <td>72.00</td>
    <td>63.00</td>
    <td>57.50</td>
    <td>82.02</td>
  </tr>
  <tr>
    <td>LDGard</td>
    <td>89.17</td>
    <td>58.00</td>
    <td>50.74</td>
    <td>50.78</td>
    <td>62.50</td>
    <td>50.00</td>
    <td>89.79</td>
  </tr>
  <tr>
    <td>FreqNet</td>
    <td>94.23</td>
    <td>97.40</td>
    <td>71.92</td>
    <td>67.35</td>
    <td>88.92</td>
    <td>59.04</td>
    <td>83.34</td>
  </tr>
  <tr>
    <td>NPR</td>
    <td>94.16</td>
    <td>76.89</td>
    <td>50.00</td>
    <td>50.00</td>
    <td>66.94</td>
    <td>98.63</td>
    <td>94.54</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Official)</td>
    <td>97.01</td>
    <td>63.37</td>
    <td>90.78</td>
    <td>98.93</td>
    <td>79.44</td>
    <td>57.76</td>
    <td>83.10</td>
  </tr>
  <tr style="background-color: #e6ffe6;">
    <td>LEGION (Replicate)</td>
    <td>91.48</td>
    <td>79.16</td>
    <td>84.73</td>
    <td>96.71</td>
    <td>78.06</td>
    <td>53.70</td>
    <td>-</td>
  </tr>
</table>

## Acknowledgements

Thanks to [Gennadiyev](https://github.com/Gennadiyev) for providing computational resources and moral support, and for helping me complete the reproduction.

Thanks to [draw-your-dream/LEGION](https://github.com/draw-your-dream/LEGION/tree/main) for fixing bugs in the first-stage training.