Uppaal commited on
Commit
6f02751
·
verified ·
1 Parent(s): 8e6c3db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -36,12 +36,13 @@ base_model:
36
 
37
  # ProFS Editing for Safety
38
 
 
 
39
 
40
- This model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
 
41
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
42
 
43
- ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors—such as toxicity—by identifying and projecting out harmful subspaces in model weights.
44
-
45
  **Key Features:**
46
 
47
  - Training-free & plug-and-play: edits weights directly, no gradient steps or architectural changes needed.
 
36
 
37
  # ProFS Editing for Safety
38
 
39
+ This model is an edited version of [`EleutherAI/gpt-j-6b`](https://huggingface.co/EleutherAI/gpt-j-6b).
40
+ Editing is applied through ProFS, to reduce toxicity.
41
 
42
+ ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors by identifying and projecting out harmful subspaces in model weights.
43
+ The model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
44
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
45
 
 
 
46
  **Key Features:**
47
 
48
  - Training-free & plug-and-play: edits weights directly, no gradient steps or architectural changes needed.