File size: 2,028 Bytes
1173745
 
0aaa389
 
 
1173745
 
 
0aaa389
1173745
 
 
72e922e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
title: Safe O Bot
emoji: πŸ’‚
colorFrom: red
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
short_description: Complete Moderation tool, blocking harmful links, spam, etc.
---

# Text Safety Analyzer β€” Multi-model pipeline

A Hugging Face Space / project template that analyzes input text for multiple safety signals:
- Harm/toxicity detection (who is harmed: author, reader, or target β€” via multi-model ensemble)
- AI jailbreak / filter-bypass pattern detection (heuristics + optional model)
- Filter-obfuscation detection (homoglyphs, separators, zero-width)
- Hidden/obfuscated URL detection (heuristics + malicious-URL model)
- ASCII-art / low-entropy payload detection

This project intentionally focuses on **detection** and explanation. It does NOT provide ways to bypass safety protections.

---

## Files
- `classifier.py` β€” Core pipeline: normalization, heuristics, multi-model inference, aggregation and explanations.
- `app.py` β€” Gradio demo ready for Hugging Face Spaces.
- `requirements.txt` β€” Python dependencies.
- `examples/` β€” (not included by default) place labeled examples for tuning thresholds & unit tests.

---

## Architecture & design
1. **Normalization step** β€” homoglyph mapping, zero-width removal, whitespace collapse.
2. **Heuristic detectors** β€” regex-based detection for obfuscated URLs, ASCII art, jailbreak patterns, and low entropy checks.
3. **Model ensemble** β€” several models can be loaded for specific tasks:
   - Harm / toxicity models (English and multilingual)
   - URL malicious classifier
4. **Aggregation & explanation** β€” combine model outputs and heuristic flags and present explainable reasons with model names and scores.

The app is intentionally modular: add additional models by editing `HARM_MODELS` or `URL_MODEL` in `classifier.py` and reloading.

---

## How to run locally
1. Create a virtual environment and install dependencies:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt