Spaces:

PatoFlamejanteTV
/

Safe-o-Bot

Running

App Files Files Community

Safe-o-Bot / README.md

PatoFlamejanteTV

Update README.md

72e922e verified 17 days ago

preview code

raw

history blame contribute delete

2.03 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

metadata

title: Safe O Bot
emoji: 💂
colorFrom: red
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
short_description: Complete Moderation tool, blocking harmful links, spam, etc.

Text Safety Analyzer — Multi-model pipeline

A Hugging Face Space / project template that analyzes input text for multiple safety signals:

Harm/toxicity detection (who is harmed: author, reader, or target — via multi-model ensemble)
AI jailbreak / filter-bypass pattern detection (heuristics + optional model)
Filter-obfuscation detection (homoglyphs, separators, zero-width)
Hidden/obfuscated URL detection (heuristics + malicious-URL model)
ASCII-art / low-entropy payload detection

This project intentionally focuses on detection and explanation. It does NOT provide ways to bypass safety protections.

Files

classifier.py — Core pipeline: normalization, heuristics, multi-model inference, aggregation and explanations.
app.py — Gradio demo ready for Hugging Face Spaces.
requirements.txt — Python dependencies.
examples/ — (not included by default) place labeled examples for tuning thresholds & unit tests.

Architecture & design

Normalization step — homoglyph mapping, zero-width removal, whitespace collapse.
Heuristic detectors — regex-based detection for obfuscated URLs, ASCII art, jailbreak patterns, and low entropy checks.
Model ensemble — several models can be loaded for specific tasks:
- Harm / toxicity models (English and multilingual)
- URL malicious classifier
Aggregation & explanation — combine model outputs and heuristic flags and present explainable reasons with model names and scores.

The app is intentionally modular: add additional models by editing HARM_MODELS or URL_MODEL in classifier.py and reloading.

How to run locally

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt