Safe-o-Bot / README.md
PatoFlamejanteTV's picture
Update README.md
72e922e verified

A newer version of the Gradio SDK is available: 6.0.0

Upgrade
metadata
title: Safe O Bot
emoji: πŸ’‚
colorFrom: red
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
short_description: Complete Moderation tool, blocking harmful links, spam, etc.

Text Safety Analyzer β€” Multi-model pipeline

A Hugging Face Space / project template that analyzes input text for multiple safety signals:

  • Harm/toxicity detection (who is harmed: author, reader, or target β€” via multi-model ensemble)
  • AI jailbreak / filter-bypass pattern detection (heuristics + optional model)
  • Filter-obfuscation detection (homoglyphs, separators, zero-width)
  • Hidden/obfuscated URL detection (heuristics + malicious-URL model)
  • ASCII-art / low-entropy payload detection

This project intentionally focuses on detection and explanation. It does NOT provide ways to bypass safety protections.


Files

  • classifier.py β€” Core pipeline: normalization, heuristics, multi-model inference, aggregation and explanations.
  • app.py β€” Gradio demo ready for Hugging Face Spaces.
  • requirements.txt β€” Python dependencies.
  • examples/ β€” (not included by default) place labeled examples for tuning thresholds & unit tests.

Architecture & design

  1. Normalization step β€” homoglyph mapping, zero-width removal, whitespace collapse.
  2. Heuristic detectors β€” regex-based detection for obfuscated URLs, ASCII art, jailbreak patterns, and low entropy checks.
  3. Model ensemble β€” several models can be loaded for specific tasks:
    • Harm / toxicity models (English and multilingual)
    • URL malicious classifier
  4. Aggregation & explanation β€” combine model outputs and heuristic flags and present explainable reasons with model names and scores.

The app is intentionally modular: add additional models by editing HARM_MODELS or URL_MODEL in classifier.py and reloading.


How to run locally

  1. Create a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt