Clickbait Detector

This model is a machine learning classifier trained to detect clickbait headlines. It uses a Random Forest algorithm with TF-IDF vectorization to classify news headlines as either "clickbait" or "real".

Model Details

Model Description

Model type: Random Forest Classifier
Task: Text Classification (Clickbait Detection)
Input: News headlines (text strings)
Output: Binary classification ("clickbait" or "real")
Language(s) covered: English
License: MIT

Model Sources

Repository: Devishetty100/clickbait-detector
Paper or resources: N/A
Demo: N/A

Uses

Direct Use

This model can be used to classify news headlines and identify potentially misleading or sensationalized content. It can be integrated into content moderation systems, news aggregators, or educational tools to help users discern between genuine news and clickbait.

Downstream Use

Content filtering and moderation
Journalism education
Social media analysis
Research on media manipulation

Out-of-Scope Use

This model should not be used for:

Automated content removal without human oversight
Making decisions that affect individuals' livelihoods or rights
Classifying content in languages other than English

Bias, Risks, and Limitations

Recommendations

Users should be aware that:

The model may have biases based on the training data
Performance may vary across different domains or writing styles
False positives/negatives can occur
The model is trained on English text only

Known Limitations

Trained on a specific dataset which may not represent all types of clickbait or real news
May not perform well on very short or very long headlines
Does not consider context beyond the headline text itself
Binary classification may not capture nuanced cases

Training Details

Training Data

The model was trained on the Clickbait Dataset from Kaggle, which contains news headlines labeled as clickbait or real.

Dataset size: 32,000 samples (16,000 clickbait, 16,000 real)
Data preprocessing: Text cleaning, TF-IDF vectorization with English stop words, max 5000 features
Train/test split: 80/20 stratified split (25,600 train, 6,400 test)

Training Procedure

Architecture: Random Forest with 200 estimators
Hyperparameters: Default parameters except n_estimators=200, random_state=42
Training time: [Not specified]
Hardware: [Not specified]
Software: scikit-learn, pandas, numpy

Evaluation

Metrics

The model achieves the following performance on the test set:

Accuracy: 91.45%
Precision: 0.92 (macro avg)
Recall: 0.91 (macro avg)
F1-Score: 0.91 (macro avg)

Testing Data, Factors & Metrics

Testing Data

Same dataset as training, held-out test set
Stratified sampling to maintain class balance

Factors

Headline length and complexity
Use of sensational language
Topic domain

Metrics

Accuracy, Precision, Recall, F1-Score
Confusion Matrix

Results

              precision    recall  f1-score   support

   clickbait       0.89      0.95      0.92      3200
        real       0.94      0.88      0.91      3200

    accuracy                           0.91      6400
   macro avg       0.92      0.91      0.91      6400
weighted avg       0.92      0.91      0.91      6400

Environmental Impact

Estimated Emissions: Not calculated

Hardware Type: Standard CPU training

Hours used: [Not specified]

Technical Specifications

Model Architecture and Objective

Architecture: Ensemble of decision trees (Random Forest)
Objective: Binary classification using TF-IDF features
Input preprocessing: TF-IDF vectorization
Output postprocessing: Class prediction

Compute Infrastructure

Hardware: CPU-based training
Software: Python, scikit-learn

How to Use

Loading the Model

from huggingface_hub import hf_hub_download
import joblib

# Download model and vectorizer
model_path = hf_hub_download(repo_id="Devishetty100/clickbait-detector", filename="clickbait_detector.pkl")
vectorizer_path = hf_hub_download(repo_id="Devishetty100/clickbait-detector", filename="tfidf_vectorizer.pkl")

# Load
model = joblib.load(model_path)
vectorizer = joblib.load(vectorizer_path)

Making Predictions

# Example headline
headline = "You won't believe what happened next!"

# Transform and predict
features = vectorizer.transform([headline])
prediction = model.predict(features)[0]

print(f"Prediction: {prediction}")  # Output: 'clickbait' or 'real'

Requirements

Python 3.6+
scikit-learn
joblib
huggingface_hub

Citation

If you use this model, please cite:

@misc{clickbait-detector,
  title={Clickbait Detector},
  author={Devishetty100},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Devishetty100/clickbait-detector}
}

Contact

For questions or issues, please open an issue on the repository.

Downloads last month: -

Devishetty100
/

clickbait-detector

Clickbait Detector

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Known Limitations

Training Details

Training Data

Training Procedure

Evaluation

Metrics

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Environmental Impact

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

How to Use

Loading the Model

Making Predictions

Requirements

Citation

Contact

Space using Devishetty100/clickbait-detector 1