|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- distilbert/distilbert-base-uncased |
|
|
datasets: |
|
|
- cwinkler/patents_green_plastics |
|
|
--- |
|
|
# Model Card for `aslan-ng/lora-green-patents` |
|
|
|
|
|
This model classifies patents and product descriptions as green (eco-friendly) or not green. |
|
|
It was finetuned with LoRA on a binary text dataset of patent descriptions. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- Developed by: Jennifer Evans, Aslan Noorghasemi |
|
|
- Model type: Text classifier (binary classification) |
|
|
- Languages (NLP): English |
|
|
- Finetuned from model: LoRA with the "distilbert-base-uncased" model and DistilBertForSequenceClassification |
|
|
- |
|
|
### Model Sources |
|
|
- Training Dataset: https://huggingface.co/datasets/cwinkler/patents_green_plastics |
|
|
- Model Details |
|
|
- Train/test split: 80/20 |
|
|
- LoRA alpha: 16 |
|
|
- LoRA dropout: 0.1 |
|
|
- Eval steps: 500 |
|
|
- Learning rate: 2e-4 |
|
|
- Training epochs: 10 |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
Use this model to classify whether input text is considered green (eco-friendly) or not. It takes patent or product descriptions as text inputs and returns a predicted binary label and probabilities. |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
It can be incorporated into larger text evaluation systems (ie. patent and product analysis tasks) as a pre-screening classifier. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
Not intended for: |
|
|
- Safety-critical deployment without further validation. |
|
|
- Identifying other labels beyond green / not green. |
|
|
- Applications outside of evaluating patent and product descriptions. |
|
|
|
|
|
### Bias, Risks, and Limitations |
|
|
|
|
|
The model is trained on a specific dataset. It may: |
|
|
- Misclassify unusual or unclear eco-friendly descriptions. |
|
|
- Perform poorly on non-U.S. descriptions if not present in training. |
|
|
- Inherit any biases in the training text. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Always test on your target data before deployment. Combine with additional checks in safety-critical scenarios. |
|
|
|
|
|
### How to Get Started with the Model |
|
|
The model can be used by loading both the base and adaptor |
|
|
- Model Name: "distilbert-base-uncased" |
|
|
- Tokenizer: AutoTokenizer.from_pretrained(REPO_ID_LORA_GREEN_PATENTS) |
|
|
- Base Model: AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2) |
|
|
- Model: PeftModel.from_pretrained(base_model, REPO_ID_LORA_GREEN_PATENTS) |