aslan-ng
/

lora-green-patents

Model card Files Files and versions

lora-green-patents / README.md

aslan-ng's picture

Update README.md

13ea971 verified about 2 months ago

|

history blame contribute delete

2.24 kB

	---
	license: mit
	language:
	- en
	base_model:
	- distilbert/distilbert-base-uncased
	datasets:
	- cwinkler/patents_green_plastics
	---
	# Model Card for `aslan-ng/lora-green-patents`

	This model classifies patents and product descriptions as green (eco-friendly) or not green.
	It was finetuned with LoRA on a binary text dataset of patent descriptions.

	## Model Details

	### Model Description
	- Developed by: Jennifer Evans, Aslan Noorghasemi
	- Model type: Text classifier (binary classification)
	- Languages (NLP): English
	- Finetuned from model: LoRA with the "distilbert-base-uncased" model and DistilBertForSequenceClassification
	-
	### Model Sources
	- Training Dataset: https://huggingface.co/datasets/cwinkler/patents_green_plastics
	- Model Details
	- Train/test split: 80/20
	- LoRA alpha: 16
	- LoRA dropout: 0.1
	- Eval steps: 500
	- Learning rate: 2e-4
	- Training epochs: 10

	## Uses

	### Direct Use

	Use this model to classify whether input text is considered green (eco-friendly) or not. It takes patent or product descriptions as text inputs and returns a predicted binary label and probabilities.

	### Downstream Use

	It can be incorporated into larger text evaluation systems (ie. patent and product analysis tasks) as a pre-screening classifier.

	### Out-of-Scope Use

	Not intended for:
	- Safety-critical deployment without further validation.
	- Identifying other labels beyond green / not green.
	- Applications outside of evaluating patent and product descriptions.

	### Bias, Risks, and Limitations

	The model is trained on a specific dataset. It may:
	- Misclassify unusual or unclear eco-friendly descriptions.
	- Perform poorly on non-U.S. descriptions if not present in training.
	- Inherit any biases in the training text.

	### Recommendations

	Always test on your target data before deployment. Combine with additional checks in safety-critical scenarios.

	### How to Get Started with the Model
	The model can be used by loading both the base and adaptor
	- Model Name: "distilbert-base-uncased"
	- Tokenizer: AutoTokenizer.from_pretrained(REPO_ID_LORA_GREEN_PATENTS)
	- Base Model: AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
	- Model: PeftModel.from_pretrained(base_model, REPO_ID_LORA_GREEN_PATENTS)