LFM2-VL GUI Assistant

This model is a fine-tuned version of LiquidAI/LFM2-VL-450M on the maharshpatelx/realGUI-800K dataset.

Model Description

A vision-language model specialized in GUI understanding and automation tasks. The model can analyze screenshots and provide guidance on GUI interactions, element identification, and navigation tasks.

Training Details

  • Base Model: LiquidAI/LFM2-VL-450M
  • Dataset: maharshpatelx/realGUI-800K
  • Training Method: Supervised Fine-Tuning (SFT) with LoRA
  • LoRA Config: r=8, alpha=16, dropout=0.05

Usage

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

# Load model and processor
processor = AutoProcessor.from_pretrained("maharshpatelx/lfm2-vl-gui-sft", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    "maharshpatelx/lfm2-vl-gui-sft", 
    device_map="auto", 
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Prepare input
image = Image.open("screenshot.png").convert('RGB')
conversation = [
    {"role": "system", "content": [
        {"type": "text", "text": "You are a GUI automation assistant specialized in understanding user interfaces and providing guidance on GUI interactions. Analyze the screenshot and provide accurate responses about GUI elements, actions, or navigation tasks."}
    ]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "what is bbox location of 'google chrome'?"}
    ]}
]

# Generate response
inputs = processor.apply_chat_template(
    conversation, 
    add_generation_prompt=True, 
    return_tensors="pt",
    tokenize=True,
    return_dict=True
)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Intended Use

This model is designed for:

  • GUI automation and testing
  • User interface analysis
  • Accessibility assistance
  • Educational purposes in HCI research

Limitations

  • Performance may vary on UI designs significantly different from training data
  • May not generalize well to non-English interfaces
  • Should not be used for malicious automation or unauthorized access
Downloads last month
2
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for maharshpatelx/lfm2-vl-gui-sft

Finetuned
(14)
this model