microsoft/SWE-Sharp-Bench
Viewer
•
Updated
•
150
•
187
None defined yet.
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs