Hi, I’m Boyu, a Ph.D. student at The Ohio State University. I’m fortunate to be co-advised by Prof. Yu Su and Prof. Huan Sun at the OSU NLP Group.
My research interests lie in Language Agents, with a specific focus on autonomous GUI agents that can use computers or browse the web.
We built the first multimodal web agent with all code open-sourced (SeeAct, Multimodal-Mind2Web). After that, we proposed building purely vision-based GUI agents that can operate as humans do and demonstrated their power despite their minimal design (UGround). We also built benchmarks to better assess cutting-edge web agents in short- to medium-horizon web agent tasks (Online-Mind2Web) and long-horizon agentic web search tasks (Mind2Web 2), introducing novel LLM-as-a-Judge and rubric-based Agent-as-a-Judge methods.
If you are interested in a research internship/collaboration, or just a short chat, feel free to email me or reach out on LinkedIn/Twitter to discuss ideas and potential collaborations.
Teaching Assistant at ShanghaiTech University: