Mingyu Ouyang

Ph.D. Student in Computer Science, National University of Singapore

Portrait of Mingyu Ouyang

I am a second-year Ph.D. student in Computer Science at the National University of Singapore, supervised by Prof. Hwee Tou Ng and Prof. Mike Zheng Shou. I completed my Master's in Data Science and Machine Learning at NUS, and obtained my Bachelor's degree from Wuhan University.

I am with ShowLab, NUS and NUSNLP on multimodal and GUI agent research. During my undergraduate years I worked with IIP Lab under the supervision of Prof. Zhenzhong Chen. I previously interned at SenseTime Research Singapore and Tencent.

I am interested in the interaction and the productivity brought by multimodal digital agents. I built computer-use/GUI agents, with recent work focusing on the UI perception and grounding. Currently I work on game agents, as games offer an ideal environment with rich visual observations, long-horizon goals, and real-time interaction. The improvement on fine-grained perception, planning, and precise control will benefit the future embodiment of digitial agents.

[Opportunity Note] I am currently looking for research internship working on multimodal agents (including computer-use, game, etc) starting summer 2026. Feel free to contact me!

Email: yyyangwhu@gmail.com


Updates
  • [2026.04] We released GameWorld, a standardized and verifiable benchmark for multimodal game agents in browser environments. Check: [Project Page] [Technical Report].
  • [2026.03] Our GUI perception + grounding work FocusUI was accepted by CVPR 2026. See you in Denver!
  • [2025.09] Personalized slide generation work SlideTailor was accepted by AAAI 2026.
  • [2025.01] Started Ph.D. in School of Computing, NUS.
  • [2024.06] JPEG coefficient recovery work DCTransformer was published on IEEE TIP.
  • [2024.03] AssistGUI was accepted by CVPR 2024.
  • [2023.09] I joined ShowLab, NUS and started my internship at SenseTime Research Singapore.
  • [2023.08] Started my Master study in Data Science and Machine Learning at Faculty of Science, NUS.

Selected Publications (*equal contribution)
TR '26 GameWorld teaser

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Mingyu Ouyang*, Siyuan Hu*, Kevin Qinghong Lin, Hwee Tou Ng, Mike Zheng Shou
Technical Report 2026

Standardizes multimodal game-agent evaluation across 34 browser games and 170 tasks with paused sandbox execution, dual agent interfaces, and deterministic state-verifiable scoring.

CVPR '26 FocusUI teaser

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng
CVPR 2026

Selects instruction-relevant UI tokens while preserving positional continuity, improving grounding accuracy and efficiency under aggressive token reduction.

AAAI '26 SlideTailor teaser

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
Wenzheng Zeng*, Mingyu Ouyang*, Langyuan Cui*, Hwee Tou Ng
AAAI 2026

Generates user-aligned editable slides from papers using example-based preference conditioning, visual templates, and a chain-of-speech planning mechanism.

arXiv '24 The Dawn of GUI Agent teaser

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use (Proj. Computer-use OOTB)
Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou
Technical Report 2024

Github 1.9k stars. An out-of-the-box framework for desktop GUI agents, supporting both Windows and macOS.

CVPR '24 AssistGUI teaser

AssistGUI: Task-Oriented PC Graphical User Interface Automation
D. Gao, L. Ji, Z. Bai, M. Ouyang, P. Li, D. Mao, Q. Wu, W. Zhang, P. Wang, X. Guo, H. Wang, L. Zhou, Mike Z. Shou
CVPR 2024

A Windows GUI agent framework for task decomposition, GUI parsing, action generation, and reflection.

TIP '24 DCTransformer teaser

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer
Mingyu Ouyang, Zhenzhong Chen
IEEE Transactions on Image Processing 2024

Introduces DCTransformer to recover JPEG-quantized DCT coefficients with joint spatial-frequency modeling across varied compression quality factors.


Education
National University of Singapore
Ph.D. in Computer Science
National University of Singapore
M.Sc. in Data Science and Machine Learning
Wuhan University
B.Sc. in Geospatial Informatics and Digitalized Technology. First-class Undergraduate Scholarship.

Experience
UseIt AI
Co-founded a startup focusing on creating, editing, and deploying personal agents. Led GUI agent development and agent workflow orchestration.
SenseTime Research Singapore
Developed efficient low-level deep learning models for RAW image processing, including noise synthesis and learned ISP modules for camera and phone systems.
Tencent Technology
Implemented GPS data processing on Hadoop and Spark, and maintained an internal service handling over 10 billion callbacks per hour.

Miscellaneous
  • I enjoy football games (Visca Barça! 💙❤️). I played for and served as the captain of the Wuhan University football team. We won the champion of the 16th Hubei Province Games (2022). "Salid y disfrutad."