In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—that can autonomously operate computers and smartphones through natural language, enabling automated office tasks and cross-application task execution. However, existing methods rely on step-by-step reasoning, resulting in high computational costs and low execution efficiency, especially for repetitive tasks.
To overcome this bottleneck, we introduce AppAgent X, an evolutionary GUI agent framework. Unlike traditional agents that repeatedly reason through each step, AppAgent X learns from its own operational experience, continuously generalizing and optimizing efficient behavior patterns. This enables the agent to become more efficient and intelligent with use.
This presentation will cover the core mechanisms of AppAgent X, experimental results, and its potential applications in the field of intelligent agents.
Speaker Bio
Dr. Chi Zhang received his Ph.D. from the School of Computer Science and Engineering at Nanyang Technological University, Singapore. In 2024, he joined the School of Engineering at Westlake University as an Assistant Professor (PI, Ph.D. Supervisor) and founded the Artificial General Intelligence (AGI) Lab. Prior to this, he worked as a Research Scientist at Tencent from 2022 to 2024.
Dr. Zhang’s research focuses on multimodal models and generative artificial intelligence (GenAI). To date, he has published over 30 papers in top-tier AI conferences and journals, including CVPR, ICCV, NeurIPS, and TPAMI. He was named among the World’s Top 2% Scientists by Stanford University in both 2023 and 2024.
More Details
- When: Wed. 2 April 2025, at 1-2pm (Brisbane time)
- Speaker: Prof Chi Zhang (Westlake University)
- Host: Dr Yujun Cai
- Zoom: https://uqz.zoom.us/j/88065580162