AppAgent X—Making GUI Agents Smarter with Use

In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—t...

From Westlake University, Apr 02, 2025

All Talks

AppAgent X—Making GUI Agents Smarter with Use

In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—that can autonomously operate computers and smartph...

From Westlake University, Apr 02, 2025

Using Large Language Models for Cross-Language Information Access

One interesting aspect of today’s generative Large Language Models (LLMs) is that they are natural polyglots, facile in many languages.   These new multi-dexterous capabilities offer ...

From University of Maryland, Mar 28, 2025

Leveraging semantics for recommendation at scale

In this talk, we present some of our recent work conducted at Amazon International Machine Learning Australia. First, we present a simple approach to address cold-start recommendation...

From Amazon, Mar 26, 2025

Towards Multimodal Intelligence: Bridging Vision, Language, and Large-Scale Models

Multimodal intelligence is revolutionizing document understanding by enabling AI to process and reason across vision and language. This talk explores how large-scale models integrate ...

From Adobe, Mar 07, 2025

From Next Token Prediction to Compliant AI Assistants: A Systematic Path toward Trustworthy Large Language Models

Language models are systems that can predict upcoming words” - this classical definition of NLP models forms the basis of LLMs becoming responsive text completion models. However, suc...

From UC Merced, Feb 28, 2025

No.25-01 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Exciting models have been developed in multimodal video understanding and generation, such as video LLM and video diffusion model. One emerging pathway to the ultimate intelligence is...

From NUS, Feb 27, 2025

No.24-20 Controllable Visual Synthesis via Structural Representation

End-to-end neural approaches have revolutionized visual generation, producing stunning outputs from natural language prompts. However, precise controls remain challenging through dire...

From Stanford, Dec 13, 2024

No.24-19 When LLMs Meet Recommendations: Scalable Hybrid Approaches to Enhance User Experiences

While LLMs offer powerful reasoning and generalization capabilities for user understanding and long-term planning in recommendation systems, their latency and cost hinder direct appli...

From Deepmind, Dec 09, 2024

No.24-18 Developing Effective Long-Context Language Models

In this talk, I will share our journey behind developing an effective long-context language model. I’ll begin by introducing our initial approach of using parallel context encoding (C...

From Princeton, Dec 04, 2024

No.24-17 Mitigating Distribution Shifts in Using Pre-trained Vision-Language Models

Benefiting from large-scale image-text pair datasets, powerful pre-trained vision-language models (VLMs, such as CLIP) enable many real-world applications, e.g., zero-shot classificat...

From UniMelb, Dec 02, 2024

Upcoming