AppAgent X—Making GUI Agents Smarter with Use
In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—t...
Leveraging semantics for recommendation at scale
From Amazon, Mar 26, 2025All Talks
AppAgent X—Making GUI Agents Smarter with Use
In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—that can autonomously operate computers and smartph...
From Westlake University, Apr 02, 2025Using Large Language Models for Cross-Language Information Access
One interesting aspect of today’s generative Large Language Models (LLMs) is that they are natural polyglots, facile in many languages. These new multi-dexterous capabilities offer ...
From University of Maryland, Mar 28, 2025Leveraging semantics for recommendation at scale
In this talk, we present some of our recent work conducted at Amazon International Machine Learning Australia. First, we present a simple approach to address cold-start recommendation...
From Amazon, Mar 26, 2025Towards Multimodal Intelligence: Bridging Vision, Language, and Large-Scale Models
Multimodal intelligence is revolutionizing document understanding by enabling AI to process and reason across vision and language. This talk explores how large-scale models integrate ...
From Adobe, Mar 07, 2025From Next Token Prediction to Compliant AI Assistants: A Systematic Path toward Trustworthy Large Language Models
Language models are systems that can predict upcoming words” - this classical definition of NLP models forms the basis of LLMs becoming responsive text completion models. However, suc...
From UC Merced, Feb 28, 2025No.25-01 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Exciting models have been developed in multimodal video understanding and generation, such as video LLM and video diffusion model. One emerging pathway to the ultimate intelligence is...
From NUS, Feb 27, 2025No.24-20 Controllable Visual Synthesis via Structural Representation
End-to-end neural approaches have revolutionized visual generation, producing stunning outputs from natural language prompts. However, precise controls remain challenging through dire...
From Stanford, Dec 13, 2024No.24-19 When LLMs Meet Recommendations: Scalable Hybrid Approaches to Enhance User Experiences
While LLMs offer powerful reasoning and generalization capabilities for user understanding and long-term planning in recommendation systems, their latency and cost hinder direct appli...
From Deepmind, Dec 09, 2024No.24-18 Developing Effective Long-Context Language Models
In this talk, I will share our journey behind developing an effective long-context language model. I’ll begin by introducing our initial approach of using parallel context encoding (C...
From Princeton, Dec 04, 2024No.24-17 Mitigating Distribution Shifts in Using Pre-trained Vision-Language Models
Benefiting from large-scale image-text pair datasets, powerful pre-trained vision-language models (VLMs, such as CLIP) enable many real-world applications, e.g., zero-shot classificat...
From UniMelb, Dec 02, 2024