Would prompt work for graph learning? An exploration of few-shot learning on graphs
Graph structures are prevalent across a variety of fields, including social networks, e-commerce, transportation, and biological syst...
Towards Knowledgeable Foundation Models
From University of Illinois Urbana-Champaign, Apr 10, 2025AppAgent X—Making GUI Agents Smarter with Use
From Westlake University, Apr 02, 2025All Talks
Would prompt work for graph learning? An exploration of few-shot learning on graphs
Graph structures are prevalent across a variety of fields, including social networks, e-commerce, transportation, and biological systems. Within these graphs, numerous analytical and ...
From Singapore Management University (SMU), Apr 17, 2025Towards Knowledgeable Foundation Models
Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance on knowledge reasoning tasks, owing to their implicit knowledge derived from ex...
From University of Illinois Urbana-Champaign, Apr 10, 2025AppAgent X—Making GUI Agents Smarter with Use
In recent years, the development of multimodal large language models has given rise to a new class of intelligent agents—GUI Agents—that can autonomously operate computers and smartph...
From Westlake University, Apr 02, 2025Using Large Language Models for Cross-Language Information Access
One interesting aspect of today’s generative Large Language Models (LLMs) is that they are natural polyglots, facile in many languages. These new multi-dexterous capabilities offer ...
From University of Maryland, Mar 28, 2025Leveraging semantics for recommendation at scale
In this talk, we present some of our recent work conducted at Amazon International Machine Learning Australia. First, we present a simple approach to address cold-start recommendation...
From Amazon, Mar 26, 2025Towards Multimodal Intelligence: Bridging Vision, Language, and Large-Scale Models
Multimodal intelligence is revolutionizing document understanding by enabling AI to process and reason across vision and language. This talk explores how large-scale models integrate ...
From Adobe, Mar 07, 2025From Next Token Prediction to Compliant AI Assistants: A Systematic Path toward Trustworthy Large Language Models
Language models are systems that can predict upcoming words” - this classical definition of NLP models forms the basis of LLMs becoming responsive text completion models. However, suc...
From UC Merced, Feb 28, 2025No.25-01 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Exciting models have been developed in multimodal video understanding and generation, such as video LLM and video diffusion model. One emerging pathway to the ultimate intelligence is...
From NUS, Feb 27, 2025No.24-20 Controllable Visual Synthesis via Structural Representation
End-to-end neural approaches have revolutionized visual generation, producing stunning outputs from natural language prompts. However, precise controls remain challenging through dire...
From Stanford, Dec 13, 2024No.24-19 When LLMs Meet Recommendations: Scalable Hybrid Approaches to Enhance User Experiences
While LLMs offer powerful reasoning and generalization capabilities for user understanding and long-term planning in recommendation systems, their latency and cost hinder direct appli...
From Deepmind, Dec 09, 2024