All Talks

Towards Multimodal Intelligence: Bridging Vision, Language, and Large-Scale Models

Multimodal intelligence is revolutionizing document understanding by enabling AI to process and reason across vision and language. This talk explores how large-scale models integrate ...

From Adobe, Mar 07, 2025

From Next Token Prediction to Compliant AI Assistants: A Systematic Path toward Trustworthy Large Language Models

Language models are systems that can predict upcoming words” - this classical definition of NLP models forms the basis of LLMs becoming responsive text completion models. However, suc...

From UC Merced, Feb 28, 2025

No.25-01 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Exciting models have been developed in multimodal video understanding and generation, such as video LLM and video diffusion model. One emerging pathway to the ultimate intelligence is...

From NUS, Feb 27, 2025

No.24-20 Controllable Visual Synthesis via Structural Representation

End-to-end neural approaches have revolutionized visual generation, producing stunning outputs from natural language prompts. However, precise controls remain challenging through dire...

From Stanford, Dec 13, 2024

No.24-19 When LLMs Meet Recommendations: Scalable Hybrid Approaches to Enhance User Experiences

While LLMs offer powerful reasoning and generalization capabilities for user understanding and long-term planning in recommendation systems, their latency and cost hinder direct appli...

From Deepmind, Dec 09, 2024

No.24-18 Developing Effective Long-Context Language Models

In this talk, I will share our journey behind developing an effective long-context language model. I’ll begin by introducing our initial approach of using parallel context encoding (C...

From Princeton, Dec 04, 2024

No.24-17 Mitigating Distribution Shifts in Using Pre-trained Vision-Language Models

Benefiting from large-scale image-text pair datasets, powerful pre-trained vision-language models (VLMs, such as CLIP) enable many real-world applications, e.g., zero-shot classificat...

From UniMelb, Dec 02, 2024

No.24-16 Towards Graph Machine Learning in the Wild

Learning on graphs is a long-standing and fundamental challenge in machine learning and recent works have demonstrated solid progress in this area. However, most existing models tacit...

From MIT, Nov 27, 2024

No.24-15 Evaluation and Reasoning in Real-world Scenarios

User queries in natural settings, such as “provide a design for a disk topology for a NAS built on TrueNAS Scale, as well as a dataset layout,” differ significantly from those produce...

From Cornell University, Nov 20, 2024

No.24-14 Long-Range Meets Scalability: Unveiling a Linear-Time Graph Neural Network for Recommendation at Scale

Recommender systems play a central role in shaping our daily digital experiences, yet achieving both scalability and expressive power remains a significant challenge. While Graph Neur...

From Pennsylvania State University, Nov 13, 2024

Upcoming