google research india,

No.24-05 Efficient and Elastic Large Models

Follow May 17, 2024 · 1 min read
No.24-05 Efficient and Elastic Large Models
Share this

Generative LLMs are transforming multiple industries and have proven to be robust for multitude of use cases across industries and settings. One of the key impediments to their widespread deployment is the cost of serving and its deployability across multiple devices/settings. In this talk, we will discuss the key challenges in improving efficiency of LLM serving. We will then give an overview of multiple techniques to address the problem. In particular, we will discuss tandem transformers and HIRE, techniques to speed up decoding in LLMs.

Speaker Bio

Prateek Jain is a Principal Scientist at Google Research India where he is also the director of Machine Learning and Optimization. He obtained his doctorate from UT Austin and BTech from IIT-Kanpur. He has conducted foundational research in the areas of efficient and elastic large models as well as in large-scale and non-convex optimization. Prateek regularly serves on the senior PC of top ML conferences and is on the editorial board of top ML journals including JMLR, SIMODS. He has also won multiple best paper awards including the 2020 Best Paper by IEEE Signal Processing Society. Prateek also received the Young Alumnus Award from IIT Kanpur in 2021 and the ACM India Early Career Researcher Award in 2022.

More Details

  • When: Fri 17 May 2024, at 3:00 - 4:30 pm (GMT+10)
  • Speaker: Dr Prateek Jain (Google Research India)
  • Host: Dr Mahsa Baktashmotlagh & Prof Guido Zuccon
  • Venue: 50-N201 - Hawken Engineering Building
  • Zoom: Physical only
llm
Join Newsletter
Get the latest news right in your inbox. We never spam!