Train and Deploy High-Performing AI Model Development at Scale

In this session, we will explore the end-to-end workflow of managing foundation model (FM) development on Amazon SageMaker HyperPod. Our discussion will cover both distributed model training and inference using frameworks like PyTorch and KubeRay. Additionally, we will dive into operational aspects, including system observability and resiliency features for scale and cost-performance using Amazon EKS on SageMaker HyperPod. By the end of this hands-on session, you will gain a robust understanding of training and deploying FMs efficiently on AWS. You will learn to leverage cutting-edge techniques and tools to ensure high performance, reliable, and scalable FM development.

Location: Room 206

Duration: 1 hour

Speaker(s):

Author:

Mark Vinciguerra

Assoc. WW Solution Architect

AWS GenAI

Author:

Aravind Neelakantan

WW Solution Architect

AWS GenAI

Author:

Aman Shanbhag

WW Solution Architect

AWS GenAI

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

Session Type:

Workshop