In this session, we will explore the end-to-end workflow of managing foundation model (FM) development on Amazon SageMaker HyperPod. Our discussion will cover both distributed model training and inference using frameworks like PyTorch and KubeRay. Additionally, we will dive into operational aspects, including system observability and resiliency features for scale and cost-performance using Amazon EKS on SageMaker HyperPod. By the end of this hands-on session, you will gain a robust understanding of training and deploying FMs efficiently on AWS. You will learn to leverage cutting-edge techniques and tools to ensure high performance, reliable, and scalable FM development.
Location: Room 206
Duration: 1 hour

Mark Vinciguerra

Aravind Neelakantan

Aman Shanbhag
Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.