| Page 36 | Kisaco Research

In this session, we will explore the end-to-end workflow of managing foundation model (FM) development on Amazon SageMaker HyperPod. Our discussion will cover both distributed model training and inference using frameworks like PyTorch and KubeRay. Additionally, we will dive into operational aspects, including system observability and resiliency features for scale and cost-performance using Amazon EKS on SageMaker HyperPod. By the end of this hands-on session, you will gain a robust understanding of training and deploying FMs efficiently on AWS. You will learn to leverage cutting-edge techniques and tools to ensure high performance, reliable, and scalable FM development.

Location: Room 206

Duration: 1 hour

Author:

Mark Vinciguerra

Assoc. WW Solution Architect
AWS GenAI

Mark Vinciguerra

Assoc. WW Solution Architect
AWS GenAI

Author:

Aravind Neelakantan

WW Solution Architect
AWS GenAI

Aravind Neelakantan

WW Solution Architect
AWS GenAI

Author:

Aman Shanbhag

WW Solution Architect
AWS GenAI

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

Aman Shanbhag

WW Solution Architect
AWS GenAI

Aman Shanbhag is a Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

 

Anna Doherty

Partner
G9 Ventures

Anna Doherty

Partner
G9 Ventures

Anna Doherty

Partner
G9 Ventures

The rapid evolution of high-performance computing (HPC) clusters has been instrumental in driving transformative advancements in AI research and applications. These sophisticated systems enable the processing of complex datasets and support groundbreaking innovation. However, as their adoption grows, so do the critical security challenges they face, particularly when handling sensitive data in multi-tenant environments where diverse users and workloads coexist. Organizations are increasingly turning to Confidential Computing as a framework to protect AI workloads, emphasizing the need for robust HPC architectures that incorporate runtime attestation capabilities to ensure trust and integrity.

In this session, we present an advanced HPC cluster architecture designed to address these challenges, focusing on how runtime attestation of critical components – such as the kernel, Trusted Execution Environments (TEEs), and eBPF layers – can effectively fortify HPC clusters for AI applications operating across disjoint tenants. This architecture leverages cutting-edge security practices, enabling real-time verification and anomaly detection without compromising the performance essential to HPC systems.

Through use cases and examples, we will illustrate how runtime attestation integrates seamlessly into HPC environments, offering a scalable and efficient solution for securing AI workloads. Participants will leave this session equipped with a deeper understanding of how to leverage runtime attestation and Confidential Computing principles to build secure, reliable, and high-performing HPC clusters tailored for AI innovations.

Location: Room 201

Duration: 1 hour

Author:

Jason Rogers

CEO
Invary

Jason Rogers is the Chief Executive Officer of Invary, a cybersecurity company that ensures the security and confidentiality of critical systems by verifying their Runtime Integrity. Leveraging NSA-licensed technology, Invary detects hidden threats and reinforces confidence in an existing security posture. Previously, Jason served as the Vice President of Platform at Matterport, successfully launched a consumer-facing IoT platform for Lowe's, and developed numerous IoT and network security software products for Motorola.

Jason Rogers

CEO
Invary

Jason Rogers is the Chief Executive Officer of Invary, a cybersecurity company that ensures the security and confidentiality of critical systems by verifying their Runtime Integrity. Leveraging NSA-licensed technology, Invary detects hidden threats and reinforces confidence in an existing security posture. Previously, Jason served as the Vice President of Platform at Matterport, successfully launched a consumer-facing IoT platform for Lowe's, and developed numerous IoT and network security software products for Motorola.

Author:

Ayal Yogev

CEO & Co-founder
Anjuna

Ayal Yogev

CEO & Co-founder
Anjuna

Dive into a hands-on workshop designed exclusively for AI developers. Learn to leverage the power of Google Cloud TPUs, the custom accelerators behind Google Gemini, for highly efficient LLM inference using vLLM. In this trial run for Google Developer Experts (GDEs), you'll build and deploy Gemma 3 27B on Trillium TPUs with vLLM and Google Kubernetes Engine (GKE). Explore advanced tooling like Dynamic Workload Scheduler (DWS) for TPU provisioning, Google Cloud Storage (GCS) for model checkpoints, and essential observability and monitoring solutions. Your live feedback will directly shape the future of this workshop, and we encourage you to share your experience with the vLLM/TPU integration on your social channels.

Location: Room 207

Duration: 1 hour

Author:

Niranjan Hira

Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users.  With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

Niranjan Hira

Senior Product Manager
Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users.  With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

 

Julianne Kur

Principal
Alliance Consumer Growth

Julianne Kur

Principal
Alliance Consumer Growth

Julianne Kur

Principal
Alliance Consumer Growth

Location: Room 206

Duration: 1 hour

Experience the future of GenAI inference architecture with NeuReality’s fully integrated, enterprise-ready NR1® Inference Appliance. In this hands-on workshop, you'll go from cold start to live GenAI applications in under 30 minutes using our AI-CPU-powered system. The NR1® Chip – the world’s first AI-CPU purpose built for interference – pairs with any GPU or AI accelerator and optimizes any AI data workload. We’ll walk you through setup, deployment, and real-time inference using models like LLaMA, Mistral, and DeepSeek on our disaggregated architecture—built for smooth scalability, superior price/performance and near 100% GPU utilization (vs <50% with traditional CPU/NIC architecture). Join us to see how NeuReality eliminates infrastructure complexity and delivers enterprise-ready performance and ROI today.

Location: Room 201

Duration: 1 hour

Author:

Paul Piezzo

Enterprise Sales Director
NeuReality

Paul Piezzo

Enterprise Sales Director
NeuReality

Author:

Gaurav Shah

VP of Business Development
NeuReality

Gaurav Shah

VP of Business Development
NeuReality

Author:

Naveh Grofi

Customer Success Engineer
NeuReality

Naveh Grofi

Customer Success Engineer
NeuReality

Join us in this hands-on workshop to learn how to deploy and optimize large language models (LLMs) for scalable inference at enterprise scale. Participants will learn to orchestrate distributed LLM serving with vLLM on Amazon EKS, enabling robust, flexible, and highly available deployments. The session demonstrates how to utilize AWS Trainium hardware within EKS to maximize throughput and cost efficiency, leveraging Kubernetes-native features for automated scaling, resource management, and seamless integration with AWS services.

Location: Room 206

Duration: 1 hour

Author:

Asheesh Goja

Principal GenAI Solutions Architect
AWS

Asheesh Goja

Principal GenAI Solutions Architect
AWS

Author:

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML
AWS

Pinak Panigrahi

Sr. Machine Learning Architect - Annapurna ML
AWS

As AI workload demands continue to accelerate, Cloud Service Providers, System OEMs, and IP/Silicon vendors require a scalable, high-performance solution to support advanced workloads. By enhancing performance, optimizing power and cost efficiency, and promoting interoperability and supply chain diversity, the UALink 200G 1.0 Specification delivers a low-latency, high-bandwidth interconnect designed for efficient communication between accelerators and switches within AI computing pods.

Location: Room 201

Duration: 40 minutes

GIGABYTE AI TOP is a groundbreaking desktop solution that empowers developers to train their own AI models locally. Featuring advanced memory offloading technology and support for open-source LLMs, LMMs, and other machine learning models, it delivers enterprise-grade performance in a compact desktop form factor. This solution enables both AI beginners and professionals to build, fine-tune, and deploy state-of-the-art models with enhanced privacy, flexibility, and security.

Author:

Charles Le

CTO, Channel AI Solutions
GIGABYTE

Dr. Charles Le currently serves as Chief Technology Officer of Channel AI Solutions at GIGABYTE. He leads the AI software division and is the architect behind GIGABYTE’s flagship platform, AI TOP Utility, which empowers developers and enterprises to train and deploy large AI models with ease.

He is an expert in the training, finetuning, and inference of LLMs, LMMs, and other machine learning models, with deep knowledge across algorithm design, hardware acceleration, and system integration.

 

Before joining GIGABYTE, Dr. Le spent four years applying deep learning to the development of radiative cooling materials for marine robotics. He also has six years of experience in structural health monitoring and modal identification for infrastructure under dynamic loads such as earthquakes and wind. More recently, he has applied AI to enhance business intelligence, hardware R&D, and service AI assistants using tools like LangChain and LLM deployment.

Charles Le

CTO, Channel AI Solutions
GIGABYTE

Dr. Charles Le currently serves as Chief Technology Officer of Channel AI Solutions at GIGABYTE. He leads the AI software division and is the architect behind GIGABYTE’s flagship platform, AI TOP Utility, which empowers developers and enterprises to train and deploy large AI models with ease.

He is an expert in the training, finetuning, and inference of LLMs, LMMs, and other machine learning models, with deep knowledge across algorithm design, hardware acceleration, and system integration.

 

Before joining GIGABYTE, Dr. Le spent four years applying deep learning to the development of radiative cooling materials for marine robotics. He also has six years of experience in structural health monitoring and modal identification for infrastructure under dynamic loads such as earthquakes and wind. More recently, he has applied AI to enhance business intelligence, hardware R&D, and service AI assistants using tools like LangChain and LLM deployment.