Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing
Location
Oxford e-Research Centre Teaching room (277), 7 Keble Road, Oxford, OX1 3QG
Contact
events@oerc.ox.ac.ukDate & Time
Thursday 23 Jan 2025 12:30 - Thursday 23 Jan 2025 13:00
Availability
Abstract: Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.
Speakers: Radostin Stoyanov is a DPhil student at Oxford e-Research Centre. His research focuses on improving the resilience and performance of HPC and cloud computing systems. Viktória Spišaková is a PhD student at the Faculty of Informatics at Masaryk University. Viktória will join the talk virtually. Radostin and Viktória are preparing to present this work at FOSDEM 2025 In Belgium.