Skip to main content
Menu

Optimizing Resource Utilization for Interactive GPU Workloads with Transparent Container Checkpointing

Location

Oxford e-Research Centre Teaching room (277), 7 Keble Road, Oxford, OX1 3QG

Date & Time

Thursday 23 Jan 2025 12:30 - Thursday 23 Jan 2025 13:00

Availability

Open to all. You are welcome to bring lunch. Tea, coffee and biscuits will be provided in the room.

Abstract: Interactive GPU workloads, such as Jupyter notebooks and generative AI inference are becoming increasingly popular in scientific research and data analysis. However, efficiently allocating expensive GPU resources in multi-tenant environments like Kubernetes clusters is challenging due to the unpredictable usage patterns of these workloads. Container checkpointing was recently introduced as a beta feature in Kubernetes and has been extended to support GPU-accelerated applications. In this talk, we present a novel approach to optimizing resource utilization for interactive GPU workloads using container checkpointing. This approach enables dynamic reallocation of GPU resources based on real-time workload demands, without the need for modifying existing applications. We demonstrate the effectiveness of our approach through experimental evaluations with a variety of interactive GPU workloads and present preliminary results that highlight its potential.

Speakers: Radostin Stoyanov is a DPhil student at Oxford e-Research Centre. His research focuses on improving the resilience and performance of HPC and cloud computing systems. Viktória Spišaková is a PhD student at the Faculty of Informatics at Masaryk University. Viktória will join the talk virtually. Radostin and Viktória are preparing to present this work at FOSDEM 2025 In Belgium.