Granny: Managing Compute-Intensive Cloud Applications using Granules. Carlos Segarra
Oxford e-Research Centre, 7 Keble Road, Oxford, OX1 3QG
Date & Time
Wednesday 24 May 2023 12:30 - Wednesday 24 May 2023 13:30
Join us for a talk by Carlos Segarra entitled Granny: Managing Compute-Intensive Cloud Applications using Granules.
Talk is open to all, please contact firstname.lastname@example.org if you would like to attend. A sandwich/ meal deal lunch will be provided in the room.
Compute-intensive applications (e.g. weather forecasting and molecule dynamics simulation) rely on scale-up (by using more CPU cores) and scale-out (by using more nodes). To scale up/out, these applications are implemented typically using shared memory and message passing programming models (e.g. OpenMP and MPI). Although clouds with their abundant resources should offer an attractive execution environment for such applications, existing shared memory/message passing runtimes assume fixed parallelism, which removes control from a cloud provider when managing resources: it prevents providers from consolidating resources to fewer machines to exploit locality, elastically changing application resources, or transparently making long-running applications fault-tolerant.
In this talk I will present Granny, a new cloud runtime for shared memory/message passing (OpenMP/MPI) applications that allows the cloud provider to manage compute resources. Granny achieves this through the new abstraction of Granules, which unify the thread- and process-based execution semantics of OpenMP/MPI. Granules are WebAssembly modules that interact through OpenMP/MPI APIs. Since all execution state is captured by WebAssembly's linear memory model, Granny can efficiently migrate Granules between nodes to increase application locality. It can also vary the parallelism of MPI/OpenMP constructs by changing the number of CPU threads associated with a Granule. Finally, Granny periodically checkpoints Granules for rollback recovery fault-tolerance. Granny achieves all of this by interrupting Granule execution at control points (system or OpenMP/MPI calls), to change the distribution and parallelism, or to checkpoint the state.
Carlos is a third-year PhD student at the Large-Scale Data & Systems group of the Imperial College London. His research interests include the design and implementation of secure and efficient cloud runtimes. Particularly, he is interested in lightweight isolation mechanisms and confidential computing. All of his work is open source and available on GitHub.
Talk is free of charge and open to all, please contact email@example.com if you would like to attend. A sandwich/ meal deal lunch will be provided in the room.