PhD Thesis Defense: Walid A. Hanafy, Carbon-aware Resource Management for Cloud Computing Platforms
Content
Abstract
The rising demand for computing has led to a significant increase in energy consumption. In 2023, data centers consumed 176 terawatt-hours (TWh), with estimates predicting consumption to rise between 325 and 580 TWh by 2028. As fossil fuels continue to be the primary energy source, the increased energy demand leads to higher carbon emissions, with cloud data centers becoming significant contributors to global carbon emissions. In response to this issue, cloud platforms are increasingly focusing on sustainability and minimizing their operational carbon footprint. While earlier strategies centered on enhancing the energy efficiency of data centers, I advocate for prioritizing carbon emissions as a first-class objective.
In this thesis, I propose novel resource management techniques that allow cloud users and operators to reduce their operational carbon emissions.
First, I propose an optimal scheduling algorithm that leverages the elasticity of cloud batch applications, where a job dynamically varies its server allocation based on fluctuations in the carbon intensity of the grid's energy. I quantify and show how the proposed technique significantly reduces carbon emissions without performance overheads and possible cost overheads.
Second, I evaluate the three-way trade-off between carbon emissions, performance, and cost for batch schedulers in hybrid cloud clusters. I show how carbon-aware scheduling can increase job completion times due to delayed execution and how carbon-aware adjustments change the demand pattern by periodically leaving resources idle, resulting in a trade-off between carbon emissions and cost.
Third, I analyze the conflict between carbon efficiency—work done per unit of carbon—and energy efficiency—work done per unit of energy—for batch workloads and show how optimizing for carbon efficiency often reduces energy efficiency. Next, I experimentally quantify the carbon-cost-performance trade-offs for both interactive and batch cloud applications, examining the extent of stampede effects that cloud providers may encounter as clients become more carbon-conscious.
Finally, I propose a cluster resource manager for carbon-aware resource provisioning and scheduling for cloud-based batch workloads. The scheduler leverages continuous learning over historical cluster-level data to drive near-optimal runtime resource provisioning and job scheduling. I show how my continuous learning approach is able to mimic the decisions of an oracle scheduler with perfect knowledge of future carbon intensity and job lengths in online settings.
Advisor
Prashant Shenoy