Content

Speaker

Walid Abdelrahman Hanafy

Abstract

The rising demand for computing has led to a significant increase in energy consumption. Data centers are estimated to consume up to 377 terawatt-hours of energy. As fossil fuels continue to be the primary energy source, the increased energy demand leads to higher carbon emissions, with cloud data centers becoming significant contributors to global carbon emissions. In response to this issue, cloud platforms focus more on sustainability and reducing their operational carbon footprint. Although previous approaches have focused on optimizing the energy efficiency of data centers, I argue for using carbon emissions as a first-class objective.

In this thesis, I propose novel resource management techniques that allow cloud users and operators to reduce their operational carbon emissions.

First, I propose an optimal scheduling algorithm that leverages the elasticity of cloud batch applications, where a job dynamically varies its server allocation based on fluctuations in the carbon cost of the grid's energy. I quantify and show how the proposed technique significantly reduces carbon emissions without performance overheads and possible cost overheads.

Second, I evaluate the conflict between carbon efficiency—work done per unit of carbon—and energy efficiency—work done per unit of energy—and show how optimizing for carbon efficiency often reduces energy efficiency. I also assess the overheads associated with carbon-aware scheduling from the perspectives of both cloud users and providers.

Third, I evaluate the three-way trade-off between carbon emissions, performance, and cost in cloud-based batch schedulers. I show how carbon-aware scheduling can increase job completion times due to delayed execution and how carbon-aware adjustments change the demand pattern by periodically leaving resources idle, which creates a trade-off between carbon emissions and cost.

Finally, I propose leveraging resource elasticity in cluster-wide settings. Specifically, I will design and implement a cluster-wide system and propose provisioning and scheduling policies that optimize cloud-based batch systems' operational emissions and cost.