Content

Abstract

In recent years, an important new class of applications has emerged that is characterized by its tight latency requirements. Examples of such latency-sensitive applications include autonomous driving, mobile augmented and virtual reality (AR/VR), online gaming, and the Internet of Things (IoT). These applications pose new challenges to cloud providers as cloud data centers are often geographically distant from users. As a result, the computing industry has proposed edge computing as a solution to the challenges presented by latency-sensitive applications. Edge computing promises lower response time by bringing server clusters closer to end users and devices. Therefore, conventional wisdom holds that the edge is better than the cloud from a latency perspective.

However, from the perspective of applications, the end-to-end latency of a request includes three parts: network latency, queueing delay, and service time. While edge data centers have the advantage of lower network latency, applications deployed at the edge are often vulnerable to longer queueing delays. These longer queueing delays could be caused by the resource-constrained nature of edge clusters, or less obviously, by inefficient resource multiplexing. As a result, proper resource allocation techniques are needed to ensure that latency-sensitive applications deployed in edge environments can achieve optimal performance. In this thesis, I address this gap by presenting model-driven resource allocation algorithms for latency-sensitive applications deployed at the edge in various contexts.

First, I design and implement a framework for running latency-sensitive serverless functions on edge resources. My approach can allocate the appropriate number of containers for each function to meet service-level objectives (SLO) in the absence of resource pressure while also providing fairness guarantees during resource overload.

Second, I study the problem of edge performance inversion, which describes scenarios where edge servers provide worse end-to-end latencies than cloud servers despite having lower network latency. I develop inversion-aware resource allocation and workload scheduling algorithms for latency-sensitive applications deployed in distributed edge-cloud environments.

Finally, I investigate the problem of container allocation for serverless applications in the form of directed acyclic graphs (DAGs). I propose a workflow-aware container allocation algorithm for serverless DAGs, with the goal to provide SLO guarantees on tail end-to-end latencies while minimizing the total amount of resources allocated to the application.

Advisor

Prashant Shenoy

Hybrid event posted in PhD Thesis Defense