Abstract:
Data analytics applications can often be modeled as a directed acyclic graph (DAG),
where the nodes are fine-grained tasks and the edges are task dependencies. A DAG scheduler
can be used to distribute the tasks to cloud computing resources where they can be
executed in parallel to speedup work
ow applications. Serverless computing is a cloud computing
platform that enables the decomposition of traditionally monolithic, server-based
applications into a collection of fine-grained cloud functions. Developers write the function
logic while the service provider takes care of provisioning, scaling, and managing the
back-end servers or virtual machines (VMs) that the functions run on. Creating a serverlessoriented
DAG scheduler poses a major challenge, as executing complex, burst-parallel DAG
jobs requires rapid scaling and high task throughput while minimizing data movement across
tasks.
Despite these challenges, data analytics workloads are well-suited for serverless computing.
The auto-scaling property of serverless computing platforms accommodates short tasks
and bursty workloads, while the pay-per-use billing model of serverless computing providers
keeps the cost of short tasks low.
In this thesis, we thoroughly investigate the problem space of DAG scheduling in serverless
computing. Our goal is to demonstrate that serverless-oriented, parallel computing
frameworks can support fast and efficient, DAG-based, parallel-computation work
ows that
are easy to deploy and manage. To accomplish this, we identify and evaluate a set of techniques
to make DAG schedulers serverless-aware, and we implement these techniques in
Wukong, a serverless DAG engine built atop AWS Lambda.
Our techniques and optimizations bring multiple benefits, including enhanced data locality,
reduced network I/Os, automatic resource elasticity, and improved cost effectiveness.
We show that when comparing Wukong to numpywren, a serverless system for linear algebra,
Wukong achieves near-ideal scalability, executes parallel computation jobs up to
68:17x faster, reduces network I/O by multiple orders of magnitude, and achieves 92:96%
tenant-side cost savings compared to numpywren, a serverless linear algebra library.
This thesis contains two modified, published papers along with an additional chapter
describing the latest, work-in-progress version of Wukong. In Chapter 3, we describe
and evaluate the initial prototype of Wukong. This first version of Wukong delivered
competitive performance compared to a comparable, serverful Dask cluster. In Chapter 4,
we present the second version of Wukong. We present a series of optimizations that greatly
improve the cost-effectiveness and performance of Wukong. Finally, we present the current
work-in-progress version of Wukong in Chapter 5. The latest version of Wukong is
completely serverless, requiring no user-deployed or user-managed servers. As a result, this
version of Wukong is extremely simple to use while still delivering competitive performance
and cost-effectiveness.
This thesis establishes that serverless computing is an appropriate setting for creating a
serverless DAG engine. We show that, by designing a DAG engine that takes into account
both the benefits and the challenges of serverless computing platforms, it is possible to
create a fast, cost-effective, and easy-to-use serverless parallel computing framework.