By taking this course, you will learn:
1. Explain the basic concepts, objectives, and functions of distributed
computing (eco)systems, e.g.,
communication, resource management and scheduling, data consistency,
2. Compare the basic characteristics of distributed computing with other
computing paradigms (i.e., centralized, parallel).
3. Identify proficiently the different flavors of modern distributed
ecosystems (i.e., peer-to-peer systems, cluster computing, grid
computing, cloud computing, datacenters, distributed HPC, SDN, Big Data
systems, IoT systems).
4. Analyze proficiently the trade-offs inherent in the design of modern
5. Design your portfolio distributed-system, with many basic and some
complex operations of modern distributed systems.
6. Implement and deploy your portfolio distributed-system.
7. Analyze your portfolio distributed-system.
This course focuses on distributed computing systems and ecosystems. In
general, debugging and tuning existing systems and ecosystems, and
designing, implementing, and analyzing new distributed computing systems
remains vital and challenging for both industry and academia.
Starting with the mid-1990s, computing is undergoing a revolution, in
which collections of independent computers appear to users as a single,
albeit distributed, computing system. Motivated by the advent of the
Internet, by the increase in the computation capacity of consumer
computers, by the commoditization of server-grade machines, by energy
constraints, etc., the distributed computing paradigm has permeated all
fields using computers. Current distributed computing applications range
from social networks to banking, from peer-to-peer file-sharing to
high-performance computing used in research, from massively multiplayer
online games to business-critical workloads, etc. Important advances
have helped to fuse heterogeneous resources into truly global
distributed systems and ecosystems, for example in scientific computing,
where distributed computation is using Big Data and distributed sensors
to produce meaningful progress for the humankind. We will focus in this
course on a number of these modern examples of distributed computing
Although so many distributed systems and ecosystems already exist, the
list of conceptual and technical challenges they pose is long. Depending
on requirements, even trivial communication between nodes of the
distributed system can be challenging. The failure of a single node, or
sometimes even a performance hiccup, can bring an entire system down;
with it, other nodes or entire other systems may also crash,
experiencing correlated and catastrophic failures. Data consistency and
coordinating nodes remain important challenges made worse by the
large-scale of real-world deployments. Poor resource management and
naive scheduling can lead to orders-of-magnitude higher operational
costs and consumption of energy that we simply cannot spare. It is not
uncommon for a modern distributed system to quickly rise and then fall
in popularity, as exemplified by the 2016 example of Pokemon Go. We will
present in this course real-world situations where modern distributed
systems have behaved poorly.
Addressing these challenges requires unique approaches and concepts.
Separating concerns and breaking down problems into smaller cases often
lead to limited success, because many properties of distributed systems
can only be achieved end-to-end. Can anyone imagine a perfectly reliable
production pipeline, if even one of its key stages can suffer failures?
Building capability by adding resources is often offset by the
distributed nature of the system. Can anyone ignore the physical
limitations of communication around the globe? In this course, we will
focus on the unique approaches and principles of distributed systems,
from specific architectures and communication protocols, to specific
concepts in resource management and scheduling, data consistency,
fault-tolerance, and performance.
Form of tuition
The course is gamified. The course is taught as a series of lectures, in
combination with self-study and with a large practical assignment. The
course also has important design and seminar components.
Type of assessment
Written exam. Depending on enrollment, an oral exam may also be
available. Report on the practical assignment. In-class assignments
include: design, seminar, Q&A. There are other gamification elements.
The course uses as textbook the book: Maarten van Steen and Andrew S.
Tanenbaum, Distributed Systems, 3rd. Ed., online edition, 2017. (free
for all) [Online] Available:
Recent material is only available as lecture slides and selected
publications. The lecture slides also recommend additional literature.
The study guide available on Canvas also indicates other worthwhile
sources of information.
Ability to work in teams for the practical assignment. Ability to
develop code using modern software engineering practices, e.g., setting
up your own GitHub repository, co-editing using tools such as Overleaf
or Sharelatex, etc., is a big plus.
Recommended background knowledge
Students should have taken standard courses on:
- Computer networks.
- Programming paradigms, in particular OOP and/or actor-based
- Software engineering.
Prior experience with Internet, web, distributed, or parallel
programming and algorithms is helpful.
Prior experience with operating systems development and analysis, and in
general experience with computer systems courses, is a big plus.
This course uses the gamification technique developed by prof. Iosup and