MapReduce
Python
Project details
Description
- Implemented a MapReduce framework in Python inspired by Google’s original MapReduce paper. The framework executes MapReduce programs with distributed processing on a cluster of computers like AWS EMR, Google Dataproc, or Microsoft MapReduce.
- The learning goals of this project include:
- MapReduce program execution
- Basic distributed systems
- Fault tolerance
- OS-provided concurrency facilities (threads and processes)
- Networking (sockets)
- The MapReduce framework consists of two major pieces of code. A Manager listens for user-submitted MapReduce jobs and distributes the work among Workers. Multiple Worker instances receive instructions from the Manager and execute map and reduce tasks that combine to form a MapReduce program.
- Managers communicate with the workers using TCP protocol. UDP heartbeats are sent from the Worker every 2s to let the Manager know that it is alive.
-
Start Date:
June 5th, 2024 -
End Date:
June 14th, 2024 -
Github:
Private -
Course:
EECS485 -
Course Topic:
Web Systems
7+
Years of Coding Experience
25
Completed CS Projects
3+