This project implements a multithreaded MapReduce runtime in C using POSIX threads. It includes a thread pool for concurrency, sorted intermediate key-value structures, and Shortest Job First (SJF) scheduling for both Map and Reduce phases. The example application is a distributed word count program (distwc.c).
- pthread_mutex_t mutex: guards the shared job queue, idle counters, and shutdown flag.
- pthread_cond_t cond: signals worker threads when new jobs are available.
- pthread_cond_t cond_all_idle: used to synchronize between map and reduce phases.
- pthread_mutex_t (per partition): protects each partition’s sorted key-value list during concurrent
MR_Emitcalls.
Each partition maintains sorted key-value pairs:
typedef struct ValueNode {
char *value;
struct ValueNode *next;
} ValueNode;
typedef struct KVNode {
char *key;
ValueNode *values_head, *values_tail;
struct KVNode *next;
} KVNode;
typedef struct Partition {
pthread_mutex_t mutex;
KVNode *kv_head;
size_t bytes_used;
} Partition;Keys are inserted in ascending order (strcmp) during MR_Emit. Each partition’s size is tracked for SJF scheduling.
-
Inputs: all
testcase/sample*.txt(20 files). -
Partitions: 10 (fixed).
-
Threads: varied as shown below.
-
Correctness checks:
- No duplicate keys within or across partitions.
- Each key’s count equals the expected total (e.g., 5000 in provided cases).
-
Tools:
valgrind --tool=memcheck(no leaks) andvalgrind --tool=helgrind(no races). -
Command:
make time-test(uses/usr/bin/time).
| Threads | Elapsed |
|---|---|
| 1 | 0:00.14 |
| 2 | 0:00.15 |
| 4 | 0:00.13 |
| 8 | 0:00.13 |
| 16 | 0:00.15 |
Discussion (why speedup is limited here):
- Workload size is tiny, so thread overhead and I/O dominate; adding threads doesn’t substantially reduce wall time.
- Amdahl’s Law: serial sections remain (job submission, Map→Reduce barrier, one reducer per partition).
- Contention: brief but frequent locking on the job queue and hot partitions during
MR_Emit. - Load balance & tail effects: some partitions finish later; total time is gated by the slowest partition’s reducer.
- Cache & scheduling overheads grow with more threads, offsetting any potential gains on such a small dataset.
On larger inputs you should see clearer improvements from 1→4→8 threads until the factors above dominate.
make # build
make run # run on testcase/*.txt
make memcheck # check for leaks
make helgrind # check for data racesResult files (result-*.txt) are deleted before each run to prevent duplicates.
- Course handout/spec (CMPUT 379 A2).
man pthreads,man pthread_mutex,man pthread_cond.- Valgrind documentation (memcheck, helgrind).
No external code was copied; all implementation is original.