- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name : Diepreye Charles-Daniel

SID : 1766907

CCID : diepreye

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Overview

This project implements a multithreaded MapReduce runtime in C using POSIX threads. It includes a thread pool for concurrency, sorted intermediate key-value structures, and Shortest Job First (SJF) scheduling for both Map and Reduce phases. The example application is a distributed word count program (distwc.c).

Synchronization Primitives Used

Thread Pool

pthread_mutex_t mutex: guards the shared job queue, idle counters, and shutdown flag.
pthread_cond_t cond: signals worker threads when new jobs are available.
pthread_cond_t cond_all_idle: used to synchronize between map and reduce phases.

MapReduce Library

pthread_mutex_t (per partition): protects each partition’s sorted key-value list during concurrent MR_Emit calls.

Data Structures

Each partition maintains sorted key-value pairs:

typedef struct ValueNode {
    char *value;
    struct ValueNode *next;
} ValueNode;

typedef struct KVNode {
    char *key;
    ValueNode *values_head, *values_tail;
    struct KVNode *next;
} KVNode;

typedef struct Partition {
    pthread_mutex_t mutex;
    KVNode *kv_head;
    size_t bytes_used;
} Partition;

Keys are inserted in ascending order (strcmp) during MR_Emit. Each partition’s size is tracked for SJF scheduling.

Testing & Verification

Inputs: all testcase/sample*.txt (20 files).
Partitions: 10 (fixed).
Threads: varied as shown below.
Correctness checks:
- No duplicate keys within or across partitions.
- Each key’s count equals the expected total (e.g., 5000 in provided cases).
Tools: valgrind --tool=memcheck (no leaks) and valgrind --tool=helgrind (no races).
Command: make time-test (uses /usr/bin/time).

Runtime scaling (20 inputs, 10 partitions; elapsed wall time)

Threads	Elapsed
1	0:00.14
2	0:00.15
4	0:00.13
8	0:00.13
16	0:00.15

Discussion (why speedup is limited here):

Workload size is tiny, so thread overhead and I/O dominate; adding threads doesn’t substantially reduce wall time.
Amdahl’s Law: serial sections remain (job submission, Map→Reduce barrier, one reducer per partition).
Contention: brief but frequent locking on the job queue and hot partitions during MR_Emit.
Load balance & tail effects: some partitions finish later; total time is gated by the slowest partition’s reducer.
Cache & scheduling overheads grow with more threads, offsetting any potential gains on such a small dataset.

On larger inputs you should see clearer improvements from 1→4→8 threads until the factors above dominate.

Build & Run

make           # build
make run       # run on testcase/*.txt
make memcheck  # check for leaks
make helgrind  # check for data races

Result files (result-*.txt) are deleted before each run to prevent duplicates.

Sources

Course handout/spec (CMPUT 379 A2).
man pthreads, man pthread_mutex, man pthread_cond.
Valgrind documentation (memcheck, helgrind).

No external code was copied; all implementation is original.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
testcase		testcase
Makefile		Makefile
README.md		README.md
distwc.c		distwc.c
extended_logs.txt		extended_logs.txt
feedback.txt		feedback.txt
logs.txt		logs.txt
mapreduce.c		mapreduce.c
mapreduce.h		mapreduce.h
threadpool.c		threadpool.c
threadpool.h		threadpool.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name : Diepreye Charles-Daniel

SID : 1766907

CCID : diepreye

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Overview

Synchronization Primitives Used

Thread Pool

MapReduce Library

Data Structures

Testing & Verification

Runtime scaling (20 inputs, 10 partitions; elapsed wall time)

Build & Run

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Name : Diepreye Charles-Daniel

SID : 1766907

CCID : diepreye

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Overview

Synchronization Primitives Used

Thread Pool

MapReduce Library

Data Structures

Testing & Verification

Runtime scaling (20 inputs, 10 partitions; elapsed wall time)

Build & Run

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages