google-trace.md

Google Trace

This document describes how we leverage Google Cluster Trace Data to plot Figure 4(a) in the paper.

The following description is contributed by Yilun Chen. If you have any questions, please use Github issues.

The Figure 4(a) was generated by these steps

(1) Divide the entire tracing period (29 days) to fixed length time slots (length = L). (e.g. we used 5 mins)
(2) Calculate available normalized memory from machine event table for each time slot. For example, initialize a vector with all 0, if machine A has memory M, that was added at time t0, removed at t1 (t(end) if not be removed), increment all elements that corresponding time slot falls into the range of [t0, t1) by M, and so on so forth.
(3) Calculate cumulative canonical memory usage of each time slot. For example, also initialize another vector with all 0, if there is a record that task A has memory usage M' between t2 and t3, then increment the element in the vector which corresponding time slot t2 falls into, by M' * ((t3-t2) / L).
(4) once the two vectors are finalized, use the vector get from step (3) to divide that one from step (2) to generate final vector and plot.