How to reproduce the paper results

This repository is to be intended for the sole purpose of evaluating the artifacts presented in the ATC 2025 Paper entitled: On-Demand Container Partitioning for Distributed ML

The 2DFS-builder and CLI implementation is available at: https://github.com/2DFS/2dfs-builder
The 2DFS-registry implementation is available at: https://github.com/2DFS/2dfs-registry

Each tool comes with instructions on how to install and use it. The 2DFS-builder is a CLI tool that allows you to build OCI images using the 2DFS technology, while the 2DFS-registry is a container registry, based on the OCI distribution spec, that allows you to store and manage your OCI images compliant with the 2DFS technology.

Disclaimer

The code is not intended for production use and is not supported. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the code or the use or other dealings in the code.

How to reproduce the paper results

Following, a guide to reproduce the results presented in the paper. The guide is divided into sections, each one corresponding to a figure in the paper. Each section contains a description of the figure, the expected behavior of the script, and the results of the experiment.

If something is not clear from the guide below, we also made a short YouTube tutorial on how to set up a VM and run the experiments. The video is available here:

Looking for a VM to replicate the results?

If you are looking for a VM to replicate the results, we tested the code on an AWS VM with the following configuration:

OS: Ubuntu 22.04 Server SSD Volume type
Instance type: c5.4xlarge
vCPUs: 4x (Important the more concurrency capability you have, the faster the experiments will run and the higher the gap between TDFS and Docker will be)
RAM: 8 GB
Disk: 40 GB

Requirements

Ubuntu 22.04 or newer. This code has been tested on Ubuntu 22.04.
Python 3.8 or higher installed. You can check your Python version by running:
```
python3 --version
```

Install Docker, pip3, venv and ifstat running the following commands:

wget https://raw.githubusercontent.com/2DFS/artifacts-evaluation/refs/heads/main/install_requirements.sh && chmod +x install_requirements.sh && sudo ./install_requirements.sh 
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Requirements installation troubleshooting

If the install requirements scripts fails the requirements can be manually installed using following the next steps:

Docker installation (please follow this guide)

Rootless Docker installation. This is required to run the evaluation scripts without sudo. To do this, you need to add your user to the docker group. Please refer to the following Docker Rootless documentation.

pip3 installed. You can install pip3 using:
sudo apt install python3-pip
venv python virtual environment. You can install venv using:
sudo apt install python3-venv
Install ifstat. Simply run:
sudo apt-get install ifstat 

Experiment setup

Q: How to evaluate the 2DFS artifacts presented in the ATC 2025 Paper?
A: The evaluation is based on the following steps:

Setup the environment: Setup the environment of the machine where you want to run the evaluation scripts be following the steps below:
- (1.1) Clone this repository and navigate its root directory:
```
git clone https://github.com/2DFS/artifacts-evaluation ATC25-2dfs-artifacts-evaluation && cd ATC25-2dfs-artifacts-evaluation
```
- (1.2) Install the latest tdfsCLI utility, python packages, configure the local registry access, the docker containerd snapshotter and download the dataset using using the following command:
```
./setup_environment.sh
```
- (1.3) Enter the virtual environment:
```
source ./venv/bin/activate
```
  At the end of the experiments you can deactive the virtual environment using deactivate and cleanup the container registry with the ./cleanup_environment.sh command.
Run the evaluation scripts: For each of the figures in our paper, we include a script to run its evaluation. The scripts assume that both docker and tdfs are installed and the splits/ folder containing the models and splits is in the same directory as the evaluation scripts, so make sure you completed the step above. The scripts to reproduce each figure are available below.
Get the results: The results of each experiment will be saved in the current directory both as .csv files and as .pdf, reproducing the results and pictures of the paper. The files use the common filename structure results_fig<fig-number>.csv and fig<fig-number>_reproduced.pdf. For example, the results of Figure 8 will be saved in results_fig8.csv, and the plot will be saved in fig8_reproduced.pdf.

If you're using a remote vm, you can use scp to copy the results to your local machine. For example, if you're using an AWS VM, you can run the following command from your local machine, E.g.:
```
scp -i <path-to-your-aws-key> ubuntu@<your-aws-ip>:~/ATC25-2dfs-artifacts-evaluation/fig8_reproduced.pdf .
```
Where <path-to-your-aws-key> is the path to your AWS key, <your-aws-ip> is the public IP of your AWS VM.

The results might slightly differ in scale from the values of the paper due to different machine configurations and environments. For example, lower degree of parallelism or slower Read/Write speed on the disk compared to the machines used in the evaluation of the paper will lead to faster build times. Please refer to the How to interpret the results section below for more details.

Evaluation Scripts

These artifact evaluation scripts reproduce all the results presented in the Evaluation section of the paper, specifically from Fig.8 to Fig. 14.

N.b. To reduce the time overhead of this evaluation, by default each experiment is executed only once. Suppose you're willing to repeat each experiment multiple times to increase its accuracy. In that case, it is possible to export the EXPERIMENT_REPEAT environment variable to the number of times you want to repeat each experiment. For example, to repeat each experiment 2 times, run:
export EXPERIMENT_REPEAT=2

Figure 8: Build time for a single image with increasing number of splits per layer.

Script execution time: ~10 minutes

To run the evaluation for Figure 8, run the following command:

   python3 fig8.py

Please make sure you have stable connectivity and that the experiment is not interrupted. In case the experiment is interrupted because of a VM disconnection, make sue to follow again the preparation step 1. **Setup the environment**

Expected behavior:

During the experiment, for each experiment configuration, the script will print messages like the following:

###TDFS EXPERIMENT##
Total time:  5.92 Download time 4.336642265319824 Layering time 0.3461027145385742
###DOCKER EXPERIMENT##
Total time:  12.2 Download time 1.4 Layering time 7.099999999999991

They highlight the total build time, download time (from the local registry), and container layering time for each experiment configuration. These results already give the user an idea of the performance of TDFS vs Docker. N.b.: the download time is not considered in the final results of the paper as it is considered not relevant, but it is included in the output for completeness.

Results:

At the end of the execution, the script will save the results in a file called results_fig8.csv in the current directory. The script will also generate a plot of the results and save it in a file called fig8_reproduced.pdf in the current directory. The plot will be saved in the same format as the one presented in the paper.

Figure 9: Build time for different split partitions where each partition is packaged as a separate image.

Script execution time: ~15 minutes

To run the evaluation for Figure 9, run the following command:

   python3 fig9.py

Please make sure you have stable connectivity and that the experiment is not interrupted. In case the experiment is interrupted because of a VM disconnection, make sue to follow again the preparation step 1. **Setup the environment**

Expected behavior:

During the experiment, for each experiment configuration, the script will print messages like the following:

###TDFS EXPERIMENT##
Total time:  5.92 Download time 4.336642265319824 Layering time 0.3461027145385742
###DOCKER EXPERIMENT##
Total time:  12.2 Download time 1.4 Layering time 7.099999999999991

They highlightg the Total build time, download time (from the local registry) and container layering time for each experiment configuration. These results already give the user an idea of the performance of TDFS vs Docker. N.b.: the download time is not considered in the final results of the paper as it is considered not relevant, but it is included in the output for completeness.

Results:

At the end of the execution, the script will save the results in a file called results_fig9.csv in the current directory. The script will also generate a plot of the results and save it in a file called fig9_reproduced.pdf in the current directory.

Figure 10: Resources consumption during image build.

Script execution time: ~5 minutes

To run the evaluation for Figure 10, run the following command:

   python3 fig10.py

Please make sure you have stable connectivity and that the experiment is not interrupted. In case the experiment is interrupted because of a VM disconnection, make sue to follow again the preparation step 1. **Setup the environment**

Expected behavior:

During the experiment, for each experiment configuration, the script will print messages like the following:

###TDFS EXPERIMENT##
Total time:  5.92 Download time 4.336642265319824 Layering time 0.3461027145385742
###DOCKER EXPERIMENT##
Total time:  12.2 Download time 1.4 Layering time 7.099999999999991

They highlight the Total build time, download time (from the local registry), and container layering time for each experiment configuration. These results already give the user an idea of the performance of TDFS vs Docker. N.b.: the download time is not considered in the final results of the paper as it is considered not relevant, but it is included in the output for completeness.

Results:

At the end of the execution, the script will save the results in a file called results_fig10.csv and cpumemoryusage.csv in the current directory. The former contains the build output, like in Figs 8 and 9, while the latter contains the CPU and memory consumption measurements during the experiments. The script will also generate a plot of the results and save it in a file called fig10_reproduced.pdf in the current directory.

Disclaimer: CPU and memory measurements fluctuate based on machine usage in real-time. Please consider this when interpreting the results. Additionally In this single-vm experiment, the CPU and memory measurements are affected by both the runtime making the request and the registry serving the images. Therefore, the memory and consumption might result slightly higher than the one presented in the paper where the registry was hosted in an isolated machine.

Figures 11, 12, and 13: Download of partitioned vs prebuilt images.

Script execution time: ~10 minutes

Figures 11, 12, and 13 are generated together as part of the same experiment. To run the evaluation for Figures 11, 12, and 13, follow these steps:

To run the evaluation for Figure 11,12 and 13, run the following command:

   python3 fig11-13.py

Please make sure you have stable connectivity and that the experiment is not interrupted. In case the experiment is interrupted because of a VM disconnection, make sue to follow again the preparation step 1. **Setup the environment**

Expected behavior:

The script should run multiple Docker and TDFS builds, each one with a different configuration. At the end of each build, it pushes the artifacts to the local registry. Then, it performs different pulls for each image partition.

Results:

At the end of the execution, the script will save the results in files called results_fig11.csv, cpumemoryusage.csv, and bandwidth-result.log in the current directory. The first contains the build output results, the second contains the CPU and Memory consumption measurements during the experiments, and the third contains the bandwidth measurements at the docker bridge. The script will also generate the plots of the results and save them respectively in the files: fig11_reproduced.pdf,fig12_reproduced.pdf, and fig13_reproduced.pdf in the current directory.

Disclaimer: CPU, memory, and bandwidth measurements fluctuate based on machine usage in real-time. In this script, the registry runs locally with the builder. So, expect additional background noise compared to the paper. Please consider this when interpreting the results. Additionally, due to high fluctuations, the standard deviation of the measurements can be high; this effect is mitigated by running the experiment multiple times.

Figure 14: Build time after model updates with image caching.

Script execution time: ~10 minutes

To run the evaluation for Figure 14a and Figure 14b, run the following command:

   python3 fig14.py

Please make sure you have stable connectivity and that the experiment is not interrupted. In case the experiment is interrupted because of a VM disconnection, make sue to follow again the preparation step 1. **Setup the environment**

Expected behavior:

During the experiment, the script initially runs a TDFS build and a Docker build for the base images. Then, the script will perform the layer updates according to the experiment configuration and re-perform the build with the new layers. The script will print messages like the following:

Change allotments...
###TDFS EXPERIMENT##
Total time:  5.59 Download time 0.3019580841064453 Layering time 2.4875688552856445
###DOCKER EXPERIMENT##
Total time:  84.81 Download time 12.299999999999983 Layering time 15.29999999999998

They highlight the Total build time, download time (from the local registry), and container layering time for each experiment configuration. These results already give the user an idea of the performance of TDFS vs Docker. N.b.: the download time is not considered in the final results of the paper as it is considered not relevant, but it is included in the output for completeness.

Results:

At the end of the execution, the script will save the results in a file called results_fig14.csv in the current directory. This file contains the execution results. The script will also generate a plot of the results and save them respectively in fig14_a_reproduced.pdf and fig14_b_reproduced.pdf in the current directory.

How to interpret the results

The 2DFS utility is designed to exploit CPU parallelism to boost the image build time. The results of the experiments are highly affected by the number of cores available in the machine as well as the maximum Read/Write speed of the disk. For example, a machine with 16 cores gives much more parallelism to 2DFS compared to a machine with 4, while Docker builds are pretty much only affected by the processing speed and not parallelism. While the results are reproducible and potentially consistent, the scale of the gap and the absolute values of the build and caching time may vary based on the machine configuration.

Another consideration is that while on the paper, we evaluate using a 2dfs compliant registry in isolated machines; in this proposed experimental setup, we install everything locally to simplify the installation time. This means that the registry is not isolated from the builder, and the measurements are affected by both the runtime making the request and the registry serving the images. Therefore, the memory and CPU consumption might be slightly higher than the one presented in the paper.

What about Fig 15?

Figure 15 requires a complex environment involving multiple machines and a distributed setup. Since Figure 15 does not directly evaluate our solution in terms of core contributions but primarily analyzes the benefits from a distributed ML perspective, we decided not to include it in the artifact evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
extra		extra
figs		figs
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup_environment.sh		cleanup_environment.sh
fig10.py		fig10.py
fig11-13.py		fig11-13.py
fig14.py		fig14.py
fig8.py		fig8.py
fig9.py		fig9.py
install_requirements.sh		install_requirements.sh
requirements.txt		requirements.txt
setup_environment.sh		setup_environment.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disclaimer

How to reproduce the paper results

Table of contents

Looking for a VM to replicate the results?

Requirements

Experiment setup

Evaluation Scripts

Figure 8: Build time for a single image with increasing number of splits per layer.

Expected behavior:

Results:

Figure 9: Build time for different split partitions where each partition is packaged as a separate image.

Expected behavior:

Results:

Figure 10: Resources consumption during image build.

Expected behavior:

Results:

Figures 11, 12, and 13: Download of partitioned vs prebuilt images.

Expected behavior:

Results:

Figure 14: Build time after model updates with image caching.

Expected behavior:

Results:

How to interpret the results

What about Fig 15?

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disclaimer

How to reproduce the paper results

Table of contents

Looking for a VM to replicate the results?

Requirements

Experiment setup

Evaluation Scripts

Figure 8: Build time for a single image with increasing number of splits per layer.

Expected behavior:

Results:

Figure 9: Build time for different split partitions where each partition is packaged as a separate image.

Expected behavior:

Results:

Figure 10: Resources consumption during image build.

Expected behavior:

Results:

Figures 11, 12, and 13: Download of partitioned vs prebuilt images.

Expected behavior:

Results:

Figure 14: Build time after model updates with image caching.

Expected behavior:

Results:

How to interpret the results

What about Fig 15?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages