Skip to content

Assess the viability of adopting Docker Containers vs Conda as the standard execution environment for the package #534

@gcroci2

Description

@gcroci2

In the course of PR #528, we recognized the benefits of constructing an environment within a Docker container, which eliminates the need for users to manually handle the installation of various dependencies required by deeprank2. Thus, there is the possibility to make Docker containers the default execution environment for the package. However, there are some concerns about this direction.

PROs

  • Dependency management: Docker simplifies dependency management. All the required dependencies are specified in the Dockerfile, making it easier for users to set up the environment without manually installing each component.
  • Ease of reproducibility and consistency across environments: Docker images are essentially snapshots of an environment, making it easy to reproduce the same environment on different machines or at different points in time. Additionally, Docker allows for consistent deployment across different environments. It helps ensure that the package works the same way regardless of the underlying system, which is particularly useful in production scenarios.

CONs:

  • Learning curve: there may be a learning curve for users unfamiliar with Docker, adding complexity to the process of setting up and running the package.
  • Limited resource access: Docker containers are inherently isolated, and they don't have direct access to all resources of the host machine. This can be partially modified within the Docker settings, but it needs to be done properly by the user and might be a drawback in scenarios where an application requires low-level access to specific hardware or system resources. For my MAC for example, not all the CPUs can be used (maximum 8 out of 10). Additionally, the official Docker docs advise users to limit the containers' resources for both system instability and security risks.
  • GPU access: Docker traditionally has limitations when it comes to accessing GPUs directly. This can be a significant drawback for applications that heavily rely on GPU processing power, such as our machine learning pipeline.
  • Supercomputers: Realistically, users will run deeprank2 mainly run on supercomputers, where they won't have sudo permissions. In such cases, running Docker will require certain actions and considerations from the user side (e.g., system administrator assistance, compatibility with the job scheduler, containerized job scripts, availability of the Docker image containing your application on the supercomputer's file system or a container registry accessible to the compute nodes).

In my opinion, for our community use cases, Docker is not the best choice, especially for the latter couple of cons listed above. But please let me know your thoughts :) and also if you know about alternatives that overcome Docker's limitations.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions