Inspiration 💡

The github of CORTX contains a list about suggested integrations. There are 3 interesting software on that list: Elasticsearch, PyTorch, TensorFlow. We want to make something that can integrate all of these into CORTX at the same time. So this sounded to us really challenging. We are hackathonists so besides the hard work, it seemed to be fun. Since Seagate’s CORTX provide data manage and catalog solution, an integration has to be more than just saving and loading data. Each cloud service is capable to store data, so a good integration handles meta data with features like cataloging and searching. We tried to achieve this goal.

What it does ❓

It helps to integrate PyTorch, TensorFlow and Elasticsearch into CORTX at the same time with the functionality of DoF. DoF is a file format and acronym for Deep Model Core Output Framework. It is a continuously developed hackathon winner project to provide fast dataset sharing and data-secure at the same time. It helps data scientists to handle sensitive data and to work with large datasets easily. DoF can contain different stages of the dataset such as raw data, preprocessed data, output of headless pretrained model. Besides this, DoF can hold model data such as weights and biases. Furthermore, it is a container for any additional information like license, contact of author, description of model, etc. There are a lot of prebuild keys that help you to store the most important details about your dataset, model or training process. The list of keys are extendable.

The main functions of cortx_dof are:

  • uploading with S3

  • searching with Elasticsearch

  • downloading with S3

How we built it 🔹

We installed CORTX-VM locally. Frankly, we installed it three times, since we want to test its functionality and flexibility. The scope of the third installation was out of this hackathon, since we really like those software and we will use it after the hackathon. For VM we really built a computer from parts.

We installed boto3 and elasticsearch. The code is written in Python. We used Spyder and Atom for the writing, Spyder and Command line for testing.

We created examply.py to demonstrate the functionality of the code. The code is well documented with docstrings.

Challenges we ran into 🏆

At first sight CORTX-VM was hard to understand. That’s why we decided to check its github to examine the code. After that, everything was clear. We made a step by step guide to us based on.

Our guide is based on the original tutorial.

We used VMWare’s VirtualBox to run CORTX-VM. It is very important to consider the following things:

The default CPU settings is 8 cores, you might not have that much, we definitely don’t have. We experienced that the best practice is, if we leave at least one core for normal workflow. The memory and monitor settings are similar. It's worth considering changing them.

Setting in CORTX-VM is not always easy, so if you have the ability to set the VM time to UTC on the host computer, it seems to be a good solution.

The default password is opensource! of CORTX-VM. If you ever used other keyboard than English, you might imagine the position of the exclamation sign is not always that obvious. Therefore it is worth changing the password right at the beginning. If you are afraid of mistyping the new password you can check it in a new terminal before you log out from the actual session.

The name of the folder with ifcfg files is network-scripts instead of network_scripts. It worth setting static IP addresses for all network cards to ensure connection with the VM.

By our experience bootstrap.sh will never run till the end from remote.terminal.

It’s important to run some commands anytime the VM is started. It is worth making a start.sh file with that needed 3 commands.

Accomplishments that we're proud of 😎

  • Elasticsearch, PyTorch and TensorFlow integration at the same time

  • The score of our code is 10.0 / 10.0 by Pylint

  • We installed and used CORTX-VM

  • We made a how to install CORTX-VM documentation.

  • The code is well documented with docstrings.

What we learned 📖

  • We learned how to install CORTX-VM.

  • We learned about data schematics and ideas behind Elasticsearch.

What's next for cortx_dof? ⏱

DoF is a continuously developed hackathon winner project, but we would like to build new features independently from the original DoF functionality. For example, we plan to cover more AI/ML frameworks, not just PyTorch and TensorFlow. We think cortx_dof has great possibilities.

Built With

Share this project:

Updates