docs: add Dockerfile and .yml for building conda env#528
Conversation
Pull Request Test Coverage Report for Build 7277606585
💛 - Coveralls |
dsmits
left a comment
There was a problem hiding this comment.
Very nice, couple of remarks
Dockerfile
Outdated
| WORKDIR /home/deeprank2 | ||
|
|
||
| # Define default command | ||
| CMD ["bash"] |
There was a problem hiding this comment.
| CMD ["bash"] | |
| CMD ["/opt/conda/envs/deeprank2/bin/jupyter", "notebook"] |
There was a problem hiding this comment.
This wasn't working (no jupyter server was started)
I used the following instead:
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--NotebookApp.token=''","--NotebookApp.password=''", "--allow-root"]
Without disabling the password, the browser was asking for the token (and the token from the terminal wasn't working).
Anyway, I'd avoid using this if it's unsafe and there are no safe ways to disable the token request. The "bash" version seems easier to me from a user perspective in that case. @dsmits
There was a problem hiding this comment.
I will merge soon the PR, but will leave this conversation unsolved so you can comment when you come back :) @dsmits
DaniBodor
left a comment
There was a problem hiding this comment.
I'm not done reviewing, but will already release these comments, as we will need some changes already. I will continue from here with a new review
DaniBodor
left a comment
There was a problem hiding this comment.
I'm getting an error in the trainer notebook if I run it now (in the cell where GNN is tested). I get the same error if I run it locally from this branch and the latest env, but not from my working repo/env.
UnboundLocalError Traceback (most recent call last)
Cell In[9], line 7
4 earlystop_maxgap = 0.1
5 min_epoch = 10
----> 7 trainer.train(
8 nepoch = epochs,
9 batch_size = batch_size,
10 earlystop_patience = earlystop_patience,
11 earlystop_maxgap = earlystop_maxgap,
12 min_epoch = min_epoch,
13 validate = True,
14 filename = os.path.join(output_path, f"gnn_{task}", "model.pth.tar"))
16 epoch = trainer.epoch_saved_model
17 print(f"Model saved at epoch {epoch}")File /opt/conda/envs/deeprank2/lib/python3.10/site-packages/deeprank2/trainer.py:640, in Trainer.train(self, nepoch, batch_size, shuffle, earlystop_patience, earlystop_maxgap, min_epoch, validate, num_workers, best_model, filename)
638 # Now that the training loop is over, save the model
639 if filename:
--> 640 torch.save(checkpoint_model, filename)
641 self.opt_loaded_state_dict = checkpoint_model["optimizer_state"]
642 self.model_load_state_dict = checkpoint_model["model_state"]UnboundLocalError: local variable 'checkpoint_model' referenced before assignment
The issue is that if the loss is nan (such as in this case using very few datapoints), |
The
Dockerfileproposed here downloads the executable file from dssp repo, and installs it in the container context. It also uses a yml file (together with a txt file) for installing conda and pip dependencies, respectively.@DaniBodor for reviewing this PR you can follow the new docker instructions and the new installation instructions.
I opened a separate issue for creating an action for testing the Dockerfile #529.
When this PR is going to be merged, remember to update the JOSS' reviewers (openjournals/joss-reviews#5983)