Skip to content

Usepbscluster#1

Closed
jedwards4b wants to merge 7 commits intomrocklin:pbs-clusterfrom
jedwards4b:usepbscluster
Closed

Usepbscluster#1
jedwards4b wants to merge 7 commits intomrocklin:pbs-clusterfrom
jedwards4b:usepbscluster

Conversation

@jedwards4b
Copy link
Copy Markdown

Hi Matt, Wonder if you could take a look at this and help me debug. The first question is - why is pbs.py not recognizing my logger? The second is that it doesn't seem to be submitting jobs to the queue, any idea why?

from dask.distributed import Client
from pangeo import PBSCluster

global logger
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not need to annotate this as global. It's already top-level scope.

formatter = logging.Formatter(' - '.join(
["%(asctime)s", "%(name)s", "%(levelname)s", "%(message)s"]))
ch.setFormatter(formatter)
logger = logging.getLogger(__file__)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically use __name__ here rather than __file__ much as you did above. Perhaps this is your problem?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initializing logger twice, but I removed this one and it still doesn't work.

@mrocklin
Copy link
Copy Markdown
Owner

The second is that it doesn't seem to be submitting jobs to the queue, any idea why?

Nothing strikes me at first glance. I'll admit that I haven't taken a very deep look at this though.

@jedwards4b
Copy link
Copy Markdown
Author

I found the problem. _exit is being called which in turn calls qdel and cancels the job.
If I comment out the exit call, I get a worker job and I can connect to a jupyter notebook but I
don't seem to get the dask dashboard. It seems like I just want this script to run and set things up and then exit, is your idea that this would persist as long as the dask job is running?

@mrocklin
Copy link
Copy Markdown
Owner

The PBSCluster object was intended for interactive use in the notebook. It may not be appropriate and was not intended for setting up long running clusters.

@jedwards4b
Copy link
Copy Markdown
Author

I still think that our objectives are similar. I don't need the session to last any longer than the notebook does. I have a script that sets up a dask scheduler and worker(s) on cheyenne and then connects a notebook to that, PBSCluster has some functionality that could make that easier.
Are you saying that PBSCluster would allow you to start dask scheduler and workers from the notebook session directly? I don't see a lot of difference except order of operations, and perhaps the ability to add or delete workers. Maybe it would help if you provided an example notebook?

@mrocklin
Copy link
Copy Markdown
Owner

I've answered questions on pangeo-data#56

I'm going to close this for now. For future comments and debugging help I recommend opening a PR on the pangeo-data fork so that others are aware and can engage in conversation.

@mrocklin mrocklin closed this Dec 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants