Sets Run to terminal state if it has been deleted from Databricks first#158
Conversation
| return err | ||
| // Databricks API 2.0/jobs/runs/get-output returns error 500 if the Run has already been | ||
| // deleted from Databricks. Set k8s Run to a terminal state | ||
| if strings.Contains(err.Error(), "Response from server (500)") { |
There was a problem hiding this comment.
Do you think this error could happen even if it's not deleted? If it does, wouldn't we be setting a non deleted run to be deleted on the cluster?
There was a problem hiding this comment.
This is a good point, and unfortunately the status code and response are pretty unhelpful here!
I just tested https://northeurope.azuredatabricks.net/api/2.0/jobs/runs/get?run_id=123 and that gives a 400 response with "Run 123 does not exist" in the JSON message
Whereas https://northeurope.azuredatabricks.net/api/2.0/jobs/runs/get-output?run_id=123 gives a 500 respones with an empty message!
There was a problem hiding this comment.
I agree with @EliiseS that we don't want to stop reconciling for runs that still exist when the API is having issues and returning status code 500 responses.
My initial thought was if we get a 500 repsonse to then call /jobs/runs/get and test the error message. However this adds an extra call in a reconciliation loop which could have a negative impact if the API is having issues and returning genuine 500 responses
There was a problem hiding this comment.
@EliiseS this change sets the Run in k8s to a final state, but doesn't set if for deletion in k8s. You would still need to delete it afterwards (which will work fine even without this fix). Still, as @stuartleeks mentioned, we could use a call to /jobs/runs/get to determine the "type" of error 500 we are getting.
There was a problem hiding this comment.
ok, I improved the handling of the error to tell if the 500 happens because the Run was deleted in Databricks or the API is failing for some other reason
There was a problem hiding this comment.
We should probably run a load test on this before this is merged
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |





Fix for issue #156: Reconciler error when refreshing a Run that has been deleted in Databricks.