Skip to content

Conversation

@allencloud
Copy link
Contributor

@allencloud allencloud commented Mar 27, 2017

Signed-off-by: allencloud allen.sun@daocloud.io

Since docker daemon use gRPC to communicate with swarmkit, so when docker daemon accepts http request and transfers to swarmkit side, the at last, swarmkit will return a gRPC code to docker daemon. For the user's request, we definitely need to return a response with an HTTP status code. This PR hopes to determine the http status code from the gRPC code.

this fixes #31909

- What I did

  1. choose rpc code to determine status code
  2. make secret operation use the correct status code. currently I only change the secret side.
  3. add two test cases for issue Secrets API: Name conflict returns 500, should be 409 #31909
  4. update docker-py to the version 2.3.0 in Dockerfiles

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to not propagate this usage.
At the very least we can catch grpc errors in the API package and produce the correct error code from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. Like you mentioned, I tried to move that part into api/server/httputils/errors.go. PTAL
@cpuguy83

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch from a1ecc6f to 946ce45 Compare March 28, 2017 02:12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mouthful.

newHTTPErrorFromGRPC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. 408 is only used when the client doesn't send a complete request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 504 would work here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict is a specific http error. This should be a 400 or some other error. There is already an open issue about the incorrect usage of 409. @justincormack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a specific http error that doesn't map to the same thing as the GRPC error code.

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 2 times, most recently from cd2e853 to d2d607a Compare April 1, 2017 03:41
@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 4 times, most recently from 6263cdc to 6b4de0f Compare April 10, 2017 09:13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/non-exsiting/non-existing/ :joy:

@thaJeztah
Copy link
Member

Looks like some tests need updating

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving this, seems like a better place to handle it, and in line with the changes requested in #32144

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this up to top of the default case and instead pass the error directly rather than checking the error string?

Then we can gate the horrid string checking pattern currently above this with something like:

if statusCode == http.StatusInternalServerError {
    // horrible string parser
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can gate the horrid string checking pattern currently above this with something like:

Sorry, actually I cannot follow this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, move this above this stuff:

			{"not found", http.StatusNotFound},
			{"cannot find", http.StatusNotFound},
			{"no such", http.StatusNotFound},
			{"bad parameter", http.StatusBadRequest},
			{"no command", http.StatusBadRequest},
			{"conflict", http.StatusConflict},
			{"impossible", http.StatusNotAcceptable},
			{"wrong login/password", http.StatusUnauthorized},
			{"unauthorized", http.StatusUnauthorized},
			{"hasn't been activated", http.StatusForbidden},
			{"this node", http.StatusServiceUnavailable},
			{"needs to be unlocked", http.StatusServiceUnavailable},
			{"certificates have expired", http.StatusServiceUnavailable},

And only check that stuff if the parsed statusCode from the grpc check is 500.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not implemented? Wouldn't this be StatusForbidden?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem like the correct code.
This is used when the server is acting as a proxy. In this case something just went really wrong. I think this would just be a 500.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the engine may technically be acting as a proxy to containerd/swarm, I don't think this is something that should propagate up to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you propose instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in failed test:

11:17:28 docker_api_swarm_test.go:342:
11:17:28     c.Assert(status, checker.Equals, http.StatusInternalServerError, check.Commentf("deadline exceeded", string(out)))
11:17:28 ... obtained int = 504
11:17:28 ... expected int = 500
11:17:28 ... deadline exceeded%!(EXTRA string={"message":"rpc error: code = 4 desc = context deadline exceeded"}
11:17:28 )
11:17:28 
11:17:28 [d0f11ccd4b298] exiting daemon

Original code is 500

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be conflict, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 400?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 works too, just not 500.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, technically I think 400 is for the HTTP server and shouldn't be used by us, but we do :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't make up new status code usages here. The description of this status does not seem like anything we'd ever require.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutOfRange should just be mapped to 400 Bad Request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is swarm or containerd returning this ever? If so, is the HTTP code documented in the API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move this to the default case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would ever happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, updated. PTAL

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 7 times, most recently from f29dbe1 to 02b47ad Compare April 13, 2017 07:38
@allencloud
Copy link
Contributor Author

Yes, some tests need updating. @thaJeztah
I think POST /secrets/{id}/update introduces a status code of 400, which is included in PR #32464

And POST /nodes/{id}/update adds a case whose status code is originally 500, now to be 400.

I think I still need to update docs/api/version-history.md, WDYT?

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch from 02b47ad to b0936f9 Compare April 14, 2017 02:11
@allencloud
Copy link
Contributor Author

I am afraid such kind of change may influence docker-py, here is some evidence which docekr-py test failed:

10:32:09 =================================== FAILURES ===================================
10:32:09 _______________________ SwarmTest.test_remove_main_node ________________________
10:32:09 /docker-py/tests/integration/api_swarm_test.py:171: in test_remove_main_node
10:32:09     assert e.value.response.status_code == 500
10:32:09 E   AssertionError: assert 400 == 500
10:32:09 E    +  where 400 = <Response [400]>.status_code
10:32:09 E    +    where <Response [400]> = APIError(HTTPError('400 Client Error: Bad Request',),).response
10:32:09 E    +      where APIError(HTTPError('400 Client Error: Bad Request',),) = <ExceptionInfo APIError tblen=6>.value
10:32:09 ======== 1 failed, 205 passed, 31 skipped, 2 xfailed in 408.65 seconds =========
10:32:09 ---> Making bundle: .integration-daemon-stop (in bundles/17.06.0-dev/test-docker-py)

Any input for this, or what should I do for this? @shin-

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 2 times, most recently from 4101773 to 30a8589 Compare May 23, 2017 02:14
@thaJeztah
Copy link
Member

docker/docker-py#1621 was merged; can you try updating this PR?

@allencloud
Copy link
Contributor Author

Thanks for your feedback, @thaJeztah
Actually this pr has been updated yesterday.

The status code test error in docker-py never happens. While another error in docker-py occurs:

04:31:10 =================================== FAILURES ===================================
04:31:10 ______________________ PruneImagesTest.test_prune_images _______________________
04:31:10 /docker-py/tests/integration/api_image_test.py:306: in test_prune_images
04:31:10     result = self.client.prune_images()
04:31:10 /docker-py/docker/utils/decorators.py:35: in wrapper
04:31:10     return f(self, *args, **kwargs)
04:31:10 /docker-py/docker/api/image.py:300: in prune_images
04:31:10     return self._result(self._post(url, params=params), True)
04:31:10 /docker-py/docker/utils/decorators.py:47: in inner
04:31:10     return f(self, *args, **kwargs)
04:31:10 /docker-py/docker/api/client.py:179: in _post
04:31:10     return self.post(url, **self._set_request_timeout(kwargs))
04:31:10 /usr/lib/python2.7/dist-packages/requests/sessions.py:500: in post
04:31:10     return self.request('POST', url, data=data, json=json, **kwargs)
04:31:10 /usr/lib/python2.7/dist-packages/requests/sessions.py:457: in request
04:31:10     resp = self.send(prep, **send_kwargs)
04:31:10 /usr/lib/python2.7/dist-packages/requests/sessions.py:569: in send
04:31:10     r = adapter.send(request, **kwargs)
04:31:10 /usr/lib/python2.7/dist-packages/requests/adapters.py:407: in send
04:31:10     raise ConnectionError(err, request=request)
04:31:10 E   ConnectionError: ('Connection aborted.', BadStatusLine("''",))
04:31:10 ========= 1 failed, 241 passed, 7 skipped, 3 xfailed in 377.00 seconds =========

I am still searching the error's root cause.

@alfred-landrum
Copy link
Contributor

@allencloud - I have a PR that is also blocked on the test_prune_images failure, which seems to be unrelated to the content of my PR. I haven't been able to reproduce it locally, but will share anything I find.
#33005

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch from 30a8589 to 56fae16 Compare May 27, 2017 05:28
@allencloud
Copy link
Contributor Author

allencloud commented May 27, 2017

Thanks @alfred-landrum
I think it is good for us to see all green. And Is there anything more I could do to meet your change request. @cpuguy83

In addition, I think we still to add more status code change in version_history.md, right?
While I still do not know it will be included in which version. 1.30?
Currently I updated the 1.30 API changes in version_history.md. @thaJeztah

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 2 times, most recently from dffd995 to 3b5d5c4 Compare May 29, 2017 04:31
@allencloud
Copy link
Contributor Author

Any update? ☀️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string check is probably more expensive, and certainly more fragile, than just checking statusCodeFromGRPCError in all cases.

If we move this to the top of the default case, in many cases we won't have to mess around with this string handling.
so...

default:
    statusCode = statusCodeFromGRPCError(err)
    if statusCode == http.StatusInternalServerError {
        // nasty string comparison stuff
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. @cpuguy83 I think what you said is reasonable. Thanks a lot.
While I think CI fails according to accident. Can you re-trigger that? @thaJeztah

@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch 2 times, most recently from 93429fc to 224dec0 Compare June 5, 2017 08:32
Signed-off-by: allencloud <allen.sun@daocloud.io>
@allencloud allencloud force-pushed the choose-rpc-code-to-determine-status-code branch from 224dec0 to f257f77 Compare June 6, 2017 02:09
@allencloud
Copy link
Contributor Author

Does the change I added seem OK for you, @cpuguy83 ?
And feel free to tell me if I could do anything else? ☀️

* `DELETE /secrets/(name)` now returns status code 404 instead of 500 when the secret does not exist.
* `POST /secrets/create` now returns status code 409 instead of 500 when creating an already existing secret.
* `POST /secrets/(name)/update` now returns status code 400 instead of 500 when updating a secret's content which is not the labels.
* `POST /nodes/(name)/update` now returns status code 400 instead of 500 when demoting last node fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we need to update the swagger.yml as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to update swagger.yml.

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 8, 2017

@allencloud perfect!

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 8, 2017

@cpuguy83 LGTY? feel free to merge if yes

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 8, 2017

Just needs the changes requested by @thaJeztah

@allencloud
Copy link
Contributor Author

I think there is no need to update swagger.yml, since in swagger.yml no change will be involved.
For example,
DELETE /secrets/(name) already has code 404 in swagger.yml, but in old version of docker, like docker-17.05, it has no code of 404, but 404 is still in swagger.yml;
POST /secrets/create already has code 409;
POST /secrets/(name)/update already has two code 400 and 500, but one case changing from 500 to 400;
POST /nodes/(name)/update already has two code 400 and 500, but one case changing from 500 to 400;

So there will be no change for swagger.yml.
@thaJeztah @cpuguy83

@thaJeztah
Copy link
Member

So there will be no change for swagger.yml.

Even better, thanks for checking 👍

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, LGTM as well 😅

@thaJeztah thaJeztah merged commit 27ca921 into moby:master Jun 9, 2017
@allencloud allencloud deleted the choose-rpc-code-to-determine-status-code branch June 9, 2017 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Secrets API: Name conflict returns 500, should be 409

8 participants