Unskip all but test_redeploy_single_replica in test_deploy.py#21391
Unskip all but test_redeploy_single_replica in test_deploy.py#21391simon-mo merged 16 commits intoray-project:masterfrom
test_redeploy_single_replica in test_deploy.py#21391Conversation
| rets = ret.split("|") | ||
| if len(rets) == 2: | ||
| return rets[0], rets[1] | ||
| elif len(rets) == 1: | ||
| return rets[0], "<NULL>" | ||
| else: | ||
| return "<NULL>", "<NULL>" |
There was a problem hiding this comment.
There was a problem hiding this comment.
Do you happen to remember what value ret was? Was the IndexError caused because ret.split("|")[1] failed? I want to make sure that the corner case wasn't actually an issue with the test's correctness.
There was a problem hiding this comment.
I want to make sure that the corner case wasn't actually an issue with the test's correctness.
Yes. I will re-run the tests tomorrow (without this change) and post the logs here.
There was a problem hiding this comment.
File "python\ray\_raylet.pyx", line 640, in ray._raylet.execute_task
outputs = function_executor(*args, **kwargs)
File "C:\Users\gagan\ray_project\ray\python\ray\serve\tests\test_deploy.py", line 675, in call
return ret.split("|")[0], ret.split("|")[1]
IndexError: list index out of range
pid=2196) 2022-01-06 08:14:44,896 INFO deployment_state.py:939 -- Removing 4 replicas from deployment 'test'. component=serve deployment=test
pid=2588) ret: Path '/test' not found. Please ping http://.../-/routes for route table.
pid=7964) ret: 2|564
pid=3992) ret: Path '/test' not found. Please ping http://.../-/routes for route table.
pid=4020) ret: Path '/test' not found. Please ping http://.../-/routes for route table.
pid=11184) ret: 2|7256
pid=10512) ret: Path '/test' not found. Please ping http://.../-/routes for route table.
pid=4988) ret: Path '/test' not found. Please ping http://.../-/routes for route table.
pid=7044) ret: 2|7076
pid=6088) ret: 2|9404
pid=7968) ret: 2|9404
ray\serve\tests\test_deploy.py::test_redeploy_scale_up[False]
I think Path '/test' not found. Please ping http://.../-/routes for route table. means that deployment didn't complete when requests.get got executed and accessed "http://localhost:8000/test"? I think I should also return timestamps for debugging purposes. That might help in figuring out when the v1 and v2 objects were deployed and when actually the requests.get accessed "http://localhost:8000/test" the location. Though we also need to see if Path '/test' not found. Please ping http://.../-/routes for route table. is anyways affecting the test.
P.S. Result of another race condition may be?
test_redeploy_scale_downtest_redeploy_single_replica in test_deploy.py
| responses = defaultdict(set) | ||
| start = time.time() | ||
| while time.time() - start < 30: | ||
| while time.time() - start < 200: |
There was a problem hiding this comment.
I can make this change Windows specific. Let me know if the idea behind this bump is acceptable. Windows is slow so may be not much can be done about this other than increasing time limits.
There was a problem hiding this comment.
Yep, the bump is OK. Can we make it Windows specific?
There was a problem hiding this comment.
Sure. I will do that.
| }).text | ||
|
|
||
| return ret.split("|")[0], ret.split("|")[1] | ||
| rets = {key: value for key, value in enumerate(ret.split("|"))} |
There was a problem hiding this comment.
What's the reason we need to use a list comprehension and enumerate here instead of directly using split? Is it a Windows-related change?
There was a problem hiding this comment.
N.B. - #21391 (comment)
I have explained the reason for this change in the above comment. I used dict for a two line change. I was using if-else if checks earlier.
There was a problem hiding this comment.
Got it, I added a question to that thread. I want to make sure the corner case doesn't affect the test's correctness.
| responses = defaultdict(set) | ||
| start = time.time() | ||
| while time.time() - start < 30: | ||
| while time.time() - start < 200: |
There was a problem hiding this comment.
Yep, the bump is OK. Can we make it Windows specific?
| responses = defaultdict(set) | ||
| start = time.time() | ||
| while time.time() - start < 30: | ||
| while time.time() - start < 100: |
There was a problem hiding this comment.
Could we make this Windows-specific also?
| responses = defaultdict(set) | ||
| start = time.time() | ||
| while time.time() - start < 30: | ||
| while time.time() - start < 200: |
There was a problem hiding this comment.
Could we make this Windows-specific also?
| responses = defaultdict(set) | ||
| start = time.time() | ||
| while time.time() - start < 30: | ||
| while time.time() - start < 100: |
There was a problem hiding this comment.
Could we make this Windows specific?
| val, pid = ray.get(ref) | ||
| responses[val].add(pid) | ||
| if val != "<NULL>" and pid != "<NULL>": | ||
| responses[val].add(pid) |
There was a problem hiding this comment.
I have added this check so that behaviour before this change is retained without throwing into IndexError.
|
By keeping the timeout limit 500 the tests pass everywhere in |
|
The timeout sounds good. However I'm worry about the readability of the change. The not found error can be programmatically detected via |
| ret_candidate = ret.text.split("|") | ||
|
|
||
| # Set return value to (None, None) | ||
| # if status_code is 404 or if there are | ||
| # not enough values to return | ||
| if ret.status_code == 404 or len(ret_candidate) != 2: | ||
| ret = (None, None) | ||
| else: | ||
| ret = ret_candidate | ||
|
|
||
| return ret[0], ret[1] |
There was a problem hiding this comment.
Let me know if this approach looks good. I will apply this to other test cases as well.
simon-mo
left a comment
There was a problem hiding this comment.
There's still <NULL> in different places. It should be changed
from
rets = {key: value for key, value in enumerate(ret.split("|"))}
return rets.get(0, "<NULL>"), rets.get(1, "<NULL>")
to
if requests.get(f"http://localhost:8000/{name}").status_code != 200:
return None, None
|
Please ping when this is ready to merge! Thank you |
|
Windows and lint test pass on my latest commit. |
|
@simon-mo Anything else to be done here? Lint tests are failing across all my (and I think other PRs too). It is due to some SphinxError. |
|
|
@czgdp1807 Would you mind perform the code style change and merge master? The diff looks good to me and we just need to get it merged. |
…a` in `test_deploy.py` (ray-project#21391)" (ray-project#22299) This reverts commit 000c56f.
…a` in `test_deploy.py` (ray-project#21391)" (ray-project#22299) This reverts commit 000c56f.
Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.