Skip to content

Unskip all but test_redeploy_single_replica in test_deploy.py#21391

Merged
simon-mo merged 16 commits intoray-project:masterfrom
czgdp1807:deploy_601
Feb 9, 2022
Merged

Unskip all but test_redeploy_single_replica in test_deploy.py#21391
simon-mo merged 16 commits intoray-project:masterfrom
czgdp1807:deploy_601

Conversation

@czgdp1807
Copy link
Copy Markdown
Contributor

@czgdp1807 czgdp1807 commented Jan 5, 2022

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@czgdp1807 czgdp1807 added windows testing topics about testing labels Jan 5, 2022
Comment on lines +616 to +622
rets = ret.split("|")
if len(rets) == 2:
return rets[0], rets[1]
elif len(rets) == 1:
return rets[0], "<NULL>"
else:
return "<NULL>", "<NULL>"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An IndexError was being raised in one of the processes executing this task. I added this to handle the corner cases. I can create a utility function to do this and replace return ret.split("|")[0], ret.split("|")[1] with a call to that function. Let me know. cc: @pcmoritz @simon-mo

Copy link
Copy Markdown
Contributor

@shrekris-anyscale shrekris-anyscale Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to remember what value ret was? Was the IndexError caused because ret.split("|")[1] failed? I want to make sure that the corner case wasn't actually an issue with the test's correctness.

Copy link
Copy Markdown
Contributor Author

@czgdp1807 czgdp1807 Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure that the corner case wasn't actually an issue with the test's correctness.

Yes. I will re-run the tests tomorrow (without this change) and post the logs here.

Copy link
Copy Markdown
Contributor Author

@czgdp1807 czgdp1807 Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  File "python\ray\_raylet.pyx", line 640, in ray._raylet.execute_task
    outputs = function_executor(*args, **kwargs)
  File "C:\Users\gagan\ray_project\ray\python\ray\serve\tests\test_deploy.py", line 675, in call
    return ret.split("|")[0], ret.split("|")[1]
IndexError: list index out of range
 pid=2196) 2022-01-06 08:14:44,896      INFO deployment_state.py:939 -- Removing 4 replicas from deployment 'test'. component=serve deployment=test
 pid=2588) ret:  Path '/test' not found. Please ping http://.../-/routes for route table.
 pid=7964) ret:  2|564
 pid=3992) ret:  Path '/test' not found. Please ping http://.../-/routes for route table.
 pid=4020) ret:  Path '/test' not found. Please ping http://.../-/routes for route table.
 pid=11184) ret:  2|7256
 pid=10512) ret:  Path '/test' not found. Please ping http://.../-/routes for route table.
 pid=4988) ret:  Path '/test' not found. Please ping http://.../-/routes for route table.
 pid=7044) ret:  2|7076
 pid=6088) ret:  2|9404
 pid=7968) ret:  2|9404
 ray\serve\tests\test_deploy.py::test_redeploy_scale_up[False]

I think Path '/test' not found. Please ping http://.../-/routes for route table. means that deployment didn't complete when requests.get got executed and accessed "http://localhost:8000/test"? I think I should also return timestamps for debugging purposes. That might help in figuring out when the v1 and v2 objects were deployed and when actually the requests.get accessed "http://localhost:8000/test" the location. Though we also need to see if Path '/test' not found. Please ping http://.../-/routes for route table. is anyways affecting the test.

P.S. Result of another race condition may be?

@czgdp1807 czgdp1807 changed the title Unskip test_redeploy_scale_down Unskip all but test_redeploy_single_replica in test_deploy.py Jan 5, 2022
responses = defaultdict(set)
start = time.time()
while time.time() - start < 30:
while time.time() - start < 200:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this change Windows specific. Let me know if the idea behind this bump is acceptable. Windows is slow so may be not much can be done about this other than increasing time limits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the bump is OK. Can we make it Windows specific?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will do that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

}).text

return ret.split("|")[0], ret.split("|")[1]
rets = {key: value for key, value in enumerate(ret.split("|"))}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we need to use a list comprehension and enumerate here instead of directly using split? Is it a Windows-related change?

Copy link
Copy Markdown
Contributor Author

@czgdp1807 czgdp1807 Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. - #21391 (comment)

I have explained the reason for this change in the above comment. I used dict for a two line change. I was using if-else if checks earlier.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I added a question to that thread. I want to make sure the corner case doesn't affect the test's correctness.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responses = defaultdict(set)
start = time.time()
while time.time() - start < 30:
while time.time() - start < 200:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the bump is OK. Can we make it Windows specific?

responses = defaultdict(set)
start = time.time()
while time.time() - start < 30:
while time.time() - start < 100:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this Windows-specific also?

responses = defaultdict(set)
start = time.time()
while time.time() - start < 30:
while time.time() - start < 200:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this Windows-specific also?

responses = defaultdict(set)
start = time.time()
while time.time() - start < 30:
while time.time() - start < 100:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this Windows specific?

val, pid = ray.get(ref)
responses[val].add(pid)
if val != "<NULL>" and pid != "<NULL>":
responses[val].add(pid)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this check so that behaviour before this change is retained without throwing into IndexError.

@czgdp1807
Copy link
Copy Markdown
Contributor Author

By keeping the timeout limit 500 the tests pass everywhere in test_deploy.py on Windows.

@simon-mo
Copy link
Copy Markdown
Contributor

simon-mo commented Jan 7, 2022

The timeout sounds good. However I'm worry about the readability of the change.

The not found error can be programmatically detected via requests.get().status_code == 404. In this case, you can just return (None, None) instead of special value here. And in the accumulation of responses[val].add(pid) you can just skip it.

Comment on lines +396 to +406
ret_candidate = ret.text.split("|")

# Set return value to (None, None)
# if status_code is 404 or if there are
# not enough values to return
if ret.status_code == 404 or len(ret_candidate) != 2:
ret = (None, None)
else:
ret = ret_candidate

return ret[0], ret[1]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this approach looks good. I will apply this to other test cases as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me. cc @simon-mo

Copy link
Copy Markdown
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still <NULL> in different places. It should be changed
from

rets = {key: value for key, value in enumerate(ret.split("|"))}
        return rets.get(0, "<NULL>"), rets.get(1, "<NULL>")

to

if requests.get(f"http://localhost:8000/{name}").status_code != 200:
    return None, None

@simon-mo simon-mo added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Jan 14, 2022
@simon-mo
Copy link
Copy Markdown
Contributor

Please ping when this is ready to merge! Thank you

@czgdp1807
Copy link
Copy Markdown
Contributor Author

Windows and lint test pass on my latest commit.

@simon-mo simon-mo removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Jan 14, 2022
@czgdp1807
Copy link
Copy Markdown
Contributor Author

@simon-mo Anything else to be done here? Lint tests are failing across all my (and I think other PRs too). It is due to some SphinxError.

@bveeramani
Copy link
Copy Markdown
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

Copy link
Copy Markdown
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simon-mo
Copy link
Copy Markdown
Contributor

simon-mo commented Feb 1, 2022

@czgdp1807 Would you mind perform the code style change and merge master? The diff looks good to me and we just need to get it merged.

@czgdp1807 czgdp1807 requested a review from simon-mo February 2, 2022 10:09
@simon-mo simon-mo merged commit 000c56f into ray-project:master Feb 9, 2022
architkulkarni added a commit that referenced this pull request Feb 10, 2022
simon-mo pushed a commit that referenced this pull request Feb 10, 2022
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Mar 8, 2022
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing topics about testing windows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants