Skip to content

[external codegen] better yaml error messaging, added explicit error message tests#56597

Closed
bdhirsh wants to merge 12 commits intogh/bdhirsh/106/basefrom
gh/bdhirsh/106/head
Closed

[external codegen] better yaml error messaging, added explicit error message tests#56597
bdhirsh wants to merge 12 commits intogh/bdhirsh/106/basefrom
gh/bdhirsh/106/head

Conversation

@bdhirsh
Copy link
Copy Markdown
Collaborator

@bdhirsh bdhirsh commented Apr 21, 2021

3 small changes, all centered around error messaging.

  1. Improved error messages when gen_backend_stubs.py receives invalid yaml

  2. Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

  3. I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:

  • The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
  • Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!

Stack from ghstack:

Differential Revision: D28474363

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Apr 21, 2021

💊 CI failures summary and remediations

As of commit 52dc0c5 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@bdhirsh bdhirsh requested a review from ezyang April 21, 2021 17:52
cpp_namespace: torch_xla
supported:
- abs
''': 'AssertionError: You must provide a value for "backend"',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use expect tests! Future test writers will thank you! You can see a pile of examples of expect tests in test/test_dispatch.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you're not saving that much boilerplate by having a dictionary of YAML with errors. Better to give each a separate test so that you can run tests individually if they fail.

Copy link
Copy Markdown
Collaborator Author

@bdhirsh bdhirsh Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that putting each test yaml in its own test will be way clearer.

Haha so after struggling with expect tests for a bit I also realized that with my current setup it'll fail CI, because the full output includes a stack trace containing paths specific to my filesystem. That doesn't seem sustainable, because even if all CI machines run the tests from the same directory, that'll be different from our individual filesystems (/scratch/hirsheybar/.... in my case).

Rather than try to parse out and remove the stack trace, I think the whole thing will be less fragile if I just refactor gen_backend_stubs.py to give it a clear entry point function, and call that function directly from the tests. Something like:

def main():
   # gather up all command line args
   ....
   run(options.input_yaml, options.output_path, ...)

def run(input_yaml: str, output_path: str, ....) -> None
   # real work happens here. This is also the function that the tests call directly.

Downside: it's kind of a less faithful test since it's not "directly" running gen_backend_stubs.py and makes minor assumptions about it's internals, but I think that'll make these tests easier to work with. Does that all sound reasonable to you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, and will also make it easier to attach pdb to debug tests (rather than chasing a command line argument). However, parsing is not completely out of the question either; we do it for cpp stack traces here, for example:

    def assertMultiLineEqualMaybeCppStack(self, expect, actual, *args, **kwargs):
        self.assertGreaterEqual(len(actual), len(expect), *args, **kwargs)
        if hasattr(self, "assertMultiLineEqual"):
            self.assertMultiLineEqual(expect, actual[:len(expect)], *args, **kwargs)
        else:
            self.assertEqual(expect, actual[:len(expect)], *args, **kwargs)
        if len(actual) > len(expect):
            cpp_stacktrace_header = "\nException raised from"
            end_header = len(expect) + len(cpp_stacktrace_header)
            self.assertEqual(actual[len(expect): end_header], cpp_stacktrace_header)

In general, the price you have to pay for getting to use expect tests is you have to ensure, one way or another, that the output you're expecting on is deterministic. This has usually be worth it, in my experience.

with tempfile.NamedTemporaryFile(mode='w') as fp:
fp.write(yaml_str)
fp.flush()
command = ['python',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this, use sys.executable instead

fp.flush()
command = ['python',
gen_backend_stubs_path,
'--output_dir=""',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... what does this mean? Dump in CWD?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got this from the docs- from some examples I assumed that this was the canonical way to pass arguments to a python script. Does that look wrong?

If you agree with the change I suggested in the above comment though, then this doesn't matter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, you're passing empty string as output dir. So does that mean the script will dump the outputs in the current working directory?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is a user API thing I should have been more explicit about.

The dry-run flag seems like a reasonable addition to gen_backend_stubs, because (a) that allows these tests to run without writing to the file system for no reason, and (b) that makes it more similar to gen.py.

The dry-run flag effectively means that we ignore the output directory though. So as it's set up now, output_dir="" doesn't do anything because I've specified the dry_run flag. Another option would be to kill the dry_run flag, and just make the output_dir flag optional, where not specifying it does a dry run. I was hesitant about that because I figured that if you're an external backend using this script, you probably don't care about dry-run, and it would be better to have explicit error messages if you pass in wrong flags.

'--dry_run=True',
f'--source_yaml={fp.name}']

subprocess_output = subprocess.run(command, encoding='utf-8', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about error code?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, since all of these tests are explicitly for error checking, I should assert the error code.

But if I switch to not using subprocesses, I can just assert that an exception was raised.

…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
@bdhirsh
Copy link
Copy Markdown
Collaborator Author

bdhirsh commented Apr 22, 2021

@ezyang copying here since the comment is outdated now:

What I mean is, you're passing empty string as output dir. So does that mean the script will dump the outputs in the current working directory?

yeah, this is a user API thing I should have been more explicit about.

The dry-run flag seems like a reasonable addition to gen_backend_stubs, because (a) that allows these tests to run without writing to the file system for no reason, and (b) that makes it more similar to gen.py.

The dry-run flag effectively means that we ignore the output directory though. So as it's set up now, output_dir="" doesn't do anything because I've specified the dry_run flag. Another option would be to kill the dry_run flag, and just make the output_dir flag optional, where not specifying it does a dry run. I was hesitant about that because I figured that if you're an external backend using this script, you probably don't care about dry-run, and it would be better to have explicit error messages if you pass in wrong flags.

I should probably just make output_dir optional in the case when dry_run is specified

supported:
- abs_BAD'''
output_error = self.get_errors_from_gen_backend_stubs(yaml_str)
self.assertExpectedInline(output_error, '''Found an invalid operator name: abs_BAD. For the list of valid operator names, see https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml''')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# noqa: E501, I think. (I used to just # noqa these but it looks like @samestep doesn't like that in #56272). Actually, I noticed people have been manually adding line breaks to the expect tests in test/test_dispatch.py which is bad, because it means the automatic update mechanism will no longer work.

Copy link
Copy Markdown
Contributor

@samestep samestep Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For long lines I think it's # noqa: B950 (at least in the PyTorch codebase; maybe it's E501 elsewhere)

if op_name not in native_functions_map:
raise AssertionError(f"Found an invalid operator name: {op_name}")
assert op_name in native_functions_map, f"Found an invalid operator name: {op_name}. \
For the list of valid operator names, see https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably wouldn't link it here, because master might not be the same as the version release

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bleh yeah that's fair

supported:
'''
output_error = self.get_errors_from_gen_backend_stubs(yaml_str)
self.assertExpectedInline(output_error, '''expected "supported" to be a list, but got: None (of type <class 'NoneType'>)''')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this should work lol

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's fair (I'll just get that case working before I land). I assume the use case you're picturing a new backend that wants to set up all of the boilerplate required for integration with us, without being required to write any kernels first?

Separately from that is the singleton case that I have a test for: supported: __. I tentatively think it's fine to leave this as an error, since (a) it doesn't seem like a useful thing to support (more "flexible" yaml?), and (b) If you do something dumb like supported: XLA, you'll get a worse error.

…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
bdhirsh added 2 commits April 23, 2021 16:24
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request Apr 30, 2021
…message tests

ghstack-source-id: 499d664
Pull Request resolved: pytorch#56597
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request May 3, 2021
…message tests

ghstack-source-id: 9f085e9
Pull Request resolved: pytorch#56597
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
bdhirsh added 2 commits May 12, 2021 07:05
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
…icit error message tests"

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!





[ghstack-poisoned]
@bdhirsh
Copy link
Copy Markdown
Collaborator Author

bdhirsh commented May 17, 2021

@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@bdhirsh merged this pull request in 3d9f10f.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
…message tests (pytorch#56597)

Summary:
Pull Request resolved: pytorch#56597

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28474363

Pulled By: bdhirsh

fbshipit-source-id: 8b5ec804b388dbbc0350a20c053da657fad0474f
bdhirsh added a commit that referenced this pull request May 20, 2021
…message tests (#56597)

Summary:
Pull Request resolved: #56597

3 small changes, all centered around error messaging.

1) Improved error messages when `gen_backend_stubs.py` receives invalid yaml

2) Added error message tests. I wasn't sure if there was a canonical way to do this, so I just wrote a test that takes in a list of (yaml input, expected error message) pairs and runs the codegen pipeline on each of them.

3) I also removed the LineLoader from the yaml parsing bit that reads in the external backend yaml file. Two reasons that I took it out:
 - The main reason we use it with native_functions.yaml is to easily pinpoint problems with new ops as they're added, that the codegen can pick up. 99% of these problems have to do with schema, which is irrelevant to the external yaml since it pulls the schema from native_functions
 - Not all operators have to appear in the external yaml. We could do something like "line: -1", but that's kind of weird.

If you think the line numbers would actually be of more use than I'm thinking of in the external yaml though, let me know!

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28474363

Pulled By: bdhirsh

fbshipit-source-id: 8b5ec804b388dbbc0350a20c053da657fad0474f
This was referenced May 20, 2021
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/106/head branch May 21, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants