Intoduce torch.testing._internal.opinfos#59871
Intoduce torch.testing._internal.opinfos#59871antocuni wants to merge 13 commits intopytorch:masterfrom
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 07eabdc (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
mruberry
left a comment
There was a problem hiding this comment.
What do you think, @kshitij12345, @pmeier?
common_methods_invocations.py is currently 7500 lines, and we like to keep files below 10k lines. So I think we have time before we have to pursue a split. If/when we do pursue a split I agree this seems like the best approach.
Should we do this now, @antocuni? What do you see as the pros/cons? My attempt at a list:
PROS
- nicer file names (because common_methods_invocations.py is a terrible name)
- can put sample inputs next to ops, at least for some operators
- moves classes like UnaryUfuncInfo into their own files, so files have clearer purposes
CONS
- can no longer search all opinfos in one file
- sample inputs next to ops or an op after op pattern both seem valid, and some classes of operator (unary elementwise, binary elementwise, reductions) will use the same sample inputs for the entire class, so at least a few will use an op after op pattern; this makes PRO #2 above a little dubious
What other PROS/CONS should we consider, @antocuni?
just my 2 cents here, but generally speaking 10k lines seems a LOT to me, also considering that Python source code is typically more dense than other languages. It makes it impossible to read the file from top to bottom and the only reasonable way to navigate around it is to search for a known string. Even 7.5k is already too much, at least for me. Also, it seems that the
+1 for this :)
this can be mitigated by using a sane naming convention for the files, so that one can easily
yes, but there are others cases in which there are ~6000 lines between the definition of the
Extra PROS
Extra CONS
|
+1
I agree with @antocuni that this structure gives us more flexibility going forward. Also from my personal experience, just looking at Thanks @antocuni! |
|
Very much in favor of having multiple smaller files. What about a structure like this: We could group every operator in the |
+1 for this.
Defining an from . import unary, binary, linalg, misc # ...
op_db = OpInfoDB.merge('all', unary.op_db, binary.op_db, linalg.op_db, misc.op_db, ...)No side effects, no need to do imports at the end, everything straightforward. @mruberry what do you think? |
|
This sounds good. What about keeping our lives simpler for now and still just using a list? Lists in Python can be appended with each new OpInfo instead of declaring them in it. The central repo can just import the lists and extend or add them or whatever to create its own list. For files, we could start with a new folder in _internal, like proposed above, called "opinfos". The files in it can mirror different test files and classes. We could start with init.py, core.py or opinfo.py, unary_elementwise.py, and foreach.py for example. These could define the lists "all_opinfos" (replacing the name "op_db"), define the OpInfo-related classes, a "unary_elementwise_opinfos" list, and the special "foreach_opinfos" list, which contains OpInfos that don't appear in "all_opinfos". That seems simple and flexible while achieving our organizational goals? It also lets us create a more specialized data structure in the future if we want to without locking us into anything today. We could add the additional files like binary_elementwise.py and linalg.py etc. in the same PR or a follow-up. |
9f96068 to
42d8d72
Compare
@mruberry I changed the name of this PR to reflect the this new plan and implemented it.
This PR is going to be a conflict nightmare since I propose to continue the refactoring in follow up PRs, but assuming that tests are green, hopefully this one should land as quickly as possible, because the more we wait the more likely it is to become out of date. |
…s used also elsewhere
Cool; let me think if we can sneak this in this weekend. |
|
fyi @antocuni I haven't forgotten about this but need to find the right time to make the update, thanks for being patient |
no problem, thank you. |
|
Right now Gentle ping @mruberry |
Yeah it's become too big |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
We should not let this go to waste. Working on the file is a pain. @antocuni is no longer with Quansight, but I can take this up to at least fix the merge conflicts or do additional changes if requested. |
@mruberry this PR is still not complete but I'd like to get some feedback before continuing, to avoid doing unnecessary work.
I started to experiment with what we discussed in #55159, and in particular:
op_dbis not an instance ofOpInfoDBinstead of being a simple list. This allows toregister()OpInfos from multiple places, instead of having to list them explicitly into a big long listlinalg.detandlinalg.condin their own file, to have a feeling on how the new system looks like. A good pro is that now we can implement thesample_inputs_*functions in the proximity of theOpInfoitself, instead of having them many lines above.Looking forward to get some feedback!