Skip to content

Intoduce torch.testing._internal.opinfos#59871

Closed
antocuni wants to merge 13 commits intopytorch:masterfrom
antocuni:antocuni/refactor-opdb
Closed

Intoduce torch.testing._internal.opinfos#59871
antocuni wants to merge 13 commits intopytorch:masterfrom
antocuni:antocuni/refactor-opdb

Conversation

@antocuni
Copy link
Contributor

@antocuni antocuni commented Jun 11, 2021

@mruberry this PR is still not complete but I'd like to get some feedback before continuing, to avoid doing unnecessary work.
I started to experiment with what we discussed in #55159, and in particular:

  1. op_db is not an instance of OpInfoDB instead of being a simple list. This allows to register() OpInfos from multiple places, instead of having to list them explicitly into a big long list
  2. I moved linalg.det and linalg.cond in their own file, to have a feeling on how the new system looks like. A good pro is that now we can implement the sample_inputs_* functions in the proximity of the OpInfo itself, instead of having them many lines above.

Looking forward to get some feedback!

@antocuni antocuni added the module: testing Issues related to the torch.testing module (not tests) label Jun 11, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 11, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 07eabdc (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 30 19:10:04 RuntimeError: test_quantization failed!
Jun 30 19:10:02 Generated XML report: test-reports/dist-gloo/test_quantization/TEST-quantization.core.test_quantized_op.TestQuantizedOps-20210630190112.xml
Jun 30 19:10:02 Generated XML report: test-reports/dist-gloo/test_quantization/TEST-quantization.core.test_quantized_tensor.TestQuantizedTensor-20210630190112.xml
Jun 30 19:10:02 Generated XML report: test-reports/dist-gloo/test_quantization/TEST-quantization.core.test_workflow_module.TestRecordHistogramObserver-20210630190112.xml
Jun 30 19:10:02 Generated XML report: test-reports/dist-gloo/test_quantization/TEST-quantization.bc.test_backward_compatibility.TestSerialization-20210630190112.xml
Jun 30 19:10:02 Generated XML report: test-reports/dist-gloo/test_quantization/TEST-quantization.core.test_quantized_module.TestStaticQuantizedModule-20210630190112.xml
Jun 30 19:10:04 Traceback (most recent call last):
Jun 30 19:10:04   File "test/run_test.py", line 1308, in <module>
Jun 30 19:10:04     main()
Jun 30 19:10:04   File "test/run_test.py", line 1287, in main
Jun 30 19:10:04     raise RuntimeError(err_message)
Jun 30 19:10:04 RuntimeError: test_quantization failed!
Jun 30 19:10:04 + cleanup
Jun 30 19:10:04 + retcode=1
Jun 30 19:10:04 + set +x
Jun 30 19:10:04 =================== sccache compilation log ===================
Jun 30 19:10:04 =========== If your build fails, please take a look at the log above for possible reasons ===========
Jun 30 19:10:04 Compile requests                      0
Jun 30 19:10:04 Compile requests executed             0
Jun 30 19:10:04 Cache hits                            0
Jun 30 19:10:04 Cache misses                          0
Jun 30 19:10:04 Cache timeouts                        0

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.
remote: Enumerating objects: 216, done.        
remote: Counting objects:   0% (1/216)        
remote: Counting objects:   1% (3/216)        
remote: Counting objects:   2% (5/216)        
remote: Counting objects:   3% (7/216)        
remote: Counting objects:   4% (9/216)        
remote: Counting objects:   5% (11/216)        
remote: Counting objects:   6% (13/216)        
remote: Counting objects:   7% (16/216)        
remote: Counting objects:   8% (18/216)        
remote: Counting objects:   9% (20/216)        
remote: Counting objects:  10% (22/216)        
remote: Counting objects:  11% (24/216)        
remote: Counting objects:  12% (26/216)        
remote: Counting objects:  13% (29/216)        
remote: Counting objects:  14% (31/216)        
remote: Counting objects:  15% (33/216)        
remote: Counting objects:  16% (35/216)        
remote: Counting objects:  17% (37/216)        
remote: Counting objects:  18% (39/216)        
remote: Counting objects:  19% (42/216)        
remote: Counting objects:  20% (44/216)        
remote: Counting objects:  21% (46/216)        
remote: Counting objects:  22% (48/216)        
remote: Counting objects:  23% (50/216)        
remote: Counting objects:  24% (52/216)        
remote: Counting objects:  25% (54/216)        
remote: Counting objects:  26% (57/216)        
remote: Counting objects:  27% (59/216)        
remote: Counting objects:  28% (61/216)        
remote: Counting objects:  29% (63/216)        
remote: Counting objects:  30% (65/216)        
remote: Counting objects:  31% (67/216)        
remote: Counting objects:  32% (70/216)        
remote: Counting objects:  33% (72/216)        
remote: Counting objects:  34% (74/216)        
remote: Counting objects:  35% (76/216)        
remote: Counting objects:  36% (78/216)        
remote: Counting objects:  37% (80/216)        
remote: Counting objects:  38% (83/216)        
remote: Counting objects:  39% (85/216)        
remote: Counting objects:  40% (87/216)        
remote: Counting objects:  41% (89/216)        
remote: Counting objects:  42% (91/216)        
remote: Counting objects:  43% (93/216)        
remote: Counting objects:  44% (96/216)        
remote: Counting objects:  45% (98/216)        
remote: Counting objects:  46% (100/216)        
remote: Counting objects:  47% (102/216)        
remote: Counting objects:  48% (104/216)        
remote: Counting objects:  49% (106/216)        
remote: Counting objects:  50% (108/216)        
remote: Counting objects:  51% (111/216)        
remote: Counting objects:  52% (113/216)        
remote: Counting objects:  53% (115/216)        
remote: Counting objects:  54% (117/216)        
remote: Counting objects:  55% (119/216)        
remote: Counting objects:  56% (121/216)        
remote: Counting objects:  57% (124/216)        
remote: Counting objects:  58% (126/216)        
remote: Counting objects:  59% (128/216)        
remote: Counting objects:  60% (130/216)        
remote: Counting objects:  61% (132/216)        
remote: Counting objects:  62% (134/216)        
remote: Counting objects:  63% (137/216)        
remote: Counting objects:  64% (139/216)        
remote: Counting objects:  65% (141/216)        
remote: Counting objects:  66% (143/216)        
remote: Counting objects:  67% (145/216)        
remote: Counting objects:  68% (147/216)        
remote: Counting objects:  69% (150/216)        
remote: Counting objects:  70% (152/216)        
remote: Counting objects:  71% (154/216)        
remote: Counting objects:  72% (156/216)        
remote: Counting objects:  73% (158/216)        
remote: Counting objects:  74% (160/216)        
remote: Counting objects:  75% (162/216)        
remote: Counting objects:  76% (165/216)        
remote: Counting objects:  77% (167/216)        
remote: Counting objects:  78% (169/216)        
remote: Counting objects:  79% (171/216)        
remote: Counting objects:  80% (173/216)        
remote: Counting objects:  81% (175/216)        
remote: Counting objects:  82% (178/216)        
remote: Counting objects:  83% (180/216)        
remote: Counting objects:  84% (182/216)        
remote: Counting objects:  85% (184/216)        
remote: Counting objects:  86% (186/216)        
remote: Counting objects:  87% (188/216)        
remote: Counting objects:  88% (191/216)        
remote: Counting objects:  89% (193/216)        
remote: Counting objects:  90% (195/216)        
remote: Counting objects:  91% (197/216)        
remote: Counting objects:  92% (199/216)        
remote: Counting objects:  93% (201/216)        
remote: Counting objects:  94% (204/216)        
remote: Counting objects:  95% (206/216)        
remote: Counting objects:  96% (208/216)        
remote: Counting objects:  97% (210/216)        
remote: Counting objects:  98% (212/216)        
remote: Counting objects:  99% (214/216)        
remote: Counting objects: 100% (216/216)        
remote: Counting objects: 100% (216/216), done.        
remote: Compressing objects:   0% (1/113)        
remote: Compressing objects:   1% (2/113)        
remote: Compressing objects:   2% (3/113)        
remote: Compressing objects:   3% (4/113)        
remote: Compressing objects:   4% (5/113)        
remote: Compressing objects:   5% (6/113)        
remote: Compressing objects:   6% (7/113)        
remote: Compressing objects:   7% (8/113)        
remote: Compressing objects:   8% (10/113)        
remote: Compressing objects:   9% (11/113)        
remote: Compressing objects:  10% (12/113)        
remote: Compressing objects:  11% (13/113)        
remote: Compressing objects:  12% (14/113)        
remote: Compressing objects:  13% (15/113)        
remote: Compressing objects:  14% (16/113)        
remote: Compressing objects:  15% (17/113)        
remote: Compressing objects:  16% (19/113)        
remote: Compressing objects:  17% (20/113)        
remote: Compressing objects:  18% (21/113)        
remote: Compressing objects:  19% (22/113)        
remote: Compressing objects:  20% (23/113)        
remote: Compressing objects:  21% (24/113)        
remote: Compressing objects:  22% (25/113)        
remote: Compressing objects:  23% (26/113)        
remote: Compressing objects:  24% (28/113)        
remote: Compressing objects:  25% (29/113)        
remote: Compressing objects:  26% (30/113)        
remote: Compressing objects:  27% (31/113)        
remote: Compressing objects:  28% (32/113)        
remote: Compressing objects:  29% (33/113)        
remote: Compressing objects:  30% (34/113)        
remote: Compressing objects:  31% (36/113)        
remote: Compressing objects:  32% (37/113)        
remote: Compressing objects:  33% (38/113)        
remote: Compressing objects:  34% (39/113)        
remote: Compressing objects:  35% (40/113)        
remote: Compressing objects:  36% (41/113)        
remote: Compressing objects:  37% (42/113)        
remote: Compressing objects:  38% (43/113)        
remote: Compressing objects:  39% (45/113)        
remote: Compressing objects:  40% (46/113)        
remote: Compressing objects:  41% (47/113)        
remote: Compressing objects:  42% (48/113)        
remote: Compressing objects:  43% (49/113)        
remote: Compressing objects:  44% (50/113)        
remote: Compressing objects:  45% (51/113)        
remote: Compressing objects:  46% (52/113)        
remote: Compressing objects:  47% (54/113)        
remote: Compressing objects:  48% (55/113)        
remote: Compressing objects:  49% (56/113)        
remote: Compressing objects:  50% (57/113)        
remote: Compressing objects:  51% (58/113)        
remote: Compressing objects:  52% (59/113)        
remote: Compressing objects:  53% (60/113)        
remote: Compressing objects:  54% (62/113)        
remote: Compressing objects:  55% (63/113)        
remote: Compressing objects:  56% (64/113)        
remote: Compressing objects:  57% (65/113)        
remote: Compressing objects:  58% (66/113)        
remote: Compressing objects:  59% (67/113)        
remote: Compressing objects:  60% (68/113)        
remote: Compressing objects:  61% (69/113)        
remote: Compressing objects:  62% (71/113)        
remote: Compressing objects:  63% (72/113)        
remote: Compressing objects:  64% (73/113)        
remote: Compressing objects:  65% (74/113)        
remote: Compressing objects:  66% (75/113)        
remote: Compressing objects:  67% (76/113)        
remote: Compressing objects:  68% (77/113)        
remote: Compressing objects:  69% (78/113)        
remote: Compressing objects:  70% (80/113)        
remote: Compressing objects:  71% (81/113)        
remote: Compressing objects:  72% (82/113)        
remote: Compressing objects:  73% (83/113)        
remote: Compressing objects:  74% (84/113)        
remote: Compressing objects:  75% (85/113)        
remote: Compressing objects:  76% (86/113)        
remote: Compressing objects:  77% (88/113)        
remote: Compressing objects:  78% (89/113)        
remote: Compressing objects:  79% (90/113)        
remote: Compressing objects:  80% (91/113)        
remote: Compressing objects:  81% (92/113)        
remote: Compressing objects:  82% (93/113)        
remote: Compressing objects:  83% (94/113)        
remote: Compressing objects:  84% (95/113)        
remote: Compressing objects:  85% (97/113)        
remote: Compressing objects:  86% (98/113)        
remote: Compressing objects:  87% (99/113)        
remote: Compressing objects:  88% (100/113)        
remote: Compressing objects:  89% (101/113)        
remote: Compressing objects:  90% (102/113)        
remote: Compressing objects:  91% (103/113)        
remote: Compressing objects:  92% (104/113)        
remote: Compressing objects:  93% (106/113)        
remote: Compressing objects:  94% (107/113)        
remote: Compressing objects:  95% (108/113)        
remote: Compressing objects:  96% (109/113)        
remote: Compressing objects:  97% (110/113)        
Receiving objects:   0% (1/124)
Receiving objects:   1% (2/124)
Receiving objects:   2% (3/124)
Receiving objects:   3% (4/124)
Receiving objects:   4% (5/124)
Receiving objects:   5% (7/124)
Receiving objects:   6% (8/124)
Receiving objects:   7% (9/124)
Receiving objects:   8% (10/124)
Receiving objects:   9% (12/124)
Receiving objects:  10% (13/124)
Receiving objects:  11% (14/124)
Receiving objects:  12% (15/124)
Receiving objects:  13% (17/124)
Receiving objects:  14% (18/124)
Receiving objects:  15% (19/124)
Receiving objects:  16% (20/124)
Receiving objects:  17% (22/124)
Receiving objects:  18% (23/124)
Receiving objects:  19% (24/124)
Receiving objects:  20% (25/124)
Receiving objects:  21% (27/124)
Receiving objects:  22% (28/124)
Receiving objects:  23% (29/124)
Receiving objects:  24% (30/124)
Receiving objects:  25% (31/124)
Receiving objects:  26% (33/124)
Receiving objects:  27% (34/124)
Receiving objects:  28% (35/124)
Receiving objects:  29% (36/124)
Receiving objects:  30% (38/124)
Receiving objects:  31% (39/124)
Receiving objects:  32% (40/124)
Receiving objects:  33% (41/124)
Receiving objects:  34% (43/124)
Receiving objects:  35% (44/124)
Receiving objects:  36% (45/124)
Receiving objects:  37% (46/124)
Receiving objects:  38% (48/124)
Receiving objects:  39% (49/124)
Receiving objects:  40% (50/124)
Receiving objects:  41% (51/124)
Receiving objects:  42% (53/124)
Receiving objects:  43% (54/124)
Receiving objects:  44% (55/124)
Receiving objects:  45% (56/124)
Receiving objects:  46% (58/124)
Receiving objects:  47% (59/124)
Receiving objects:  48% (60/124)
Receiving objects:  49% (61/124)
Receiving objects:  50% (62/124)
Receiving objects:  51% (64/124)
Receiving objects:  52% (65/124)
Receiving objects:  53% (66/124)
Receiving objects:  54% (67/124)
Receiving objects:  55% (69/124)
Receiving objects:  56% (70/124)
Receiving objects:  57% (71/124)
Receiving objects:  58% (72/124)
Receiving objects:  59% (74/124)
Receiving objects:  60% (75/124)
Receiving objects:  61% (76/124)
Receiving objects:  62% (77/124)
Receiving objects:  63% (79/124)
Receiving objects:  64% (80/124)
Receiving objects:  65% (81/124)
Receiving objects:  66% (82/124)
Receiving objects:  67% (84/124)
Receiving objects:  68% (85/124)
Receiving objects:  69% (86/124)
Receiving objects:  70% (87/124)
Receiving objects:  71% (89/124)
Receiving objects:  72% (90/124)
Receiving objects:  73% (91/124)
Receiving objects:  74% (92/124)
Receiving objects:  75% (93/124)
Receiving objects:  76% (95/124)
Receiving objects:  77% (96/124)
Receiving objects:  78% (97/124)
Receiving objects:  79% (98/124)
Receiving objects:  80% (100/124)
Receiving objects:  81% (101/124)
Receiving objects:  82% (102/124)
Receiving objects:  83% (103/124)
Receiving objects:  84% (105/124)
Receiving objects:  85% (106/124)
Receiving objects:  86% (107/124)
Receiving objects:  87% (108/124)
Receiving objects:  88% (110/124)
Receiving objects:  89% (111/124)
Receiving objects:  90% (112/124)
Receiving objects:  91% (113/124)
remote: Total 124 (delta 88), reused 27 (delta 7), pack-reused 0        
Receiving objects:  92% (115/124)
Receiving objects:  93% (116/124)
Receiving objects:  94% (117/124)
Receiving objects:  95% (118/124)
Receiving objects:  96% (120/124)
Receiving objects:  97% (121/124)
Receiving objects:  98% (122/124)
Receiving objects:  99% (123/124)
Receiving objects: 100% (124/124)
Receiving objects: 100% (124/124), 104.97 KiB | 3.75 MiB/s, done.
Resolving deltas:   0% (0/88)
Resolving deltas:   1% (1/88)
Resolving deltas:   2% (2/88)
Resolving deltas:   3% (3/88)
Resolving deltas:   4% (4/88)
Resolving deltas:   5% (5/88)
Resolving deltas:   6% (6/88)
Resolving deltas:   7% (7/88)
Resolving deltas:   9% (8/88)
Resolving deltas:  10% (9/88)
Resolving deltas:  11% (10/88)
Resolving deltas:  12% (11/88)
Resolving deltas:  13% (12/88)
Resolving deltas:  14% (13/88)
Resolving deltas:  15% (14/88)
Resolving deltas:  17% (15/88)
Resolving deltas:  18% (16/88)
Resolving deltas:  19% (17/88)
Resolving deltas:  20% (18/88)
Resolving deltas:  21% (19/88)
Resolving deltas:  22% (20/88)
Resolving deltas:  23% (21/88)
Resolving deltas:  25% (22/88)
Resolving deltas:  26% (23/88)
Resolving deltas:  27% (24/88)
Resolving deltas:  28% (25/88)
Resolving deltas:  29% (26/88)
Resolving deltas:  30% (27/88)
Resolving deltas:  31% (28/88)
Resolving deltas:  32% (29/88)
Resolving deltas:  34% (30/88)
Resolving deltas:  35% (31/88)
Resolving deltas:  36% (32/88)
Resolving deltas:  37% (33/88)
Resolving deltas:  38% (34/88)
Resolving deltas:  39% (35/88)
Resolving deltas:  40% (36/88)
Resolving deltas:  42% (37/88)
Resolving deltas:  43% (38/88)
Resolving deltas:  44% (39/88)
Resolving deltas:  45% (40/88)
Resolving deltas:  46% (41/88)
Resolving deltas:  47% (42/88)
Resolving deltas:  48% (43/88)
Resolving deltas:  50% (44/88)
Resolving deltas:  51% (45/88)
Resolving deltas:  52% (46/88)
Resolving deltas:  53% (47/88)
Resolving deltas:  54% (48/88)
Resolving deltas:  55% (49/88)
Resolving deltas:  56% (50/88)
Resolving deltas:  57% (51/88)
Resolving deltas:  59% (52/88)
Resolving deltas:  60% (53/88)
Resolving deltas:  61% (54/88)
Resolving deltas:  62% (55/88)
Resolving deltas:  63% (56/88)
Resolving deltas:  64% (57/88)
Resolving deltas:  65% (58/88)
Resolving deltas:  67% (59/88)
Resolving deltas:  68% (60/88)
Resolving deltas:  69% (61/88)
Resolving deltas:  70% (62/88)
Resolving deltas:  71% (63/88)
Resolving deltas:  72% (64/88)
Resolving deltas:  73% (65/88)
Resolving deltas:  75% (66/88)
Resolving deltas:  76% (67/88)
Resolving deltas:  77% (68/88)
Resolving deltas:  78% (69/88)
Resolving deltas:  79% (70/88)
Resolving deltas:  80% (71/88)
Resolving deltas:  81% (72/88)
Resolving deltas:  82% (73/88)
Resolving deltas:  84% (74/88)
Resolving deltas:  85% (75/88)
Resolving deltas:  86% (76/88)
Resolving deltas:  87% (77/88)
Resolving deltas:  88% (78/88)
Resolving deltas:  89% (79/88)
Resolving deltas:  90% (80/88)
Resolving deltas:  92% (81/88)
Resolving deltas:  93% (82/88)
Resolving deltas:  94% (83/88)
Resolving deltas:  95% (84/88)
Resolving deltas:  96% (85/88)
Resolving deltas:  97% (86/88)
Resolving deltas:  98% (87/88)
Resolving deltas: 100% (88/88)
Resolving deltas: 100% (88/88), completed with 75 local objects.
From ssh://github.com/NVlabs/cub
 * branch            d106ddb991a56c3df1b6d51b2409e36ba8181ce4 -> FETCH_HEAD
remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0        
ssh: connect to host github.com port 22: Connection timed out

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Fetched in submodule path 'third_party/cudnn_frontend', but it did not contain 51e60d891b689d618e7a623509a779c422a420f7. Direct fetching of that commit failed.


Exited with code exit status 1


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think, @kshitij12345, @pmeier?

common_methods_invocations.py is currently 7500 lines, and we like to keep files below 10k lines. So I think we have time before we have to pursue a split. If/when we do pursue a split I agree this seems like the best approach.

Should we do this now, @antocuni? What do you see as the pros/cons? My attempt at a list:

PROS

  • nicer file names (because common_methods_invocations.py is a terrible name)
  • can put sample inputs next to ops, at least for some operators
  • moves classes like UnaryUfuncInfo into their own files, so files have clearer purposes

CONS

  • can no longer search all opinfos in one file
  • sample inputs next to ops or an op after op pattern both seem valid, and some classes of operator (unary elementwise, binary elementwise, reductions) will use the same sample inputs for the entire class, so at least a few will use an op after op pattern; this makes PRO #2 above a little dubious

What other PROS/CONS should we consider, @antocuni?

@antocuni
Copy link
Contributor Author

What do you think, @kshitij12345, @pmeier?

common_methods_invocations.py is currently 7500 lines, and we like to keep files below 10k lines. So I think we have time before we have to pursue a split. If/when we do pursue a split I agree this seems like the best approach.

just my 2 cents here, but generally speaking 10k lines seems a LOT to me, also considering that Python source code is typically more dense than other languages. It makes it impossible to read the file from top to bottom and the only reasonable way to navigate around it is to search for a known string. Even 7.5k is already too much, at least for me.

Also, it seems that the OpInfo approach is really successful and we will continuing to add new features and new operators, so eventually we will reach the 10k limit anyway. So, the sooner we do the split, the less work we have to do later when we will inevitably need to do it.

Should we do this now, @antocuni? What do you see as the pros/cons? My attempt at a list:

PROS

  • nicer file names (because common_methods_invocations.py is a terrible name)

+1 for this :)

  • can put sample inputs next to ops, at least for some operators
  • moves classes like UnaryUfuncInfo into their own files, so files have clearer purposes

CONS

  • can no longer search all opinfos in one file

this can be mitigated by using a sane naming convention for the files, so that one can easily grep into them. In this PR I started using opinfo_*.py, but we could even use a separate package opinfo/*.py (e.g., opinfo/linalg.py).

  • sample inputs next to ops or an op after op pattern both seem valid, and some classes of operator (unary elementwise, binary elementwise, reductions) will use the same sample inputs for the entire class, so at least a few will use an op after op pattern; this makes PRO Don't support legacy Python #2 above a little dubious

yes, but there are others cases in which there are ~6000 lines between the definition of the sample_input_* function and the corresponding OpInfo. Even if the PRO #2 does not applies everywhere, it's still a big improvements when it does.

What other PROS/CONS should we consider, @antocuni?

Extra PROS

  • with this approach, we can easily combine multiple OpInfoDB into one. For example, right now we start from a single big op_db and from there we create smaller dbs by filtering out what we don't want, like unary_ufuncs, spectral_funcs, etc. We could do the opposite and have opinfo/unary.py to define unary_db, opinfo/spectral.py to define spectral_func_db and opinfo/all.py to be defined as unary_db + spectral_db + other_ops_db + ... (but we need to find a better name for other_ops_db)

  • it makes it possible to split the logic and the code of OpInfo, OpInfoDB, SampleInput etc. from the actual definition of the DB itself (which will be in the opinfo/*.py files)

  • it enables us to experiment with new/better syntax to declare OpInfos, either now or in the future. See e.g. the class-based syntax which I proposed in Better syntax for OpInfo #55159

Extra CONS

  • we need to repeat a lot of import boilerplate at the beginning of each file

@kshitij12345
Copy link
Collaborator

nicer file names (because common_methods_invocations.py is a terrible name)

+1

sample inputs next to ops or and op after op pattern both seem valid, and some classes of operator (unary elementwise, binary elementwise, reductions) will use the same sample inputs for the entire class, so at least a few will use an op after op pattern; this makes PRO #2 above a little dubious

I agree with @antocuni that this structure gives us more flexibility going forward.

Also from my personal experience, just looking at test_torch.py was intimidating (as a new contributor back when it was 15K lines and over). I really like that with this approach we can split the OpInfo into coherent files.

Thanks @antocuni!

@pmeier
Copy link
Collaborator

pmeier commented Jun 15, 2021

Very much in favor of having multiple smaller files. What about a structure like this:

torch/
└── testing
    └── _internal
        └── op_info
            ├── binary.py
            ├── __init__.py
            ├── linalg.py
            ├── misc.py
            ├── reduction.py
            ├── unary.py
            └── utils.py

We could group every operator in the torch namespace into unary, binary, reduction, and misc (or split this further if we like to). For operators that have their own namespace, I would just keep it for example linalg. Stuff that is used by all files, for example the base OpInfo class could be put in utils. As @antocuni proposed, every file could now define its own small database which could be fused in __init__.

@antocuni
Copy link
Contributor Author

Very much in favor of having multiple smaller files. What about a structure like this:

torch/
└── testing
    └── _internal
        └── op_info
            ├── binary.py
            ├── __init__.py
            ├── linalg.py
            ├── misc.py
            ├── reduction.py
            ├── unary.py
            └── utils.py

+1 for this.

We could group every operator in the torch namespace into unary, binary, reduction, and misc (or split this further if we like to). For operators that have their own namespace, I would just keep it for example linalg. Stuff that is used by all files, for example the base OpInfo class could be put in utils. As @antocuni proposed, every file could now define its own small database which could be fused in __init__.

Defining an OpInfoDB in each file and merge them together inside __init__ has another advantage: in the current version of the branch, import torch.testing._internal.opinfo_linalg has the side effect of mutating _internal.op_db (by calling .register() on it) and thus has do be done at the end of the file.
By reversing the approach, we can do a much simpler and easier to understand composition. E.g. inside op_info/__init__.py we will have something like this:

from . import unary, binary, linalg, misc # ...
op_db = OpInfoDB.merge('all', unary.op_db, binary.op_db, linalg.op_db, misc.op_db, ...)

No side effects, no need to do imports at the end, everything straightforward.

@mruberry what do you think?

@mruberry
Copy link
Collaborator

mruberry commented Jun 20, 2021

This sounds good. What about keeping our lives simpler for now and still just using a list? Lists in Python can be appended with each new OpInfo instead of declaring them in it. The central repo can just import the lists and extend or add them or whatever to create its own list.

For files, we could start with a new folder in _internal, like proposed above, called "opinfos". The files in it can mirror different test files and classes. We could start with init.py, core.py or opinfo.py, unary_elementwise.py, and foreach.py for example. These could define the lists "all_opinfos" (replacing the name "op_db"), define the OpInfo-related classes, a "unary_elementwise_opinfos" list, and the special "foreach_opinfos" list, which contains OpInfos that don't appear in "all_opinfos".

That seems simple and flexible while achieving our organizational goals? It also lets us create a more specialized data structure in the future if we want to without locking us into anything today. We could add the additional files like binary_elementwise.py and linalg.py etc. in the same PR or a follow-up.

@antocuni antocuni force-pushed the antocuni/refactor-opdb branch from 9f96068 to 42d8d72 Compare June 30, 2021 10:11
@antocuni antocuni changed the title WIP: Introduce OpInfoDB Intoduce torch.testing._internal.opinfos Jun 30, 2021
@antocuni
Copy link
Contributor Author

This sounds good. What about keeping our lives simpler for now and still just using a list? Lists in Python can be appended with each new OpInfo instead of declaring them in it. The central repo can just import the lists and extend or add them or whatever to create its own list.

For files, we could start with a new folder in _internal, like proposed above, called "opinfos". The files in it can mirror different test files and classes.

@mruberry I changed the name of this PR to reflect the this new plan and implemented it.
This PR:

  • introduces torch.testing._internal.opinfos
  • move the majority of the old common_method_invocations.py into multiple files in opinfos/
  • fix the imports accordingly

This PR is going to be a conflict nightmare since common_methods_invocations.py is modifed very often, so to mitigate the problem for now I tried to minimize the changes. In particular, I did not rename anything, and op_db is still defined as a big single list inside opinfos/db.py.

I propose to continue the refactoring in follow up PRs, but assuming that tests are green, hopefully this one should land as quickly as possible, because the more we wait the more likely it is to become out of date.

@antocuni antocuni marked this pull request as ready for review June 30, 2021 14:35
@mruberry
Copy link
Collaborator

mruberry commented Jul 9, 2021

This sounds good. What about keeping our lives simpler for now and still just using a list? Lists in Python can be appended with each new OpInfo instead of declaring them in it. The central repo can just import the lists and extend or add them or whatever to create its own list.
For files, we could start with a new folder in _internal, like proposed above, called "opinfos". The files in it can mirror different test files and classes.

@mruberry I changed the name of this PR to reflect the this new plan and implemented it.
This PR:

  • introduces torch.testing._internal.opinfos
  • move the majority of the old common_method_invocations.py into multiple files in opinfos/
  • fix the imports accordingly

This PR is going to be a conflict nightmare since common_methods_invocations.py is modifed very often, so to mitigate the problem for now I tried to minimize the changes. In particular, I did not rename anything, and op_db is still defined as a big single list inside opinfos/db.py.

I propose to continue the refactoring in follow up PRs, but assuming that tests are green, hopefully this one should land as quickly as possible, because the more we wait the more likely it is to become out of date.

Cool; let me think if we can sneak this in this weekend.

@mruberry
Copy link
Collaborator

fyi @antocuni I haven't forgotten about this but need to find the right time to make the update, thanks for being patient

@antocuni
Copy link
Contributor Author

fyi @antocuni I haven't forgotten about this but need to find the right time to make the update, thanks for being patient

no problem, thank you.
I think it's not worth trying to fix the conflicts continuously, so please ping me when you are ready to do the merge, and I'll fix it

@kshitij12345
Copy link
Collaborator

Right now common_method_invocations is around 15k lines and Github has issue rendering.

Gentle ping @mruberry

@mruberry
Copy link
Collaborator

Right now common_method_invocations is around 15k lines and Github has issue rendering.

Gentle ping @mruberry

Yeah it's become too big

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Apr 13, 2022
@pmeier
Copy link
Collaborator

pmeier commented Apr 13, 2022

We should not let this go to waste. Working on the file is a pain. @antocuni is no longer with Quansight, but I can take this up to at least fix the merge conflicts or do additional changes if requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: testing Issues related to the torch.testing module (not tests) open source Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants