Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136987
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d3b8428 with merge base dfe1d45 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
Before this change, test failed with unable to compile errors, as `bfloat16` requires explicit cast. Tested in #136987 Pull Request resolved: #136981 Approved by: https://github.com/Skylion007
) Just adds instantiation of the kernels and sometimes explicit cast. Tested in #136987 Pull Request resolved: #136982 Approved by: https://github.com/Skylion007 ghstack dependencies: #136981
By simply adding explicit instantiation Tested in #136987 Pull Request resolved: #136983 Approved by: https://github.com/Skylion007 ghstack dependencies: #136981, #136982
[ghstack-poisoned]
[ghstack-poisoned]
By even further reducing precisions of imprecise FP16 ops, introducing new BF16_LOW_PRECISION_OPS category and marking BF16 tests as xfail for `divfloor_rounding`, `floor_divide` and `remainder` [ghstack-poisoned]
For Metal cast ops to comple, one need to explicitly cast to/from `bfloat` unlike for other dtypes Tested in #136987 [ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Any chance we can fix that behavior? This will most likely change the numerics for end users significantly.
@kulinseth ?
For Metal cast ops to comple, one need to explicitly cast to/from `bfloat` unlike for other dtypes Tested in #136987 Pull Request resolved: #137070 Approved by: https://github.com/Skylion007
By even further reducing precisions of imprecise FP16 ops, introducing new BF16_LOW_PRECISION_OPS category and marking BF16 tests as xfail for `divfloor_rounding`, `floor_divide` and `remainder`. I guess the nature of low-precision results, is that MPSGraph, unlike the rest of the PyTorch does not do accumulation over fp32 for reduction operations [ghstack-poisoned]
|
@pytorchbot merge -f "MPS + Lint are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Before this change, test failed with unable to compile errors, as `bfloat16` requires explicit cast. Tested in pytorch#136987 Pull Request resolved: pytorch#136981 Approved by: https://github.com/Skylion007
…rch#136982) Just adds instantiation of the kernels and sometimes explicit cast. Tested in pytorch#136987 Pull Request resolved: pytorch#136982 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#136981
By simply adding explicit instantiation Tested in pytorch#136987 Pull Request resolved: pytorch#136983 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#136981, pytorch#136982
By adding explicit instantiation. Tested in pytorch#136987 Pull Request resolved: pytorch#136984 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#136981, pytorch#136982, pytorch#136983
For Metal cast ops to comple, one need to explicitly cast to/from `bfloat` unlike for other dtypes Tested in pytorch#136987 Pull Request resolved: pytorch#137070 Approved by: https://github.com/Skylion007
By even further reducing precisions of imprecise FP16 ops, introducing new BF16_LOW_PRECISION_OPS category and marking BF16 tests as xfail for `divfloor_rounding`, `floor_divide` and `remainder`. I guess the nature of low-precision results, is that MPSGraph, unlike the rest of the PyTorch does not do accumulation over fp32 for reduction operations Pull Request resolved: pytorch#136987 Approved by: https://github.com/albanD ghstack dependencies: pytorch#137070
Stack from ghstack (oldest at bottom):
By even further reducing precisions of imprecise FP16 ops, introducing new BF16_LOW_PRECISION_OPS category and marking BF16 tests as xfail for
divfloor_rounding,floor_divideandremainder.I guess the nature of low-precision results, is that MPSGraph, unlike the rest of the PyTorch does not do accumulation over fp32 for reduction operations