Conversation
|
FYI #9503 is merged |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ml_dtypes (bfloat16) supportml_dtypes (bfloat16) support
|
This is now in a state where a look make sense. Not small anymore, but also not that big. One larger change is that there is now a Some open points/homework for me:
EDIT: I won't claim there won't be holes, but I think I covered all of those points 🎉. The one thing would probably still do is rename |
ml_dtypes (bfloat16) supportml_dtypes.bfloat16 support
|
/test mini |
leofang
left a comment
There was a problem hiding this comment.
Looks great! Awesome progress, Sebastian!
There was a problem hiding this comment.
This can wait, but it'd be nice later to add no_bfloat16=False to for_all_dtypes, for_float_dtypes, ... in cupy/testing/_loops.py to get it auto-tested
|
/test mini |
|
One other thing to confirm here: I am not yet sure the |
cupy/_core/_routines_math.pyx
Outdated
| ('F->f', 'out0 = arg(in0) * (180.0 / M_PI)'), | ||
| ('D->d', 'out0 = arg(in0) * (180.0 / M_PI)')), | ||
| 'out0 = in0 >= 0 ? 0 : 180.0', | ||
| 'out0 = in0 >= decltype(in0){0} ? 0 : 180.0', |
There was a problem hiding this comment.
Hmmm, I need to undo (or change this), but it needs some header fix as well (it thinks the __half comparison may also match).
FWIW, I will also check other CI testing bfloat16 though, I suspect I added ml_dtypes to the matrix but have to add it to actually be installed maybe?
There was a problem hiding this comment.
In the end, I made this two lines, but really just used decltype(+in0){0}. The + seemed like the easiest way to fix the fusion, which uses e.g. double &in0 and we need to get rid of the &.
I couldn't think of a way to keep this code unchanged that would avoid the operator to the header, but I didn't want to add a full family of them...
(As an aside: type_in0_raw mechanism isn't compatible with fusion, maybe fusion could also define it. Although I like the decltype at least if it wasn't for the +in0 hack.)
|
Want to see if this works (or if I might have to adapt the minimum version in the tests). /test cuda122,linux |
|
/test cuda122,linux |
1 similar comment
|
/test cuda122,linux |
|
/test mini |
1 similar comment
|
/test mini |
|
/test windows,cuda120 |
|
/test windows,cuda129 |
| } | ||
| return float16(ret_raw_); | ||
| } | ||
| #endif // #ifdef __HIPCC_RTC__ |
There was a problem hiding this comment.
Comment only, no need to address in this PR: Shouldn't this be __HIPCC__?
|
/test force-skip |
|
The following tests were force-skipped:
|
ENH: `ml_dtypes.bfloat16` support
|
LGTM! Thanks! |
Add optional support for
bfloat16viaml_dtypes. The mechanism for this is:ml_dtypesand if that can be imported, add it to our C type maps. (The corresponding scalars are supported now via an earlier refactor.)bfloat16is used in any of the kernel arguments.*bf16_loop()helper, which expands to the loop ifml_dtypescan be found, otherwise the loop is omitted. (This should be fine, in part because there will always be an "e" loop in front, so we don't have to worry much about weirder promotion corner cases.)I think this should be considered experimental, in the sense that corner cases/differences probably exist and it may not always be obvious what the computation type should be for
bfloat16input (or e.g. half input is explicitly using float32 and it is missing for now).