Skip to content

Prevent silent overflow in lapack worker size calculations.#19288

Merged
copybara-service[bot] merged 1 commit intojax-ml:mainfrom
pearu:pearu/int32-overflow
Feb 20, 2024
Merged

Prevent silent overflow in lapack worker size calculations.#19288
copybara-service[bot] merged 1 commit intojax-ml:mainfrom
pearu:pearu/int32-overflow

Conversation

@pearu
Copy link
Copy Markdown
Collaborator

@pearu pearu commented Jan 10, 2024

As in the title.

Addresses the issues of wrong results and crashes reported in #10420, #10411, etc.

@pearu pearu self-assigned this Jan 10, 2024
@pearu pearu added CPU Issues related to the CPU compiler/runtime bug Something isn't working labels Jan 10, 2024
Copy link
Copy Markdown
Collaborator

@hawkinsp hawkinsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good improvement, although it would not catch the original issue, in which I think there was an integer overflow inside LAPACK from multiplying m * n when m and n were close to 2**16. To catch that we'd need some additional defensive checking limiting maximum sizes.

I would probably add that checking in Python here, if desired:
https://github.com/google/jax/blob/main/jaxlib/lapack.py

One other change is needed also. Please add:

    copts = ["-fexceptions"],
    features = ["-use_header_modules"],

to the lapack_kernels build target.
https://github.com/google/jax/blob/adf05d520a9ec95b2b05b6ba28b627ae34df4a02/jaxlib/cpu/BUILD#L32

By default we disable exceptions in our C++ builds, and they have to be enabled file by file. (Mostly we use absl::Status to communicate failure, but that's probably overkill here, where these workspace size functions are called by pybind11 wrappers and pybind11 wants failures as C++ exceptions.)

@hawkinsp
Copy link
Copy Markdown
Collaborator

Actually you're probably right: the workspace size check would probably catch the original problem by itself. So only the BUILD file change is needed.

@pearu
Copy link
Copy Markdown
Collaborator Author

pearu commented Jan 10, 2024

One other change is needed also. Please add: ...

Done. Thanks for the hint, @hawkinsp!

Add -fexceptions to building lapack_kernels
@pearu pearu force-pushed the pearu/int32-overflow branch from 4496a25 to 3fa1033 Compare February 20, 2024 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CPU Issues related to the CPU compiler/runtime pull ready Ready for copybara import and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants