MAINT Fix GCC warnings from Cython#14077
Conversation
sklearn/neighbors/dist_metrics.pyx
Outdated
|
|
||
| cdef inline DTYPE_t rdist(self, DTYPE_t* x1, DTYPE_t* x2, | ||
| ITYPE_t size) nogil except -1: | ||
| return self._rdist(x1, x2, size) |
There was a problem hiding this comment.
But this is definitely uglier and makes the code harder to understand. I think it at least deserves a comment referencing this pull request
There was a problem hiding this comment.
The other way to make this work is to not have inline. The generated C code for any rdist and dist are not marked with CYTHON_INLINE. I suspect because the base class DistanceMetric does not mark rdist and dist as inline. DistanceMetric can not mark them as inline because it would make them "final".
TLDR: Remove inline everywhere since Cython does not generated code that inlines rdist and dist anyways.
There was a problem hiding this comment.
I am fine with the last options but please check with a short micro-benchmark.
There was a problem hiding this comment.
For future reference, the Cython docs state:
Note that class-level cdef functions are handled via a virtual function table, so the compiler won’t be able to inline them in almost all cases.
https://cython.readthedocs.io/en/latest/src/userguide/pyrex_differences.html#cdef-inline
|
This PR was updated:
|
|
Error is unrelated to this PR. |
sklearn/linear_model/cd_fast.pyx
Outdated
| cdef floating dual_norm_XtA | ||
| cdef unsigned int ii | ||
| cdef unsigned int n_iter = 0 | ||
| cdef int n_iter = 0 |
There was a problem hiding this comment.
Do you remember what was the warning? uint seems like the right type for n_iter..
There was a problem hiding this comment.
n_iter is used to iterate later:
for n_iter in range(max_iter):
...where max_iter is an int. When n_iter is a unsigned int we get warning similar to this:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_29; __pyx_t_10+=1) {
There was a problem hiding this comment.
Alternatively we can cast max_iter to an unsigned int.
There was a problem hiding this comment.
Casting seems like a bad idea, there isn't any checking of max_iter when enet_path calls enet_coordinate_descent.
sklearn/utils/sparsefuncs_fast.pyx
Outdated
| # and their corresponding values are stored in: | ||
| # data[indptr[i]:indptr[i+1]] | ||
| cdef np.npy_intp i, j | ||
| cdef unsigned long long i |
There was a problem hiding this comment.
intp seems like the right dtype here.
Integer used for indexing (same as C ssize_t; normally either int32 or int64)
What was the warning?
There was a problem hiding this comment.
Later we use i as an index for n_samples which is an unsigned long long. If i does not match this type, we get a comparison warning like above.
j will iterate over X_indptr[i] to X_indptr[i + 1] which is an integral type. If i does not match this type, we get a comparison warning like above.
Traditionally np.npy_intp is used as an index for numpy arrays, which allows for negative indexing, i.e. x[-3]. In this case we set wraparound=False, thus only accepting positive integers as indices. In this case, I think this allows for large sparse matrices with a size of MAX(unsigned long long)
There was a problem hiding this comment.
Alternatively, we can fully embrace np.intp and have:
cdef np.npy_intp n_samples = shape[0]
sklearn/utils/sparsefuncs_fast.pyx
Outdated
| # cannot be declared directly and can only be passed as function arguments | ||
| cdef: | ||
| np.npy_intp i | ||
| unsigned long long i |
There was a problem hiding this comment.
Same as above in the PR (below in the code)
|
@rth I reverted the cython change regarding indexing to make this PR smaller. (I will open another PR for the index changes) |
|
this is fixing gcc warnings rather than cython warnings, right? |
Yup - Change title to reflect this. |
A 9 hour train ride is the best time to fix Cython warnings:
PyArray_NewFromDescr.original_median = 0.0, they are not usedupdate_median_parameters_post_removeorupdate_median_parameters_post_pushwhen the size is 0.