Skip to content

MAINT Fix GCC warnings from Cython#14077

Merged
rth merged 17 commits intoscikit-learn:masterfrom
thomasjpfan:cython-compile-fixes
Jul 1, 2019
Merged

MAINT Fix GCC warnings from Cython#14077
rth merged 17 commits intoscikit-learn:masterfrom
thomasjpfan:cython-compile-fixes

Conversation

@thomasjpfan
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan commented Jun 12, 2019

A 9 hour train ride is the best time to fix Cython warnings:

  1. Updates the signature of PyArray_NewFromDescr.
  2. Initializes original_median = 0.0, they are not used update_median_parameters_post_remove or update_median_parameters_post_push when the size is 0.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm


cdef inline DTYPE_t rdist(self, DTYPE_t* x1, DTYPE_t* x2,
ITYPE_t size) nogil except -1:
return self._rdist(x1, x2, size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is definitely uglier and makes the code harder to understand. I think it at least deserves a comment referencing this pull request

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other way to make this work is to not have inline. The generated C code for any rdist and dist are not marked with CYTHON_INLINE. I suspect because the base class DistanceMetric does not mark rdist and dist as inline. DistanceMetric can not mark them as inline because it would make them "final".

TLDR: Remove inline everywhere since Cython does not generated code that inlines rdist and dist anyways.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the last options but please check with a short micro-benchmark.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, the Cython docs state:

Note that class-level cdef functions are handled via a virtual function table, so the compiler won’t be able to inline them in almost all cases.

https://cython.readthedocs.io/en/latest/src/userguide/pyrex_differences.html#cdef-inline

@thomasjpfan
Copy link
Copy Markdown
Member Author

This PR was updated:

  1. Remove some warnings with gcc regarding unsigned/signed ints.
  2. Reverted the dist_metrics.pyx changes. This should be in its own PR. Currently the inline for dist may not be respected and it would be nice to have it inline (somehow).

@thomasjpfan
Copy link
Copy Markdown
Member Author

Error is unrelated to this PR.

cdef floating dual_norm_XtA
cdef unsigned int ii
cdef unsigned int n_iter = 0
cdef int n_iter = 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember what was the warning? uint seems like the right type for n_iter..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_iter is used to iterate later:

for n_iter in range(max_iter):
    ...

where max_iter is an int. When n_iter is a unsigned int we get warning similar to this:

warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_29; __pyx_t_10+=1) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can cast max_iter to an unsigned int.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting seems like a bad idea, there isn't any checking of max_iter when enet_path calls enet_coordinate_descent.

# and their corresponding values are stored in:
# data[indptr[i]:indptr[i+1]]
cdef np.npy_intp i, j
cdef unsigned long long i
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intp seems like the right dtype here.

Integer used for indexing (same as C ssize_t; normally either int32 or int64)

What was the warning?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we use i as an index for n_samples which is an unsigned long long. If i does not match this type, we get a comparison warning like above.

j will iterate over X_indptr[i] to X_indptr[i + 1] which is an integral type. If i does not match this type, we get a comparison warning like above.

Traditionally np.npy_intp is used as an index for numpy arrays, which allows for negative indexing, i.e. x[-3]. In this case we set wraparound=False, thus only accepting positive integers as indices. In this case, I think this allows for large sparse matrices with a size of MAX(unsigned long long)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can fully embrace np.intp and have:

cdef np.npy_intp  n_samples = shape[0]

# cannot be declared directly and can only be passed as function arguments
cdef:
np.npy_intp i
unsigned long long i
Copy link
Copy Markdown
Member

@rth rth Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above in the PR (below in the code)

@thomasjpfan
Copy link
Copy Markdown
Member Author

thomasjpfan commented Jun 30, 2019

@rth I reverted the cython change regarding indexing to make this PR smaller. (I will open another PR for the index changes)

@NicolasHug
Copy link
Copy Markdown
Member

this is fixing gcc warnings rather than cython warnings, right?

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan

@thomasjpfan thomasjpfan changed the title [MRG] MAINT Fixes Cython warnings [MRG] MAINT Fixes GCC warnings from Cython Jul 1, 2019
@thomasjpfan
Copy link
Copy Markdown
Member Author

this is fixing gcc warnings rather than cython warnings, right?

Yup - Change title to reflect this.

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan !

@rth rth changed the title [MRG] MAINT Fixes GCC warnings from Cython MAINT Fix GCC warnings from Cython Jul 1, 2019
@rth rth merged commit 581b0e1 into scikit-learn:master Jul 1, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants