MAINT Refactor vector sentinel into utils#22728
MAINT Refactor vector sentinel into utils#22728jeremiedbb merged 5 commits intoscikit-learn:mainfrom
Conversation
| StdVectorSentinel sentinel = _create_sentinel(vect_ptr) | ||
| np.ndarray arr = np.PyArray_SimpleNewFromData( | ||
| 1, &size, sentinel.get_typenum(), sentinel.get_data()) |
There was a problem hiding this comment.
I think this is slightly more clear than the current implementation. The get_data makes it clear that arr points to the data owned by the sentinel.
The implementation on main defines arr with the buffer from vect_ptr and then the sentinel would set the internal pointer in sentinel.vec to vect_ptr. Because only the pointers were swapped, arr is pointing to the correct place in memory.
jjerphan
left a comment
There was a problem hiding this comment.
Thank you for the refactoring.
I am glad that this set of features helps solving issues and improving others implementations.
This LGTM modulo a minor discussion: should we move coerce_vectors_to_nd_arrays here potentially renaming it to mention ragged arrays?
| StdVectorSentinel sentinel = _create_sentinel(vect_ptr) | ||
| np.ndarray arr = np.PyArray_SimpleNewFromData( | ||
| 1, &size, sentinel.get_typenum(), sentinel.get_data()) |
I think ragged arrays in In the future, if there is a need in other parts of the codebase for ragged arrays, I can see placing it into |
| config.add_extension( | ||
| "_vector_sentinel", | ||
| sources=["_vector_sentinel.pyx"], | ||
| libraries=libraries, | ||
| language="c++", | ||
| ) |
There was a problem hiding this comment.
I'm always confused about when is include_dirs=[numpy.get_include()] necessary.. I thought it was whenever compiling against numpy c api.
There was a problem hiding this comment.
Yea, it's strange. I added the include_dirs here to be safe.
Reference Issues/PRs
Follow up to #22320
What does this implement/fix? Explain your changes.
This PR refactors the
StdVectorSentinelinto it's own file and puts it intosklearn.utils. With this PR, adding new vectors entail._typedefvector_typedStdVectorSentinel*_create_sentinelThe caller of
vector_to_nd_arraydoes not need to know anything about sentinels. They can pass in a vector and an ndarray is returned.Any other comments?
Running benchmarks from #22320 (review) I do not see any performance difference with this refactor.
I initially had a PR ready to resolve #11540 by using
vector[int64_t]+StdVectorSentinelInt64. But I think the refactoring itself deserves it's own PR.CC @jjerphan