Conversation
seberg
left a comment
There was a problem hiding this comment.
Cool! One small comment for simplifying a bit, but the code looks great on first sight!
My two main questions are:
- Did we decide what the output dtype is? The default integer, the same integer?
- While I am in favor of adding the
bit_countufunc (which still may need more discussion unfortunately), I would still need a bit convincing on adding a method, it doesn't seem as clear to me. There could be an argument of where to add the ufunc, but hopefully we don't lose track of this because of such questions.
numpy/core/src/umath/funcs.inc.src
Outdated
|
|
||
| /* Try to use inbuilt popcount if available */ | ||
| static PyObject *builtin_popcount_func = NULL; | ||
| builtin_popcount_func = PyObject_GetAttrString(obj, "bit_count"); |
There was a problem hiding this comment.
Hmm, most ufuncs already try to call a method of the same name when it exists? There should be a shorthand to do this in generate_umath without the need to define this function.
There was a problem hiding this comment.
ohh yeah, let me see if we can auto-gen this, will make things easier.
There was a problem hiding this comment.
Ohh also makes sense now, why not all umath functions are here.
There was a problem hiding this comment.
Hey @seberg , I dug a bit deeper here and it seems all the o supported dtypes have a function in funcs.inc.srs and it does not seem automatic. Am I missing something here?
There was a problem hiding this comment.
Ah, there is the "P" character code that means: "Object, but just call the method".
There was a problem hiding this comment.
Ah nice, I missed it. Thanks! Added via b86a322
|
Thanks for the review @seberg
For the scalar version we decided to always output an 8 bit (
This brings a very good point on if we can create a |
015a1a2 to
abcfc59
Compare
|
This PR seems so close. Anything I can do to help get this over the finish line @ganesh-k13? For certain similarity functions (like hamming distance) the bit count function is the bottleneck, and this could really unlock pure python solutions without needing to resort to something in a lower level language. |
|
Hey @softwaredoug , we support Well to answer your question on moving it forward, implementation wise, perhaps the So in conclusion, if you feel this has a genuine use case for many in the community (the UFunc of course, the scalar version is already in), you can start a thread in the mailing list. Now I can prioritize this to completion if everyone is ok with adding this UFunc and if you want to contribute, I can help you in any way with terms of navigating this bit. |
|
I will give my 👍 for the ufunc, but not for the method (on the array object). Feel free to announce to the mailing list that pending no substantial objects we will move forward with adding it. The return dtype has to be resolved, but I guess it was defined as uint8 for the scalars? In which case we have prior art. |
|
Awesome - thanks @ganesh-k13 and @seberg Maybe this is my numpy ignorance, but is there a way to efficiently apply the scalar function to a whole array? Without something like |
|
No, there isn't an efficient way (maybe a JIT'er like |
abcfc59 to
050127a
Compare
|
Hey @BvB93 , basic question: If we fix the return type of UFunc to one dtype, where do we make the typing changes? Now I do get |
6738683 to
bcd7a6f
Compare
bcd7a6f to
1731966
Compare
|
Updated |
seberg
left a comment
There was a problem hiding this comment.
Thanks. Lots of nitpicks of the optional kind. I have to think about the naming a bit more.
Overall, I lean against adding any methods, which is the only meaningful change besides possible name bike-shedding.
numpy/core/src/multiarray/methods.c
Outdated
| METH_NOARGS, NULL}, | ||
| {"bitwise_count", | ||
| (PyCFunction)array_bitwise_count, | ||
| METH_VARARGS | METH_KEYWORDS, NULL}, |
There was a problem hiding this comment.
Same as for bit_count, I would just drop the method. I could see a point to bit_count() for supporting the nested array case, but overall I am leaning against both.
There was a problem hiding this comment.
This might be handy? If we want to support it on other sequences? I'm guessing there will be some other way though without this addition? happy to do both
There was a problem hiding this comment.
I removed it via d7f364f. I'm a bit confused now, I thought this adds bitwise_count to np namespace?
| UNARY_UFUNCS = [obj for obj in np.core.umath.__dict__.values() | ||
| if isinstance(obj, np.ufunc)] | ||
| UNARY_OBJECT_UFUNCS = [uf for uf in UNARY_UFUNCS if "O->O" in uf.types] | ||
| UNARY_OBJECT_UFUNCS.remove(getattr(np, 'bitwise_count')) |
There was a problem hiding this comment.
Just in case someone wonders: it seems the path assumes float64 loops to exist, which is not the case for bitwise.
1731966 to
0ff8230
Compare
bit_count UFuncsbitwise_count UFuncs
0ff8230 to
47ad3bd
Compare
47ad3bd to
e985c36
Compare
|
Resolved merge conflicts (rebased with latest main).
|
|
Long overdue look at the Array API standard. Two callouts:
|
e985c36 to
5928575
Compare
|
Are any more changes needed here @seberg? |
|
I have made few changes to the code to allow auto-vectorization, and here the result on x86 with avx2 enabled: ❯ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
❯ gcc --version
gcc (GCC) 13.2.1 20230801
❯ spin bench --compare enh_16325_popcount_ufunc_before -t bitwise_count
| Change | Before [03fc9291] <enh_16325_popcount_ufunc_before> | After [fc6b5911] <enh_16325_popcount_ufunc> | Ratio | Benchmark (Parameter) |
|----------|-------------------------------------------------------|-----------------------------------------------|---------|-----------------------------------------------------------------------------------|
| - | 253±0.1μs | 170±0.1μs | 0.67 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'H') |
| - | 180±0.4μs | 58.9±0.6μs | 0.33 | bench_ufunc.UFunc.time_ufunc_types('bitwise_count') |
| - | 353±0.8μs | 117±1μs | 0.33 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'h') |
| - | 564±0.5μs | 121±0.2μs | 0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'l') |
| - | 566±4μs | 123±0.4μs | 0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'q') |
| - | 621±3μs | 108±1μs | 0.17 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'i') |
| - | 562±0.5μs | 79.6±0.2μs | 0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'L') |
| - | 562±0.4μs | 80.5±0.2μs | 0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'Q') |
| - | 679±3μs | 75.1±0.06μs | 0.11 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'I') |
| - | 269±1μs | 11.9±0.08μs | 0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'B') |
| - | 309±0.2μs | 12.8±0.05μs | 0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'b') | |
|
Wow! Thanks Sayed! |
Not that good, the GCC compiler was expected to apply something similar to Harley-Seal popcount algorithm for optimized machine code generation. However, it defaulted to using the standard popcnt instruction without additional optimizations. To potentially enhance performance, we could improve the existing "fast macros" to provides better hints for the compiler maybe by manually unrolling specialized loops counting on C++ meta or/and implementing the algorithm with handwritten generic SIMD intrinsics. |
|
@ganesh-k13 seems like nobody but me has much squirms about diverging from Python naming. Could you rebase/merge to fix the |
03e77b7 to
a7cb8a9
Compare
| assert_type(np.bitwise_count.signature, None) | ||
| assert_type(np.bitwise_count.identity, None) | ||
| assert_type(np.bitwise_count(i8), Any) | ||
| assert_type(np.bitwise_count(AR_i8), Any) |
There was a problem hiding this comment.
This is failing now. I'll take a look tomorrow.
There was a problem hiding this comment.
I think npt.NDArray[Any] is correct?
There was a problem hiding this comment.
Correct, I'm somewhat surprised that it even produced an Any here at one point. Though it could have been accepted due to the somewhat broad substring matching used previously.
There was a problem hiding this comment.
Ah I see. That makes sense why it was passing before.
a7cb8a9 to
9e052cd
Compare
|
@seberg I've resolved the conflicts and UT failures, let me know if anything else needed here. |
|
@seberg any other changes needed here? |
|
Sorry, do you have time to resolve the conflict, I think that was really all that was left here, just I was in and out a bit much. |
ENH, DOC: Added countbits (popcount)
ENH: Popcount implementation
ENH: Add popcount to umath
ENH: Added countbits (popcount) to umath `__all__`
ENH: Refined popcount logic
DOC: Added `bit_count`
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
MAINT: Renamed `countbits` to `bit_count`
MAINT: Fixed 4 1s magic number
DOC: Added `popcount` to docstring
ENH: Added bit_count annotations
ENH: Added GNU/CLANG popcount
DOC: Added `popcount` language example
ENH, BUG: Moved `bitcount` to npy_math.h as `popcount` | Fixed final right shift
ENH: Enable `popcount` for signed
TST: Tests for `bit_count`
BUG, DOC: (BUG) Added missing typecast causing an unwanted upcast
(DOC) Added more details on `popcount` implementation
MAINT, BUG: (MAINT) Refined `popcount` TC to use typecode
(BUG) Fixed ufunc.ntypes to include signed ints
ENH: Added windows builtin support
ENH: Added `popcount` implementation for big python ints natively
[1/2] `popcount` object loop changes
ENH: Object loop for `bit_count`
[2/2] `popcount` object loop changes
TST: Refined `bit_count` tests and added object type
ENH: Added `bit_count` to `np.int*`
DOC: Added `np.bit_count` (numpy#19355)
MAINT: Various linting and minor fixes:
1. Fixed passing all args to _internals umath bitcount.
Note: We use kwargs here that might hinder performance
2. Fixed linting errors.
3. Improved verbosity of logs
4. Made a generic TO_BITS_LEN macro to accomdate more length based
functions in future
BENCH: Added bit_count (popcount)
MAINT: Style nits | Added signed case
DOC, MAINT: Improved example
ENH: Added annotations for bit_count
TST: Added annotations tests for bit_count
MAINT: Fixed linting errors
MAINT: Moved Magic constants to npy_math_internal
MAINT: Remove python implementation | Added 3.10 check to tests
DOC: Added abs value usage to doc
MAINT: Resolved merge conflicts
WIP: Fix the typing
* Changed popcount -> bitwise_count * Added comment on why we remove certain function in float64 object loops * Comment on when we skip bitwise_count
* Added return type for integer inputs * Refined release note * Refined external references for popcount
9e052cd to
e8c8824
Compare
|
@seberg it seems to pass now 👍 , good to go from my end. |
|
Thanks, let's get it in before it diverges again. |
|
This is probably the longest I've worked on a PR :). Now time to get hex/fromhex in 😉 |
|
This is great! One observation – the implementation does not appear to support general output dtypes: >>> import numpy as np
>>> np.bitwise_count(2, dtype='int32')
# ---------------------------------------------------------------------------
# TypeError Traceback (most recent call last)
# <ipython-input-6-f7427f7dde09> in <cell line: 2>()
# 1 import numpy as np
# ----> 2 np.bitwise_count(2, dtype='int32')
# TypeError: No loop matching the specified signature and casting was found for ufunc bitwise_countIt would be nice if we could support the |
|
Probably could do something about it, but the problem is that So to make it work, you would have to explicitly teach the ufunc that |
bitwise_countUFuncsExpose
bit_countas an UFunc.This is a raw port of #19355, specifically from 600947e to 4ee8a40
Side Note:
Anyone new can use this a template to add new UFuncs (I guess). Related to #19494
TODO:
generate_umathContinues: #19355
Resolves: #16325