ENH: Added `bitwise_count` UFuncs by ganesh-k13 · Pull Request #21429 · numpy/numpy

ganesh-k13 · 2022-05-03T11:59:48Z

`bitwise_count` UFuncs

Expose bit_count as an UFunc.

>>> a = (2**np.arange(20) - 1)
>>> a
array([     0,      1,      3,      7,     15,     31,     63,    127,
          255,    511,   1023,   2047,   4095,   8191,  16383,  32767,
        65535, 131071, 262143, 524287])
>>> np.bitwise_count(a)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

This is a raw port of #19355, specifically from 600947e to 4ee8a40

Side Note:

Anyone new can use this a template to add new UFuncs (I guess). Related to #19494

TODO:

Figure out if it's up to date with latest main
Add bencmarks (at least now :) )
Commits are a mess, fix it
Figure out the return dtype
Clean generate_umath

Continues: #19355
Resolves: #16325

seberg

Cool! One small comment for simplifying a bit, but the code looks great on first sight!

My two main questions are:

Did we decide what the output dtype is? The default integer, the same integer?
While I am in favor of adding the bit_count ufunc (which still may need more discussion unfortunately), I would still need a bit convincing on adding a method, it doesn't seem as clear to me. There could be an argument of where to add the ufunc, but hopefully we don't lose track of this because of such questions.

seberg · 2022-05-04T19:05:19Z

numpy/core/src/umath/funcs.inc.src

+
+    /* Try to use inbuilt popcount if available */
+    static PyObject *builtin_popcount_func = NULL;
+    builtin_popcount_func = PyObject_GetAttrString(obj, "bit_count");


Hmm, most ufuncs already try to call a method of the same name when it exists? There should be a shorthand to do this in generate_umath without the need to define this function.

ohh yeah, let me see if we can auto-gen this, will make things easier.

Ohh also makes sense now, why not all umath functions are here.

Hey @seberg , I dug a bit deeper here and it seems all the o supported dtypes have a function in funcs.inc.srs and it does not seem automatic. Am I missing something here?

Ah, there is the "P" character code that means: "Object, but just call the method".

Ah nice, I missed it. Thanks! Added via b86a322

ganesh-k13 · 2022-05-05T03:43:52Z

Thanks for the review @seberg

Did we decide what the output dtype is? The default integer, the same integer?

For the scalar version we decided to always output an 8 bit (uint8_t). Although here it might be the case, will need to change it.

There could be an argument of where

This brings a very good point on if we can create a np.bits namespace of sorts and club the bits stuff there, what do you think?

softwaredoug · 2022-11-24T14:32:59Z

This PR seems so close. Anything I can do to help get this over the finish line @ganesh-k13? For certain similarity functions (like hamming distance) the bit count function is the bottleneck, and this could really unlock pure python solutions without needing to resort to something in a lower level language.

ganesh-k13 · 2022-11-24T14:42:02Z

Hey @softwaredoug , we support bit_count today and was added via #19355. This PR is mainly to get a bit_count UFunc so we can operate it on arrays, etc.

Well to answer your question on moving it forward, implementation wise, perhaps the generate_umath can be made simpler. But the bigger hurdle would be convince the mailing list that this function is needed and doesn't add more to the already bloated namespace.

So in conclusion, if you feel this has a genuine use case for many in the community (the UFunc of course, the scalar version is already in), you can start a thread in the mailing list. Now I can prioritize this to completion if everyone is ok with adding this UFunc and if you want to contribute, I can help you in any way with terms of navigating this bit.

seberg · 2022-11-24T14:44:59Z

I will give my 👍 for the ufunc, but not for the method (on the array object). Feel free to announce to the mailing list that pending no substantial objects we will move forward with adding it.

The return dtype has to be resolved, but I guess it was defined as uint8 for the scalars? In which case we have prior art.

softwaredoug · 2022-11-24T15:02:19Z

Awesome - thanks @ganesh-k13 and @seberg

Maybe this is my numpy ignorance, but is there a way to efficiently apply the scalar function to a whole array? Without something like apply_along_axis which in my mental model wouldn't be particularly fast?

seberg · 2022-11-24T15:58:16Z

No, there isn't an efficient way (maybe a JIT'er like numba can do it, not sure). You could write your own ufunc, its not actually that terribly hard, but the learning curve is just quite steep, especially if you are not used to C (ufunclab is my goto example for external ufuncs currently).

ganesh-k13 · 2022-11-26T14:20:52Z

Hey @BvB93 , basic question: If we fix the return type of UFunc to one dtype, where do we make the typing changes?

Now I do get ValueError: Unexpected reveal line format: 23: error: Name "ctypes._Pointer" is not defined in numpy/typing/tests/test_typing.py, but I don't know how to interpret this. I guess we need to make some change in numpy/typing/tests/data/reveal/ufuncs.pyi?

ganesh-k13 · 2022-12-05T11:19:00Z

Updated bit_count (just the UFunc) to bitwise_count. Reference to discussion regarding the same: discussions

seberg

Thanks. Lots of nitpicks of the optional kind. I have to think about the naming a bit more.
Overall, I lean against adding any methods, which is the only meaningful change besides possible name bike-shedding.

numpy/core/src/multiarray/scalartypes.c.src

doc/release/upcoming_changes/19355.new_feature.rst

numpy/core/code_generators/ufunc_docstrings.py

seberg · 2022-12-05T12:54:00Z

numpy/core/src/multiarray/methods.c

        METH_NOARGS, NULL},
+    {"bitwise_count",
+        (PyCFunction)array_bitwise_count,
+        METH_VARARGS | METH_KEYWORDS, NULL},


Same as for bit_count, I would just drop the method. I could see a point to bit_count() for supporting the nested array case, but overall I am leaning against both.

This might be handy? If we want to support it on other sequences? I'm guessing there will be some other way though without this addition? happy to do both

I removed it via d7f364f. I'm a bit confused now, I thought this adds bitwise_count to np namespace?

seberg · 2022-12-05T12:58:58Z

numpy/core/tests/test_ufunc.py

 UNARY_UFUNCS = [obj for obj in np.core.umath.__dict__.values()
                    if isinstance(obj, np.ufunc)]
 UNARY_OBJECT_UFUNCS = [uf for uf in UNARY_UFUNCS if "O->O" in uf.types]
+UNARY_OBJECT_UFUNCS.remove(getattr(np, 'bitwise_count'))


Just in case someone wonders: it seems the path assumes float64 loops to exist, which is not the case for bitwise.

numpy/core/tests/test_umath.py

ganesh-k13 · 2023-03-18T06:10:58Z

Resolved merge conflicts (rebased with latest main).

~~[EDIT] Should have built locally before pushing, checking...~~ Bad merge, fixed now

ganesh-k13 · 2023-03-18T06:13:59Z

Long overdue look at the Array API standard. Two callouts:

elementwise_functions section does not include bitwise_count, but a note: All other functions that operates on bits use bitwise_*
bitwise_operators specification only talks of: Bitwise operators should be defined for arrays having integer and boolean data types., which we do follow in this change list.

numpy/core/code_generators/ufunc_docstrings.py

ganesh-k13 · 2023-04-30T18:14:28Z

Are any more changes needed here @seberg?

seiko2plus · 2023-08-31T01:04:27Z

I have made few changes to the code to allow auto-vectorization, and here the result on x86 with avx2 enabled:

❯ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

❯ gcc --version
gcc (GCC) 13.2.1 20230801

❯ spin bench --compare enh_16325_popcount_ufunc_before -t bitwise_count
| Change   | Before [03fc9291] <enh_16325_popcount_ufunc_before>   | After [fc6b5911] <enh_16325_popcount_ufunc>   |   Ratio | Benchmark (Parameter)                                                             |
|----------|-------------------------------------------------------|-----------------------------------------------|---------|-----------------------------------------------------------------------------------|
| -        | 253±0.1μs                                             | 170±0.1μs                                     |    0.67 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'H') |
| -        | 180±0.4μs                                             | 58.9±0.6μs                                    |    0.33 | bench_ufunc.UFunc.time_ufunc_types('bitwise_count')                               |
| -        | 353±0.8μs                                             | 117±1μs                                       |    0.33 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'h') |
| -        | 564±0.5μs                                             | 121±0.2μs                                     |    0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'l') |
| -        | 566±4μs                                               | 123±0.4μs                                     |    0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'q') |
| -        | 621±3μs                                               | 108±1μs                                       |    0.17 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'i') |
| -        | 562±0.5μs                                             | 79.6±0.2μs                                    |    0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'L') |
| -        | 562±0.4μs                                             | 80.5±0.2μs                                    |    0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'Q') |
| -        | 679±3μs                                               | 75.1±0.06μs                                   |    0.11 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'I') |
| -        | 269±1μs                                               | 11.9±0.08μs                                   |    0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'B') |
| -        | 309±0.2μs                                             | 12.8±0.05μs                                   |    0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'b') |

ganesh-k13 · 2023-08-31T01:06:57Z

Wow! Thanks Sayed!

seiko2plus · 2023-08-31T02:17:41Z

Wow! Thanks Sayed!

Not that good, the GCC compiler was expected to apply something similar to Harley-Seal popcount algorithm for optimized machine code generation. However, it defaulted to using the standard popcnt instruction without additional optimizations. To potentially enhance performance, we could improve the existing "fast macros" to provides better hints for the compiler maybe by manually unrolling specialized loops counting on C++ meta or/and implementing the algorithm with handwritten generic SIMD intrinsics.

seberg · 2023-09-06T09:23:03Z

@ganesh-k13 seems like nobody but me has much squirms about diverging from Python naming. Could you rebase/merge to fix the ufuncs.pyi conflict so that we can get this in?

ganesh-k13 · 2023-09-06T18:08:09Z

numpy/typing/tests/data/reveal/ufuncs.pyi

+assert_type(np.bitwise_count.signature, None)
+assert_type(np.bitwise_count.identity, None)
+assert_type(np.bitwise_count(i8), Any)
+assert_type(np.bitwise_count(AR_i8), Any)


This is failing now. I'll take a look tomorrow.

I think npt.NDArray[Any] is correct?

Correct, I'm somewhat surprised that it even produced an Any here at one point. Though it could have been accepted due to the somewhat broad substring matching used previously.

Ah I see. That makes sense why it was passing before.

ganesh-k13 · 2023-09-12T09:36:49Z

@seberg I've resolved the conflicts and UT failures, let me know if anything else needed here.

ganesh-k13 · 2023-09-20T00:56:03Z

@seberg any other changes needed here?

seberg · 2023-09-27T17:03:15Z

Sorry, do you have time to resolve the conflict, I think that was really all that was left here, just I was in and out a bit much.

ENH, DOC: Added countbits (popcount) ENH: Popcount implementation ENH: Add popcount to umath ENH: Added countbits (popcount) to umath `__all__` ENH: Refined popcount logic DOC: Added `bit_count` Co-authored-by: Eric Wieser <wieser.eric@gmail.com> MAINT: Renamed `countbits` to `bit_count` MAINT: Fixed 4 1s magic number DOC: Added `popcount` to docstring ENH: Added bit_count annotations ENH: Added GNU/CLANG popcount DOC: Added `popcount` language example ENH, BUG: Moved `bitcount` to npy_math.h as `popcount` | Fixed final right shift ENH: Enable `popcount` for signed TST: Tests for `bit_count` BUG, DOC: (BUG) Added missing typecast causing an unwanted upcast (DOC) Added more details on `popcount` implementation MAINT, BUG: (MAINT) Refined `popcount` TC to use typecode (BUG) Fixed ufunc.ntypes to include signed ints ENH: Added windows builtin support ENH: Added `popcount` implementation for big python ints natively [1/2] `popcount` object loop changes ENH: Object loop for `bit_count` [2/2] `popcount` object loop changes TST: Refined `bit_count` tests and added object type ENH: Added `bit_count` to `np.int*` DOC: Added `np.bit_count` (numpy#19355) MAINT: Various linting and minor fixes: 1. Fixed passing all args to _internals umath bitcount. Note: We use kwargs here that might hinder performance 2. Fixed linting errors. 3. Improved verbosity of logs 4. Made a generic TO_BITS_LEN macro to accomdate more length based functions in future BENCH: Added bit_count (popcount) MAINT: Style nits | Added signed case DOC, MAINT: Improved example ENH: Added annotations for bit_count TST: Added annotations tests for bit_count MAINT: Fixed linting errors MAINT: Moved Magic constants to npy_math_internal MAINT: Remove python implementation | Added 3.10 check to tests DOC: Added abs value usage to doc MAINT: Resolved merge conflicts

WIP: Fix the typing

* Changed popcount -> bitwise_count * Added comment on why we remove certain function in float64 object loops * Comment on when we skip bitwise_count

* Added return type for integer inputs * Refined release note * Refined external references for popcount

ganesh-k13 · 2023-09-28T00:52:46Z

@seberg it seems to pass now 👍 , good to go from my end.

seberg · 2023-09-28T06:04:18Z

Thanks, let's get it in before it diverges again.

ganesh-k13 · 2023-09-28T06:07:34Z

This is probably the longest I've worked on a PR :). Now time to get hex/fromhex in 😉

jakevdp · 2023-10-02T16:52:57Z

This is great! One observation – the implementation does not appear to support general output dtypes:

>>> import numpy as np
>>> np.bitwise_count(2, dtype='int32')
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# <ipython-input-6-f7427f7dde09> in <cell line: 2>()
#       1 import numpy as np
# ----> 2 np.bitwise_count(2, dtype='int32')

# TypeError: No loop matching the specified signature and casting was found for ufunc bitwise_count

It would be nice if we could support the dtype argument.

seberg · 2023-10-04T09:30:54Z

Probably could do something about it, but the problem is that dtype= doesn't really mean output_dtype= which can only be achieved by out=.... dtype= means more something like "operation dtype".

So to make it work, you would have to explicitly teach the ufunc that dtype=int32 should just use the int8 loop (and cast). Which is actually possible by adding a promoter.
(Or it is be possible via some slight extensions/changes, by saying that it is really an to integer loop that happens to use (u)int8.)

github-actions bot added the 01 - Enhancement label May 3, 2022

seberg reviewed May 4, 2022

View reviewed changes

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 015a1a2 to abcfc59 Compare September 18, 2022 03:44

seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Nov 24, 2022

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from abcfc59 to 050127a Compare November 26, 2022 13:16

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch 3 times, most recently from 6738683 to bcd7a6f Compare December 2, 2022 13:31

ganesh-k13 marked this pull request as ready for review December 3, 2022 12:22

ganesh-k13 requested a review from seberg December 3, 2022 12:22

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from bcd7a6f to 1731966 Compare December 5, 2022 11:13

seberg reviewed Dec 5, 2022

View reviewed changes

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 1731966 to 0ff8230 Compare December 11, 2022 06:47

ganesh-k13 changed the title ~~ENH: Added bit_count UFuncs~~ ENH: Added bitwise_count UFuncs Dec 12, 2022

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 0ff8230 to 47ad3bd Compare January 25, 2023 16:30

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 47ad3bd to e985c36 Compare March 18, 2023 06:10

ganesh-k13 commented Mar 18, 2023

View reviewed changes

numpy/core/code_generators/ufunc_docstrings.py Show resolved Hide resolved

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from e985c36 to 5928575 Compare March 18, 2023 06:27

ganesh-k13 requested a review from seberg March 28, 2023 03:49

neworderofjamie mentioned this pull request Aug 31, 2023

Spike count record genn-team/ml_genn#73

Merged

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch 2 times, most recently from 03e77b7 to a7cb8a9 Compare September 6, 2023 18:02

ganesh-k13 commented Sep 6, 2023

View reviewed changes

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from a7cb8a9 to 9e052cd Compare September 8, 2023 14:22

ganesh-k13 and others added 10 commits September 27, 2023 23:28

ENH: Changed bit_count UFunc return to int8

8867a45

WIP: Fix the typing

MAINT: Remove custom o bit_count and use P

b2158f9

MAINT: Change bit_count UFunc to bitwise_count

5d299c4

MAINT: Removed method addition of bitwise_count

8cc17a7

TST: Misc bitwise_count changes

08047d5

* Changed popcount -> bitwise_count * Added comment on why we remove certain function in float64 object loops * Comment on when we skip bitwise_count

DOC: Improved bitwise_count documentation

6a7a231

* Added return type for integer inputs * Refined release note * Refined external references for popcount

MAINT: Added bitwise_count to init import

fcfeb45

ENH, SIMD: Allow auto-vectorzing bitwise_count

4886a7a

BENCH: Add deep datatype based benchmark for bitwise_count

e8c8824

ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 9e052cd to e8c8824 Compare September 27, 2023 23:33

seberg merged commit 5d4d05a into numpy:main Sep 28, 2023

jakevdp mentioned this pull request Oct 2, 2023

Add jax.numpy.bitwise_count() jax-ml/jax#17885

Merged

Uh oh!

Conversation

ganesh-k13 commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

bitwise_count UFuncs

Side Note:

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganesh-k13 commented May 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

softwaredoug commented Nov 24, 2022

Uh oh!

ganesh-k13 commented Nov 24, 2022

Uh oh!

seberg commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

softwaredoug commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Nov 24, 2022

Uh oh!

ganesh-k13 commented Nov 26, 2022

Uh oh!

ganesh-k13 commented Dec 5, 2022

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ganesh-k13 commented Mar 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganesh-k13 commented Mar 18, 2023

Uh oh!

Uh oh!

ganesh-k13 commented Apr 30, 2023

Uh oh!

seiko2plus commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ganesh-k13 commented Aug 31, 2023

Uh oh!

seiko2plus commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Sep 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganesh-k13 commented May 3, 2022 •

edited

Loading

`bitwise_count` UFuncs

ganesh-k13 commented May 5, 2022 •

edited

Loading

seberg commented Nov 24, 2022 •

edited

Loading

softwaredoug commented Nov 24, 2022 •

edited

Loading

ganesh-k13 commented Mar 18, 2023 •

edited

Loading

seiko2plus commented Aug 31, 2023 •

edited

Loading

seiko2plus commented Aug 31, 2023 •

edited

Loading