ENH: Implement faster argument parsing#15099
Conversation
This replaces the PyArgs_ParseTupleAndKeywords in most cases.
this also moves asarray/asanyarray, etc. to C to make up for the slight loss in argument parsing speeds it should now be preferred to use these functions when possible. This also uses the preferred functions in a few places
|
To what extent can you use |
|
Hmmm, I had not looked at it too much, since I thought it was not supported. But I suppose the important thing is that it is not "officially" supported, and probably everyone supports it, and they are making it official, so we can just use it on any python version. I will look into it, adapting the code to it should be fairly straight forwrad. |
|
Using argument clinic might be another way to get some performance boosts |
|
I looked at argument clinic at some point. But it is pretty baked into Python itself and seemed like we would have to extract it completely; Plus, quite frankly, I am not sure I like the large change in the actual code that it seemed like it does (although maybe I looked too much at more complex cases defining a whole class). Not sure I feel like going there. OTOH, of course this is 200 lines of code that is hand written for us... |
numpy/core/src/multiarray/common.c
Outdated
| * if no match was found), or -1 on failure. | ||
| */ | ||
| static NPY_INLINE npy_intp | ||
| locate_key(PyObject *const *kwnames, PyObject *key) |
There was a problem hiding this comment.
Hmm, this was the ufunc code, my code was slightly different... A bit of a mess up here, so it might change (although making it into an inlined function may be nicer.
|
Hehe, you are right of course... shoudl have move seriously considered it earlier. |
|
Hmmm, it might be that the main reason that I did not consider clinic is that I stored it as using private API, which would have been the FASTCALL switch. This is pretty nice for now for sure, I wonder if clinic really does all that much better... |
I think the benefit is it generates the code that parses the arguments, which saves you from doing so. Note also that FASTCALL is not supported on 3.6, so this would need to wait a release or two first. |
Ah, than that is why I did not consider it. Note that the changes in the code are fairly limited. I could easily solve this with something like 3-4 |
|
Here are the timing (with Master vs. `METH_FASTCALL` comparison UPDATE: Including ufunc (not including `tp_vectorcall` since python 3.7 |
8471383 to
d8c77f6
Compare
It might strip off a few ns for some reason, but probably not worth the trouble.
|
Hehehe, python 3.6 has |
|
Probably |
A lot of this is test fixes, since many things that were ValueErrors should really be TypeErrors
Frankly, this probably only helps a real amount for keyword arguments, which is not the most common thing. But METH_FASTCALL makes a lot of sense for reductions where kwargs are much more common.
92fba24 to
e52e247
Compare
|
It probably needs quite a bit of cleanup (and splitting up), but it now includes some cleanup of the ufunc code, unfortunately with a lot of |
8363955 to
6250e74
Compare
…umented and not always there?)
6250e74 to
e064932
Compare
It probably is just not worth the trouble of jumping through hoops to achieve it on other platforms...
|
It seems there is more than one PR here. Maybe you could split it into "code reorganization", "smaller cleanups" and "faster argument parsing" which would make reviewing easier, and would allow benchmarking the cleanups separately |
|
Not sure how easy it is to split up all the way, but I can at least split it up into: Faster argparsing + simple example; array coercion and UFunc related fixes. I am planning to do that, but was going to go back to the refcounting now first. |
This speeds up argument parsing quite a bit. Unfortunately, actually the local code for argparsing within
np.arrayitself is faster for an intermediate number of kwargs... Timing these speedups is tricky, in%timeitit is very clear, e.g. tryarr.astype(arr.dtype, copy=False).This currently works by adding a static struct to the calling function; There are probably a few micro-optimization tries in there that should be cleanedup. So for now marking it "draft" to see what people think.
One thing missing: ufunc argparsing does the same thing, so that duplicates code (that is quite a bit of cleanup which should go in here; I am dreading the merge conflict with the dtypes branch though...).