PERF: Optimize array check for bounded 0,1 values#20643
Conversation
|
xref #20636 |
|
Much less clean since it retains the Python-api method to handle anything that isn't double or contiguous, in case these are hit. I think this is only ever used on contiguous doubles, but I thought it was prudent to leave a slow path to avoid subtle bugs in case it could be hit. |
Optimize frequent check for probabilities when they are doubles
4c752fe to
84fb5f4
Compare
|
Did not merge yet because I find it mildly annoying how much complexity it adds. I realize that this adds at least 2µs overhead though, I assume this is a big deal for the functions in question? |
|
I'm not certain it should be merged either. Where does this optimization end? It would take a lot of line to cover all of the cases, and there are probably 3 or so that have 4+ API calls. |
|
Well, if it is a big deal somewhere... Maybe. The other thing is that we could start adding gufuncs for common use-cases (somewhere slightly hidden, like For the thing here, I don't like it, but if it makes a big difference in practice: The code isn't terrible to understand, we can't do it everywhere, but I guess that here we have a few things that make it extreme, such as probabilities typically being a very short list. |
|
It does get performance back to 1.16 levels in the case that started this. It is possible this one is particularly important since this check is commonly used when passing vectors of probabilities such as when using choice with non-uniform sampling. The key advantage of this is that I think the safety condition is never hit since this check is only used on well-behaved arrays due to earlier conversion of array-like. Can always wait on the others until we get other complaints. Another thought would be to slightly generalize and to use anonymous functions to do the comparrison where the function would take a double and return |
|
Thanks Kevin, lets get it in! I have still little love for it, but looking at this again, and it seems it can be a pretty massive performance different (the super simple one I tried sped up by a factor of almost 2). I think this could be better, e.g. it could support simple strides, it could be a ufunc, and probably more... but it should cover almost all interesting cases so not sure it matters. Lets just agree to not add this for every single case :). |
There could be scope for a more general-purpose approach. In this specific case I think all arrays that end up here are non-strided since they are generally pre-processed to be C contiguous to simplify the generation of the variates using loops over 0,...,n. The "holy grail" of random would be to use ufuncs broadly, which would add a lot of uniformity (dtype, out, simplified broadcasting). This would require another full rewrite, and the long term gains are probably not that great. |
Optimize frequent check for probabilities when they are doubles