Speed up parsing simple unit strings with Unit#17004
Conversation
|
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
mhvk
left a comment
There was a problem hiding this comment.
This seems like an excellent idea! But also a bit complex. How about extracting what is done to the top and doing it only for generic? I.e., something like,
# seems more logical to switch order
if isinstance(s, bytes):
s = s.decode("ascii")
# might as well speed this up too.
f = unit_format.Generic if format is None else unit_format.get_format(format)
# For Generic, try to short-circuit simple units.
if (f is unit_format.Generic
and (unit := get_current_unit_registry().registry.get(s))):
return unit
Of course, this could still keep the simplification you had in _do_parse, since really the short-circuit there is no longer needed.
|
|
The benefit is not as obvious here - it is a slight speed-up, but mostly is there to ensure that #16892 will not slow things down. |
|
I don't know what the definition of "significant" is in our benchmarks, but if you dig through the logs you will see that the difference between the relevant timings is much larger than their uncertainty, i.e. the change is significant: https://github.com/astropy/astropy/actions/runs/10836066679/job/30068962593?pr=17004#step:7:636 https://github.com/astropy/astropy/actions/runs/10836066679/job/30068962593?pr=17004#step:7:1143 |
92da66a to
6ca0c90
Compare
|
Replying to the suggestions by @mhvk
There was no significant speedup from placing this check after the shortcut.
Changing this part had no noticeable effect on performance, but getting rid of the redundant
My updates speed up parsing with all the |
|
The main benefit I saw of my suggestion is that it does not rely on private methods at all. But it should also be (marginally) faster for the by-far most common case of |
|
Re: #17004 (comment) -- Maybe @MridulS knows since he implemented the infrastructure. |
|
The following patch causes multiple test failures: --- a/astropy/units/core.py
+++ b/astropy/units/core.py
@@ -2091,12 +2091,10 @@ class _UnitMetaClass(type):
if isinstance(s, bytes):
s = s.decode("ascii")
- try:
- return f._parse_unit(s, detailed_exception=False) # Try a shortcut
- except (AttributeError, ValueError):
- # No `f._parse_unit()` (AttributeError)
- # or `s` was a composite unit or had weird Unicode in it (ValueError).
- pass
+ if f is unit_format.Generic and (
+ unit := get_current_unit_registry().registry.get(s)
+ ):
+ return unit
try:
return f.parse(s) |
|
That's weird, I tested it yesterday. Will check. |
The reason is that your other change is interfering; if starting from your PR, I remove the try/except you added in |
|
My original suggestion was to move a shortcut from |
|
I'm actually not sure what is the best way right now. Certainly, the failure I mentioned above shows that current implementation is a bit broken - see #17010. I'll see if I can find some time over the weekend to solve that - that may also make clear what the best path forward is. |
|
I found that removing the short-circuit showed quite a few bugs - those also show up if one tries to create composite units, and are fixed in #17011. I think this in itself makes the case stronger not to try too much short-circuiting in the parsers - they should just parse! So, I had a trial for moving it up in #17012. |
A shortcut for non-composite units from the `Generic._do_parse()` method (inherited by all `astropy` unit formats that can parse) is made more effective by moving it to `Unit` because the shortcut is now attempted earlier. This does slow down parsing simple strings with the unit parser `parse()` methods, but those are not meant to be called by users anyways. They are public mostly to define our API for downstream developers.
6ca0c90 to
ecdd92b
Compare
mhvk
left a comment
There was a problem hiding this comment.
Looking at this again, this really is a fine & elegant solution. Let's just get it in!
Description
The updates do not slow down parsing strings representing composite units. The tradeoff is that they do slow down parsing simple strings with the unit parser
parse()methods, but those are not meant to be called by users. They are public mostly to define our API for downstream developers.Setup:
$ ipython -i -c 'from astropy import units as u; from astropy.units.format import Generic, CDS, FITS, OGIP, VOUnit'On
main:In this pull request: