Use gufunc fast parser in Time by mhvk · Pull Request #11718 · astropy/astropy

mhvk · 2021-05-09T01:47:33Z

@taldcroft - some time ago I wrote a gufunc replacement for the fast parser, in part because that means that shapes are automatically dealt with, but also because it can automatically deal with masked input (as it can be overridden with __array_ufunc__). I also found it nice that it more cleanly separates the parsing and loading parser information, dealing with status codes, etc. I cleaned it up a bit today, so we can discuss.

One thing I did then which in hindsight I probably shouldn't have, is change the format for the parser information, to a structured array. This is helpful for the ufunc, as it means I can just store the array inside, but obviously breaks API. So, at the very least I should add support for interpreting the old format. But I can also just keep the old format and convert it to my structured array format in __init_subclass__. Anyway, before I do anything I thought I'd better let you have a look and see which one you prefer.

EDIT: In this version, I replaced the time structured array with a tuple, as it seemed slightly easier (if only just in the erfa call at the end); I have another version where the structured array is kept.

github-actions · 2021-05-09T01:48:22Z

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

taldcroft · 2021-05-12T09:57:40Z

@mhvk - As mentioned in another PR, this seems to be basically a complete rewrite. Which is OK, but we do need to maintain the same external API. I have code that uses this API.

I do worry slightly that the C code is going from pretty simple to something that I honestly can't understand nor review beyond checking tests.

mhvk · 2021-05-12T12:42:52Z

@taldcroft - OK, I'll make sure that the format used to indicate how to parse remains supported.

Note that the actual parsing code is not really changed, it is just that the iteration is now done differently. Most of the rest on the C side is the boilerplate needed for ufuncs.

One other advantage of the ufunc is that it goes towards being able to support things like dask arrays in Time.

taldcroft · 2021-05-12T13:06:24Z

OK, sounds good! Thanks for bring this into the future.

I'll make sure that the format used to indicate how to parse remains supported.

This is a little ambiguous and I'm not sure if you want to introduce support for another format?

mhvk · 2021-05-12T14:16:05Z

@taldcroft - OK, I pushed the new version, which simply uses the old format, converting it as needed for the gufunc in __init_subclass__.

I did add a test that if a user defines their own _fast_parser, then that will be used. But it's simply relying on the private attribute being checked and I don't think we should expose it until we find more general use for it.

mhvk · 2021-05-12T16:47:39Z

astropy/time/src/parse_times.c

+    npy_bool break_allowed;
 };

-// Distutils on Windows automatically exports ``PyInit__parse_times``,


Hmm, clearly this is needed for ufuncs also. Going to put it back...

Same question, why is this no longer needed?

mhvk · 2021-05-12T20:24:52Z

@taldcroft - OK, tests pass now (or, rather, the failures are unrelated). While considering the windows failure (which turned out to be a forgotten bit in setup_package), I also hopefully addressed your comment that the code had become less readable, by adding more comments to the parser loop. I think/hope that if you now look at the C file as a whole, it is reasonably clear... Am definitely happy to add more comments, since obviously you and others have to be able to look at it and know what is going on!

taldcroft

I just started looking at this and made some trivial comments.

The real question I have is what is the driver for the big changes in parse_ymdhms_times which turned into parse_loop. If I understand they are doing the same thing and the main point of this PR was to put in the gufunc machinery outside of that inner loop.

What I note is that the original parse_ymdhms_times is extremely simple and it is pretty straightforward to understand the logic. Your new version is much more programmatically complex and I find it difficult to verify correctness by just reading.

Things like changing an explicit time structure to char **out = args+1; reduces readability IMO (and I prefer for clarity the pre-initialization to setting values within the logic). I don't think that the strategy of trying to loop over the elements in a generic way (vs. the original "unrolled loop") is winning anything for performance.

taldcroft · 2021-05-20T09:51:16Z

astropy/time/setup_package.py

 SRC_FILES = [os.path.join(C_TIME_PKGDIR, filename)
             for filename in ['src/parse_times.c']]

-if sys.platform.startswith('win'):


What changed so that these are no longer necessary? I know that it failed to build on Windows when I was developing this change originally.

I removed it initially since I have not needed it for other ufuncs (like in erfa), but must admit I don't really know -- I'm essentially relying on CI here...

taldcroft · 2021-05-20T09:53:54Z

astropy/time/src/parse_times.c

+    npy_bool break_allowed;
 };

-// Distutils on Windows automatically exports ``PyInit__parse_times``,


Same question, why is this no longer needed?

taldcroft · 2021-05-20T10:04:54Z

astropy/time/src/parse_times.c

+    "standard time strings."
+#define CREATE_PARSER_DOCSTRING \
+    "create_parser()\n\n" \
+    "Create a gufunc that parsers according to the passed in parameters.\n\n" \


that parses

taldcroft · 2021-05-20T12:45:32Z

One interesting thing is that I did some profiling to check the performance. For large input arrays (e.g. 1e6) there is no difference in timing. For small arrays (10), I see that your new version is substantially faster:

# 1000000: Main 265 ms, 264 ms for new
# 10: Main: 240 us, New 98.5 us
%timeit t = Time(times, format='iso')

That's great. Do you have an idea where that is coming from?

mhvk · 2021-05-20T18:53:52Z

@taldcroft - thanks for doing the profiling - I meant to do that myself, but never got around to it. I had actually been worried that the ufunc would be slower for small N, given the what I thought would be larger call overhead. But perhaps that is offset by pre-calculating the start/stop/deliminator array? Anyway, not quite sure but happy to see it actually is a true performance improvement!

mhvk · 2021-05-21T00:43:39Z

On the larger question, maybe I was indeed wrong to move away from the time struct: it felt cleaner but I can see that it has become less clear. I'll think a bit more.

mhvk · 2021-05-27T22:48:31Z

closing in favour of #11768

mhvk added time Performance labels May 9, 2021

mhvk added this to the v5.0 milestone May 9, 2021

mhvk requested a review from taldcroft May 9, 2021 01:47

mhvk added 3 commits May 12, 2021 09:09

Write a fast parsing gufunc

f11efea

Use gufunc parser.

37f25ae

Create gufunc at class creation time.

12b0aff

mhvk force-pushed the time-fast-parser-gufunc branch from 7fe2f47 to 6b64588 Compare May 12, 2021 14:12

mhvk added the no-changelog-entry-needed label May 12, 2021

mhvk commented May 12, 2021

View reviewed changes

mhvk added 2 commits May 12, 2021 13:31

Simpler gufunc without time struct.

74477cb

Might as well check that a user-defined fast parser works.

87d6d4c

mhvk force-pushed the time-fast-parser-gufunc branch from 6b64588 to 87d6d4c Compare May 12, 2021 17:32

Remove windows work-arounds

c48daa9

mhvk marked this pull request as ready for review May 12, 2021 23:27

taldcroft reviewed May 20, 2021

View reviewed changes

mhvk mentioned this pull request May 21, 2021

Use gufunc fast parser in Time (alternative) #11768

Merged

mhvk closed this May 27, 2021

mhvk deleted the time-fast-parser-gufunc branch May 27, 2021 22:48

Uh oh!

Conversation

mhvk commented May 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 9, 2021

Uh oh!

taldcroft commented May 12, 2021

Uh oh!

mhvk commented May 12, 2021

Uh oh!

taldcroft commented May 12, 2021

Uh oh!

mhvk commented May 12, 2021

Uh oh!

mhvk May 12, 2021

Choose a reason for hiding this comment

Uh oh!

taldcroft May 20, 2021

Choose a reason for hiding this comment

Uh oh!

mhvk commented May 12, 2021

Uh oh!

taldcroft left a comment

Choose a reason for hiding this comment

Uh oh!

taldcroft May 20, 2021

Choose a reason for hiding this comment

Uh oh!

mhvk May 21, 2021

Choose a reason for hiding this comment

Uh oh!

taldcroft May 20, 2021

Choose a reason for hiding this comment

Uh oh!

taldcroft May 20, 2021

Choose a reason for hiding this comment

Uh oh!

mhvk May 21, 2021

Choose a reason for hiding this comment

Uh oh!

taldcroft commented May 20, 2021

Uh oh!

mhvk commented May 20, 2021

Uh oh!

mhvk commented May 21, 2021

Uh oh!

mhvk commented May 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhvk commented May 9, 2021 •

edited

Loading