-
Notifications
You must be signed in to change notification settings - Fork 523
implement next_jump, prev_jump to fix first-century cases #883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dateutil/parser/_parser.py
Outdated
| return year, month, day | ||
|
|
||
|
|
||
| def next_jump(tokens, idx, skip_comma=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a method somewhere, no?
I definitely think it shouldn't be a public, top-level function yet. Can you move it onto either parser or parserinfo (whichever you think is best), and start it out as private (_next_jump)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a method somewhere, no? [...] Can you move it onto either
parserorparserinfo
Downstream I actually have these as methods in a TokenStream class (subclasses list) and tokens = TokenStream(timestr) takes the place of l = _timelex(timestr). In #884 I'll describe some of the other virtues of the TokenStream class. If this is made into a method, I think TokenStream is the right abstraction for where it belongs.
For the time being I've privatized the function.
pganssle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm fine with the logic of this change, to the extent that I understand it, just have some comments about the specifics of the implementation.
dateutil/test/test_parser.py
Outdated
| res = parse(dstr) | ||
| assert res.year == 2001, res | ||
|
|
||
| def test_first_century(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a parametrized test? I prefer "one assert per test" wherever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
dateutil/parser/_parser.py
Outdated
| Get the next sibling and its index, skipping over whitespace | ||
| and (if specified) commas. | ||
| Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the rest of the documentation, where we're using the standard sphinx style. I'm not opposed to the idea of converting over the documentation to using this numpy-style, but I don't know if I want it to be a mishmash, which confuses new contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, will try to convert (I'm much less familiar with the sphinx style, so will bear review). Other than "I like it better" Im not sure if there is a compelling reason to convert to the numpy-style. Regardless, that would be its own Issue.
dateutil/parser/_parser.py
Outdated
| return sib, sib_idx | ||
|
|
||
|
|
||
| def prev_jump(tokens, idx, skip_comma=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like you're actually using this function anywhere, other than in the function itself. Is it necessary to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it at the moment, but I thought I'd port it since it goes with next_jump. I'll remove it now and can re-introduce it if/when it becomes necessary.
|
|
||
| def next_jump(tokens, idx, skip_comma=True): | ||
| """ | ||
| Get the next sibling and its index, skipping over whitespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of what this is looking for would probably be useful to future software archaeologists. I think you have one somewhere in the comments of the function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do.
| # dropping leading zeros, e.g. "0031" represents 31 AD, | ||
| # not 2031 | ||
| # Note: this check must come before the check for info.jump below | ||
| ymd.append(value_repr, 'Y') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly surprised that we can put strings in the ymd tuple like this without it breaking anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ymd.append starts with a check for hasattr(val, '__len__') which I guess is intended to catch strings (not sure why it doesnt just check for strings directly...)
08b0774 to
58fea3b
Compare
58fea3b to
74db519
Compare
Fixes (with tests) currently-failing cases "0031 Jun 11" and "0031Jun11".
next_jumpandprev_jumpare from a fork I made ages ago to handle some corner cases. If accepted, they can be used to clean up a bunch of existing checks in a follow-up.