[WIP] Fix word-number phrases parsing (e.g., "two days later", "two days from now")#1317
Conversation
Co-authored-by: serhii73 <24910277+serhii73@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## fix/word-number-phrases-parsing #1317 +/- ##
================================================================
Coverage 96.63% 96.63%
================================================================
Files 235 235
Lines 2915 2915
================================================================
Hits 2817 2817
Misses 98 98 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes English relative phrase parsing for expressions like “two days later” / “two days from now” by adjusting the English simplification rules so word-number quantities are recognized and “from now” is simplified correctly before parsing.
Changes:
- Update English simplifications to match word numbers in the
… laterpattern. - Replace the broken
from nowlookbehind simplification with a full-phrase match. - Add parameterized tests covering word numbers, numeric values, and pluralization variants for both “later” and “from now”.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/test_date_parser.py |
Adds regression tests for word-number + “later/from now” relative parsing. |
dateparser_data/supplementary_language_data/date_translation_data/en.yaml |
Updates English simplification regexes for “from now” and “later”. |
dateparser/data/date_translation_data/en.py |
Mirrors the English simplification updates in the Python translation data. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - from\s+now: in | ||
| - less than 1 minute ago: 45 second ago | ||
| - (\d+[.,]?\d*) (decade|year|month|week|day|hour|minute|second)s? later: in \1 \2 | ||
| - (\d+[.,]?\d*|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve) (decade|year|month|week|day|hour|minute|second)s? later: in \1 \2 |
There was a problem hiding this comment.
The later simplification now hardcodes the supported word-number list (one…twelve) even though the same word→digit mappings are defined immediately below. This duplication is easy to desync if more number words get added later. Consider keeping the later regex numeric-only and moving it after the word→digit simplifications (or otherwise centralizing the number-word pattern) so future additions don’t require editing multiple rules.
| { | ||
| "(\\d+[.,]?\\d*) (decade|year|month|week|day|hour|minute|second)s? later": "in \\1 \\2" | ||
| "(\\d+[.,]?\\d*|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve) (decade|year|month|week|day|hour|minute|second)s? later": "in \\1 \\2" | ||
| }, |
There was a problem hiding this comment.
The later simplification regex now embeds a word-number alternation (one…twelve) while separate simplifications below already convert those words to digits. This duplicates the source of truth and can drift if more number words are added later. Consider reordering so word→digit conversions run before the later rule (keeping the later pattern numeric-only), or otherwise centralizing the number-word pattern.
…ocumentation-examples
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result, | ||
| f"{description}: Expected {expected}, got {result}", | ||
| ) | ||
|
|
There was a problem hiding this comment.
There’s trailing whitespace on this blank line; please remove it to keep the file clean (and avoid potential lint/pre-commit failures).
| def test_word_numbers_advanced(self, date_string, expected_delta, description): | ||
| """Test number parsing with word numbers (1-12) in 'from now' phrases.""" |
There was a problem hiding this comment.
The PR description says a test named test_word_numbers_with_later_and_from_now was added, but the new test here is named test_word_numbers_advanced. Please either rename the test to match its scope (e.g., explicitly mention from now) or update the PR description so they stay in sync.
* Fix word-number phrases parsing (e.g., 'two days later') Close #1314 * Fix word-number phrases parsing (e.g., "two days later", "two days from now") (#1317) * Initial plan * Fix word-number phrases parsing for 'later' and 'from now' patterns Co-authored-by: serhii73 <24910277+serhii73@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: serhii73 <24910277+serhii73@users.noreply.github.com> Co-authored-by: Serhii A <aserhii@protonmail.com> * Update tests/test_date_parser.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply pre-commit * Fix approximate day counts in word number tests - Update 'four months later' from 120 to 122 days (June 15 to Oct 15, 2025) - Update 'six years later' from 2190 to 2191 days (accounts for leap year in 2028) - These values now match the actual calendar arithmetic performed by the parser --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: serhii73 <24910277+serhii73@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Phrases like
"two days later"and"two days from now"returnedNonedue to two bugs in the English simplification pipeline.Root Causes
laterpattern — regex required\d+, but word-to-number substitutions ("two"→"2") run after it, so the pattern never matched word numbers.from nowsimplification —(?<=from\s+)now → inused a lookbehind that replaced onlynow, leavingfromin the string (e.g.,"two days from in"), which is unparseable regardless of ordering.Changes
en.yaml/en.py—laterpattern: Extended the capture group to match word numbers alongside digits:en.yaml/en.py—from nowsimplification: Replaced broken lookbehind with a full-phrase match:Tests: Added
test_word_numbers_with_later_and_from_nowcovering both patterns with word numbers, numeric values, and singular/plural variants. Calendar-based units (months, years) assert onlyresult > base_datesince they aren't fixed-length deltas.Before / After
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.