-
Notifications
You must be signed in to change notification settings - Fork 489
Description
I wanted to raise this as an issue instead of as a PR in case you have a preference on whether to fix it in the data ordering or in the application logic. I am using version 1.30.0 as of this issue.
Phrases like "two days from now" and "two days later" return None, but "2 days later" works fine. The problem is that the word-to-number simplifications ("two" -> "2", etc.) are applied after the patterns that depend on numeric digits.
Reproduction
import dateparser
from datetime import datetime
base = datetime(2025, 6, 15, 12, 0, 0)
settings = {'RELATIVE_BASE': base}
dateparser.parse('2 days later', settings=settings) # 2025-06-17 (works)
dateparser.parse('two days later', settings=settings) # None
dateparser.parse('two days from now', settings=settings) # NoneCause
In dateparser/data/date_translation_data/en.py, the simplifications list is ordered like this:
index 5: '(?<=from\\s+)now' -> 'in'
index 7: '(\\d+[.,]?\\d*) (decade|year|month|...|second)s? later' -> 'in \\1 \\2'
...
index 9: 'two' -> '2'
index 10: 'three' -> '3'
...
The later pattern at index 7 requires \d+, but the word-to-number conversions don't run until index 8+. By the time "two" becomes "2", the later regex has already been applied and missed.
The from now simplification at index 5 has a separate problem on top of this: the lookbehind (?<=from\s+)now only replaces now with in, leaving from in place. So "two days from now" becomes "two days from in", which is not parseable regardless of ordering.
The application logic is in dateparser/languages/locale.py lines 438-442, which iterates through simplifications in list order:
def _apply_simplifications(self, date_string, simplifications):
for simplification in simplifications:
pattern, replacement = list(simplification.items())[0]
date_string = pattern.sub(replacement, date_string).lower()
return date_stringSuggested fix
-
Move the word-to-number simplifications (
"one"through"twelve") above thelaterandfrom nowpatterns so that digits are available when those regexes run. -
Change the
from nowsimplification from(?<=from\s+)now->into something likefrom\s+now->inso that both words get replaced.
Do note that my suggested fix will create something like 'N days in', but I verified that the full pipeline handles this correctly.
If you would like, I can make the changes and open the PR