Skip to content

Word-number phrases like "two days later" and "two days from now" return None due to simplification ordering #1314

@OscarRunsCode

Description

@OscarRunsCode

I wanted to raise this as an issue instead of as a PR in case you have a preference on whether to fix it in the data ordering or in the application logic. I am using version 1.30.0 as of this issue.

Phrases like "two days from now" and "two days later" return None, but "2 days later" works fine. The problem is that the word-to-number simplifications ("two" -> "2", etc.) are applied after the patterns that depend on numeric digits.

Reproduction

import dateparser
from datetime import datetime

base = datetime(2025, 6, 15, 12, 0, 0)
settings = {'RELATIVE_BASE': base}

dateparser.parse('2 days later', settings=settings)       # 2025-06-17 (works)
dateparser.parse('two days later', settings=settings)      # None
dateparser.parse('two days from now', settings=settings)   # None

Cause

In dateparser/data/date_translation_data/en.py, the simplifications list is ordered like this:

index 5:  '(?<=from\\s+)now' -> 'in'
index 7:  '(\\d+[.,]?\\d*) (decade|year|month|...|second)s? later' -> 'in \\1 \\2'
...
index 9:  'two' -> '2'
index 10: 'three' -> '3'
...

The later pattern at index 7 requires \d+, but the word-to-number conversions don't run until index 8+. By the time "two" becomes "2", the later regex has already been applied and missed.

The from now simplification at index 5 has a separate problem on top of this: the lookbehind (?<=from\s+)now only replaces now with in, leaving from in place. So "two days from now" becomes "two days from in", which is not parseable regardless of ordering.

The application logic is in dateparser/languages/locale.py lines 438-442, which iterates through simplifications in list order:

def _apply_simplifications(self, date_string, simplifications):
    for simplification in simplifications:
        pattern, replacement = list(simplification.items())[0]
        date_string = pattern.sub(replacement, date_string).lower()
    return date_string

Suggested fix

  1. Move the word-to-number simplifications ("one" through "twelve") above the later and from now patterns so that digits are available when those regexes run.

  2. Change the from now simplification from (?<=from\s+)now -> in to something like from\s+now -> in so that both words get replaced.

Do note that my suggested fix will create something like 'N days in', but I verified that the full pipeline handles this correctly.
If you would like, I can make the changes and open the PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions