Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1280 +/- ##
==========================================
+ Coverage 97.47% 97.49% +0.01%
==========================================
Files 234 234
Lines 2776 2793 +17
==========================================
+ Hits 2706 2723 +17
Misses 70 70 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
This PR fixes incorrect date parsing for multi-word number representations in Russian by enhancing the date parsing logic to properly handle compound ordinal numbers (e.g., "двадцать первое" = "twenty first" = 21st).
- Expands Russian language simplification patterns to include ordinal forms and additional grammatical cases for numbers 1-50
- Implements specialized processing logic for Russian compound ordinals that mathematically combines separate number components
- Adds comprehensive test coverage for multi-word date expressions in Russian (20th-31st of various months)
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/test_search.py |
Adds parameterized test cases for Russian multi-word date expressions |
dateparser_data/supplementary_language_data/date_translation_data/ru.yaml |
Expands Russian number simplification patterns with ordinal forms and grammatical cases |
dateparser/languages/locale.py |
Implements Russian-specific compound ordinal processing logic |
dateparser/data/date_translation_data/ru.py |
Updates compiled Russian translation data with expanded number patterns |
Comments suppressed due to low confidence (1)
tests/test_search.py:1110
- The test cases only cover February and March dates. Consider adding test cases for other months to ensure the parsing logic works consistently across all months.
text="Ужасное событие произошло в тот день. Двадцатое февраля. Вспоминаю тот день с ужасом.",
| number_pair_pattern = r"\b(\d+)\s+(\d+)\b" | ||
| date_string = re.sub(number_pair_pattern, replace_number_pairs, date_string) |
There was a problem hiding this comment.
The regex pattern should be compiled as a module-level constant since it's used repeatedly, which would improve performance and maintainability.
| number_pair_pattern = r"\b(\d+)\s+(\d+)\b" | |
| date_string = re.sub(number_pair_pattern, replace_number_pairs, date_string) | |
| date_string = re.sub(NUMBER_PAIR_PATTERN, replace_number_pairs, date_string) |
| ( | ||
| expected_text, | ||
| datetime.datetime( | ||
| datetime.datetime.now().year, expected_month, expected_day, 0, 0 |
There was a problem hiding this comment.
Using datetime.datetime.now().year in tests can make them non-deterministic and fail when run across year boundaries. Consider using a fixed year value instead.
| datetime.datetime.now().year, expected_month, expected_day, 0, 0 | |
| 2025, expected_month, expected_day, 0, 0 |
Close #1260