DateTimeParser Validation and Performance Improvements#4593
DateTimeParser Validation and Performance Improvements#4593matejk merged 5 commits intopocoproject:mainfrom
Conversation
Improve the validation of DateTimeParser by validating the numbers parsed for day, month, year, hour, minute, second, millis and micros. Improve performance by caching the RegularExpressions created in DateTimeFormat::isValid.
Additional change to validate milliseconds field if the . or , exists.
Foundation/src/DateTimeFormat.cpp
Outdated
| RegularExpression(DateTimeFormat::SORTABLE_REGEX) | ||
| }; | ||
| // make sure the regex list and the array of regexes are in sync | ||
| poco_assert((sizeof(regs) / sizeof(regs[0])) == REGEX_LIST.size()); |
There was a problem hiding this comment.
REGEX_LIST is used only in this place. I propose to remove it as a class member and have it as a static member in this function.
There was a problem hiding this comment.
I left it where it is because it's part of the public API of this class, unfortunately.
Wasn't sure if this would be 1.13.x or 1.14 and didn't want to make a breaking change just in case.
Foundation/src/DateTimeParser.cpp
Outdated
| using parse_iter = std::string::const_iterator; | ||
|
|
||
| namespace Poco { | ||
| [[nodiscard]] parse_iter skip_non_digits(parse_iter it, parse_iter end) |
There was a problem hiding this comment.
Shall these helper functions be inlined?
There was a problem hiding this comment.
No. inline is used in headers to avoid ODR issues. Free functions in cpp files should be either static or in an anonymous namespace to avoid IFNDR issues.
There was a problem hiding this comment.
@andrewauclair please don't introduce new naming conventions, we have our own
Foundation/src/DateTimeParser.cpp
Outdated
| using parse_iter = std::string::const_iterator; | ||
|
|
||
| namespace Poco { | ||
| [[nodiscard]] parse_iter skip_non_digits(parse_iter it, parse_iter end) |
There was a problem hiding this comment.
@andrewauclair please don't introduce new naming conventions, we have our own
REGEX_LIST was only used in isValid, which now uses a cached list of RegularExpressions instead.
|
@andrewauclair Do you have any performance measurements with these changes compared to 1.13? |
Came up with the following when running in WSL with the code from issue #4592: 1.12.5: 0m0.063s Switching to a format string that doesn't match one of the expected ones, to avoid regex matches ( 1.12.5: 0m0.063s |
Improvements for #4592.
Improve the validation of
DateTimeParserby validating the numbers parsed forday,month,year,hour,minute,second,millis, andmicros. The value 0 will be used if no digits were found formicros.These changes create new instances of
SyntaxExceptionthrown byDateTimeParser::parseandDateTimeParser::parseTZDwhen the previously listed fields do not contain the required number of digits or the string of digits is not a valid number.Improve performance by caching the
RegularExpressions created inDateTimeFormat::isValid.