-
Notifications
You must be signed in to change notification settings - Fork 431
Description
The regular expression to detect sequences of digits is parametrized to include the thousands separator. In the Python library this produces the following regex:
IntegerRegexDefinition = lambda placeholder, thousandsmark: f'(((?<!\\d+\\s*)-\\s*)|((?<=\\b)(?<!(\\d+\\.|\\d+,))))\\d{{1,3}}({thousandsmark}\\d{{3}})+(?={placeholder})'Unfortunately in Spanish the thousands separator is a dot .. That means that when the regex is instantiated, a dot is placed in it, which, given that the dot is a metacharacter in the regex, translates to "any character".
This makes the regex match unintended sequences; in effect anything like \d{1,3}.\d{3} is detected as a plain number: 2r345 , 23#310, 329@120 are all numbers for this regex. The problem also composes to more than one separator: 2Q423W532 is also a number
In Python this is easily solvable by quoting the parameter:
IntegerRegexDefinition = lambda placeholder, thousandsmark: f'(((?<!\\d+\\s*)-\\s*)|((?<=\\b)(?<!(\\d+\\.|\\d+,))))\\d{{1,3}}(' + re.escape(thousandsmark) + f'\\d{{3}})+(?={placeholder})'...but since that file is automatically generated this should be patched upstream. The problem is also likely to affect other programming languages as well