Conversation
> add comment for `fill_range` > rename some variables > type: bool~>int > refine names for the two function; simplify comments > better message
| UNICODE_TABLE_SIZE: int = MAX_UNICODE_POINT + 1 | ||
| class UnicodeWidthTable: | ||
| # A valid Unicode code point won't exceed MAX_CODE_POINT. | ||
| MAX_CODE_POINT: int = 0x10FFFF |
| Whenever a code point's width differs from the previous one, | ||
| the function print the code point to indicate the start of a new range. | ||
| """ | ||
| def print_width_estimate_intervals(self): |
There was a problem hiding this comment.
As the [I]!=[I-1]->print I logic and the assertion already made things much clearer, with renaming I think it's ok not to add extra comments.
| if line and not line.startswith("#"): | ||
| match = LINE_REGEX.fullmatch(line) | ||
| assert match, "invalid line" | ||
| assert match, line # invalid line |
There was a problem hiding this comment.
This will print line details when things goes wrong.
(Actually, I've found that the regex has been slightly outdated due to some added whitespaces in EastAsianWidth-15.1.0.txt (in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt; which is recently updated))
(I think we don't have to update the regex eagerly in this pr as it still works for 15.0.0)

(Was 0000..001F;N # Cc [32] <control-0000>..<control-001F> ,
|now 0000..001F ; N # Cc [32] <control-0000>..<control-001F>)
|
|
||
| # Read explicitly assigned ranges. | ||
| # The lines that are not empty or pure comment are uniformly of the format "HEX(..HEX)?;(A|F|H|N|Na|W) #comment". | ||
| LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*") |
There was a problem hiding this comment.
This should be moved back to read_from as it is used and documented here.
| from pathlib import Path | ||
|
|
||
|
|
||
| LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*") |
(will add comment for each change below)