Skip to content

pr3994 review suggestions#1

Merged
fsb4000 merged 1 commit intofsb4000:fix3908from
achabense:review3994
Sep 21, 2023
Merged

pr3994 review suggestions#1
fsb4000 merged 1 commit intofsb4000:fix3908from
achabense:review3994

Conversation

@achabense
Copy link

(will add comment for each change below)

> add comment for `fill_range`
> rename some variables
> type: bool~>int
> refine names for the two function; simplify comments
> better message
UNICODE_TABLE_SIZE: int = MAX_UNICODE_POINT + 1
class UnicodeWidthTable:
# A valid Unicode code point won't exceed MAX_CODE_POINT.
MAX_CODE_POINT: int = 0x10FFFF
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(code point is a thing)

Whenever a code point's width differs from the previous one,
the function print the code point to indicate the start of a new range.
"""
def print_width_estimate_intervals(self):
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the [I]!=[I-1]->print I logic and the assertion already made things much clearer, with renaming I think it's ok not to add extra comments.

if line and not line.startswith("#"):
match = LINE_REGEX.fullmatch(line)
assert match, "invalid line"
assert match, line # invalid line
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print line details when things goes wrong.
(Actually, I've found that the regex has been slightly outdated due to some added whitespaces in EastAsianWidth-15.1.0.txt (in https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt; which is recently updated))
(I think we don't have to update the regex eagerly in this pr as it still works for 15.0.0)
image
(Was 0000..001F;N # Cc [32] <control-0000>..<control-001F> ,
|now 0000..001F ; N # Cc [32] <control-0000>..<control-001F>)


# Read explicitly assigned ranges.
# The lines that are not empty or pure comment are uniformly of the format "HEX(..HEX)?;(A|F|H|N|Na|W) #comment".
LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved back to read_from as it is used and documented here.

from pathlib import Path


LINE_REGEX = re.compile(r"([0-9A-Z]+)(\.\.[0-9A-Z]+)?;(A|F|H|N|Na|W) *#.*")
Copy link
Author

@achabense achabense Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved here

@fsb4000 fsb4000 merged commit c1f1aa6 into fsb4000:fix3908 Sep 21, 2023
@achabense achabense deleted the review3994 branch September 22, 2023 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants