Skip to content

add (content-) line wrapping to stay withing the 75 char limit#295

Merged
N-Coder merged 1 commit intomainfrom
linewrap
Sep 21, 2021
Merged

add (content-) line wrapping to stay withing the 75 char limit#295
N-Coder merged 1 commit intomainfrom
linewrap

Conversation

@N-Coder
Copy link
Copy Markdown
Member

@N-Coder N-Coder commented Sep 12, 2021

fixes #215

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 12, 2021

Codecov Report

Merging #295 (d16593e) into main (2727988) will decrease coverage by 0.18%.
The diff coverage is 69.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #295      +/-   ##
==========================================
- Coverage   80.69%   80.51%   -0.19%     
==========================================
  Files          30       30              
  Lines        2829     2858      +29     
==========================================
+ Hits         2283     2301      +18     
- Misses        546      557      +11     
Impacted Files Coverage Δ
src/ics/contentline/container.py 84.57% <69.44%> (-3.80%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2727988...d16593e. Read the comment docs.

@make-github-pseudonymous-again
Copy link
Copy Markdown
Contributor

👍

Question: how are compound emoji and utf8 sequences handled? I can see the tests and it seems to be magically handled by TextWrapper but I do not understand how it works.

@N-Coder
Copy link
Copy Markdown
Member Author

N-Coder commented Sep 13, 2021

TextWrap does wrap inside compound characters, which will break emoji and possibly umlauts, but it correctly handles multi-byte codepoints (if those are the correct terms). So the wrapped document will be valid UTF-8 instead of byte-garbage and only might contain symbols that wouldn't be rendered correctly, which is fixed by unwrapping the lines during parsing.

from textwrap import TextWrapper
WRAP = TextWrapper(
    width=4, initial_indent="", subsequent_indent=" ", break_long_words=True, break_on_hyphens=True,
    expand_tabs=False, replace_whitespace=False, fix_sentence_endings=False, drop_whitespace=False
)
# https://emojipedia.org/couple-with-heart-woman-man-light-skin-tone-dark-skin-tone/
EMOJI = '\U0001f469\U0001f3fb\u200d\u2764\ufe0f\u200d\U0001f468\U0001f3ff'
print(EMOJI, len(EMOJI))
print(WRAP.fill(EMOJI * 10))
# https://en.wikipedia.org/wiki/Zalgo_text
ZALGO = '\u0074\u0334\u0313\u031b\u0307\u0351\u030d\u0309\u031a\u0309\u0300\u0308\u0307\u030c\u033f\u030c\u0355\u034d\u0316\u032e\u031f\u033a\u035a\u0326\u0322\u0326\u0320\u0316\u0317\u031c\u0318\u0348\u0355\u0317\u035c\u033c\u034d\u032b\u0325\u0354\u033b\u032f\u0331\u031e\u0333\u031c\u0332\u032b\u0356\u0359\u0333\u0348\u031f\u0354\u0326\u0353\u0359\u0329\u0328\u0326\u0065\u0335\u0358\u0308\u034b\u0311\u0308\u0352\u0306\u0303\u0303\u030f\u030b\u0351\u0343\u0300\u0304\u032f\u033a\u0322\u0349\u032e\u031c\u031f\u0333\u0317\u0353\u033c\u0353\u0317\u032d\u0317\u033a\u0353\u035c\u032f\u0326\u032f\u0353\u034d\u0073\u0337\u030e\u0303\u0346\u035d\u0307\u0307\u0312\u0360\u0301\u033f\u0304\u0307\u030f\u0313\u030f\u0308\u0302\u0341\u0314\u0315\u031b\u0308\u0303\u0305\u031a\u0341\u035d\u0310\u0313\u0358\u0344\u0351\u034c\u030b\u030d\u034b\u0352\u0340\u0333\u034e\u0356\u032e\u0316\u031f\u0347\u0355\u0353\u032d\u033c\u0320\u0331\u0345\u032d\u0353\u034d\u0316\u0359\u0326\u031e\u0324\u0323\u0320\u0319\u0332\u0329\u0345\u032e\u0330\u0328\u033b\u0325\u0355\u031e\u033b\u0356\u031d\u0354\u032e\u031f\u0322\u0349\u0074\u0334\u0307\u0343\u034e\u032f\u032b\u0333\u0323\u035a\u034e\u0321\u031c\u0339\u0318\u032d\u0316\u0327\u031f\u0354\u035a\u0317\u032c\u032a\u035c\u034d\u031f\u0331\u034d\u0323\u0321\u0349\u032e\u0329\u0319'
print(ZALGO, len(ZALGO))
print(WRAP.fill(ZALGO))
output 👩🏻‍❤️‍👨🏿 8
👩🏻‍❤
️‍👨
🏿👩🏻
‍❤️
‍👨🏿
👩🏻‍
❤️‍
👨🏿👩
🏻‍❤
️‍👨
🏿👩🏻
‍❤️
‍👨🏿
👩🏻‍
❤️‍
👨🏿👩
🏻‍❤
️‍👨
🏿👩🏻
‍❤️
‍👨🏿
👩🏻‍
❤️‍
👨🏿👩
🏻‍❤
️‍👨
🏿
t̴̢̨̛͕͍̖̮̟̺͚̦̦̠̖̗̜̘͈͕̗̼͍̫̥͔̻̯̱̞̳̜̲̫͖͙̳͈̟͔̦͓͙̩̦̓̇͑̍̉̉̀̈̇̌̿̌̚͜ë̵̢̯̺͉̮̜̟̳̗͓̼͓̗̭̗̺͓̯̦̯͓͍͋̑̈͒̆̃̃̏̋͑̓̀̄͘͜s̷̨̢̛̳͎͖̮̖̟͇͕͓̭̼̠̱̭͓͍̖͙̦̞̤̣̠̙̲̩̮̰̻̥͕̞̻͖̝͔̮̟͉̎̃͆̇̇̒́̿̄̇̏̓̏̈̂́̔̈̃̅́̐̓̈́͑͌̋̍͋͒̀̕̚͘͝͠͝ͅͅṫ̴̡̧̡͎̯̫̳̣͚͎̜̹̘̭̖̟͔͚̗̬̪͍̟̱͍̣͉̮̩̙̓͜ 218
t̴̛̓
̇͑̍
̉̉̚
̀̈̇
̌̿̌
͕͍̖
̮̟̺
̢͚̦
̦̠̖
̗̜̘
͈͕̗
̼͍͜
̫̥͔
̻̯̱
̞̳̜
̲̫͖
͙̳͈
̟͔̦
͓͙̩
̨̦e
̵̈͘
͋̑̈
͒̆̃
̃̏̋
͑̓̀
̯̺̄
̢͉̮
̜̟̳
̗͓̼
͓̗̭
̗̺͓
̯̦͜
̯͓͍
s̷̎
̃͆͝
̇̇̒
́̿͠
̄̇̏
̓̏̈
̂́̔
̛̈̕
̃̅̚
́̐͝
̓̈́͘
͑͌̋
̍͋͒
̳͎̀
͖̮̖
̟͇͕
͓̭̼
̠̱ͅ
̭͓͍
̖͙̦
̞̤̣
̠̙̲
̩̮ͅ
̨̰̻
̥͕̞
̻͖̝
͔̮̟
̢͉t
̴̇̓
͎̯̫
̳̣͚
̡͎̜
̹̘̭
̧̖̟
͔͚̗
̬̪͜
͍̟̱
̡͍̣
͉̮̩
̙

@make-github-pseudonymous-again
Copy link
Copy Markdown
Contributor

OK so TextWrapper correctly handles UTF8 and that is all we care about? Whatever is embedded in the UTF8 may not be readable when folded but we would only interpret it once unfolded.

@N-Coder
Copy link
Copy Markdown
Member Author

N-Coder commented Sep 14, 2021

Exactly! The intermediate representation still seems to be valid so that no other parser should have problems reading it and the tests show that we actually always get the exact same data back after unwrapping.

@N-Coder N-Coder merged commit 81b77a2 into main Sep 21, 2021
@N-Coder N-Coder deleted the linewrap branch September 21, 2021 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ics does not support 5545 3.1 (long line folding)

3 participants