Skip to content

Adds ReaderTypography (replaces ReaderHyphenation)#6072

Merged
poire-z merged 4 commits intokoreader:masterfrom
poire-z:readertypography
Apr 21, 2020
Merged

Adds ReaderTypography (replaces ReaderHyphenation)#6072
poire-z merged 4 commits intokoreader:masterfrom
poire-z:readertypography

Conversation

@poire-z
Copy link
Copy Markdown
Contributor

@poire-z poire-z commented Apr 19, 2020

Replace Hyphenation menu with Typography menu.
This works mostly like before:

  • typography/hyphenation is chosen according to the book metadata language,
  • one can set a default or a fallback language,
  • hyphenation is now just a subset of typography, and can be disabled while still setting a language and enabling the other features,
  • the typography language enables newly added features to crengine (TextLangMan for text typography by language, use libunibreak crengine#337): per-language line breaking rules and per-language Harfbuzz glyph selection.
  • by default (this can be disabled), we follow the books HTML lang= attributes to adapt typography/hyphenation/line breaking rules to which language the book says this section is in:
    image

People who'd like some behaviour fixes/tweaks for their own usual languages can add or suggest stuff (alas to be hardcoded in the sources) in https://github.com/koreader/crengine/blob/master/crengine/src/textlang.cpp.

This should solve a few issues (even if we got no report for them :):

Screenshots, for the usual rewording suggestions :):

image

image

image
(I added some icons from nerdfont - because I needed a break and that's always some fun chosing - for the typography features per languages, dunno if it's a good idea or if its confusing, or if nobody will care :)

Typography language submenu:

image

image

First item shows the book metadata lang tag, for easy going back to it:
image

Hyphenation submenu:
image
For a typography language, it allows switching between these last 3 hyphenation methods (didn't know if I could get rid of Algorithmic, so, well, it's available as an option with all typography languages).


This change is Reviewable

Replace Hyphenation menu with Typography menu.
This works mostly like before:
- typography/hyphenation is chosen according to the book
  metadata language,
- one can set a default or a fallback language,
- hyphenation is now just a subset of typography, and
  can be disabled while still setting a language and
  enabling the other features,
- the typography language enables newly added features
  to crengine: per-language line breaking rules and
  per-language Harfbuzz glyph selection.
@NiLuJe
Copy link
Copy Markdown
Member

NiLuJe commented Apr 19, 2020

I'm kind of on the fence about "Typography" on its own and/or when paired with "language".

I don't hate it, but something still bothers me a bit with it, not sure what, exactly...

What about "Typography rules"? (in both cases, I think when a language name is shown, it should be pretty obvious that it's a language ;p. The hyphenation dictionary entry follows a similar logic ;).).

-- B = language specific additional line breaking tweaks
-- Update them when language tweaks and features are added to crengine/src/textlang.cpp
local LANGUAGES = {
-- lang-tag aliases features menu title
Copy link
Copy Markdown
Member

@NiLuJe NiLuJe Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random fun fact: 2 letters language codes are legacy ISO 639-1 (-ish, I think the language_DIALECT bastardization is a GNU thing) while the three letters ones are ISO 639-2 (where I learned that fre is technically for Belgian French ;)).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it's actually https://tools.ietf.org/html/rfc5646, as ReaderTypography:parseLanguageTag() mentions below ;).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I did not dig into that much (or at all). Not much interested in learning about that - so, if anyone wants to specialize and add stuff so we do the right thing with tags we'll meet in the wild, please jump on it :) (here and in textlang.cpp).
One thing I get is that Harfbuzz has lots of code to handle that/translate, and I hope it will do the right thing with what we provide as is. (Except may be with the 3 letters language code like "fre" that we translate here...)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment: it should be “Ukrainian” on line 49

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have at least one Belgian French book that uses “ ”. I don't know if that's specifically Belgian and if it actually makes any difference to the algorithm.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment: it should be “Ukrainian” on line 49

That's just a filename (well, that's how that file is named for ages), that might be saved in current users settings.
I was going to say it won't be shown, but it will, as the selected hyph dict.
So, dunno if I should rename it (may be later, when things are migrated), or if we should have another table for translating typos in filenames...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z
Sorry I didn’t know that :) Looks like it comes all the way from the first ones who PR’ed Ukrainian language files

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mergen3107 On the contrary, thanks for pointing it out!

@poire-z If I go to hyphenation it already says Ukrainian, which probably implies we already have that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
We have the other mapping lang_tag => language shows alongside typography, but we can't really use it, as zh-CN would show "Chinese (Simplified)" while it uses English_US.pattern.

I don't mind a small typo table or better: a function to remove the .pattern and fix some known typos, to be used instead of:
hyph_dict_name = hyph_dict_name:gsub(".pattern$", "")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess I'll update the HYPH_DICT_NAME_TO_LANG_TAG table to include a translatable string of the language - instead of using the filename stripped from .pattern, which won't be translatable.
If we remove the .pattern for pretty menus, I guess there's no more reason to keep the relation to the hyph dict filename.

-- one is set, no fallback will ever be used - if a fallback one
-- is set, no default is wanted; so when we set one below, we
-- remove the other).
text = T( _("Would you like %1 to be used as the default (★) or fallback (�) typography language?\n\nDefault will always take precedence while fallback will only be used if the language of the book can't be automatically determined."), BD.wrap(lang_name)),
Copy link
Copy Markdown
Member

@NiLuJe NiLuJe Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that one's mildly trickier with my initial comment about typography vs. typography rules...

...) language for typography rules?

Maybe?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think language for typography rules is clearer than typography language.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the previous feedback.
One thing just to be sure: it's really typography rules/languages, and not typographic rules like I'd be inclined to write in french?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say they're both correct. Google Books seems to agree.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much like règles de typographie and règles typographiques in French :).

})

table.insert(self.menu_table, {
text = _("Follow document embedded lang tags"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document's

Or simply Honor embedded lang tags?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like use instead of honor is probably a bit easier to understand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough ;).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use embedded language tags ?
Use document's embedded language tags ?
language tags (for gramatically english) or lang tags (for HTMLically english :) ?

I'm not fond of the word "honor", but it's clearer that just "use". "Use" is a bit vague.
It's not like we "use" them, we "use" something else that will do according to them.
We don't use road signs :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just leave out the verb; the checkmark already says use/honor/etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ ] Document's embedded lang tags looks very very strange and incomplete to me :)

hold_callback = function()
local text_lang_embedded_langs = G_reader_settings:nilOrTrue("text_lang_embedded_langs")
UIManager:show(MultiConfirmBox:new{
text = text_lang_embedded_langs and _("Would you like to enable or disable support for document embedded lang tags by default?\n\nThe current default (★) is enabled.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_("Honor embedded lang tags")
Would you like to honor or ignore embedded lang tags by default? is fine by me, because Ignore | Honor buttons rhymes likes Disable | Enable and I like when things sing :)
Fine with you ?

Copy link
Copy Markdown
Member

@Frenzie Frenzie Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stuff about honoring somewhat Frenglish? Or are we just very corteous? :-P

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, It's understandable in that sense in french, and @NiLuJe uses it quite often in his comments. What would be another word for that, less generic than "use" ?
(But in french, it's also used in a meaning a bit more further than corteous, rather meaning intercour-teou-se :) which is the only reason I'm a bit not fond of using it for our text :)

Copy link
Copy Markdown
Contributor Author

@poire-z poire-z Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what we want is a positive antonym to "ignore".
And there's not many of them in the various thesaurus and dict on the web.
"Honorer" is the right one in french - didn't see "Honor" among the english antonyms to "ignore"...
The only one I found that could work in english in our context is: Obey (or Follow, that I had initially).

Copy link
Copy Markdown
Member

@NiLuJe NiLuJe Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poire-z: I think we can get away without rules since it's in a two-option section between separators with the one right above explicitly saying Typography rules ;).

I find the other suggestions way too wordy, and possibly not quite as clear (But FWIW, of those, my vote would be for @Frenzie's ;)).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote would be for @Frenzie

Which one ?

Copy link
Copy Markdown
Member

@NiLuJe NiLuJe Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one ?

I meant as far as the wordy ones are concerned: Typography rules as per embedded lang tags.

But, to be clear, I much prefer Respect embedded lang tags ;).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with all of these:
Typography rules as per embedded lang tags
Rules as per embedded lang tags
As per embedded lang tags
Respect embedded lang tags

So, please you both agree on one :)

(Really nothing could work with an initial verb like Adapt/adjust - to not have the feeling that if you check that box, and the document has no embedded lang tags, you will get no rule applied?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The respect one is fine by me. (I thought I already posted this. :-) )

@poire-z
Copy link
Copy Markdown
Contributor Author

poire-z commented Apr 19, 2020

Update screenshots:

image
No problem with it being longer than all others?
(Would [ ] Hanging punctuation be better placed inside Typography rules > ? May be not now as it's in ReaderTypeset and may need more work - but in principle?)

image

image

image

I updated this for consistency, but tell me if I should let the first one to keep already made translations:

-                    choice2_text = C_("Hyphenation", "Fallback"),
+                    choice2_text = C_("Typography", "Fallback"),

(Note: not to be merged before tomorrow, I want a nightly with just the crengine bump.)

@NiLuJe
Copy link
Copy Markdown
Member

NiLuJe commented Apr 19, 2020

(Would [ ] Hanging punctuation be better placed inside Typography rules > ? May be not now as it's in ReaderTypeset and may need more work - but in principle?)

Agreed (on both counts ;)).

@poire-z
Copy link
Copy Markdown
Contributor Author

poire-z commented Apr 20, 2020

image

Alternate wording for their language..their content?

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Apr 20, 2020

use relevant typographic rules while rendering their content

use language-specific/tailored typography rules while rendering their content

@poire-z
Copy link
Copy Markdown
Contributor Author

poire-z commented Apr 20, 2020

image

No problem with the double "while" here ? :)
And mix of typography/typographic rules here - should we stick on one only? Typography rules like in the menu, or this little variation is ok?

@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Apr 20, 2020

to render their content
for rendering their content

Not sure about the typography vs typographic but it doesn't seem like it'd hurt while adding some variation?

@NiLuJe
Copy link
Copy Markdown
Member

NiLuJe commented Apr 20, 2020

You can avoid the double "while" by keeping the first "while", and switching the second to "whereas" (which is used to emphasize contrast in these kind of constructs).

EDIT: Actually, no, that doesn't quite work here, it's not a "while blah blah" vs. "whereas bleh bleh" contrasting construct.

TL;DR: +1 for to render ;).

@poire-z poire-z merged commit 6dc8bbc into koreader:master Apr 21, 2020
@poire-z poire-z deleted the readertypography branch April 21, 2020 19:30
@Frenzie Frenzie added this to the 2020.05 milestone Apr 21, 2020
@poire-z
Copy link
Copy Markdown
Contributor Author

poire-z commented Aug 22, 2020

@roshavagarga , just answering your question on Gitter (even if I'm not our font guy/girl/entity :)
https://gitter.im/koreader/koreader?at=5f3e1df4855be416a23eec0e

I want to figure out whether Koreader supports or can enforce through settings specific styles of Cyrillic. Example: Go to the glyphs section at https://manropefont.com/ and check out Local Bulgarian Cyrillic, which uses a unique style compared to the Soviet-styled Cyrillic that's predominant. As far as I'm aware, a subpart of the font files would need to be triggered (locl.BGR).

See top post here and koreader/crengine#337 .

You can get Bulgarian glyphs provided:

  • you're reading EPUBs or FB2 (needs crengine, won't work with PDF)
  • you have kerning mode (bottom menu) set to "best" (so we use HarfBuzz, that does that magic for us)
  • your font (or the publisher font used) has the locl.BGR feature (= provides specific glyphs) (not much have them, the only one I found that have them is Vollkorn)
  • either:
    -- your book has "bg" or "bul" set at its language
    -- you have set Typography language> bulgarian
    -- some individual HTML tags in the book have <div lang=bg>

image
The last line here has the same content has the one before, but has lang=bg

@cajeiri
Copy link
Copy Markdown
Contributor

cajeiri commented Aug 24, 2020

@poire-z Thanks for the information. Would it be able to add a check somewhere? If there's no font that support Bulgarian typography, but it's been enabled manually or automatically, then Koreader should send out a message noting the lack of a compatible font? Option B is to just include 1 font that supports Bulgarian typography by default.

@poire-z
Copy link
Copy Markdown
Contributor Author

poire-z commented Aug 24, 2020

Would it be able to add a check somewhere? If there's no font that support Bulgarian typography, but it's been enabled manually or automatically, then Koreader should send out a message noting the lack of a compatible font?

No, we can't know that. That's handled by low level HarfBuzz, that will give us the glyphs it decides without telling us how/why it did.
But a bulgarian reader will probably notice it visually and know something is wrong and act :)

Option B is to just include 1 font that supports Bulgarian typography by default.

That's for our our font guy/girl/entity :) (@NiLuJe) although if the Noto fonts do not, it would be sad to have to put another one before these that would be used instead for anything latin...
I guess bulgarian readers, given the rarity of support, must have a prefered font that does provide bulgarian, and may as well add it as a user font and use it as their default font.
(And publishers of bulgarian EPUBs should add such font as embedded fonts, otherwise their books might render bad on most devices.)

@cajeiri
Copy link
Copy Markdown
Contributor

cajeiri commented Aug 24, 2020

Ah, I see, thanks for the explanation once again!

Edit:
Free Serif should have these, while Noto never added them, but I don't see it working in Free Serif, not sure which version we have. More information is available here - https://github.com/googlefonts/noto-fonts/issues/89
Libertinus should have built-in support for Bulgarian, Serbian and Macedonian Cyrillic forms.

@NiLuJe
Copy link
Copy Markdown
Member

NiLuJe commented Aug 24, 2020

Might be a job for #6272 then ;).

@strn
Copy link
Copy Markdown
Contributor

strn commented Aug 30, 2020

@roshavagarga: I can confirm that Libertinus font correctly supports Serbian Cyrillic specific glyphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants