bump crengine: multiple fallback fonts#6090
Conversation
Includes: - Simplify libunibreak includes - Text: fix read/write outside array bounds - lvtextfm: dont adjust space after initial quotation mark/dash (rework) - Fonts: allow providing and using multiple fallback fonts Users can set their prefered fallback font, which will be completed with a few of our shipped fonts for maximum coverage.
|
@WaseemAlkurdi : for EPUB books text, what would be the prefered font for arabic? Noto Sans Arabic UI (sans, a bit rounded & bold) or FreeSerif (thinner, straight, looks more serious to me :) Dunno if we should really include Noto Sans Arabic and Devanagari |
|
@poire-z Long time no see!
FreeSerif is quite the jack of all trades, master of none ... it sure is universal, but its individual characters don't look that nice (letters especially thin on E Ink)
To be honest, serif fonts are more book-like than sans :-) |
|
Thanks for the feedback. |
|
Just cut and pasting crengine HarfBuzz drawing debug output, in case some nastaliq expert some day comes around (or someone who at least knows the letter names :) and can find out where/why/how offsets might be wrong) For these 2 first sentences (from Urdu section in https://r12a.github.io/scripts/phrases) With crengine, with split on space disabled, so HB gets each full line and context as a single run: Some possibly related issues elsewhere: |
You're welcome!
Cool, I'd be waiting. But basically, that's would be the only addition needed for Arabic fonts. A good serif font ( Noto Naskh Arabic ) and a good sans font ( Noto Sans Arabic UI ).
There's definitely something wrong in there. Though Urdu uses script akin to "cursive" (nastaliq) , it's still part of the Arabic family of scripts. And I can't help but appreciate your eyesight, having spotted the lack of continuity. The letters, which normally "hold hands", are all over the place in relation to the "line". Don't know if this would help, but I can feel that this bug is related to another very minor bug in KOReader (more of a nitpick than a bug), where the Arabic diacritics (in both menus and rendering) are drawn too "low", enough for them to mix with the dots above the letters and sometimes with the letters themselves depending on the context. Initially, I ignored this, but now I feel that this might have something to do with the issue at hand, therefore I brought it up. This is the word In KOReader, the diacritic would be "mixed" or "joined" with the letter itself ... it only needs some padding.
To understand nastaliq , one has to understand the etymology of the word. It's a portmanteau of naskh (literally "copy", meaning "copy script" here, an example of which is the passage in simple script you copied above) and ta'liq , meaning "hanging". The letters "hang" from their edge and "cascade" as opposed to copy script where they "flow" along the same line (hoping my explanation makes any sense). The issue is very visible once you have that in mind. The first word from the right If there's anything else you want, you can as always ask me and I'll explain! :D |
|
Yep, the UI variants are specifically designed for constrained vertical space, so it's a likely interpretation ;). |
Same section with The vertical displacements are increasing similarly as with my code (cre or xtext). @WaseemAlkurdi : just to be sure I'm not messing my arabic reading :) can you confirm that in this: |
|
I do remember that some stuff is top-to-bottom while other is bottom-to-top where metrics/bbox are involved, but I don't recall the details OTOH :s. (i.e., if oversight there was, I'd only expect it to be on the y axis). |
|
Yep, https://www.freetype.org/freetype2/docs/glyphs/glyphs-2.html
I guess I didn't think much, everything must get a offset_y of 0 from HB in our latin world - just using the font origin_y which had the correct sign. |
|
OK, let's merge this one. (xtext does not need any fixing, textwidget.lua and textboxwidget.lua do, that's where we use y_offset) |
|
You've literally fixed it!
And exactly, the dots are misplaced! (For instance, look at the second line, last word from the left. The dots should be shifted about one letter to the right)
Again, this is correct! :-)
Naskh fonts are what is used in all (paper) books, the Arabic parallel of a serif font (not a serif font itself, but the typographical equivalent of serif). Therefore, it is pretty enough for Arabic as most books are printed in that. |
|
Thanks a lot for embedded lang tags support! Now I will try to kake Calibre detect foreign languagaes during conversiion :D Btw, does the tag mean html tag, or also html attrubute (e.g. Edit: having read the long press description, I assume they're just meta tags ;> |
|
@ptrm I think you're looking for #6069 and/or koreader/crengine#337
|
|
That might be explained in a bit more detailed in the "typography" PRs : #6069 and #6072 . Btw, any feedback with the added typography rules for polish? No bad side effects?
Damned :( We were so bad at describing the feature then ?! :) |
That's what I wanted, yes. You mean it's already working :D ? Because to test it I would need sime semi-automatic work on my epubs, mist publishers don't care about adding lang attributes ;)
So far I'm awestruck with the multifallbacks, paging through books with math and phonetic notations without changing any settings :D But Polish books look cool so far (though they're all published with s after one-letter words, so testing is harder this way ;)
The plural "tags" mislead me. Now I know it refers to all potentially opened books, but I thought it referred to many tags in one book ;) Hard to avoid, I think. |
re koreader/koreader#6090 (comment) https://github.com/googlefonts/noto-fonts @ 0fa1dfabd6e3746bb7463399e2813f60d3f1b256 (i.e., hinted, not the Phase III WIP builds).













Includes koreader/crengine#339 :
Users can set their prefered fallback font, which will be completed with a few of our shipped fonts for maximum coverage.
Ref #5277 (comment). Closes #5277.
This change is