Add characters count in the table of contents#14589
Add characters count in the table of contents#14589Jackie6 wants to merge 9 commits intoWordPress:trunkfrom
Conversation
3864006 to
bde47d3
Compare
|
With the addition of this item, we'll also want to adjust the spacing for the Two options:
Option 2 seems like the simplest fix to me. |
|
Hi, @kjellr Thanks for your comments. I think the option two looks better for me. But we may still need to wait for some design feedback. |
talldan
left a comment
There was a problem hiding this comment.
Hey @Jackie6, this looks good, thanks so much for your contribution. I did some manual testing, and the character count looks good. I tried a few edge cases, like characters that are represented by multiple code points, and the character count looks correct.
I've noticed a couple of minor things with the code, if you're able to tackle those and the discussed style changes it'd be much appreciated.
| const charCountType = _x( 'characters_excluding_spaces', 'Word count type. Do not translate!' ); | ||
|
|
||
| return ( | ||
| <span className="word-count">{ wordCount( content, charCountType ) }</span> |
There was a problem hiding this comment.
word-count isn't the right class name here. Instead I think it should be called char-count, reflecting the component name. To get the styling to work an extra selector will have to be added here for the new .char-count class:
gutenberg/packages/editor/src/components/table-of-contents/style.scss
Lines 33 to 39 in 4fc05d3
It is a bit unusual that these components (CharCount, WordCount) receive their style from a completely different component (TableOfContentsPanel), but I think it's outside of the scope of this PR to fix that.
| */ | ||
| import { withSelect } from '@wordpress/data'; | ||
| import { _x } from '@wordpress/i18n'; | ||
| import { count as wordCount } from '@wordpress/wordcount'; |
There was a problem hiding this comment.
The wordCount alias is a bit misleading, given it returns that characterCount when used in this file. It's minor, but it'd be good to either leave it as count or alias it as charCount.
|
Just thought I'd also comment that the feedback from @kjellr was the design feedback. Your PR was discussed during the design triage meeting: It seems like you both agree on option 2, so that looks like a good way forward. |
|
Hi @ellatrix, thanks for clarifying this pr further. This pr still does not fix the counting for mixed script post. But it can count the correct number of Chinese words. |
|
@Jackie6 Thanks, this is looking great! One final thing: for Latin scripts it might make sense to include both word count and character count, but does it also make sense for e.g. a Chinese translation? Should word count not be removed in that case (perhaps by allowing an empty translation)? |
|
Hi @ellatrix thanks for comments. Do you mean we need to remove the word count when the language is Chinese? I think it will cause the inconsistency of UI. That is sometimes both word count and char count show but sometimes only char count show. |
|
@Jackie6 Yes, the UI will be different, but is word count useful at all in a Chinese translation? If it's not useful, it shouldn't be there? |
|
@ellatrix But apart from Chinese, Korean and Japenese are also character-based rather than word-based. For languages like these, we also need to remove the word count? |
|
Just a random drive-by, but even though we have characters in CJK languages that by themselves are words it doesn't mean that there aren't multi-character words. Other languages such as early Greek, Linear B, early Egyption also don't use spaces to separate words. Okay, we'll ignore the fact that these earlier languages could be equally written left-to-right, right-to-left, top-to-bottom, bottom-to-top, or changing text direction at intervals to form spirals and other patterns. Word segmentation is a real problem with an extremely complicated solution. Check out this post from the Solr project talking about ambiguity in Chinese word segmentation - ambiguity that exists because there are multiple valid ways to segment the words in the sentence.
it's impractical to think that we have the energy and time at the moment to conclusively solve this generally-unsolved problem but we do have some reasonable ways we can move forward in the meantime:
my shallow thoughts on the matter are that it'd be more helpful to show the word count with some disclaimer - maybe a popover tooltip explaining that we think it could be wrong - because if we hide it altogether at the first detection of a Chinese character then we'll be degrading the experience without an explanation for not only Chinese posts but also posts that quote Chinese text, even if it's just a small one. for inclusionary purposes I think it would make more sense to develop some basic heuristics to detect when we think the word count could be wrong; possible indicators based on no research could be:
|
|
Thanks @dmsnell for you comment. My (small) problem with this change is that each of these counts can be translated. So e.g. a Chinese translating might be:
So you'd just have two counts that are the same thing. Possible solution:
|
|
Another possible solution is to display all three possible counters, and don't translate anything. |
@ellatrix this suggestion is much more favorable in my opinion because it's starting at least with some kind of principle: if the translator says that these are the same then only show one special-casing specific languages in groups I'm sure will cause more stir and degradation than we intend. having looked at I'm a fan of adding a disclaimer popup but I can see where that could introduce needless confusion - for the time being checking if |
|
@ellatrix I am a little confused. Why would |
|
@Jackie6 If it is different, it would still be displayed. If it's translated the same, one of them is removed. It doesn't make sense to have both counts if they are the same. This may not be the case in Chinese, but maybe it is in another language. |
There was a problem hiding this comment.
Fixed. Sorry about that.
| tabIndex="0" | ||
| > | ||
| { | ||
| __( 'Words' ) !== __( 'Characters' ) && |
There was a problem hiding this comment.
What is the goal of this? Do you rely on translators on changing this? That's not possible as there is no context.
There was a problem hiding this comment.
We should export the translation in both count components, then compare the types here. https://github.com/WordPress/gutenberg/pull/14589/files#r282803952
|
@ellatrix I assume with |
| * Two options available for counting: 'characters_excluding_spaces' or 'characters_including_spaces' | ||
| * Do not translate into your own language. | ||
| */ | ||
| const charCountType = _x( 'characters_including_spaces', 'Character count type. Do not translate literally!' ); |
There was a problem hiding this comment.
It would be good to export this, as well in word-count, to be used table-of-contents to compare.
|
If I'm not mistaken this one can be closed now that #24823 was merged? |
|
I tried to understand what this PR is trying to solve. The character count is implemented by #24823. I think the rest of what this PR is trying to do is to add an option to control the presence or absence of spaces in the character count. In word count, we can control the count type by gutenberg/packages/editor/src/components/word-count/index.js Lines 19 to 24 in aa9b1d6 In character count, on the logic is fixed on Is this the right goal for this PR? |
|
Doing some cleanup in the repository. I was wondering if this PR still worth keeping open. Probably needs to be redone entirely if we ever get to it. Let's close it. Please let me know if you think otherwise. |






Description
fix #13796
Screenshot
How has this been tested?
Types of changes
Add the count of characters(no spaces) in the table of contents
Testing
Steps to reproduce the behavior:
Checklist: