Add characters count in the table of contents by Jackie6 · Pull Request #14589 · WordPress/gutenberg

Jackie6 · 2019-03-22T20:56:29Z

Description

fix #13796

When writing a post in Chinese the word count shown in the content structure does not show an accurate word count.

Screenshot

How has this been tested?

Local
Browser testing

Types of changes

Add the count of characters(no spaces) in the table of contents

Testing

Steps to reproduce the behavior:

Use http://generator.lorem-ipsum.info/_chinese to generate some sample Chinese text
Create new page in Gutenberg
Paste in content
Click the info icon at the top
See if the number of characters is correct

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.
I've included developer documentation if appropriate.

kjellr · 2019-04-02T20:14:09Z

With the addition of this item, we'll also want to adjust the spacing for the table-of-contents__count items. Currently, it looks a little weird to have that last "Blocks" item sitting off on its own:

Two options:

Widen the modal, remove the existing 25% width rule, and give each items some right-padding instead:

Change the 25% width rule to 33.33% instead, to force the 4th and 5th items onto the next line. We'll also want to add a tiny bit of bottom padding to these items. This keeps things compact, but also avoids having just one item all alone:

Option 2 seems like the simplest fix to me.

Jackie6 · 2019-04-02T21:25:25Z

Hi, @kjellr Thanks for your comments. I think the option two looks better for me. But we may still need to wait for some design feedback.

talldan

Hey @Jackie6, this looks good, thanks so much for your contribution. I did some manual testing, and the character count looks good. I tried a few edge cases, like characters that are represented by multiple code points, and the character count looks correct.

I've noticed a couple of minor things with the code, if you're able to tackle those and the discussed style changes it'd be much appreciated.

talldan · 2019-04-03T03:45:01Z

packages/editor/src/components/char-count/index.js

+	const charCountType = _x( 'characters_excluding_spaces', 'Word count type. Do not translate!' );
+
+	return (
+		<span className="word-count">{ wordCount( content, charCountType ) }</span>


word-count isn't the right class name here. Instead I think it should be called char-count, reflecting the component name. To get the styling to work an extra selector will have to be added here for the new .char-count class:

gutenberg/packages/editor/src/components/table-of-contents/style.scss

Lines 33 to 39 in 4fc05d3

.table-of-contents__number,

.table-of-contents__popover .word-count {

font-size: 21px;

font-weight: 400;

line-height: 30px;

color: $dark-gray-500;

}

It is a bit unusual that these components (CharCount, WordCount) receive their style from a completely different component (TableOfContentsPanel), but I think it's outside of the scope of this PR to fix that.

talldan · 2019-04-03T03:46:21Z

packages/editor/src/components/char-count/index.js

+ */
+import { withSelect } from '@wordpress/data';
+import { _x } from '@wordpress/i18n';
+import { count as wordCount } from '@wordpress/wordcount';


The wordCount alias is a bit misleading, given it returns that characterCount when used in this file. It's minor, but it'd be good to either leave it as count or alias it as charCount.

talldan · 2019-04-03T03:54:23Z

Just thought I'd also comment that the feedback from @kjellr was the design feedback.

Your PR was discussed during the design triage meeting:
(https://wordpress.slack.com/archives/C02S78ZAL/p1554221572033000 - you may have to sign up to slack to view the discussion).

It seems like you both agree on option 2, so that looks like a good way forward.

into update/word-count

packages/editor/src/components/char-count/index.js

Jackie6 · 2019-04-10T18:54:48Z

Hi @ellatrix, thanks for clarifying this pr further. This pr still does not fix the counting for mixed script post. But it can count the correct number of Chinese words.

I expect to create a new feature to support counting the words in a mixed script post in the future. But at this point, I think it is good to add characters count in the content structure as most editors already provide this. An example is MS Word:

ellatrix · 2019-04-24T10:58:37Z

@Jackie6 Thanks, this is looking great!

One final thing: for Latin scripts it might make sense to include both word count and character count, but does it also make sense for e.g. a Chinese translation? Should word count not be removed in that case (perhaps by allowing an empty translation)?

Jackie6 · 2019-04-25T04:27:16Z

Hi @ellatrix thanks for comments. Do you mean we need to remove the word count when the language is Chinese? I think it will cause the inconsistency of UI. That is sometimes both word count and char count show but sometimes only char count show.

ellatrix · 2019-04-25T09:47:22Z

@Jackie6 Yes, the UI will be different, but is word count useful at all in a Chinese translation? If it's not useful, it shouldn't be there?

Jackie6 · 2019-04-25T15:53:20Z

@ellatrix But apart from Chinese, Korean and Japenese are also character-based rather than word-based. For languages like these, we also need to remove the word count?

dmsnell · 2019-04-25T17:28:10Z

Just a random drive-by, but even though we have characters in CJK languages that by themselves are words it doesn't mean that there aren't multi-character words.

Other languages such as early Greek, Linear B, early Egyption also don't use spaces to separate words. Okay, we'll ignore the fact that these earlier languages could be equally written left-to-right, right-to-left, top-to-bottom, bottom-to-top, or changing text direction at intervals to form spirals and other patterns.

Word segmentation is a real problem with an extremely complicated solution. Check out this post from the Solr project talking about ambiguity in Chinese word segmentation - ambiguity that exists because there are multiple valid ways to segment the words in the sentence.

the first segmentation means “I like New Zealand Flowers.” The second segmentation means “I like fresh broccoli”

it's impractical to think that we have the energy and time at the moment to conclusively solve this generally-unsolved problem but we do have some reasonable ways we can move forward in the meantime:

cut out word counts entirely: a drastic choice that excludes being helpful to a large swath of cases where it otherwise would be
cut out word counts for any language we think might be troublesome: this is the conservative option to prevent making basic mistakes but may not be super helpful
add some qualifying notice to the word count when we think it could be wrong, such as when we detect characters from languages that don't us spaces for segmentation or when the ratio of spaces to characters is below a given threshold

my shallow thoughts on the matter are that it'd be more helpful to show the word count with some disclaimer - maybe a popover tooltip explaining that we think it could be wrong - because if we hide it altogether at the first detection of a Chinese character then we'll be degrading the experience without an explanation for not only Chinese posts but also posts that quote Chinese text, even if it's just a small one.

for inclusionary purposes I think it would make more sense to develop some basic heuristics to detect when we think the word count could be wrong; possible indicators based on no research could be:

word-to-character ratio is out of whack, less than 1:20 (spaces to other characters)
we encounter a long stream of characters in non-roman letters, we could treat the blob as a whole or split it based on approximate character-per-words estimates

ellatrix · 2019-04-25T20:23:10Z

Thanks @dmsnell for you comment.

My (small) problem with this change is that each of these counts can be translated.

So e.g. a Chinese translating might be:

words => characters_excluding_spaces.
characters => characters_excluding_spaces.

So you'd just have two counts that are the same thing.

Possible solution:

Remove word count if __( 'words' ) === __( 'characters' ).
OR remove word count if __( 'words' ) === 'characters_excluding_spaces' && __( 'characters' ) === 'characters_excluding_spaces', and if __( 'words' ) === 'characters_excluding_spaces', force character count to be characters_including_spaces, or if __( 'words' ) === 'characters_including_spaces', remove character count.

What do you think @dmsnell and @Jackie6?

ellatrix · 2019-04-25T20:24:57Z

Another possible solution is to display all three possible counters, and don't translate anything.

dmsnell · 2019-04-26T00:13:49Z

Remove word count if __( 'words' ) === __( 'characters' ).

@ellatrix this suggestion is much more favorable in my opinion because it's starting at least with some kind of principle: if the translator says that these are the same then only show one

special-casing specific languages in groups I'm sure will cause more stir and degradation than we intend.

having looked at @wordpress/wordcount it became clear that we're starting from the point of poking-at-holes vs. building a solid ground. that is, making this robustly-solved is a much bigger project than just showing characters and so I think getting character counts out even when word counts will be wrong is a positive step forward

I'm a fan of adding a disclaimer popup but I can see where that could introduce needless confusion - for the time being checking if __( 'words' ) === __( 'characters' ) or whatever form is used here __( 'words', { context: 'label for word-count' } ) is a good tradeoff between getting it perfect and getting it shipped

Jackie6 · 2019-04-26T19:05:30Z

@ellatrix I am a little confused. Why would __( 'words' ) === __( 'characters' ) ? In Chinese, the translated strings of words and characters would be different. And I guess we cannot control the localization here, right? From my understanding, we just wrap translatable strings in the __() function, and translators will translate the strings.

ellatrix · 2019-05-01T11:33:32Z

@Jackie6 If it is different, it would still be displayed. If it's translated the same, one of them is removed. It doesn't make sense to have both counts if they are the same. This may not be the case in Chinese, but maybe it is in another language.

dmsnell · 2019-05-01T15:40:07Z

packages/editor/src/components/table-of-contents/panel.js

oops! Word vs. Words

Fixed. Sorry about that.

swissspidy · 2019-05-09T17:13:17Z

packages/editor/src/components/table-of-contents/panel.js

 				tabIndex="0"
 			>
+				{
+					__( 'Words' ) !== __( 'Characters' ) &&


What is the goal of this? Do you rely on translators on changing this? That's not possible as there is no context.

@swissspidy Ah, yes, I meant #14589 (comment)

We should export the translation in both count components, then compare the types here. https://github.com/WordPress/gutenberg/pull/14589/files#r282803952

swissspidy · 2019-05-09T17:15:01Z

@ellatrix I assume with Remove word count if __( 'words' ) === __( 'characters' ) you meant charCountType === wordCountType? Because __( 'words' ) is totally different from _x( 'words', 'Word count type. Do not translate!' ).

ellatrix · 2019-05-10T09:04:08Z

packages/editor/src/components/char-count/index.js

+	 * Two options available for counting: 'characters_excluding_spaces' or 'characters_including_spaces'
+	 * Do not translate into your own language.
+	 */
+	const charCountType = _x( 'characters_including_spaces', 'Character count type. Do not translate literally!' );


It would be good to export this, as well in word-count, to be used table-of-contents to compare.

aristath · 2020-08-31T14:02:04Z

If I'm not mistaken this one can be closed now that #24823 was merged?

swissspidy · 2020-08-31T14:13:35Z

@aristath Sort of. #24823 doesn't really fully address #13796, which this PR aimed to do.

t-hamano · 2022-12-15T13:40:28Z

I tried to understand what this PR is trying to solve.

The character count is implemented by #24823. I think the rest of what this PR is trying to do is to add an option to control the presence or absence of spaces in the character count.

In word count, we can control the count type by words, characters_excluding_spaces and characters_including_spaces.

gutenberg/packages/editor/src/components/word-count/index.js

Lines 19 to 24 in aa9b1d6

    
           	/* 
        
           	 * translators: If your word count is based on single characters (e.g. East Asian characters), 
        
           	 * enter 'characters_excluding_spaces' or 'characters_including_spaces'. Otherwise, enter 'words'. 
        
           	 * Do not translate into your own language. 
        
           	 */ 
        
           	const wordCountType = _x( 'words', 'Word count type. Do not translate!' );

In character count, on the logic is fixed on characters_including_spaces.

gutenberg/packages/editor/src/components/character-count/index.js

Line 18 in aa9b1d6

return characterCount( content, 'characters_including_spaces' );

Is this the right goal for this PR?

youknowriad · 2024-04-25T18:25:49Z

Doing some cleanup in the repository. I was wondering if this PR still worth keeping open. Probably needs to be redone entirely if we ever get to it. Let's close it. Please let me know if you think otherwise.

Jackie6 requested review from gziolo, noisysocks, talldan and youknowriad as code owners March 22, 2019 20:56

Add characters count in the table of contents

bde47d3

Jackie6 force-pushed the update/word-count branch from 3864006 to bde47d3 Compare March 22, 2019 23:52

gziolo requested review from jasmussen and jorgefilipecosta March 25, 2019 07:25

gziolo added Internationalization (i18n) Issues or PRs related to internationalization efforts Needs Design Feedback Needs general design feedback. labels Mar 25, 2019

gziolo mentioned this pull request Mar 25, 2019

Word Count in content structures does not count Chinese words properly #13796

Open

gziolo added the [Type] Enhancement A suggestion for improvement. label Mar 25, 2019

talldan removed the Needs Design Feedback Needs general design feedback. label Apr 3, 2019

talldan reviewed Apr 3, 2019

View reviewed changes

Jackie6 added 3 commits April 3, 2019 00:41

Merge branch 'update/word-count' of https://github.com/Jackie6/gutenberg

525a67a

into update/word-count

Change the class of char-count and add new css class

5669662

Make the table-of-contents have three coloums

f33aa39

Jackie6 requested review from adamsilverstein, aduth, ajitbohra, chrisvanpatten, dmsnell, ellatrix, karmatosed, mapk, mkaz and mmtr as code owners April 3, 2019 05:00

ellatrix reviewed Apr 10, 2019

View reviewed changes

packages/editor/src/components/char-count/index.js Outdated Show resolved Hide resolved

ellatrix reviewed Apr 10, 2019

View reviewed changes

packages/editor/src/components/char-count/index.js Show resolved Hide resolved

List two options available for char-count

9663b1d

gziolo removed the Needs Decision Needs a decision to be actionable or relevant label Apr 24, 2019

dmsnell reviewed May 1, 2019

View reviewed changes

Remove word count if it equals to char count

f869445

Jackie6 force-pushed the update/word-count branch from 41e7fc2 to f869445 Compare May 1, 2019 15:42

Jackie6 closed this May 2, 2019

Jackie6 reopened this May 2, 2019

swissspidy reviewed May 9, 2019

View reviewed changes

ellatrix reviewed May 10, 2019

View reviewed changes

swissspidy mentioned this pull request Aug 31, 2020

Add character count to info panel #24823

Merged

6 tasks

Base automatically changed from master to trunk March 1, 2021 15:42

youknowriad closed this Apr 25, 2024

	.table-of-contents__number,
	.table-of-contents__popover .word-count {
	font-size: 21px;
	font-weight: 400;
	line-height: 30px;
	color: $dark-gray-500;
	}

Conversation

Jackie6 commented Mar 22, 2019 • edited by gziolo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshot

How has this been tested?

Types of changes

Testing

Checklist:

Uh oh!

kjellr commented Apr 2, 2019

Uh oh!

Jackie6 commented Apr 2, 2019

Uh oh!

talldan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

talldan commented Apr 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jackie6 commented Apr 10, 2019

Uh oh!

ellatrix commented Apr 24, 2019

Uh oh!

Jackie6 commented Apr 25, 2019

Uh oh!

ellatrix commented Apr 25, 2019

Uh oh!

Jackie6 commented Apr 25, 2019

Uh oh!

dmsnell commented Apr 25, 2019

Uh oh!

ellatrix commented Apr 25, 2019

Uh oh!

ellatrix commented Apr 25, 2019

Uh oh!

dmsnell commented Apr 26, 2019

Uh oh!

Jackie6 commented Apr 26, 2019

Uh oh!

ellatrix commented May 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swissspidy commented May 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aristath commented Aug 31, 2020

Uh oh!

swissspidy commented Aug 31, 2020

Uh oh!

t-hamano commented Dec 15, 2022

Uh oh!

youknowriad commented Apr 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Jackie6 commented Mar 22, 2019 •

edited by gziolo

Loading

talldan commented Apr 3, 2019 •

edited

Loading