feat(languages): Add english-ngrams#109
feat(languages): Add english-ngrams#109max-niederman merged 1 commit intomax-niederman:mainfrom heysokam:patch-1
Conversation
There was a problem hiding this comment.
I agree that this is a useful dictionary to add, but I'm not sure about the naming. "N-gram" can refer to sequences of any kinds of symbols, including words. In fact, all of our dictionaries are lists of common 1-grams, where the symbols are words. I suggest english-nchars, unless you have another idea.
|
They are still n-grams, not n-chars, even if they are using characters as their symbols. I think the name is intuitive, it shows up on google for the person who doesn't know what they are, and wikipedia itself gives the right description for the concept (and even explains the context of unigrams where symbols are words). So I would say the more intuitive and pre-existing meaning should be kept. |
This is not true; neither is more technically correct because "n-gram" is a very broad term and applies to both. That's why I'm hesitant to call only one "n-gram" as its distinguishing feature.
This is a valid point, though. "N-gram" is more searchable, at the very least because of ngram-type. I'm going to go ahead and merge this, although in v2 I think this'll need to be replaced by n-gram generation, which is already planned. |
Based on the app and wordlist from: https://github.com/ranelpadon/ngram-type
Based on the app and wordlist from:
https://github.com/ranelpadon/ngram-type