enable hot-word boosting #3297

JRMeyer · 2020-08-31T21:06:48Z

This PR enables hot-word boosting (immediate support in the C and Python clients) with the new flags --hot_words.

The flag takes a string of words and their respective boosts separated by commas and colons, as such: --hot_words "friend:1.5,enemy:20.4". The boost takes a floating point number between -inf and inf.

The boosting is applied as an addition to the negative log likelihood of a candidate word sequence, given by the KenLM language model. Since the LM probability is a negative log value, at 0.0 we have 100% likelihood, and at negative infinity we have 0% likelihood. As such, we will always have some negative number from the KenLM model.

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we add 3 to this number, we get -0.75, therefore increasing the likelihood of that sequence. On the other hand, if we add a -3 to the likelihood, we decrease the likelihood of that sequence. Adding a negative number as a boost will make the decoder "avoid" certain words.

lissyx · 2020-09-01T09:20:17Z

This PR enables hot-word boosting (from the C client) with the two new flags --hot_words and --boost_coefficient.

Can we not limit that to the C-client? It's very much likely people will want to use this part of the API from elsewhere, and in the current state, it's completely unknown whether this works or not.

native_client/args.h

native_client/ctcdecode/ctc_beam_search_decoder.cpp

native_client/deepspeech.cc

native_client/deepspeech.h

native_client/deepspeech.cc

lissyx · 2020-09-01T09:27:21Z

native_client/ctcdecode/ctc_beam_search_decoder.cpp

+              if (!hot_words_.empty()) {
+                // increase prob of prefix for every word
+                // that matches a word in the hot-words list
+                for (std::string word : ngram) {


have you measured perf impact with scorers?

perf as in WER? or perf as in latency?

I have not measured the latency effects yet, no.

Are there any TC jobs that do this, or should I profile locally? What do you recommend?

Unfortunately, you'd have to do it locally. Using perf should be quite easy.

lissyx

Please expose it in the API as a real list of words, and please add:

basic CI testing for that feature
usage in different bindings would really be a good thing (Python, JS, .Net, Java) if you can

Also, it looks like your current code breaks training and ctc decoder, so please fix that.

reuben · 2020-09-01T10:57:20Z

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we multiply this number by 0.5, we get -1.75, therefore doubling the likelihood of that sequence.

This isn't how log probabilities work, you're making exponential increases in the probability here. exp(-3.5) ~= 0.03 and exp(-1.75) ~= 0.17. This, combined with the fact that a single word will be boosted several times in the same beam as it appears in multiple n-grams, makes it hard to reason about the behavior of the coefficient. It should probably be an additive factor (multiplication in probability space).

reuben · 2020-09-01T10:58:09Z

See for example https://www.isca-speech.org/archive/Interspeech_2018/pdfs/2416.pdf

native_client/deepspeech.h

lissyx · 2020-09-01T11:26:27Z

@JRMeyer To keep your API simpler, I suggest you move to a single entry point:

DEEPSPEECH_EXPORT
int DS_AddHotWord(ModelState* aCtx, const char* word, float boostCoefficient)

This entry point would add a new word to your std::vector (or set, maybe, because it would guarantee unicity). If the hot word does not exists, we add it with the given boost, and if it is already in the set, we update the coefficient

Depending on usecase, it could also be cool to expose (though I'm unsure it is really required):

DEEPSPEECH_EXPORT
int DS_ClearHotWords(ModelState* aCtx)

This would simply re-init the set of hot words

With this API, you could more easily expose and update all our bindings (const char ** are a bit painful via SWIG) to make the feature available.

native_client/ctcdecode/ctc_beam_search_decoder.cpp

JRMeyer · 2020-09-10T00:53:34Z

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we multiply this number by 0.5, we get -1.75, therefore doubling the likelihood of that sequence.

This isn't how log probabilities work, you're making exponential increases in the probability here. exp(-3.5) ~= 0.03 and exp(-1.75) ~= 0.17. This, combined with the fact that a single word will be boosted several times in the same beam as it appears in multiple n-grams, makes it hard to reason about the behavior of the coefficient. It should probably be an additive factor (multiplication in probability space).

Even though my initial intuition was wrong about how the boosting compounds, I still like the UX. Namely, if you're using this feature, and trying to find the right boosting coefficient for your data, you would know to sweep between 0 and 1, which isn't hard.

with an additive effect, the search space now goes from (0,1) to (0,infinity). The math is better, but the UX seems worse. I make the changes in 184189c, but I still have doubts. Thoughts?

native_client/ctcdecode/ctc_beam_search_decoder.h

native_client/ctcdecode/#ctc_beam_search_decoder.h#

native_client/deepspeech.cc

carlfm01

Nice @JRMeyer, just missing the following on the IDeepSpeech interface:

        /// <summary>
        /// Add a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <param name="aBoost">Some boost</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void AddHotWord(string aWord, float aBoost);

        /// <summary>
        /// Erase entry for a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void EraseHotWord(string aWord);

        /// <summary>
        /// Clear all hot-words.
        /// </summary>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void ClearHotWords();

native_client/dotnet/DeepSpeechClient/DeepSpeech.cs

JRMeyer · 2020-09-22T22:37:52Z

Nice @JRMeyer, just missing the following on the IDeepSpeech interface:

        /// <summary>
        /// Add a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <param name="aBoost">Some boost</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void AddHotWord(string aWord, float aBoost);

        /// <summary>
        /// Erase entry for a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void EraseHotWord(string aWord);

        /// <summary>
        /// Clear all hot-words.
        /// </summary>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void ClearHotWords();

I set these as unsafe void in 5432f56

carlfm01 · 2020-09-22T22:50:48Z

I set these as unsafe void in

Sorry, I forgot to delete the public, you did it right.

taskcluster/tc-asserts.sh

taskcluster/tc-android-ds-tests.sh

lissyx

This is now looking quite good, just fix the Android test execution, and ensure to squash into one commit

taskcluster/tc-android-utils.sh

taskcluster/tc-asserts.sh

JRMeyer · 2020-09-24T20:15:44Z

@lissyx -- it's one commit, but all the previous commit messages got appended into the one commit message :/ it doesn't look pretty, but yes it is one commit