Skip to content
This repository was archived by the owner on Jun 19, 2025. It is now read-only.

Conversation

@JRMeyer
Copy link
Contributor

@JRMeyer JRMeyer commented Aug 31, 2020

This PR enables hot-word boosting (immediate support in the C and Python clients) with the new flags --hot_words.

The flag takes a string of words and their respective boosts separated by commas and colons, as such: --hot_words "friend:1.5,enemy:20.4". The boost takes a floating point number between -inf and inf.

The boosting is applied as an addition to the negative log likelihood of a candidate word sequence, given by the KenLM language model. Since the LM probability is a negative log value, at 0.0 we have 100% likelihood, and at negative infinity we have 0% likelihood. As such, we will always have some negative number from the KenLM model.

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we add 3 to this number, we get -0.75, therefore increasing the likelihood of that sequence. On the other hand, if we add a -3 to the likelihood, we decrease the likelihood of that sequence. Adding a negative number as a boost will make the decoder "avoid" certain words.

@lissyx
Copy link
Collaborator

lissyx commented Sep 1, 2020

This PR enables hot-word boosting (from the C client) with the two new flags --hot_words and --boost_coefficient.

Can we not limit that to the C-client? It's very much likely people will want to use this part of the API from elsewhere, and in the current state, it's completely unknown whether this works or not.

if (!hot_words_.empty()) {
// increase prob of prefix for every word
// that matches a word in the hot-words list
for (std::string word : ngram) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you measured perf impact with scorers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf as in WER? or perf as in latency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not measured the latency effects yet, no.

Are there any TC jobs that do this, or should I profile locally? What do you recommend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you'd have to do it locally. Using perf should be quite easy.

Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expose it in the API as a real list of words, and please add:

  • basic CI testing for that feature
  • usage in different bindings would really be a good thing (Python, JS, .Net, Java) if you can

Also, it looks like your current code breaks training and ctc decoder, so please fix that.

@reuben
Copy link
Contributor

reuben commented Sep 1, 2020

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we multiply this number by 0.5, we get -1.75, therefore doubling the likelihood of that sequence.

This isn't how log probabilities work, you're making exponential increases in the probability here. exp(-3.5) ~= 0.03 and exp(-1.75) ~= 0.17. This, combined with the fact that a single word will be boosted several times in the same beam as it appears in multiple n-grams, makes it hard to reason about the behavior of the coefficient. It should probably be an additive factor (multiplication in probability space).

@reuben
Copy link
Contributor

reuben commented Sep 1, 2020

@lissyx
Copy link
Collaborator

lissyx commented Sep 1, 2020

@JRMeyer To keep your API simpler, I suggest you move to a single entry point:

DEEPSPEECH_EXPORT
int DS_AddHotWord(ModelState* aCtx, const char* word, float boostCoefficient)

This entry point would add a new word to your std::vector (or set, maybe, because it would guarantee unicity). If the hot word does not exists, we add it with the given boost, and if it is already in the set, we update the coefficient

Depending on usecase, it could also be cool to expose (though I'm unsure it is really required):

DEEPSPEECH_EXPORT
int DS_ClearHotWords(ModelState* aCtx)

This would simply re-init the set of hot words

With this API, you could more easily expose and update all our bindings (const char ** are a bit painful via SWIG) to make the feature available.

@JRMeyer
Copy link
Contributor Author

JRMeyer commented Sep 10, 2020

For example, if KenLM returns -3.5 as the likelihood for the word sequence "i like cheese", if we multiply this number by 0.5, we get -1.75, therefore doubling the likelihood of that sequence.

This isn't how log probabilities work, you're making exponential increases in the probability here. exp(-3.5) ~= 0.03 and exp(-1.75) ~= 0.17. This, combined with the fact that a single word will be boosted several times in the same beam as it appears in multiple n-grams, makes it hard to reason about the behavior of the coefficient. It should probably be an additive factor (multiplication in probability space).

Even though my initial intuition was wrong about how the boosting compounds, I still like the UX. Namely, if you're using this feature, and trying to find the right boosting coefficient for your data, you would know to sweep between 0 and 1, which isn't hard.

with an additive effect, the search space now goes from (0,1) to (0,infinity). The math is better, but the UX seems worse. I make the changes in 184189c, but I still have doubts. Thoughts?

Copy link
Collaborator

@carlfm01 carlfm01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @JRMeyer, just missing the following on the IDeepSpeech interface:

        /// <summary>
        /// Add a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <param name="aBoost">Some boost</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void AddHotWord(string aWord, float aBoost);

        /// <summary>
        /// Erase entry for a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void EraseHotWord(string aWord);

        /// <summary>
        /// Clear all hot-words.
        /// </summary>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void ClearHotWords();

@JRMeyer
Copy link
Contributor Author

JRMeyer commented Sep 22, 2020

Nice @JRMeyer, just missing the following on the IDeepSpeech interface:

        /// <summary>
        /// Add a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <param name="aBoost">Some boost</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void AddHotWord(string aWord, float aBoost);

        /// <summary>
        /// Erase entry for a hot-word.
        /// </summary>
        /// <param name="aWord">Some word</param>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void EraseHotWord(string aWord);

        /// <summary>
        /// Clear all hot-words.
        /// </summary>
        /// <exception cref="ArgumentException">Thrown on failure.</exception>
        public void ClearHotWords();

I set these as unsafe void in 5432f56

@carlfm01
Copy link
Collaborator

carlfm01 commented Sep 22, 2020

I set these as unsafe void in

Sorry, I forgot to delete the public, you did it right.

@lissyx lissyx self-requested a review September 24, 2020 13:58
Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now looking quite good, just fix the Android test execution, and ensure to squash into one commit

@JRMeyer JRMeyer merged commit 1eb155e into mozilla:master Sep 24, 2020
@JRMeyer
Copy link
Contributor Author

JRMeyer commented Sep 24, 2020

@lissyx -- it's one commit, but all the previous commit messages got appended into the one commit message :/ it doesn't look pretty, but yes it is one commit

@lissyx lissyx mentioned this pull request Sep 25, 2020
3 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants