Skip to content

feat(common/core/web): engine correction-prep optimizations#5319

Merged
jahorton merged 8 commits intomasterfrom
feat/common/core/web/correction-prep-optimizations
Jun 24, 2021
Merged

feat(common/core/web): engine correction-prep optimizations#5319
jahorton merged 8 commits intomasterfrom
feat/common/core/web/correction-prep-optimizations

Conversation

@jahorton
Copy link
Copy Markdown
Contributor

@jahorton jahorton commented Jun 21, 2021

An alternate approach from #5314.

Rather than outright turning off base fat-finger modeling, why not just threshold it within a tight time window? As seen here, it's quite simple to do.

Furthermore, after a bit of investigation as to what could possibly be responsible for #5314's point 3 making such a huge difference... I realized that the core Mock.from function was likely responsible.

  1. Mock.from did not get the sliding-context-window treatment
  2. Mock.from relies on the kmwstring.ts utility functions.

The thing is... there's no reason for point 2 to be necessary if provided an initial Mock. Which is the case for modeling potential fat-fingers, even before applying engine rules - so this "simple" optimization should actually have a very significant impact.

In testing, I've noticed that touch-alias elements may still be slow to 'load' whenever they receive focus. TouchAliasElement caret-positioning logic doesn't appear optimized for long text. That's a separate, though related, issue.


While everything needed for a proper "sliding window" approach isn't yet fully clear, I will say that all alternates should probably use the same base for their window. In particular, the starting index (within the context) of all windows should match, even if some keys output longer context than others. (So, no 'extra' sliding based on output.)

Such a rule should help simplify other aspects of the engine - particularly, application of suggestions and reversions - if maintaining existing behavior turns out to be non-trivial when that is introduced.

@mcdurdin
Copy link
Copy Markdown
Member

In testing, I've noticed that touch-alias elements may still be slow to 'load' whenever they receive focus. TouchAliasElement caret-positioning logic doesn't appear optimized for long text. That's a separate, though related, issue.

See #5312!

@mcdurdin
Copy link
Copy Markdown
Member

While everything needed for a proper "sliding window" approach isn't yet fully clear, I will say that all alternates should probably use the same base for their window.

In my experiment, the sliding window was internal to the algorithm -- stored values were all relative to start of string. It was just narrowing the search space. See #5314.

@jahorton
Copy link
Copy Markdown
Contributor Author

jahorton commented Jun 21, 2021

While everything needed for a proper "sliding window" approach isn't yet fully clear, I will say that all alternates should probably use the same base for their window.

In my experiment, the sliding window was internal to the algorithm -- stored values were all relative to start of string. It was just narrowing the search space. See #5314.

Refer to applySuggestion and applyReversion within the LanguageProcessor class. If we keep Mock.from reliant on the full context, this may not be an issue, but if we instead give Mock.from the sliding window treatment, the sliding window cannot be internal to the algorithm, as comparisons between "before" and "after" windows are likely necessary.

Note: #5314 does not include a sliding window on Transcription.preInput, which is what actually represents (tracked) context. The modifications (re: sliding window) only affected construction of the Transform, so it's thus very little surprise to me that #5314's step 1 didn't have much impact.

If we do give Mock.from the sliding-context-window treatment, we will need special handling to build a transform from the 'captured' before-state to the desired final-state, as their sliding windows will require non-trivial alignment.


Later edit:

After some extra analysis, I think we can safely build the fat-finger Transforms from a sliding context window. The biggest point of concern would be if we implemented a sliding window on tracking for the actual context state - Transcription.preInput. Fortunately, we can do the former without having to do the latter.

@mcdurdin
Copy link
Copy Markdown
Member

I've run some performance tests on my mobile. I manually set the textarea to display:none in order to take that out of the equation. Performance does not appear to be degrading noticably for longer texts now; see 20ms or so per touchend on a very long test context on my Android phone.

@jahorton jahorton marked this pull request as ready for review June 22, 2021 05:33
@jahorton
Copy link
Copy Markdown
Contributor Author

jahorton commented Jun 23, 2021

Bonus! New block of fun:

new unit tests

The two ~130ms tests shown there will outright fail at 250ms, rather than the default 2000ms. That may need a bit of tweaking if it turns out my dev machine has more performance than the build agents, of course. Also... that text? It's the full, compiled Web-core script. Figured it was an appropriate candidate for "extremely long text" that's already within the repo.

I added in some tests as a mild profiling-check for keystroke rule processing + fat-finger performance. As seen in the screenshot above... by far, the largest impact comes from the base keystroke. The base keystroke still doesn't use a sliding context during rule processing, as we need the full context to provide absolute indexing for determining the Transform from the previous context to the new context. Even the code in #5314 requires an absolute index for this.

I do have work on a local branch aimed to start addressing that if desired, but the work already done is probably an okay place to park things for the moment.

@jahorton
Copy link
Copy Markdown
Contributor Author

Good news on that last front:

Screen Shot 2021-06-23 at 9 52 00 AM

The variance there is a bit interesting, but both finished more quickly than on my development machine.

@mcdurdin
Copy link
Copy Markdown
Member

The variance there is a bit interesting, but both finished more quickly than on my development machine.

The VMs for Linux and Windows builds are virtual machines, so performance is likely to vary considerably.

@jahorton
Copy link
Copy Markdown
Contributor Author

jahorton commented Jun 23, 2021

The variance there is a bit interesting, but both finished more quickly than on my development machine.

The VMs for Linux and Windows builds are virtual machines, so performance is likely to vary considerably.

That's fair. Any preference for the timeout, then? Go with a 'loose' 1 sec?

Oh, and BTW - that was under the Linux VM.

@mcdurdin
Copy link
Copy Markdown
Member

That's fair. Any preference for the timeout, then? Go with a 'loose' 1 sec?

Let's leave it at 250ms and tweak if it proves necessary.

}

let context = new TranscriptionContext(Mock.from(target), this.configuration);
let context = new ContextWindow(Mock.from(target), this.configuration);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a new reference at the top of the file?

Copy link
Copy Markdown
Contributor Author

@jahorton jahorton Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely one of the "quirks" of the whole <reference path="" /> setup. Since a file that references this makes the new reference first... no, not in a literal sense.

Not saying it's a bad idea, though - adding it may indeed be more clear.

Comment on lines +60 to +61
if(coreSourceCode._kmwLength() > 1000000) {
coreSourceCode = coreSourceCode._kmwSubstring(0, 1000000);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(coreSourceCode._kmwLength() > 1000000) {
coreSourceCode = coreSourceCode._kmwSubstring(0, 1000000);
if(coreSourceCode.length() > 1000000) {
coreSourceCode = coreSourceCode.substring(0, 1000000);

There's really no reason to use kmw forms here is there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, that's fair. I used it mostly for cross-unit-test consistency. True, KMW's source code doesn't actually have SMP characters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length is a property, not a method, though...

@jahorton jahorton merged commit b395eab into master Jun 24, 2021
@jahorton jahorton deleted the feat/common/core/web/correction-prep-optimizations branch June 24, 2021 04:20
@keyman-server
Copy link
Copy Markdown
Collaborator

Changes in this pull request will be available for download in Keyman version 15.0.74-alpha

@keyman-server
Copy link
Copy Markdown
Collaborator

Changes in this pull request will be available for download in Keyman version 15.0.83-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants