Skip to content

spec: Suggestions respect capitalization #3720

@mcdurdin

Description

@mcdurdin

This is part of a broader specification for Caps Lock support on touch devices. Development of this touches multiple components; each component will be implemented in a separate PR, referencing this issue.

Related features

Introduction

When a user starts typing at start-of-sentence, currently KeymanWeb does not respect the start-of-sentence capitalization. As the lexical model dictionary ignores case for matches, we need a method of adjusting the suggestions to match the input casing -- Initial case or ALL CAPS.

A flag named languageUsesCasing on the model determines if casing rules should be applied to suggestions. If this flag is not set, none of the following functionality should apply.

A model can define an applyCasing() method consumed by wordform2Key():

// Separately declared, safe for models to reference
declare type CasingForm = 'lower' | 'initial' | 'upper';

applyCasing(form: CasingForm, text: string): string

Initial case will only upper case the first letter of the string. Lower case will lower case the whole string, and upper case will upper case the whole string.

A default implementation uses toLowerCase(), toUpperCase() and a simple toInitialCase() function. Developers may override applyCasing(), then should simply call defaultApplyCasing() after handling special cases.

Example for Turkish

function applyCasing(form: CasingForm, text: string, 
                     defaultApplyCasing: (form, text) => string): string {
    switch(form) {
    case 'lower':
        return defaultApplyCasing(form, text
                    .replace(/I/g, 'ı')
                    .replace(/İ/g, 'i'));
    case 'upper':
        return defaultApplyCasing(form, text
                    .replace(/ı/g, 'I')
                    .replace(/i/g, 'İ'));
    case 'initial':
        return applyCasing(casingForm.upper, text.charAt(0)) + text.substr(1); 
    default:
        return text;
    }
}

Note: toLocaleUpperCase/toLocaleLowerCase are only supported in Chrome M58 on Android (we currently support M35), so we cannot use them at this time.

Mechanism

Start of token predictions: KeymanWeb should report the current on screen keyboard layer to the LMLayer. For start-of-word, this will allow predictions to display with ‘initial’ casing.

To assist with this, ‘start of token’ will trigger a predictive round of fat-finger execution based on the current on-screen keyboard layer. The lm-layer will request this of Web.

Mid-word predictions: The LMLayer should check the current context token to determine casing requirements for suggestions. If the first character is lowercase, then predictions should be provided without case modification (e.g. this allows for proper names, acronyms, and special casing that are provided in the model, which covers a majority of languages).

Else, if the token is only one character long, or only the first character is upper case, then applyCasing(initial) should be applied for each suggestion, unless the first letter of the suggestion is already uppercase (again, allowing for acronyms, special cases).

Otherwise, if the token is all upper case, then applyCasing(upper) should be applied to each suggestion.

Testing Casing

So long as the input text adheres to one of the three standard casing patterns, testing it against itself (via input_text == applyCasing(case, input_text)) should be mostly sufficient. However, if the input is all lowercase (the default case), we should not modify the suggestion’s base casing. If the suggestion’s base form in the lexical model is either initial or upper, that implies that the wordform is invalid for lower-casing.

So, in broad-stroke pseudocode…

if get_case_of(input_text) == lower
    return suggestion_text
else
    return applyCasing(get_case_of(input_text), suggestion_text)

C3.1 Changes to .model.ts format

Additions:

  • applyCasing
  • languageUsesCasing

C3.2 Changes to Compiler

Incorporates everything listed in C3.1.

May need additional validation: the new function (applyCasing) must compile properly.

C3.3 Changes to LMLayer

  • adds case-matching
    • is called toward end of prediction process
    • is a transform on the most likely suggestions
    • merging of duplicate suggestions required after the transform (e.g. god->God, God)
  • Use of languageUsesCasing

Additional notes

Future functionality plans remain in the Google Doc. We will revisit if/when we decide to implement.

Related Issues and discussions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions