spec: Suggestions respect capitalization

This is part of a broader specification for Caps Lock support on touch devices. Development of this touches multiple components; each component will be implemented in a separate PR, referencing this issue.

## Related features

* #3620: spec: Caps Lock layer for touch layouts
* #3621: spec: Start of text/sentence selects shift layer

## Introduction

When a user starts typing at start-of-sentence, currently KeymanWeb does not respect the start-of-sentence capitalization. As the lexical model dictionary ignores case for matches, we need a method of adjusting the suggestions to match the input casing -- Initial case or ALL CAPS.

A flag named `languageUsesCasing` on the model determines if casing rules should be applied to suggestions. If this flag is not set, none of the following functionality should apply.

A model can define an `applyCasing()` method consumed by `wordform2Key()`:

```ts
// Separately declared, safe for models to reference
declare type CasingForm = 'lower' | 'initial' | 'upper';

applyCasing(form: CasingForm, text: string): string
``` 

Initial case will only upper case the first letter of the string. Lower case will lower case the whole string, and upper case will upper case the whole string.

A default implementation uses `toLowerCase()`, `toUpperCase()` and a simple `toInitialCase()` function. Developers may override `applyCasing()`, then should simply call `defaultApplyCasing()` after handling special cases.

## Example for Turkish

```ts
function applyCasing(form: CasingForm, text: string, 
                     defaultApplyCasing: (form, text) => string): string {
    switch(form) {
    case 'lower':
        return defaultApplyCasing(form, text
                    .replace(/I/g, 'ı')
                    .replace(/İ/g, 'i'));
    case 'upper':
        return defaultApplyCasing(form, text
                    .replace(/ı/g, 'I')
                    .replace(/i/g, 'İ'));
    case 'initial':
        return applyCasing(casingForm.upper, text.charAt(0)) + text.substr(1); 
    default:
        return text;
    }
}
```

Note: <code>toLocaleUpperCase</code>/<code>toLocaleLowerCase</code> are only supported in Chrome M58 on Android (we currently support M35), so we cannot use them at this time.

## Mechanism

**Start of token predictions:** KeymanWeb should report the current on screen keyboard layer to the LMLayer. For start-of-word, this will allow predictions to display with ‘initial’ casing.

To assist with this, ‘start of token’ will trigger a _predictive_ round of fat-finger execution based on the current on-screen keyboard layer.  The lm-layer will request this of Web.

**Mid-word predictions:** The LMLayer should check the current context token to determine casing requirements for suggestions. If the first character is lowercase, then predictions should be provided without case modification (e.g. this allows for proper names, acronyms, and special casing that are provided in the model, which covers a majority of languages).

Else, if the token is only one character long, or only the first character is upper case, then `applyCasing(initial)` should be applied for each suggestion, unless the first letter of the suggestion is already uppercase (again, allowing for acronyms, special cases).

Otherwise, if the token is all upper case, then `applyCasing(upper)` should be applied to each suggestion.

## Testing Casing

So long as the input text adheres to one of the three standard casing patterns, testing it against itself (via `input_text == applyCasing(case, input_text)`) should be mostly sufficient.  However, if the input is all lowercase (the default case), we should not modify the suggestion’s base casing.  If the suggestion’s base form in the lexical model is either `initial` or `upper`, that implies that the wordform is invalid for `lower`-casing.

So, in broad-stroke pseudocode…

```ts
if get_case_of(input_text) == lower
    return suggestion_text
else
    return applyCasing(get_case_of(input_text), suggestion_text)
```

## C3.1 Changes to .model.ts format

Additions:

*   `applyCasing`
*   `languageUsesCasing`

## C3.2 Changes to Compiler

Incorporates everything listed in C3.1.

May need additional validation:  the new function (`applyCasing`) must compile properly.

## C3.3 Changes to LMLayer

*   adds case-matching
    *   is called toward end of prediction process
    *   is a transform on the most likely suggestions
    *   merging of duplicate suggestions required after the transform (e.g. god->God, God)
*   Use of `languageUsesCasing`

## Additional notes

Future functionality plans remain in the Google Doc. We will revisit if/when we decide to implement.

## Related Issues and discussions

* #2638
* https://community.software.sil.org/t/problem-with-caps-lock-output/3718
* [Original design document](https://docs.google.com/document/d/1J1WX41trb7GtPMM_VkfnV3XNOd3WFfHRlXdbf6YR7U8/edit#heading=h.vn6hrktkdivc)
* [RFC](https://community.software.sil.org/t/request-for-comment-specification-for-suggestions-respect-capitalization/3973)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

spec: Suggestions respect capitalization #3720

Related features

Introduction

Example for Turkish

Mechanism

Testing Casing

C3.1 Changes to .model.ts format

C3.2 Changes to Compiler

C3.3 Changes to LMLayer

Additional notes

Related Issues and discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

spec: Suggestions respect capitalization #3720

Description

Related features

Introduction

Example for Turkish

Mechanism

Testing Casing

C3.1 Changes to .model.ts format

C3.2 Changes to Compiler

C3.3 Changes to LMLayer

Additional notes

Related Issues and discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions