feat(core): support normalization in Core 🌱

## Impacted modules

* Core Consumers:
  * Keyman Engine for Windows
  * Keyman Engine for macOS
  * Keyman Engine for Linux
  * Keyman Developer Debugger
* Keyman Core
* Core Keyboard Processors:
  * LDML Keyboard Processor
* Unit tests

## Steps

1. Add a Core API to convert `km_core_actions` to `km_core_action_item*` (mirrors `action_item_list_to_actions_object`)

2. Update Engines to _always_ call `km_core_state_context_set_if_needed`, and stop managing their own cached context.

3. LDML processor can then generate `km_core_actions` and existing engines can consume it with no further modification.

4. Normalization support for LDML keyboards:
  * input context (`km_core_state_set_context_if_needed`)
  * and action output (`km_core_state_get_actions` and `actions_object_to_action_item_list`)

17.0 target API usage is:

- kmn: emit queue
- ldml: emit struct
- windows: read queue
- mac: read struct
- linux: read queue
- debugger: read queue
- ldml unit tests: read struct
- Core will translate _only_ as needed, so this way, the big blocker (debugger) can wait until 18.0.

## 17.0 TODO

### Move to `km_core_state_context_set_if_needed`
- [x] Windows: use `km_core_state_context_set_if_needed` ➡ #10052
- [x] macOS: use `km_core_state_context_set_if_needed` ➡ #8403
- [x] #10212
- [x] #10213
- [x] #10214
- [x] #10215
- [x] #10384

## Longer term

1. Move to action struct in Engine for Windows, Developer Debugger, Engine for Linux ([#10353](https://github.com/keymanapp/keyman/issues/10353)) {tech debt}
  * Removes dead wood apis
  * Possible difficulties with the Keyman Developer debugger (will need sideband data on cached context as debugger is, and must continue to be, marker-aware)

2. Move to action struct for output from kmn Keyboard Processor. Then the concept of Action Queue moves to an internal component of kmn Keyboard Processor (and the debugger will query kmn keyboard processor directly).

## Implementation notes

in `km_core_state_context_set_if_needed` we'll need to do something like this:

```
  // TODO: normalization
  km_core_cp* text;
  switch(get_keyboard_processor_normalization()) {
    case NFD: text = normalize_nfd(application_context); break;
    case NFC: text = normalize_nfc(application_context); break;
    default: /*NFW*/ text = application_context;
  }
```

In `km_core_state_get_actions` and `actions_object_to_action_item_list`, Core will need to track the context back to the last base codepoint (per Unicode spec) to give a starting anchor for normalization of output.

Core will need to maintain a copy of the exact text from the app as well as the NFD cached context for LDML, and need to sync those up to determine the number of codepoints to delete from the app. Remember that the input context is in NFU "Normalization Form Unknown" -- and may well have mixed content, and we need to know exactly how many codepoints to delete from that mixed content.

## Application context synchronization

Note: this discussion is working in UTF32; if working in UTF16 then consideration needs to be given to whether counts are in codepoints or codeunits.

Given an application context in NFU and a cached context in NFD (for the purposes of this discussion, stripped of markers), we need to be able to calculate the number of characters to delete.

* The transform function has two parameters: `del` (number of characters) and `ins` (string to insert).
* `cached_context` is always NFD.
* `app_context` is always NFU.

The LDML keyboard processor will return a number of NFD codepoints to delete. However, this number cannot be passed straight to the app, because the normalization of the string may not not match up. We also need to consider normalization with string concatenation. This requires a two step process:

1. Normalize insertion side of the concatenation boundary (https://unicode.org/reports/tr15/#Concatenation): 

* remove `del` characters from `cached_context`
* while `cached_context` is not empty, and the last character of `cached_context` is not a stable character, prepend it to the `ins` and increment `del`.

```
let chopped_cached_context = cached_context.substr(0, cached_context.length - del)
while(chopped_cached_context != "" && !is_stable(chopped_cached_context.last())) {
  let ch = chopped_cached_context.pop()
  ins.prepend(ch)
  del++
}
```

2. Calculate the number of characters to delete from `app_context`, and additional fixups that need to be prepended to the inserted text `ins` because the change happened over a normalization boundary:

```
let app_del = 0
let new_app_context = app_context
let chopped_cached_context = cached_context.substr(0, cached_context.length - del)

while(nfd(new_app_context).length > chopped_cached_context.length) {
  new_app_context.pop()
  app_del++
}

ins.prepend(substr(chopped_cached_context, nfd(new_app_context).length))
```

* `app_del` will then be the number of characters to delete from the document
* `ins` will be the text to insert

## UTF-16 Considerations

Core has enough data to provide the number of codepoints to delete and number of UTF-16 codeunits to delete. Recommend that this information is provided in the action object to consumers, as that covers both of our deletion use cases (injecting backspaces vs direct string manipulation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): support normalization in Core 🌱 #9999

Impacted modules

Steps

17.0 TODO

Move to `km_core_state_context_set_if_needed`

Longer term

Implementation notes

Application context synchronization

UTF-16 Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat(core): support normalization in Core 🌱 #9999

Description

Impacted modules

Steps

17.0 TODO

Move to km_core_state_context_set_if_needed

Longer term

Implementation notes

Application context synchronization

UTF-16 Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Move to `km_core_state_context_set_if_needed`