Impacted modules
- Core Consumers:
- Keyman Engine for Windows
- Keyman Engine for macOS
- Keyman Engine for Linux
- Keyman Developer Debugger
- Keyman Core
- Core Keyboard Processors:
- Unit tests
Steps
-
Add a Core API to convert km_core_actions to km_core_action_item* (mirrors action_item_list_to_actions_object)
-
Update Engines to always call km_core_state_context_set_if_needed, and stop managing their own cached context.
-
LDML processor can then generate km_core_actions and existing engines can consume it with no further modification.
-
Normalization support for LDML keyboards:
- input context (
km_core_state_set_context_if_needed)
- and action output (
km_core_state_get_actions and actions_object_to_action_item_list)
17.0 target API usage is:
- kmn: emit queue
- ldml: emit struct
- windows: read queue
- mac: read struct
- linux: read queue
- debugger: read queue
- ldml unit tests: read struct
- Core will translate only as needed, so this way, the big blocker (debugger) can wait until 18.0.
17.0 TODO
Move to km_core_state_context_set_if_needed
Longer term
- Move to action struct in Engine for Windows, Developer Debugger, Engine for Linux (#10353) {tech debt}
- Removes dead wood apis
- Possible difficulties with the Keyman Developer debugger (will need sideband data on cached context as debugger is, and must continue to be, marker-aware)
- Move to action struct for output from kmn Keyboard Processor. Then the concept of Action Queue moves to an internal component of kmn Keyboard Processor (and the debugger will query kmn keyboard processor directly).
Implementation notes
in km_core_state_context_set_if_needed we'll need to do something like this:
// TODO: normalization
km_core_cp* text;
switch(get_keyboard_processor_normalization()) {
case NFD: text = normalize_nfd(application_context); break;
case NFC: text = normalize_nfc(application_context); break;
default: /*NFW*/ text = application_context;
}
In km_core_state_get_actions and actions_object_to_action_item_list, Core will need to track the context back to the last base codepoint (per Unicode spec) to give a starting anchor for normalization of output.
Core will need to maintain a copy of the exact text from the app as well as the NFD cached context for LDML, and need to sync those up to determine the number of codepoints to delete from the app. Remember that the input context is in NFU "Normalization Form Unknown" -- and may well have mixed content, and we need to know exactly how many codepoints to delete from that mixed content.
Application context synchronization
Note: this discussion is working in UTF32; if working in UTF16 then consideration needs to be given to whether counts are in codepoints or codeunits.
Given an application context in NFU and a cached context in NFD (for the purposes of this discussion, stripped of markers), we need to be able to calculate the number of characters to delete.
- The transform function has two parameters:
del (number of characters) and ins (string to insert).
cached_context is always NFD.
app_context is always NFU.
The LDML keyboard processor will return a number of NFD codepoints to delete. However, this number cannot be passed straight to the app, because the normalization of the string may not not match up. We also need to consider normalization with string concatenation. This requires a two step process:
- Normalize insertion side of the concatenation boundary (https://unicode.org/reports/tr15/#Concatenation):
- remove
del characters from cached_context
- while
cached_context is not empty, and the last character of cached_context is not a stable character, prepend it to the ins and increment del.
let chopped_cached_context = cached_context.substr(0, cached_context.length - del)
while(chopped_cached_context != "" && !is_stable(chopped_cached_context.last())) {
let ch = chopped_cached_context.pop()
ins.prepend(ch)
del++
}
- Calculate the number of characters to delete from
app_context, and additional fixups that need to be prepended to the inserted text ins because the change happened over a normalization boundary:
let app_del = 0
let new_app_context = app_context
let chopped_cached_context = cached_context.substr(0, cached_context.length - del)
while(nfd(new_app_context).length > chopped_cached_context.length) {
new_app_context.pop()
app_del++
}
ins.prepend(substr(chopped_cached_context, nfd(new_app_context).length))
app_del will then be the number of characters to delete from the document
ins will be the text to insert
UTF-16 Considerations
Core has enough data to provide the number of codepoints to delete and number of UTF-16 codeunits to delete. Recommend that this information is provided in the action object to consumers, as that covers both of our deletion use cases (injecting backspaces vs direct string manipulation).
Impacted modules
Steps
Add a Core API to convert
km_core_actionstokm_core_action_item*(mirrorsaction_item_list_to_actions_object)Update Engines to always call
km_core_state_context_set_if_needed, and stop managing their own cached context.LDML processor can then generate
km_core_actionsand existing engines can consume it with no further modification.Normalization support for LDML keyboards:
km_core_state_set_context_if_needed)km_core_state_get_actionsandactions_object_to_action_item_list)17.0 target API usage is:
17.0 TODO
Move to
km_core_state_context_set_if_neededkm_core_state_context_set_if_needed➡ chore(windows): remove cached context management from Keyman Engine #10052km_core_state_context_set_if_needed➡ feat(mac): invoking Keyman Core from Keyman Engine for Mac 🍕 #8403km_core_state_context_set_if_needed#10212km_core_state_context_set_if_needed#10213km_core_state_context_set_if_needed🌱 #10215Longer term
Implementation notes
in
km_core_state_context_set_if_neededwe'll need to do something like this:In
km_core_state_get_actionsandactions_object_to_action_item_list, Core will need to track the context back to the last base codepoint (per Unicode spec) to give a starting anchor for normalization of output.Core will need to maintain a copy of the exact text from the app as well as the NFD cached context for LDML, and need to sync those up to determine the number of codepoints to delete from the app. Remember that the input context is in NFU "Normalization Form Unknown" -- and may well have mixed content, and we need to know exactly how many codepoints to delete from that mixed content.
Application context synchronization
Note: this discussion is working in UTF32; if working in UTF16 then consideration needs to be given to whether counts are in codepoints or codeunits.
Given an application context in NFU and a cached context in NFD (for the purposes of this discussion, stripped of markers), we need to be able to calculate the number of characters to delete.
del(number of characters) andins(string to insert).cached_contextis always NFD.app_contextis always NFU.The LDML keyboard processor will return a number of NFD codepoints to delete. However, this number cannot be passed straight to the app, because the normalization of the string may not not match up. We also need to consider normalization with string concatenation. This requires a two step process:
delcharacters fromcached_contextcached_contextis not empty, and the last character ofcached_contextis not a stable character, prepend it to theinsand incrementdel.app_context, and additional fixups that need to be prepended to the inserted textinsbecause the change happened over a normalization boundary:app_delwill then be the number of characters to delete from the documentinswill be the text to insertUTF-16 Considerations
Core has enough data to provide the number of codepoints to delete and number of UTF-16 codeunits to delete. Recommend that this information is provided in the action object to consumers, as that covers both of our deletion use cases (injecting backspaces vs direct string manipulation).