Skip to content

Improve keybinding parsing for Unicode support#14020

Merged
sholderbach merged 1 commit intonushell:mainfrom
JustForFun88:nushell-keybinding-unicode
Oct 8, 2024
Merged

Improve keybinding parsing for Unicode support#14020
sholderbach merged 1 commit intonushell:mainfrom
JustForFun88:nushell-keybinding-unicode

Conversation

@JustForFun88
Copy link
Copy Markdown
Contributor

@JustForFun88 JustForFun88 commented Oct 7, 2024

Description

This pull request enhances the add_parsed_keybinding function to provide greater flexibility in specifying keycodes for keybindings in Nushell. Previously, the function only supported specifying keycodes directly through character notation (e.g., char_e for the character e). This limited users to a small set of keybindings, especially in scenarios where specific non-English characters were needed.

With this new version, users can also specify characters using their Unicode codes, such as char_u003B for the semicolon (;), providing a more flexible approach to customization, for example like this:

{
    name: move_to_line_end_or_take_history_hint
    modifier: shift
    keycode: char_u003B # char_;
    mode: vi_normal
    event: {
        until: [
            { send: historyhintcomplete }
            { edit: movetolineend }
        ]
    }
}

Motivation:

This enhancement allows better customization of vi mode in Nushell. For example, in the Russian layout, the $ symbol (used for 'Move to end of line') coincides with the semicolon (;). Previously, it was not possible to assign char_; as the keycode, as the parser did not recognize it as a valid Unicode character.

This enhancement extends support for specifying keycodes using Unicode character codes (e.g., char_u003B for ;), allowing users to more easily configure Nushell according to their preferred keyboard layouts and enhancing the overall flexibility of keybinding customization.

The new implementation preserves the performance of the old one and avoids redundant checks (c.starts_with("char_") and c.chars().skip(5)).

User-Facing Changes

Added support for specifying keycodes using Unicode codes, e.g., char_u002C (comma - ,):

{
    name: <command_name>, # name of the command
    modifier: none,       # key modifier
    keycode: char_u002C,  # Unicode code for the comma (',')
    mode: vi_normal,      # mode in which this binding should work
    event: {
        send: <action>    # action to be performed
    }
}

Enhance keybinding parsing to support Unicode
character codes:

- Added support for specifying keycodes using
  Unicode codes (e.g., char_u003B for `;`).
- Improved flexibility for customizing vi mode
  keybindings, especially for non-English keyboard
  layouts.
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

Thanks for trying to help out. I haven't studied your code yet, but I like the idea of expanding this to support uXXXX. I always thought char_X was a bit limiting. Just one question below.

@sholderbach
Copy link
Copy Markdown
Member

I don't quite follow the example in your motivation.

You can define a binding to ; with the existing config logic as well, the limitation is only imposed by the unquoted string, a ; semicolon will act as a statement separator so it only sees the char_.

(we may want to check our general parser/grammar if the behavior of ; in the record context is proper)

       {
            name: history_menu
            modifier: none
            keycode: "char_;"
            mode: [emacs, vi_insert, vi_normal]
            event: { send: menu name: history_menu }
        }

One general limitation with some of those special characters is that the ANSI default keyboard protocol will not always carry the modifier info with some of those. What gets transmitted from the terminal is generally really the text. While on a QWERTY keyboard it would correctly show without modifier as found on the layout, with the Russian keyboard it probably won't show the SHIFT-4 as it doesn't show a SHIFT for German QWERTZ (here ; is in the shifted position of ,)

@JustForFun88
Copy link
Copy Markdown
Contributor Author

I don't quite follow the example in your motivation.

You can define a binding to ; with the existing config logic as well, the limitation is only imposed by the unquoted string, a ; semicolon will act as a statement separator so it only sees the char_.
One general limitation with some of those special characters is that the ANSI default keyboard protocol will not always carry the modifier info with some of those. What gets transmitted from the terminal is generally really the text. While on a QWERTY keyboard it would correctly show without modifier as found on the layout, with the Russian keyboard it probably won't show the SHIFT-4 as it doesn't show a SHIFT for German QWERTZ (here ; is in the shifted position of ,)

Oh, I didn't know you could assign keys like "char_;". That's definitely cool 😊. On the other hand, I think adding uXXXX won't be much of an inconvenience, and it allows, for example, separate categorization of uppercase and lowercase Latin letters. Although the latter is probably not very useful since switching is done via SHIFT.

And you are right that modifier: none + keycode: "char_;" or modifier: none + keycode: char_u003B doesn't work. I found experimentally that you need to set modifier: shift for this to work 😊

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

IIRC, I think the issue was less about char_; and more about some special unicode character that aren't on english keyboards, although it is also about char_;. Seems like the discussion for this was on discord somewhere but it was like a double u character now W but char_uu but one character. Maybe it would be better to have that as an example.

Found it: https://discord.com/channels/601130461678272522/614593951969574961/1292426188765265951

{
    name: enter_vi_insert_mode
    modifier: none
    keycode: char_ш
    mode: vi_normal
    event: { *Some magic to change to vi_insert ??* }
}

@JustForFun88
Copy link
Copy Markdown
Contributor Author

IIRC, I think the issue was less about char_; and more about some special unicode character that aren't on english keyboards, although it is also about char_;. Seems like the discussion for this was on discord somewhere but it was like a double u character now W but char_uu but one character. Maybe it would be better to have that as an example.

Found it: https://discord.com/channels/601130461678272522/614593951969574961/1292426188765265951

{
    name: enter_vi_insert_mode
    modifier: none
    keycode: char_ш
    mode: vi_normal
    event: { *Some magic to change to vi_insert ??* }
}

Oh, that's a different issue. I have some thoughts about adding a separate variant in EventType, like EventType::Command, and putting something like Command::EnterViInsert there. But I'm not sure yet. Another option would be to have KeybindingsMode store all three: emacs_keybindings, insert_keybindings, normal_keybindings, so that the Nushell mode can be changed on the fly.
If, of course, the Nushell team considers any of these changes acceptable.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

I know getting insert mode to work is a different issue but i thought char_ш wouldn't work either, just as a key binding?

@sholderbach
Copy link
Copy Markdown
Member

Found it: discord.com/channels/601130461678272522/614593951969574961/1292426188765265951

{
    name: enter_vi_insert_mode
    modifier: none
    keycode: char_ш
    mode: vi_normal
    event: { *Some magic to change to vi_insert ??* }
}

Here the problem seems more towards what can be mapped and not the char_ dispatch. We permit a single char (so anything that is one Unicode codepoint, not joined) as this the unit that we can decode via crossterm.
From a readability and usability perspective I personally prefer that this binding is WYSIWYG. If you can generate character in the config with your keyboard you most likely can trigger the key via crossterm. One argument I can see is one might disambiguate via the Unicode id if you meant the U+003B ; semicolon or the confusable Greek question mark U+037E ; (but not sure what the keyboards/terminals do there)
The cyrillic alphabet as found on the keyboard also produces single codepoint chars e.g. й is its own char (2-bytes in UTF-8) and not some joined grapheme by combining и and some grave diacritic mark.

Our vi mode parser itself is not very advanced at the moment so doesn't permit remapping of anything in the state machine for the normal mode key sequences. So if you :noremap your way out of the incompatibilities of vim with non-QWERTY keyboards, we can't provide an easy way out at the moment.

@JustForFun88
Copy link
Copy Markdown
Contributor Author

JustForFun88 commented Oct 7, 2024

I know getting insert mode to work is a different issue but i thought char_ш wouldn't work either, just as a key binding?

No, just like the key binding char_ш works, there are no problems with that.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

No, just like the key binding char_ш works, there are no problems with that.

ok, my mistake. so, this PR is really about char_; alone then. still seems reasonable to me.

@JustForFun88
Copy link
Copy Markdown
Contributor Author

Here the problem seems more towards what can be mapped and not the char_ dispatch. We permit a single char (so anything that is one Unicode codepoint, not joined) as this the unit that we can decode via crossterm. From a readability and usability perspective I personally prefer that this binding is WYSIWYG. If you can generate character in the config with your keyboard you most likely can trigger the key via crossterm.

I have to disagree on one point: the current implementation doesn't distinguish between uppercase and lowercase Latin characters, meaning you can't assign different actions to A and a, though this is possible for non-Latin characters. Therefore, WYSIWYG isn't fully achieved for Latin characters 🙂.

@sholderbach
Copy link
Copy Markdown
Member

Therefore, WYSIWYG isn't fully achieved for Latin characters 🙂.

Yeah that lazy to_ascii_lowercase makes it a bit lax. In the history of crossterm there was a time where the capitalization of the shifted characters was less clearly defined so we probably went with that. But for all the ASCII alphabetical characters at least shift is properly detected (in contrast to the Ctrl versions)

Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made clear that I am not burning for the feature, but the implementation is sound and it can possibly disambiguate characters on international keyboards. I would not recommend that we use any of those char_u... expressions in the default config and it creeps up the number of things a user may get wrong in their config.

But let's give it a try and make sure that we have some documentation on it.

@sholderbach sholderbach merged commit 55c3fc9 into nushell:main Oct 8, 2024
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 8, 2024

@JustForFun88 it would be good to have some documentation on this change in the upcoming release blog if you have time? here's the current PR that people contribute things like this to nushell/nushell.github.io#1566

@hustcer hustcer added this to the v0.99.0 milestone Oct 8, 2024
@JustForFun88
Copy link
Copy Markdown
Contributor Author

@JustForFun88 it would be good to have some documentation on this change in the upcoming release blog if you have time? here's the current PR that people contribute things like this to nushell/nushell.github.io#1566

I think that I can prepare PR tomorrow

@JustForFun88 JustForFun88 deleted the nushell-keybinding-unicode branch October 8, 2024 14:14
@fdncred fdncred added the notes:mention Include the release notes summary in the "Hall of Fame" section label Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:mention Include the release notes summary in the "Hall of Fame" section

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants