Skip to content

shellIntegrationAddon: fix broken deserializeMessage() implementation + add tests#165635

Merged
Tyriar merged 3 commits intomicrosoft:mainfrom
rwe:fix-deserialize-message
Nov 28, 2022
Merged

shellIntegrationAddon: fix broken deserializeMessage() implementation + add tests#165635
Tyriar merged 3 commits intomicrosoft:mainfrom
rwe:fix-deserialize-message

Conversation

@rwe
Copy link
Contributor

@rwe rwe commented Nov 6, 2022

The deserializeMessage() implementation was broken and did not contain useful tests.

This adds tests (initially failing) and fixes the main correctness issues:

  • deserializeMessage() would not recognize escape sequences at the start of a string, due to a mistaken falsey check on match?.index which could be validly 0 instead of null.
    In other words: "Foo\x3bBar" would decode to "Foo;Bar", but the string "\x3bBar" would not be recognized as containing an escape sequence.
  • deserializeMessage() confused escaped and un-escaped sequences due to squashing each adjacent pair of backslashes prior to parsing.

Unlike encoding, decoding cannot be performed in passes: it must be sequential across the string, because of the potential for overlapping patterns (e.g. the third backslash in \\\x3b). The original implementation wrongly threw away the escaping information by replacing each adjacent pairs of backslashes with a single backslash, regardless of whether escaping was in effect on those backslashes or not.

Original literal value Escaped representation Current (buggy) decoding Correct decoding
Packing\Stuff\x3BEarmuffs
Packing\\Stuff\\x3BEarmuffs
Packing\Stuff\x3BEarmuffs
Packing\Stuff\x3BEarmuffs
Packing\Stuff;Earmuffs
Packing\\Stuff\x3BEarmuffs
Packing\Stuff\x3BEarmuffs
Packing\Stuff;Earmuffs

This replaces the implementation with a single sequential regex, which matches the escape sequences and replaces each match.

This PR leaves unresolved the incorrect handling of multibyte characters, however, which is the only remaining deficiency I'm aware of.

This relates to #155639. Certainly there must be other bugs filed caused by this, too, but since the shell integrations themselves don't perform escaping (see following PRs), it's hard to tell which issues are caused by the interpretation here vs. the lack of escaping there.

PRs implementing the escaping scheme for various shells:

@rwe
Copy link
Contributor Author

rwe commented Nov 15, 2022

The force-pushes here (and linked PRs) are just no-op rebases to clear the "branch out of date" gate.

rwe added 3 commits November 23, 2022 11:20
…e implementation

The test suite includes several skipped test cases for bugs which are
fixed in subsequent commits.
…t of string'

deserializeMessage() would not recognize escape sequences at the start
of a string, due to a simple falsey check on `match?.index` which could
be validly 0 instead of null.

In other words: "Foo\x3bBar" would decode to "Foo;Bar", but "\x3b" would
not be recognized as an escape sequence.

This fixes that bug, enabling those suppressed test cases, and disables
the newly-failing tests which had only passed due to the (still-broken)
escaping logic being skipped in those situations.
The original implementation confused escaped and un-escaped sequences
due to squashing each adjacent pair of backslashes prior to parsing.

Unlike encoding, decoding cannot be performed in passes this way: it
must be sequential, because of the potential for overlapping patterns
(e.g. the third backslash in "\\\x3b").

This replaces the implementation with a single sequential regex, which
matches the escape sequences and replaces each match. This enables the
tests that were broken in the previous implementation.

To illustrate, given the following two different original values (here
as literals between «», without any escaping):
  a. «Packing\Stuff\x3BEarmuffs»
  b. «Packing\Stuff;Earmuffs»

Those should be distinctly encoded as:
  a. «Packing\\Stuff\\x3BEarmuffs»
  b. «Packing\\Stuff\x3BEarmuffs»

The original implementation wrongly threw away the escaping information
by replacing each adjacent pairs of backslashes with a single backslash,
regardless of whether escaping was in effect or not:
  a. «Packing\Stuff\x3BEarmuffs»
  b. «Packing\Stuff\x3BEarmuffs»

The new implementation matches, in (a), each "\\"; and there are no
non-overlapping "\x…" sequences. In (b), the first "\\" is matches, and
the "\x3B" is matched.

This works for any correct combination of adjacent escape squences.
@rwe rwe force-pushed the fix-deserialize-message branch from 9d45c75 to 7d99990 Compare November 23, 2022 19:37
Copy link
Contributor

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, thanks for noticing my broken code 😉

@Tyriar Tyriar added this to the November 2022 milestone Nov 23, 2022
@Tyriar Tyriar merged commit 4bc59a5 into microsoft:main Nov 28, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Jan 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants