Skip to content

Conversation

@monojenkins
Copy link
Contributor

Fixes: #14170
Fixes: dotnet/android#3112

The call_filter() function generated by mono_arch_get_call_filter()
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself. This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of xamarin/xamarin-android/d16-1@87a80b to
verify that this change resolved both of the issues linked above. I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated call_filter() function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
ctx_offset location. The 64-bit ctx value is saved to the
ctx_offset location (offset 344 in the new stack frame) shortly
afterwards:

stp	x29, x30, [sp,#-352]!
mov	x29, sp
str	x0, [x29,#344]

As expected, the invocation of call_filter() now no longer modifies
the top of the previous stack frame.

Top of the stack before call_filter():

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after call_filter() (unchanged):

(gdb) x/8x $sp
0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
Additional background information

The original lucky, "good" behavior for mono/mono/2018-08@725ba2a built with Android NDK r14

  1. The resume parameter for mono_handle_exception_internal() is
    held in register w23.

  2. That register is saved into the current stack frame at offset 20 by
    a str instruction:

       0x00000000000bc1bc <+3012>:	str	w23, [sp,#20]
    
  3. handle_exception_first_pass() invokes call_filter() via a blr
    instruction:

    2279							filtered = call_filter (ctx, ei->data.filter);
       0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl #6
       0x00000000000bc610 <+4120>:	ldr	x8, [x8,#3120]
       0x00000000000bc614 <+4124>:	ldr	x1, [x9,#112]
       0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
       0x00000000000bc61c <+4132>:	blr	x8
    

    Before the blr instruction, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
    0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071
    
  4. call_filter() runs. This function is generated by
    mono_arch_get_call_filter() and starts with:

       stp	x29, x30, [sp,#-336]!
       mov	x29, sp
       str	x0, [x29,#344]
    

    Note in particular how the first line subtracts 336 from sp to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the previous stack
    frame.

  5. After the invocations of call_filter() and
    handle_exception_first_pass(), w23 is restored from the stack:

       0x00000000000bc820 <+4648>:	ldr	w23, [sp,#20]
    

    At this step, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
    0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071
    

    Notice how call_filter() has overwritten bytes 8 through 15 of the
    stack frame. In this original lucky scenario, this does not affect
    the value restored into register w23 because that value starts at
    byte 20.

  6. mono_handle_exception_internal() tests the value of w23 to
    decide how to set the ji local variable:

    2574			if (resume) {
       0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>
    

    Since w23 is 0, mono_handle_exception_internal() correctly
    continues into the else branch.

The bad behavior for mono/mono/2018-08@725ba2a built with Android NDK r19 works slightly differently

  1. As before, the local resume parameter starts in register w23.

  2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

       0x00000000000bed7c <+3200>:	str	w23, [sp,#12]
    
  3. As before, handle_exception_first_pass() invokes call_filter()
    via a blr instruction.

    At this step, the top of the stack looks like this:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
    
  4. call_filter() runs as before. And the first few instructions of
    that function are the same as before:

    stp	x29, x30, [sp,#-336]!
    mov	x29, sp
    str	x0, [x29,#344]
    
  5. w23 is again restored from the stack, but this time from offset 12:

       0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,#12]
    

    The problem is that call_filter() has again overwritten bytes 8
    through 15 of the stack frame:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071
    

    So after the ldr instruction, w23 has an incorrect value of
    0x7f rather than the correct value 0.

  6. As before, mono_handle_exception_internal() tests the value of w23 to
    decide how to set the ji local variable:

    2574			if (resume) {
       0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>
    

    But this time because w23 is not zero, execution continues
    incorrectly into the if branch rather than the else branch.
    The result is that ji is set to 0 rather than a valid
    (MonoJitInfo *) value. This incorrect value for ji leads to a
    null pointer dereference, as seen in the bug reports.

Backport of #14757.

/cc @lambdageek @brendanzagaeski

brendanzagaeski and others added 2 commits June 3, 2019 14:51
Fixes: mono#14170
Fixes: dotnet/android#3112

The `call_filter()` function generated by `mono_arch_get_call_filter()`
was overwriting a part of the previous stack frame because it was not
creating a large enough new stack frame for itself.  This had been
working by luck before the switch to building the runtime with Android
NDK r19.

I used a local build of [xamarin/xamarin-android/d16-1@87a80b][0] to
verify that this change resolved both of the issues linked above.  I
also confirmed that I was able to reintroduce the issues in my local
environment by removing the change and rebuilding.

After this change, the generated `call_filter()` function now starts by
reserving sufficient space on the stack to hold a 64-bit value at the
`ctx_offset` location.  The 64-bit `ctx` value is saved to the
`ctx_offset` location (offset 344 in the new stack frame) shortly
afterwards:

    stp	x29, x30, [sp,#-352]!
    mov	x29, sp
    str	x0, [x29,mono#344]

As expected, the invocation of `call_filter()` now no longer modifies
the top of the previous stack frame.

Top of the stack before `call_filter()`:

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Top of the stack after `call_filter()` (unchanged):

    (gdb) x/8x $sp
    0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
    0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

Additional background information
=================================

The original lucky, "good" behavior worked as follows for
[mono/mono@725ba2a built with Android NDK r14][1]:

 1. The `resume` parameter for `mono_handle_exception_internal()` is
    held in register `w23`.

 2. That register is saved into the current stack frame at offset 20 by
    a `str` instruction:

           0x00000000000bc1bc <+3012>:	str	w23, [sp,mono#20]

 3. `handle_exception_first_pass()` invokes `call_filter()` via a `blr`
    instruction:

        2279							filtered = call_filter (ctx, ei->data.filter);
           0x00000000000bc60c <+4116>:	add	x9, x22, x24, lsl mono#6
           0x00000000000bc610 <+4120>:	ldr	x8, [x8,mono#3120]
           0x00000000000bc614 <+4124>:	ldr	x1, [x9,mono#112]
           0x00000000000bc618 <+4128>:	add	x0, sp, #0x110
           0x00000000000bc61c <+4132>:	blr	x8

    Before the `blr` instruction, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x0a8b31d8	0x00000071
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

 4. `call_filter()` runs.  This function is generated by
    `mono_arch_get_call_filter()` and starts with:

           stp	x29, x30, [sp,#-336]!
           mov	x29, sp
           str	x0, [x29,mono#344]

    Note in particular how the first line subtracts 336 from `sp` to
    start a new stack frame, but the third line writes to a position
    that is 344 bytes beyond that, back within the *previous* stack
    frame.

 5. After the invocations of `call_filter()` and
    `handle_exception_first_pass()`, `w23` is restored from the stack:

           0x00000000000bc820 <+4648>:	ldr	w23, [sp,mono#20]

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x00000000	0x00000000	0x1ef51980	0x00000071

    Notice how `call_filter()` has overwritten bytes 8 through 15 of the
    stack frame.  In this original lucky scenario, this does not affect
    the value restored into register `w23` because that value starts at
    byte 20.

 6. `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000bb960 <+872>:	cbz	w23, 0xbb9c4 <mono_handle_exception_internal+972>

    Since `w23` is `0`, `mono_handle_exception_internal()` correctly
    continues into the `else` branch.

The bad behavior for
[mono/mono@725ba2a built with Android NDK r19][2]
works just slightly differently:

 1. As before, the local `resume` parameter starts in register `w23`.
 2. This time, the register is saved into the stack frame at offset 12
    (instead of 20):

           0x00000000000bed7c <+3200>:	str	w23, [sp,mono#12]

 3. As before, `handle_exception_first_pass()` invokes `call_filter()`
    via a `blr` instruction.

    At this step, the top of the stack looks like this:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0x2724b700	0x00000000
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

 4. `call_filter()` runs as before.  And the first few instructions of
    that function are the same as before:

        stp	x29, x30, [sp,#-336]!
        mov	x29, sp
        str	x0, [x29,mono#344]

 5. `w23` is again restored from the stack, but this time from offset 12:

           0x00000000000bf2c0 <+4548>:	ldr	w23, [sp,mono#12]

    The problem is that `call_filter()` has again overwritten bytes 8
    through 15 of the stack frame:

        (gdb) x/8x $sp
        0x7fcd2774a0:	0x7144d250	0x00000000	0xcd2775b0	0x0000007f
        0x7fcd2774b0:	0x1ee19ff0	0x00000071	0x0cc00300	0x00000071

    So after the `ldr` instruction, `w23` has an incorrect value of
    `0x7f` rather than the correct value `0`.

 6. As before, `mono_handle_exception_internal()` tests the value of `w23` to
    decide how to set the `ji` local variable:

        2574			if (resume) {
           0x00000000000be430 <+820>:	cbz	w23, 0xbe834 <mono_handle_exception_internal+1848>

    But this time because `w23` is *not* zero, execution continues
    *incorrectly* into the `if` branch rather than the `else` branch.
    The result is that `ji` is set to `0` rather than a valid
    `(MonoJitInfo *)` value.  This incorrect value for `ji` leads to a
    null pointer dereference, as seen in the bug reports.

[0]: https://github.com/xamarin/xamarin-android/tree/xamarin-android-9.3.0.22
[1]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1432/Azure/
[2]: https://jenkins.xamarin.com/view/Xamarin.Android/job/xamarin-android-freestyle/1436/Azure/
@monojenkins monojenkins requested a review from lewurm as a code owner June 3, 2019 14:51
@monojenkins monojenkins added this to the 2018-10 (5.20.xx) milestone Jun 3, 2019
@lewurm
Copy link
Contributor

lewurm commented Jun 3, 2019

@monojenkins squash

@monojenkins
Copy link
Contributor Author

Cannot squash because the following required status checks are not successful:

  • "Windows x64 C++" state is "failure"

@lewurm
Copy link
Contributor

lewurm commented Jun 3, 2019

@monojenkins build failed

@monojenkins monojenkins merged commit 06cf039 into mono:2018-10 Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants