Skip to content

Fix card mark stealing issue#51104

Merged
PeterSolMS merged 35 commits intodotnet:mainfrom
PeterSolMS:Fix_card_mark_stealing_issue
Apr 13, 2021
Merged

Fix card mark stealing issue#51104
PeterSolMS merged 35 commits intodotnet:mainfrom
PeterSolMS:Fix_card_mark_stealing_issue

Conversation

@PeterSolMS
Copy link
Contributor

Fix issue with card marking stealing where getting a new chunk caused the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption.

Here are the conditions that cause the bug to surface:

  • an object containing pointers that straddles a 2 MB boundary
  • the object contains just value types (i.e. no pointers) for at least 512 bytes after the 2 MB boundary
  • there is a generation-crossing pointer afterwards in the same object
  • but that is the only generation-crossing pointer in the second 512 byte range

The bug comes about because of the following sequence of events:

  • in mark_through_cards_for_segments, we scan an object at the end of a 2 MB chunk
  • we encounter a pointer location that is already outside of that chunk (it belongs to card 1 in the next chunk)
  • we call card_transition, which advances the card to the card of the pointer location (i.e., to card 1 in the next chunk)
  • we realize that we need to get a new chunk
  • when we get the chunk, we set the card to card 0 in the chunk
  • we return to mark_through_cards_for_segments
  • as the next chunk is adjacent to the current one, we continue processing the current object
  • when we encounter the next pointer location in the object, we trigger a card_transition again
  • this will erroneously set card 0 because we think it has a cross-gen pointer
  • it will also reset the cross-gen generation pointer counter
  • thus, if there are no other cross-generation pointers in the 512 bytes described by card 1, card 1 will be erroneously cleared
  • having cards cleared erroneously ultimately leads to heap corruption

The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk.

It fixes the issue because it avoids the double card_transition.

It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.

… the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption.

Here are the conditions that cause the bug to surface:
- an object containing pointers that straddles a 2 MB boundary
- the objects contains just value types (i.e. no pointers) for at least 512 bytes after the 2 MB boundary
- there is a generation-crossing pointer afterwards in the same object
- but that is the only generation-crossing pointer in the second 512 byte range

The bug comes about because of the following sequence of events:
- in mark_through_cards_for_segments, we scan an object at the end of a 2 MB chunk
- we encounter a pointer location that is already outside of that chunk (it belongs to card 1 in the next chunk)
- we call card_transition, which advances the card to the card of the pointer location (i.e., to card 1 in the next chunk)
- we realize that we need to get a new chunk
- when we get the chunk, we set the card to card 0 in the chunk
- we return to mark_through_cards_for_segments
- as the next chunk is adjacent to the current one, we continue processing the current object
- when we encounter the next pointer location in the object, we trigger a card_transition again
- this will erroneously set card 0 because we think it has a cross-gen pointer
- it will also reset the cross-gen generation pointer counter
- thus, if there are no other cross-generation pointers in the 512 bytes described by card 1, card 1 will be erroneously cleared
- having cards cleared erroneously ultimately leads to heap corruption

The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk.

It fixes the issue because it avoids the double card_transition.

It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.
@ghost ghost added the area-GC-coreclr label Apr 12, 2021
@ghost
Copy link

ghost commented Apr 12, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix issue with card marking stealing where getting a new chunk caused the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption.

Here are the conditions that cause the bug to surface:

  • an object containing pointers that straddles a 2 MB boundary
  • the object contains just value types (i.e. no pointers) for at least 512 bytes after the 2 MB boundary
  • there is a generation-crossing pointer afterwards in the same object
  • but that is the only generation-crossing pointer in the second 512 byte range

The bug comes about because of the following sequence of events:

  • in mark_through_cards_for_segments, we scan an object at the end of a 2 MB chunk
  • we encounter a pointer location that is already outside of that chunk (it belongs to card 1 in the next chunk)
  • we call card_transition, which advances the card to the card of the pointer location (i.e., to card 1 in the next chunk)
  • we realize that we need to get a new chunk
  • when we get the chunk, we set the card to card 0 in the chunk
  • we return to mark_through_cards_for_segments
  • as the next chunk is adjacent to the current one, we continue processing the current object
  • when we encounter the next pointer location in the object, we trigger a card_transition again
  • this will erroneously set card 0 because we think it has a cross-gen pointer
  • it will also reset the cross-gen generation pointer counter
  • thus, if there are no other cross-generation pointers in the 512 bytes described by card 1, card 1 will be erroneously cleared
  • having cards cleared erroneously ultimately leads to heap corruption

The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk.

It fixes the issue because it avoids the double card_transition.

It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.

Author: PeterSolMS
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

@Treit
Copy link

Treit commented Apr 12, 2021

Should this change include a regression test for this scenario?

Copy link
Member

@ChrisAhna ChrisAhna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PeterSolMS
Copy link
Contributor Author

Maoni noticed I had some errors in my description - one card bit describes 256 bytes, not 512 as I had assumed by mistake.

@PeterSolMS PeterSolMS merged commit b1f7ca4 into dotnet:main Apr 13, 2021
@PeterSolMS
Copy link
Contributor Author

@Treit: We will likely include Chris's repro case in our validation tests.

@ghost ghost locked as resolved and limited conversation to collaborators May 13, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants