Fix card mark stealing issue#51104
Conversation
… the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption. Here are the conditions that cause the bug to surface: - an object containing pointers that straddles a 2 MB boundary - the objects contains just value types (i.e. no pointers) for at least 512 bytes after the 2 MB boundary - there is a generation-crossing pointer afterwards in the same object - but that is the only generation-crossing pointer in the second 512 byte range The bug comes about because of the following sequence of events: - in mark_through_cards_for_segments, we scan an object at the end of a 2 MB chunk - we encounter a pointer location that is already outside of that chunk (it belongs to card 1 in the next chunk) - we call card_transition, which advances the card to the card of the pointer location (i.e., to card 1 in the next chunk) - we realize that we need to get a new chunk - when we get the chunk, we set the card to card 0 in the chunk - we return to mark_through_cards_for_segments - as the next chunk is adjacent to the current one, we continue processing the current object - when we encounter the next pointer location in the object, we trigger a card_transition again - this will erroneously set card 0 because we think it has a cross-gen pointer - it will also reset the cross-gen generation pointer counter - thus, if there are no other cross-generation pointers in the 512 bytes described by card 1, card 1 will be erroneously cleared - having cards cleared erroneously ultimately leads to heap corruption The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk. It fixes the issue because it avoids the double card_transition. It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.
|
Tagging subscribers to this area: @dotnet/gc Issue DetailsFix issue with card marking stealing where getting a new chunk caused the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption. Here are the conditions that cause the bug to surface:
The bug comes about because of the following sequence of events:
The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk. It fixes the issue because it avoids the double card_transition. It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.
|
|
Should this change include a regression test for this scenario? |
|
Maoni noticed I had some errors in my description - one card bit describes 256 bytes, not 512 as I had assumed by mistake. |
|
@Treit: We will likely include Chris's repro case in our validation tests. |
Fix issue with card marking stealing where getting a new chunk caused the "card" variable to go backwards. This caused an extra call to card_transition, which in turn caused cards to be cleared that should be set. The ultimate result is memory corruption.
Here are the conditions that cause the bug to surface:
The bug comes about because of the following sequence of events:
The fix simply makes sure the "card" variable doesn't go backwards in find_next_chunk.
It fixes the issue because it avoids the double card_transition.
It is safe because during iteration of a segment, the "card" variable must always increase. When we switch to another segment, it may decrease, but that is fine because in this case it will be re-initialized by the logic in mark_through_cards_segments.