Skip to content

A64: Only clear icache when writing out new code#442

Merged
k0kubun merged 1 commit intoyjit_backend_irfrom
aw/icache
Aug 26, 2022
Merged

A64: Only clear icache when writing out new code#442
k0kubun merged 1 commit intoyjit_backend_irfrom
aw/icache

Conversation

@XrXr
Copy link

@XrXr XrXr commented Aug 26, 2022

Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.

Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
@k0kubun
Copy link
Member

k0kubun commented Aug 26, 2022

The time spent on Arm CI seems to be shortened a lot too. Great work 👏👏👏

Copy link
Member

@k0kubun k0kubun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@k0kubun k0kubun merged commit d7acd22 into yjit_backend_ir Aug 26, 2022
@k0kubun k0kubun deleted the aw/icache branch August 26, 2022 18:02
k0kubun pushed a commit that referenced this pull request Aug 26, 2022
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
Comment on lines +964 to +972
// Invalidate icache for newly written out region so we don't run
// stale code.
{
let start = cb.get_ptr(start_write_pos).raw_ptr();
let write_ptr = cb.get_write_ptr().raw_ptr();
let codeblock_end = cb.get_ptr(cb.get_mem_size()).raw_ptr();
let end = std::cmp::min(write_ptr, codeblock_end);
unsafe { rb_yjit_icache_invalidate(start as _, end as _) };
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together Alan!

Do you know if __builtin___clear_cache(start, end); uses a syscall underneath, or just a machine loop? Asking because if it uses a syscall, we might be able to batch multiple calls together in a future PR to improve speed further.

k0kubun pushed a commit that referenced this pull request Aug 29, 2022
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
k0kubun pushed a commit that referenced this pull request Aug 29, 2022
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
k0kubun pushed a commit to ruby/ruby that referenced this pull request Aug 29, 2022
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants