Hi, I use the example nvbit/tools/instr_count tool to test a triton kernel. However, there are some misalignment between the sass code printed by nvbit and dumped by nvdisasm. For example, nvdisasm dumped sass code only has 12 SYNCS.ARRIVE instructions, while nvbit dumped code has 14. I'm confused about the results. Any idea on why it happens?