Skip to content

Runtime events consumer: don't parse dropped events#12062

Merged
kayceesrk merged 4 commits intoocaml:trunkfrom
TheLortex:runtime-events-consumer-fix-lost-events
Mar 18, 2023
Merged

Runtime events consumer: don't parse dropped events#12062
kayceesrk merged 4 commits intoocaml:trunkfrom
TheLortex:runtime-events-consumer-fix-lost-events

Conversation

@TheLortex
Copy link
Copy Markdown
Contributor

In the runtime events consumer library, there's a code path that handles the case when an event content is read, but in the meantime was overwrote by the producer. In that case, it calls the the lost_events callback if it exists, but then continues parsing the event with whatever data overwritten in the ring buffer, yielding incorrect values.

This problem appeared when I used custom runtime events at a very high rate (700k/s). Indeed, at a low rate this code path is rarely taken.

The PR includes a reproduction test that spawns two domains, one emitting custom events at a high rate and one consuming them and checking that the associated value is expected (0). If it's different, it means that the event payload was overwrote with a new event block (the expected zero value would then contain a header or timestamp value parsed as an integer).

The proposed fix is the following: continue (jump over event parsing) for all code paths as long as the event dropped condition is met:

 /* Check the message we've read hasn't been overwritten by the writer */
 if (ring_head > cursor->current_positions[domain_num]) {
     ...
+    continue;
 }

@Engil @sadiqj

Copy link
Copy Markdown
Contributor

@sadiqj sadiqj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @TheLortex , thanks for identifying and a fix.

Only one issue is I can't get the test to fail in the case where the fix isn't applied. Does it fail reliably for you?

@TheLortex
Copy link
Copy Markdown
Contributor Author

Does it fail reliably for you?

Yes it does. Maybe try increasing the numbers in the two for loops ?

@TheLortex TheLortex force-pushed the runtime-events-consumer-fix-lost-events branch from eccb7f9 to 99abe06 Compare March 8, 2023 10:32
Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the fix and I believe it is correct.

@sadiqj I can reliably reproduce it on my side (you can check on bremusa as well, I've seen it fail there.), so I think this may not be a blocker?

@sadiqj
Copy link
Copy Markdown
Contributor

sadiqj commented Mar 16, 2023

so I think this may not be a blocker?

Agreed, thanks for taking a look @Engil . Maybe it's just a quirk of my local setup. I think we're good to go.

@kayceesrk kayceesrk merged commit e72c139 into ocaml:trunk Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants