Conversation
|
@danlapid I think you meant something like this in our discussion... the event is reported earlier here, but I'm not sure if we'll always have the IoContext available (needed for an accurate timestamp)? |
|
The generated output of |
|
While this is nice, we need to report the entire tracing::return event over there and on all other event types as well! |
Please note that tracing::Return is exclusively used to return tracing::FetchResponseInfo so far, which only exists for fetch events. This is the only code path where fetchStatus is set. |
74352a1 to
b684d8c
Compare
The return event should be available on all handlers! Even those where FetchResponseInfo has no info. |
b684d8c to
3613f6f
Compare
Updated the PR (mostly to capture the intent here, there's some cleanup possible and maybe return reporting isn't always in the right spot). Is this what you had in mind? |
2b1a262 to
7269f57
Compare
src/workerd/api/tail-worker-test.js
Outdated
| '{"type":"onset","executionModel":"stateless","scriptTags":[],"info":{"type":"fetch","method":"POST","url":"http://placeholder/body-length","headers":[]}}{"type":"return","info":{"type":"fetch","statusCode":200}}{"type":"outcome","outcome":"ok","cpuTime":0,"wallTime":0}', | ||
| '{"type":"onset","executionModel":"stateless","scriptTags":[],"info":{"type":"scheduled","scheduledTime":"1970-01-01T00:00:00.000Z","cron":""}}{"type":"outcome","outcome":"ok","cpuTime":0,"wallTime":0}', | ||
| '{"type":"onset","executionModel":"stateless","scriptTags":[],"info":{"type":"scheduled","scheduledTime":"1970-01-01T00:00:00.000Z","cron":"* * * * 30"}}{"type":"outcome","outcome":"ok","cpuTime":0,"wallTime":0}', | ||
| '{"type":"onset","executionModel":"stateless","scriptTags":[],"info":{"type":"scheduled","scheduledTime":"1970-01-01T00:00:00.000Z","cron":""}}{"type":"return"}{"type":"outcome","outcome":"ok","cpuTime":0,"wallTime":0}', |
There was a problem hiding this comment.
Just a thought while I'm thinking about (not blocking for this PR)... one of the reasons we have a difference between "return" events and "outcome" events is because of waitUntil tasks... however, we don't currently instrument the fact that waitUntil has tasks at all. I wonder if we should wrap waitUntil draining in a span?
c678744 to
9d5ce2c
Compare
|
@danlapid PR has been updated after moving return event reporting into WorkerEntrypoint – tests are passing, some caveats remain:
|
prewarm events should never be traced. |
src/workerd/io/worker-entrypoint.c++
Outdated
| // but I'd like to avoid that in the non-test case. | ||
| return maybeAddGcPassForTest(context, kj::mv(promise)); | ||
| auto result = co_await maybeAddGcPassForTest(context, kj::mv(promise)); | ||
| KJ_IF_SOME(t, context.getWorkerTracer()) { |
There was a problem hiding this comment.
Somehow, downstream tests segfault in the context.getWorkerTracer() call when run on CI, but pass when running locally
There was a problem hiding this comment.
You'll have to figure it out and get this landed
There was a problem hiding this comment.
Segfaults have been resolved by using KJ_DEFER here so that the IoContext has not been destructed yet by the time we try to report – I'm not sure if that approach is sound (could it report the return event too early/when there is an exception when we don't want it/still segfault based on a race condition), but that's the only remaining open question for this PR.
There was a problem hiding this comment.
The defer works because this function is returning a promise so event->run didn't even start running when you are doing the setReturn.
This does not actually count as working, the return event will fire permaturely.
There was a problem hiding this comment.
Yeah, that's one of the concerns I had, but I kinda doubt that's actually the case since it would break our tests – if the return event was sent before we start running the customEvent, it would get sent before the Onset event and just cause tests to fail (or if not that, be at the wrong position in the trace), so the tests suggest that this is being sent at around the right time (not conclusive evidence of course – happy to consider alternatives)
There was a problem hiding this comment.
Well it would run right after the first async operation in event->run which is after the onset event was submitted.
There was a problem hiding this comment.
@mar-cf the other issue I mentioned.
I can see two workarounds here:
- Report the return event after co_awaiting the customEvent, for all events except for queue (which is the one event that caused UaF based on the IoContext no longer being available, for all other event types it worked fine in tests based on how draining works for them). Adding reporting within the queue event should work, but this is still kinda ugly, we'll need a way to tell that we're not in a queue event.
- Report the return event separately for each customEvent, within that event. Will require much more maintenance effort, easy to forget when we add a new event type.
There was a problem hiding this comment.
Updated to report return event for each customEvent type individually. trace() is not supported properly, but we return there immediately anyway and fixing that is out of scope for this PR.
9d5ce2c to
a065955
Compare
a065955 to
726171b
Compare
src/workerd/io/worker-entrypoint.c++
Outdated
| TRACE_EVENT("workerd", "WorkerEntrypoint::request() deferred proxy step", | ||
| PERFETTO_FLOW_FROM_POINTER(this)); | ||
| proxyTask = kj::mv(deferredProxy.proxyTask); | ||
| KJ_IF_SOME(t, context.getWorkerTracer()) { |
There was a problem hiding this comment.
One more wrinkle: On a downstream test, this ends up failing the KJ_REQUIRE(KJ_REQUIRE_NONNULL(trace->eventInfo).is<tracing::FetchEventInfo>()); check above, we somehow end up reporting this return event to an alarm event... I guess context ends up pointing to a different worker tracer?
There was a problem hiding this comment.
Tried many things to solve this, but can't quite figure out what the right place/time is to report this
There was a problem hiding this comment.
Saving a reference to the tracer beforehand seems to work, but there's gotta be a better way.
There was a problem hiding this comment.
This approach seems to be acceptable based on internal discussion.
4e22916 to
b4d462a
Compare
…ent types This should be reported once the return code is available, not in the destructor.
b4d462a to
f29c101
Compare
This should be reported once the return code is available, not in the destructor.