-
Notifications
You must be signed in to change notification settings - Fork 6k
[pipeline] Add trace event for lag between target and display times #17384
Conversation
9ff34e7 to
cd352e5
Compare
|
Note, I'm working on adding unittests for this, wanted to give you a chance to leave any early comments. |
chinmaygarde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the need for a specific utility method in fml/time that has nothing to do with working on timing related functionality. I think you could just move the thing to two ivars in Shell or at least make it more generic. If you are going to add more stuff to it though, that changes things. Other stuff is just nits. Functionality seems fine to me and useful.
fml/time/time_recorder.cc
Outdated
| // Use of this source code is governed by a BSD-style license that can be | ||
| // found in the LICENSE file. | ||
|
|
||
| #include "flutter/fml/time/time_recorder.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor Nit: Space after the related header. clang-format likes to reorder these headers and this tends to break the ordering rules in https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes.
fml/time/time_recorder.cc
Outdated
|
|
||
| namespace fml { | ||
|
|
||
| TimeRecorder::TimeRecorder() : initialized_(false) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider an std::optional<fml::TimePoint> instead. That neatly handles the same concerns handled by the initialized_ flag.
fml/time/time_recorder.cc
Outdated
| } | ||
|
|
||
| fml::TimePoint TimeRecorder::Get() { | ||
| std::scoped_lock lock(mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a reader/writer lock instead (there in one in //flutter/fml/synchronization/shared_mutex.h). We don't know what the contention is going to be for users of this utility moving forward. This is fine for now too though.
fml/time/time_recorder.h
Outdated
| void Update(fml::TimePoint time); | ||
|
|
||
| /// Must not be called before Update gets called at lease once. | ||
| fml::TimePoint Get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fml::TimePoint Get() const;
fml/time/time_recorder.h
Outdated
|
|
||
| namespace fml { | ||
|
|
||
| /// Utility class to record time in a thread-safe manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we needed such a specific utility for something that is relatively simple (lock around an std::optional). Are you going to add to this in the future? Maybe this can be called:
template<class T>
class ThreadSafeRecorder ...
There isn't really anything time/chrono related here. You just happen to use it for a fml::TimePoint.
shell/common/shell.h
Outdated
| fml::WeakPtr<PlatformView> | ||
| weak_platform_view_; // to be shared across threads | ||
| weak_platform_view_; // to be shared across threads | ||
| fml::TimeRecorder time_recorder_; // to be shared across threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Careful with the destruction order. I'd move this above the ivars for any component that may end up using this (or just give them their own shared pointer to this object). So at least before the unique pointers to the primary thread specific components. In this specific case, it does not matter because we reset the components in the destructor but its just good to follow a safe pattern.
shell/common/rasterizer.cc
Outdated
| delegate_.GetLatestFrameTargetTime(); | ||
| const fml::TimePoint frame_target_time = layer_tree->target_time(); | ||
| if (latest_frame_target_time > frame_target_time) { | ||
| fml::tracing::TraceEventAsyncComplete("flutter", "SceneDisplayLag", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had macros for these. Should we even bother with the macros elsewhere? Those seem like a relic from a time where they were defined away. We don't do that anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be one variant of macros that exist but are slightly orthogonal to this which are: TRACE_EVENT_ASYNC_BEGIN{0, 1}, they take in id as opposed to begin_time and end_time. I was able to find only one use of these in animator.cc for Frame Request Pending event. I think it will be nice to unify on macros or function wrapper but as it stands both these variants are used.
I think we should file an issue to track this unification (should be fairly small but i'd like to keep that out of this PR). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should file an issue to track this unification (should be fairly small but i'd like to keep that out of this PR). WDYT?
Sounds good.
|
|
@chinmaygarde , i've addressed the comments and updated a test to account for this change. PTAL |
13d5b8c to
3168dac
Compare
|
The value being reported here will be an important user-experience metric in that it reports how "late" the user saw the frame. This metric can only be measured in full frames. A related app-performance metric would be "by how much was the processing late". Such a lag would be measured by "time we made the call to schedule the completed frame for display compared to our deadline". This doesn't translate into the user experience, but it lets us measure how close we were in performing the work. There may be regressions in this metric before they appear as regressions in the UX metric that we might want to track because if we let enough of these "sub-frame" lags add up, they result in another whole frame lag. Which of these metrics do we need, or both, or can we define a reporting metric that can serve both purposes. Note that the UX metric is actually the processing metric rounded up to a whole frame, essentially. For example, we can report both and let the tools focus on what they need to know. (A UX tool might care about #frames lag, but a regression tracking tool might focus on processing time over budget, for instance.) We could report the times of "target vs now" but include an optional piece of data that is the total number of user-visible lag frames...? Thoughts and suggestions? |
Thanks @flar , I like this suggestion. So the revised trace event would look like this (pseudo code): I'm definitely open to other name suggestions for the fields mentioned here. Thoughts on this? cc: @chinmaygarde , @liyuqian |
|
For naming, it seems to me
|
chinmaygarde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related app-performance metric would be "by how much was the processing late". Such a lag would be measured by "time we made the call to schedule the completed frame for display compared to our deadline". This doesn't translate into the user experience, but it lets us measure how close we were in performing the work. There may be regressions in this metric before they appear as regressions in the UX metric that we might want to track because if we let enough of these "sub-frame" lags add up, they result in another whole frame lag.
So something like frame interval lag (just an integer)? To me, that seems like it would be the most clear to interpret value. As you said, to the user, there is a massive difference between a frame that takes 1.1 frame intervals to process vs one that takes 0.9 frame intervals. In @iskakaushik's example, just one additional field for simplicity with this integer. This may also be easier to plot and visualize.
shell/common/rasterizer.cc
Outdated
| delegate_.GetLatestFrameTargetTime(); | ||
| const fml::TimePoint frame_target_time = layer_tree->target_time(); | ||
| if (latest_frame_target_time > frame_target_time) { | ||
| fml::tracing::TraceEventAsyncComplete("flutter", "SceneDisplayLag", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should file an issue to track this unification (should be fairly small but i'd like to keep that out of this PR). WDYT?
Sounds good.
|
I should've been clearer. There weren't going to be separate fields for |
Keep in mind that this is the structure of an observatory trace timeline event. The first two fields are the begin and end time that all events have, their names are simply the timestamps for the range of time the event shows on the timeline, not "names" for the things that happened at those times.
The proposal is that this events duration on the timeline plot will go from when it was supposed to be seen until when it was submitted to the platform. In truth, it won't really be seen until the next vsync transition makes it visible on the hardware screen, but we need something more precise than that to track how much we overshot. For example if we overshot by 1ms then the developer might be able to do something to simplify their scene and get it back under time budget. If we overshot by 10ms there is likely not much to do. Both will appear exactly one vsync interval later than we planned, though.
I feel that there is an ambiguity with the name "frame". It can be used to refer to the interval between vsync transitions, as in "this layer_tree will take too long to render so it will not show on the screen until the next frame". It also is used sometimes to refer to "the Dart code computed the current frame and represented it with this layer_tree". It can be either an interval between vsyncs or a thing to be rendered depending on context. And so "number of frames" can either be the number of intervals or the number of renderable things. This property wants to be the count of "extra vsync intervals before this layer_tree became visible". What is a less ambiguous name?
They are the same. The first mention of it was not a property, though, it was just the starting timestamp of the event. How much value does it have as both the start time of the event and also a named property on the event? Will the tools make it easier to extract one or the other for tracking purposes?
Where else do they see these terms? Is this a suggestion to name the "num_frames_lag" to "something_something_latency"? |
I actually like this. Though it is more technical, I think it captures the metric quite well.
The reason I listed |
|
Thanks @iskakaushik and @flar for correcting my misunderstandings! In addition to |
A note about this value being "1 for a fast frame"... I don't believe the event will be posted for events that don't lag or have latency. One issue is that there is no such thing as a "negative time event" and that is even enforced by the TRACE macros which will fix the ordering of their timestamps assuming you meant the event to have a positive duration. So, we can't post "on time" renderings as this event in the ordinary way (because "now" is less than "target_time"), and according to the way the code is written here, they would just be omitted for those frames (i.e. you don't want these events in your timeline as they appear when things go wrong). But, we could potentially post a "zero lag" event with both timestamps being the target time? Or do we just let it reorder the timestamps and have the event interval go from start=now to end=target? The problem with that is if we want a tool to find the average "latency" then it would have to interpret the events for fast frames differently, otherwise the accounting may add their timeliness to the tardiness statistic. Another option is to rename the event to something indicating "how early the rendering completed" for the case where we got it in on time? With respect to the name of the field, I'm not a big fan of "interval" by itself since that is usually the duration of a single vsync window, isn't it? But perhaps "interval_count" instead? So, "frame_vsync_interval_count" as in the count of the number of intervals that its processing lasted? That gets long, but is it less ambiguous? |
That makes sense. |
Ah, yes. This makes me think that either |
Or "vsync_interval_count"? I don't think we don't need to prepend "frame" there. "missed" would be 0 based and "interval" would be 1 based. |
|
I was thinking to restrict these events to only when we encounter a lag as a first step. I will add some benchmarks and evaluate the proposal in go/flutter-pipeline-improvements. We can add additional traces to capture how well we are doing as a next step. Given this, I decided to call it PTAL. |
shell/common/rasterizer.cc
Outdated
|
|
||
| const fml::TimePoint latest_frame_target_time = | ||
| delegate_.GetLatestFrameTargetTime(); | ||
| if (latest_frame_target_time > frame_target_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest_frame_target_time may not be updated as soon as the vsync happens which means there may be a millisecond or two before we consider this frame to have "lagged".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other place we could update it is in vsync_waiter. For example in iOS as soon as we get onDisplayLink (or right afterwards in FireCallback which will work across all platforms), we can update it this would avoid the VsyncSchedulingOverhead. I wasn't able to come up with a nice abstraction that would let us store this value there are consume it on the rasterizer.
Maybe we pass in a shared reference to a time recorder to both rasterizer and vsync waiter to record and consume these events. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the time between when the vsync happens and when the first line of our code is called is the most of the delay. By the time we field the notification, it is a very tiny delta to when we update the delegate. So moving the update earlier only improves the issue a small amount.
I guess my point was to use the comparison now > frame_target_time instead. Note that a few lines before that, we already compute "now" anyway I believe.
If the new "next target time" hasn't been delivered in a callback yet then the value we record in the event won't be very useful as it will be an out of date value. We could get fancy and try to project forward when the next target time will be based on the prior target time. Something like
if (now > frame_target_time) {
...
TimePoint next_frame_target = latest_frame_target_time;
if (now > next_frame_target) {
next_frame_target += frame_budget_millis;
}
TraceEvent(frame_target_time, now, "latest target time", next_frame_target, ...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
shell/common/rasterizer.cc
Outdated
| const auto frame_budget_millis = delegate_.GetFrameBudget().count(); | ||
| const int vsync_transitions_missed = | ||
| (latest_frame_target_time - latest_frame_target_time).ToMilliseconds() / | ||
| frame_budget_millis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An integer division may round down. Note that the number of millis is already rounded in both cases so you may be 1 under the right number of frames.
Would it make sense to count the "current frame number" in the delegate_? If not, then using a rounding calculating - add half of the frame_budget_millis to the difference before dividing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will round them.
b863071 to
6e2029f
Compare
6e2029f to
119da3d
Compare
| return 0; | ||
| } | ||
|
|
||
| void TraceTimelineEvent(TraceArg category_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to the reviewer: This is for release builds.
shell/common/rasterizer.cc
Outdated
| const auto frame_lag = | ||
| (latest_frame_target_time - frame_target_time).ToMilliseconds(); | ||
| const int vsync_transitions_missed = | ||
| round(frame_lag) / round(frame_budget_millis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the math here. Is this round() the standard math round? I don't think it helps this calculation.
I see a round() function in the code that tries to round an interval duration to the nearest duration. I need to think about whether that is the right way to go here, but I don't think so.
The formula for a "rounded integer divide" is ((a + b/2) / b). In this case:
Standard rounded integer division may not be the right answer either. I'll follow up with a better analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that is wrong. Not sure what I was thinking there. Let's say the frame budget is 16.666667. I'd like us to list out vsync_transitions_missed for various values of frame_lag.
- frame_lag = 10, missed = 1
- frame_lag = 17, missed = 2
- frame_lag = 20, missed = 2
- frame_lag = 30, missed = 2
- frame_lag = 34, missed = 3
Wouldn't regular integer integer division, without any rounding work there?
Edit: Ignore this, I forgot about the potential vsync delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into the types used here a bit, it looks like some of them are millis expressed as doubles, which makes it easier. If that is the case, then round(frame_lag / frame_budget_millis) should be fine.
I think frame_lag is an int64_t, though which is not very accurate. Perhaps ToMillisecondsF() producing a double would be better? Then I think that both values are doubles and simply rounding (the math.h version) the result of their division would be sufficient.
Do my investigations check out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Amended my comment too :-)
shell/common/rasterizer.cc
Outdated
| delegate_.OnFrameRasterized(timing); | ||
|
|
||
| const fml::TimePoint latest_frame_target_time = | ||
| delegate_.GetLatestFrameTargetTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved inside the if?
Also, the following code might help it be more accurate for the case where we just haven't gotten the vsync notification yet:
if (latest_frame_target_time < raster_finish_time) {
latest_frame_target_time += frame_budget_millis;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
cd0657f to
02a7334
Compare
shell/common/rasterizer.cc
Outdated
| if (latest_frame_target_time < raster_finish_time) { | ||
| latest_frame_target_time = | ||
| latest_frame_target_time + | ||
| fml::TimeDelta::FromMilliseconds(frame_budget_millis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? frame_budget_millis is, I think, a double representing the number of millis. The FromMilliseconds() method takes an int64_t, doesn't it? I'm surprised this doesn't generate an error, but I can't find a FromMilliseconds(double) method. There is a FromSecondsF(double) method that could be used to maintain precision. I'd add a FromMillisecondsF(double) factory or use the "seconds" version here to maintain accuracy.
02a7334 to
7196597
Compare
flar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, the event generation and frame tracking look good. I'll defer to @chinmaygarde for the rest as he also had a number of comments above. I'm approving, but suggest that @chinmaygarde chimes in as well to be sure.
7196597 to
e5e02e5
Compare
|
I have a curious question that shouldn't block the landing of this PR: do we have any trace event unit tests? Fuchsia sometimes depends on our tracing events, so it might be nice to have some unit tests to guard the changes. |
|
This pull request is not suitable for automatic merging in its current state.
|
e5e02e5 to
4119350
Compare
This change also adds TimeRecorder which records time at the start of each frame to capture the latest vsync target display time and wires it in to the rasterizer to add trace events when there is a lag.
4119350 to
dcb9d78
Compare
* 0b3f2d3 Always cache SkSL when using the Metal backend. (flutter/engine#17468) * ff62dec Roll to clang11, mark 4 (flutter/engine#17483) * 08ae3bb Remove JSON codec from C++ client wrapper (flutter/engine#17312) * 2e90965 Fix bad texture view config (flutter/engine#17486) * abc7293 [pipeline] Add trace event for lag between target and display times (flutter/engine#17384) * Updated bin/internal/fuchsia-linux.version
|
x-ref: flutter/flutter#54117 |
…lutter#17384) This change also adds TimeRecorder which records time at the start of each frame to capture the latest vsync target display time and wires it in to the rasterizer to add trace events when there is a lag.
This change also adds TimeRecorder which records time at the start
of each frame to capture the latest vsync target display time and
wires it in to the rasterizer to add trace events when there is a lag.