Skip to content

Add support for new --show-percentiles arg to get more granular percentiles instead of just median to all commands#43

Merged
mukeshpanchal27 merged 7 commits into
mainfrom
enhancement/include-percentiles
Mar 7, 2023
Merged

Add support for new --show-percentiles arg to get more granular percentiles instead of just median to all commands#43
mukeshpanchal27 merged 7 commits into
mainfrom
enhancement/include-percentiles

Conversation

@felixarntz

@felixarntz felixarntz commented Feb 28, 2023

Copy link
Copy Markdown
Collaborator

Follow up to #40.

This PR chooses a hard-coded set of percentiles 10, 25, 50 (median), 75, 90, previously discussed with @oandregal. This is also what is commonly used for HTTP Archive queries by that team, and we have also used that set of percentiles in our own HTTP Archive queries (e.g. this one).

In the future we can maybe make the concrete percentiles controllable by the argument, but that doesn't seem necessary at this point. The above percentiles are a good baseline to get a more representative impression of the overall accuracy of the values.

Testing this PR

  1. Run each of the 4 commands prior to this change (for any URL / any WebPageTest result).
  2. Switch to this PR branch.
  3. Run the same commands as above again.
    • For the wpt-* commands, make sure the median results are the same.
    • For the benchmark-* commands, make sure the median results are roughly similar (they won't be the same because it's different runs).
  4. Run the same commands again, but now include the --show-percentiles flag in each one.
    • For the wpt-* commands, make sure the p50 results are the same as the previous median results.
    • For the benchmark-* commands, make sure the p50 results are roughly similar to the previous median results (they won't be the same because it's different runs).
    • For any command, do a quick sanity check on whether percentile values look right, e.g. if you use 8 runs for your command, the p25 value should be the value of the 2nd lowest run result, or if you use 10 runs, the p90 value should be the 9th lowest / 2nd highest run result. This sanity check can be done for any of the commands, since the underlying percentile calculation logic is the same in all of them.

Comment thread cli/README.md
By default, only the median values are returned. You can optionally request all the individual run values as well.

#### Required arguments
#### Arguments

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting this simply because not all arguments are required.

@felixarntz felixarntz changed the base branch from add/benchmark-web-vitals to main March 1, 2023 16:04
@felixarntz felixarntz marked this pull request as ready for review March 2, 2023 20:57
Comment thread cli/README.md
* `--number` (`-n`): Total number of requests to send.
* `--file` (`-f`): File with URLs (one URL per line) to run benchmark tests for.
* `--output` (`-o`): The output format.
* `--output` (`-o`): The output format: Either "table" or "csv".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixarntz under all other "Arguments" headings I've noticed the usage of --format (-f) for the output format, but here it's defined as --output (-o).

Is this correct? It may be correct in code (and function as --output) but should we be interchanging flags?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@10upsimon I'm aware of this, this happened due to different people working on the commands and names diverging as such :)

I agree we should address this, but thinking better in a separate PR. Please feel free to open one!

Comment thread cli/README.md
* `--number` (`-n`): Total number of requests to send.
* `--file` (`-f`): File with URLs (one URL per line) to run benchmark tests for.
* `--output` (`-o`): The output format.
* `--output` (`-o`): The output format: Either "table" or "csv".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixarntz same comment here WRT --output as opposed to --format

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@10upsimon See my reply above.

@10upsimon

Copy link
Copy Markdown
Collaborator

@felixarntz code is good for me, and I've functionally tested and all of the commands and percentile/non percentile applications work as expected.

So really just looking for feedback on whether we need to alter --output to be inline with the --format usage as it is for the wpt-metrics and wpt-server-timing commands.

This applies to the benchmark-server-timing and benchmark-web-vitals commands.

Aside from that, good to go from my end thank you!

@felixarntz

Copy link
Copy Markdown
Collaborator Author

@mukeshpanchal27 Could you give this a review some time this week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants