Skip to content

Populate the benchmark metadata#5918

Merged
huydhn merged 15 commits intomainfrom
upload-benchmark-results-with-additional-information
Nov 15, 2024
Merged

Populate the benchmark metadata#5918
huydhn merged 15 commits intomainfrom
upload-benchmark-results-with-additional-information

Conversation

@huydhn
Copy link
Copy Markdown
Contributor

@huydhn huydhn commented Nov 14, 2024

To ease the process of gathering the benchmark metadata before uploading the the database, I'm adding a script .github/scripts/benchmarks/gather_metadata.py to gather this information and pass it to the upload script. From #5839, the benchmark metadata includes the following required fields:

-- Metadata
`timestamp` UInt64,
`schema_version` String DEFAULT 'v3',
`name` String,
-- About the change
`repo` String DEFAULT 'pytorch/pytorch',
`head_branch` String,
`head_sha` String,
`workflow_id` UInt64,
`run_attempt` UInt32,
`job_id` UInt64,
-- The raw records on S3
`s3_path` String,

I'm going to test this out with PT2 compiler instruction count benchmark at pytorch/pytorch#140493

Testing

https://github.com/pytorch/test-infra/actions/runs/11831746632/job/32967412160?pr=5918#step:5:105 gathers the metadata and upload the benchmark results correctly

Also, an actual upload at https://github.com/pytorch/pytorch/actions/runs/11831781500/job/33006545698#step:24:138

@huydhn huydhn requested review from a team, clee2000 and kit1980 November 14, 2024 05:30
@vercel
Copy link
Copy Markdown

vercel bot commented Nov 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Nov 15, 2024 7:14pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2024
for result in benchmark_results:
# This is a required field
if "metric" not in result:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to error here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could print a warning and dump the record. Although we have one metric per record in the database, there is nothing wrong with having a list of them in the same JSON file. So, I'm thinking the code just skip invalid records in the list

info(
"The result is without any information about the repo, workflow, or job id"
)
return ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you're going to return optional[str] might as well as make this none

is there a chance of nothing being in the benchmark results? if yes maybe declare repo, workflow_id, job_id etc outside of the loop

schema-version:
default: 'v2'
github-token:
default: ''
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed for v3 right, maybe we can have a check that this is given if v3 is set?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good. I'm wondering if I could leave the job id optional even for v3, but then it would complicate thing like writing query joining with workflow_job. It seems easier to make this mandatory for v3

@huydhn huydhn merged commit 5397347 into main Nov 15, 2024
@huydhn huydhn deleted the upload-benchmark-results-with-additional-information branch November 15, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants