Store perf numbers in AWS database#5844
Conversation
e2ecede to
97c6745
Compare
|
@pytorchbot retest this please |
|
What is the plan for schema migrations? |
| test_name = Column(String) | ||
| test_type = Column(String) | ||
| mean = Column(Float) | ||
| sigma = Column(Float) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if args.local: | ||
| engine = create_engine('sqlite:///test.db') | ||
| else: | ||
| engine = create_engine('postgresql://{}:{}@{}/{}'.format(args.username, args.password, args.hostname, args.dbname)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Just a few questions about overall how this is going to work infra-wise. If you need this to merge to help with iterating, happy to. |
|
I thought about it a bit more and felt that database is probably not the right way to go. In particular, database schema migration would cause backward incompatibility if we are testing old master commits on the new schema, and we actually don't need the relational aspect of the SQL-based database at all. I think storing the perf numbers as commit hash named JSON files in S3 might be better - that way both the data and the file format is pinned to a specific master commit, and is guaranteed to work for that commit. We would also have a LATEST_COMMIT file to store the hash of the latest tested master commit, so that we can test PRs against it. |
|
Well, the "perf numbers changed format BC" problem is always going to apply no matter what you do. SQL requires you to decide how to reformat immediately when you update the schema; NoSQL lets you defer it to a consumer of the data. You have a more generic problem where if you're retesting commits, this might because you have a new updated version of the tool that actually measures the thing you care about (ostensibly you weren't getting signal from the older thing). My main worry with files on S3 is ending up with infra that is too dependent on AWS. (Yes, pot calling kettle black.) You also have to make sure that file updates are atomic. (Maybe all this means is that you want a NoSQL database... Not sure what the right answer is, but hoping to help.) |
|
Closing in favor of #5951. |
* Store perf numbers in S3 Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers. This is in replacement of #5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point. * Fixed issues
* Store perf numbers in S3 Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers. This is in replacement of pytorch#5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point. * Fixed issues
Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to AWS RDS, which allows us to have finer control over when to push the new numbers.