Store perf numbers in AWS database by yf225 · Pull Request #5844 · pytorch/pytorch

yf225 · 2018-03-17T00:04:49Z

Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to AWS RDS, which allows us to have finer control over when to push the new numbers.

yf225 · 2018-03-17T00:43:08Z

@pytorchbot retest this please

ezyang · 2018-03-17T18:21:29Z

What is the plan for schema migrations?

Sign in to view

+    test_name = Column(String)
+    test_type = Column(String)
+    mean = Column(Float)
+    sigma = Column(Float)


Sign in to view

+if args.local:
+    engine = create_engine('sqlite:///test.db')
+else:
+    engine = create_engine('postgresql://{}:{}@{}/{}'.format(args.username, args.password, args.hostname, args.dbname))


ezyang · 2018-03-19T19:55:59Z

Just a few questions about overall how this is going to work infra-wise. If you need this to merge to help with iterating, happy to.

yf225 · 2018-03-19T20:12:25Z

I thought about it a bit more and felt that database is probably not the right way to go. In particular, database schema migration would cause backward incompatibility if we are testing old master commits on the new schema, and we actually don't need the relational aspect of the SQL-based database at all.

I think storing the perf numbers as commit hash named JSON files in S3 might be better - that way both the data and the file format is pinned to a specific master commit, and is guaranteed to work for that commit. We would also have a LATEST_COMMIT file to store the hash of the latest tested master commit, so that we can test PRs against it.

ezyang · 2018-03-19T20:34:07Z

Well, the "perf numbers changed format BC" problem is always going to apply no matter what you do. SQL requires you to decide how to reformat immediately when you update the schema; NoSQL lets you defer it to a consumer of the data. You have a more generic problem where if you're retesting commits, this might because you have a new updated version of the tool that actually measures the thing you care about (ostensibly you weren't getting signal from the older thing).

My main worry with files on S3 is ending up with infra that is too dependent on AWS. (Yes, pot calling kettle black.) You also have to make sure that file updates are atomic.

(Maybe all this means is that you want a NoSQL database... Not sure what the right answer is, but hoping to help.)

yf225 · 2018-03-22T23:27:43Z

Closing in favor of #5951.

* Store perf numbers in S3 Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers. This is in replacement of #5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point. * Fixed issues

* Store perf numbers in S3 Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers. This is in replacement of pytorch#5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point. * Fixed issues

yf225 force-pushed the rds branch 2 times, most recently from e2ecede to 97c6745 Compare March 17, 2018 00:29

Save perf number in AWS database

97c6745

ezyang reviewed Mar 19, 2018

View reviewed changes

Comment thread .jenkins/perf_test/db_schema.py

test_name = Column(String)

test_type = Column(String)

mean = Column(Float)

sigma = Column(Float)

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Mar 19, 2018

View reviewed changes

Comment thread .jenkins/perf_test/update_baseline.py

if args.local:

engine = create_engine('sqlite:///test.db')

else:

engine = create_engine('postgresql://{}:{}@{}/{}'.format(args.username, args.password, args.hostname, args.dbname))

This comment was marked as off-topic.

Sign in to view

yf225 mentioned this pull request Mar 22, 2018

Store perf numbers in S3 #5951

Merged

yf225 closed this Mar 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store perf numbers in AWS database#5844

Store perf numbers in AWS database#5844
yf225 wants to merge 1 commit intopytorch:masterfrom
yf225:rds

yf225 commented Mar 17, 2018

Uh oh!

yf225 commented Mar 17, 2018

Uh oh!

ezyang commented Mar 17, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Mar 19, 2018

Uh oh!

yf225 commented Mar 19, 2018

Uh oh!

ezyang commented Mar 19, 2018

Uh oh!

yf225 commented Mar 22, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yf225 commented Mar 17, 2018

Uh oh!

yf225 commented Mar 17, 2018

Uh oh!

ezyang commented Mar 17, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Mar 19, 2018

Uh oh!

yf225 commented Mar 19, 2018

Uh oh!

ezyang commented Mar 19, 2018

Uh oh!

yf225 commented Mar 22, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants