Skip to content

Store perf numbers in AWS database#5844

Closed
yf225 wants to merge 1 commit intopytorch:masterfrom
yf225:rds
Closed

Store perf numbers in AWS database#5844
yf225 wants to merge 1 commit intopytorch:masterfrom
yf225:rds

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Mar 17, 2018

Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to AWS RDS, which allows us to have finer control over when to push the new numbers.

@yf225 yf225 force-pushed the rds branch 2 times, most recently from e2ecede to 97c6745 Compare March 17, 2018 00:29
@yf225
Copy link
Copy Markdown
Contributor Author

yf225 commented Mar 17, 2018

@pytorchbot retest this please

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 17, 2018

What is the plan for schema migrations?

test_name = Column(String)
test_type = Column(String)
mean = Column(Float)
sigma = Column(Float)

This comment was marked as off-topic.

if args.local:
engine = create_engine('sqlite:///test.db')
else:
engine = create_engine('postgresql://{}:{}@{}/{}'.format(args.username, args.password, args.hostname, args.dbname))

This comment was marked as off-topic.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 19, 2018

Just a few questions about overall how this is going to work infra-wise. If you need this to merge to help with iterating, happy to.

@yf225
Copy link
Copy Markdown
Contributor Author

yf225 commented Mar 19, 2018

I thought about it a bit more and felt that database is probably not the right way to go. In particular, database schema migration would cause backward incompatibility if we are testing old master commits on the new schema, and we actually don't need the relational aspect of the SQL-based database at all.

I think storing the perf numbers as commit hash named JSON files in S3 might be better - that way both the data and the file format is pinned to a specific master commit, and is guaranteed to work for that commit. We would also have a LATEST_COMMIT file to store the hash of the latest tested master commit, so that we can test PRs against it.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 19, 2018

Well, the "perf numbers changed format BC" problem is always going to apply no matter what you do. SQL requires you to decide how to reformat immediately when you update the schema; NoSQL lets you defer it to a consumer of the data. You have a more generic problem where if you're retesting commits, this might because you have a new updated version of the tool that actually measures the thing you care about (ostensibly you weren't getting signal from the older thing).

My main worry with files on S3 is ending up with infra that is too dependent on AWS. (Yes, pot calling kettle black.) You also have to make sure that file updates are atomic.

(Maybe all this means is that you want a NoSQL database... Not sure what the right answer is, but hoping to help.)

@yf225 yf225 mentioned this pull request Mar 22, 2018
@yf225
Copy link
Copy Markdown
Contributor Author

yf225 commented Mar 22, 2018

Closing in favor of #5951.

@yf225 yf225 closed this Mar 22, 2018
ezyang pushed a commit that referenced this pull request Mar 24, 2018
* Store perf numbers in S3

Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers.

This is in replacement of #5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point.

* Fixed issues
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
* Store perf numbers in S3

Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers.

This is in replacement of pytorch#5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point.

* Fixed issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants