-
Notifications
You must be signed in to change notification settings - Fork 955
Upload and distribute database updates via S3 #5334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I believe this is nearly ready. It's deployed and working in a demo. The currently in use database info can be found in a JSON file available on the server as well. More testing is needed to make sure that the DB file switching does not result in errors, but so far it seems solid. |
89bfc64 to
ffbf6ed
Compare
|
I setup a test of this locally. I wrote a script that swaps out the database file once per second and substituted that for the normal cron script: #!/usr/bin/env python
import os
from shutil import copy
from itertools import cycle
from time import sleep
DB_FILES = [
'bedrock1.db',
'bedrock2.db',
]
PREV_FILE = DB_FILES[1]
copy('bedrock.db', DB_FILES[0])
for db_file in cycle(DB_FILES):
copy('bedrock.db', PREV_FILE)
print 'copied main to %s' % PREV_FILE
os.rename(db_file, 'bedrock.db')
print 'renamed %s to main' % db_file
PREV_FILE = db_file
sleep(1)Then I built and ran the container. I pointed $ ab -t 60 -c 4 http://localhost:8000/en-US/security/advisories/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Finished 791 requests
Server Software: meinheld/0.6.1
Server Hostname: localhost
Server Port: 8000
Document Path: /en-US/security/advisories/
Document Length: 304397 bytes
Concurrency Level: 4
Time taken for tests: 60.179 seconds
Complete requests: 791
Failed requests: 0
Total transferred: 242385182 bytes
HTML transferred: 241144137 bytes
Requests per second: 13.14 [#/sec] (mean)
Time per request: 304.317 [ms] (mean)
Time per request: 76.079 [ms] (mean, across all concurrent requests)
Transfer rate: 3933.35 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 125 303 149.3 278 1378
Waiting: 124 220 100.6 204 1374
Total: 126 303 149.3 278 1378
Percentage of the requests served within a certain time (ms)
50% 278
66% 332
75% 410
80% 425
90% 479
95% 551
98% 628
99% 720
100% 1378 (longest request) |
3c2ac0c to
c23ff0a
Compare
bin/run-db-upload.py
Outdated
|
|
||
| def s3_client(): | ||
| access_key_id = getenv('AWS_DB_ACCESS_KEY_ID') | ||
| secret_access_key = getenv('AWS_DB_SECRET_ACCESS_KEY') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we set the environment variables that boto uses by default and let it handle reading the credentials from the environment instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do that. Just figuring that we'll likely end up with multiple AWS creds for different things eventually. Can rely on the implicit config until then though I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this with your suggestion.
docker/bin/docker_build.sh
Outdated
| # generate the dockerfile | ||
| rm -f "$FINAL_DOCKERFILE" | ||
| cat "docker/dockerfiles/bedrock_$DOCKERFILE" | envsubst '$GIT_COMMIT' > "$FINAL_DOCKERFILE" | ||
| cat "docker/dockerfiles/bedrock_$DOCKERFILE" | envsubst '$GIT_COMMIT,$BRANCH_NAME' > "$FINAL_DOCKERFILE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we need envsubst any more: https://docs.docker.com/engine/reference/builder/#environment-replacement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docker on Jenkins is too old for that to work. Will fix many things when we can use new docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurately it doesn't work on the FROM line in our version.
glogiotatidis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive how you can switch db files and nothing breaks. This is good engineering @pmac ❗️
| *.db | ||
| *.mmdb | ||
| bedrock_db_info.json | ||
| !root_files/bedrock_db_info.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Nothing so far :) Need to test more. |
|
This seems to still be working well on the demo as the healthz-cron info is showing. |
|
I'm hoping next week we can get a prod-like deployment and point some real traffic at it to see how it performs. |
966a175 to
72575a2
Compare
72575a2 to
8fc7458
Compare
84b9417 to
86b5fdb
Compare
A.K.A The Sqlitening! This removes bedrock's depenence on a database server and moves to downloading pre-built sqlite database files from s3 on a schedule. There is also a clock process that will update and upload such a database on a schedule as well. This should mean more stability, speed, and reliability for bedrock, as well as quicker development due to easy to download pre-populated databases.
jgmize
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @pmac
A.K.A The Sqlitening!
This removes bedrock's depenence on a database server and moves to downloading pre-built sqlite database files from s3 on a schedule. There is also a clock process that will update and upload such a database on a schedule as well. This should mean more stability, speed, and reliability for bedrock, as well as quicker development due to easy to download pre-populated databases.
Replaces #5236