feat: Use S3 node store with seaweedfs#3498
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3498 +/- ##
=======================================
Coverage 99.49% 99.49%
=======================================
Files 3 3
Lines 197 197
=======================================
Hits 196 196
Misses 1 1 ☔ View full report in Codecov by Sentry. |
|
Any reason why you didn't use SeaweedFS per what you said yesterday? |
Co-authored-by: Reinaldy Rafli <aldy505@proton.me>
Well I started with that and realized 3 things:
Garage fits the bill much better as it is explicitly created for smaller setups like this, easy to expand without specialized roles, doesn't have any paid thing in it, and has much more decent and familiar S3 interface support. |
when I tried seaweedfs last time (and I still use it for sourcemap/profile storage tbh) it had single node ability via Some of them enabled by default. |
|
I think garage/minio simpler for small setups, seaweedfs looks necessary for mid to high setups because all other services I know keep files as is. And thousands of thousands small files like profiles not ideal to store on most popular filesystems i guess. |
@doc-sheet Hey, I'm going to work on this PR. I'd think seaweed is better for self-hosted Sentry. One thing I don't like about Garage is that we need to specify the storage allocation beforehand, if we set it to 100GB, there might be some people that have more data than 100GB, I don't want that to cause any issues. That said, since you said you've used seaweed before: How was your experience? How does it compare to MinIO or Ceph?
Yeah if we set up an object storage, we might as well move filestore & profiles there too. But let's focus on nodestore first. |
It is a bit strange sometimes. But it is fine. It has multiple options for filer store. At first I tried redis it worked for several months and then... I just lost all data. I don't know if issue was in redis or weed itself. i suspect bug with ttl could be the reason too. But after that incident I wiped cluster and started new one with scylla as a filer backend and it works fine for almost a year already despite that ttl bug. Seaweedfs have multiple versions like
I suggest to use large_disk always. Documentation is not clear but it is easy to reach that limit I don't know difference between full and normal and just use Also I don't use s3 auth - I was too lazy to set it up. Other than all that I have no problems and barely touched it after initial setup. It just works. As for minio and ceph. But minio was the reason to look for alternatives. Tons of profiles from js-sdk stored as different files started to affect my monitoring script and soon it might start to affect minio performance too. And it is not that easy to scale minio. And probably impossible to optimize for small-files storage. At least in my low-cost setup. |
If seaweedfs would control ttl then there is another catch. weed have it's own settings for collections and it creates collection for each s3-bucket. But if sentry itself would cleanup old data I guess there is no difference. |
Good to know about Seaweed
Ah so everyone has the same experience with minio.
The sentry cleanup job only cleans up the one on filesystem. If we're using S3, it won't clean up anything. We need to configure S3 data cleanup on our own. |
|
Looks like i missed that seaweedfs now have an ability to control ttl with s3 api. And I even linked to correct section of FAQ. :) I'd like to look into new integraton with seaweedfs. Nad by the way I like the idea of expanding sentry images. I am myself install some packages and modules. Like maybe an extra step in install to build user provided Dockerfiles. |
|
Config migration is done and is behind a prompt/flag. Next up is to think how to manage the retention. Should we do it with a cron/scheduled job (cons: probably heavy process, pros: can be configured dynamically), or a S3 lifecycle (cons: can't be configured dynamically, pros: won't be a heavy process). |
|
I'm just gonna go forward with this https://github.com/seaweedfs/seaweedfs/wiki/S3-API-FAQ#setting-ttl |
This reverts commit 2f1575d.
|
Great, I believe that's all. |
|
@aminvakil @doc-sheet Hi, would you mind reviewing this PR? |
| @@ -0,0 +1,90 @@ | |||
| echo "${_group}Bootstrapping seaweedfs (node store)..." | |||
|
|
|||
| $dc up --wait seaweedfs postgres | |||
There was a problem hiding this comment.
Does the --wait thing work for podman?
There was a problem hiding this comment.
Good point. Will check later.
|
|
||
| bucket_list=$($s3cmd --access_key=sentry --secret_key=sentry --no-ssl --region=us-east-1 --host=localhost:8333 --host-bucket='localhost:8333/%(bucket)' ls) | ||
|
|
||
| if [[ $($bucket_list | tail -1 | awk '{print $3}') != 's3://nodestore' ]]; then |
There was a problem hiding this comment.
This if condition really needs some explanation about what it is checking and how -- like what are the assumptions here?
| # Other backend implementations for node storage developed by the community | ||
| # are available in public GitHub repositories. | ||
|
|
||
| SENTRY_NODESTORE = "sentry_nodestore_s3.S3PassthroughDjangoNodeStorage" |
There was a problem hiding this comment.
Do we need Passthrough for fresh installs?
There was a problem hiding this comment.
This is your original code, so I assume, yes.
There was a problem hiding this comment.
I would not assume anything about my original code 😅
install/bootstrap-s3-nodestore.sh
Outdated
| read -p "y or n? " yn | ||
| case $yn in | ||
| y | yes | 1) | ||
| export APPLY_AUTOMATIC_CONFIG_UPDATES=1 |
There was a problem hiding this comment.
This feels dangerous as once you export it like this wouldn't it also enable auto update for pgbouncer? I think that should be a separate questions and decision?
| n | no | 0) | ||
| export APPLY_AUTOMATIC_CONFIG_UPDATES=0 | ||
| echo | ||
| echo -n "Alright, you will need to update your sentry.conf.py file manually before running 'docker compose up'." |
There was a problem hiding this comment.
Again, provide details and link to this PR (or some doc). Also skip setting up seaweed at this point or at least mention that the service will be running for no good reason?
|
|
||
| if [[ "$APPLY_AUTOMATIC_CONFIG_UPDATES" == 1 ]]; then | ||
| nodestore_config=$(sed -n '/SENTRY_NODESTORE/,/[}]/{p}' sentry/sentry.conf.example.py) | ||
| if [[ $($dc exec postgres psql -qAt -U postgres -c "select exists (select * from nodestore_node limit 1)") = "f" ]]; then |
There was a problem hiding this comment.
Up until this point you don't need the postgres service so maybe instead of bringing it up at the beginning, just use $dcr postgres psql ... here?
| $dc exec seaweedfs mkdir -p /data/idx/ | ||
| $s3cmd --access_key=sentry --secret_key=sentry --no-ssl --region=us-east-1 --host=localhost:8333 --host-bucket='localhost:8333/%(bucket)' mb s3://nodestore |
There was a problem hiding this comment.
Why are these commands repeated (from lines 6 and 7 above)?
| # XXX(aldy505): Should we refactor this? | ||
| lifecycle_policy=$( | ||
| cat <<EOF | ||
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <LifecycleConfiguration> | ||
| <Rule> | ||
| <ID>Sentry-Nodestore-Rule</ID> | ||
| <Status>Enabled</Status> | ||
| <Filter></Filter> | ||
| <Expiration> | ||
| <Days>$SENTRY_EVENT_RETENTION_DAYS</Days> | ||
| </Expiration> | ||
| </Rule> | ||
| </LifecycleConfiguration> | ||
| EOF | ||
| ) | ||
| $dc exec seaweedfs sh -c "printf '%s' '$lifecycle_policy' > /tmp/nodestore-lifecycle-policy.xml" |
There was a problem hiding this comment.
Yes, make it a config file and mount it? Is this because of the $SENTRY_EVENT_RETENTION_DAYS variable? If yes, we may wanna construct a custom entry point script for seaweed so when people change this value and restart, it is updated without running ./install.sh again?
There was a problem hiding this comment.
Restarting seaweed wouldn't take anything into effect. You will need to execute s3cmd setlifecycle for that
There was a problem hiding this comment.
And I'm saying we should put whatever is necessary into a custom entrypoint script for seaweed so it picks the changes up?
There was a problem hiding this comment.
I don't like that approach, it complicates stuff.
There was a problem hiding this comment.
Okay, until someone compalins let's go with the simpler approach :)
| echo "Making sure the bucket lifecycle policy is all set up correctly..." | ||
| $s3cmd --access_key=sentry --secret_key=sentry --no-ssl --region=us-east-1 --host=localhost:8333 --host-bucket='localhost:8333/%(bucket)' getlifecycle s3://nodestore |
There was a problem hiding this comment.
This seems it belongs to tests instead of here? Is this because of the unknown env variable value?
aminvakil
left a comment
There was a problem hiding this comment.
Can we make it opt-in for a release? And get some feedback regarding this change?
@aminvakil You're saying this shouldn't be the default? And probably switch it as a default at a later stage? That works for me. |
|
I'd say default for fresh installs. And since we prompt people for existing installs, it should be fine? |
|
@aldy505 Yes. @BYK Yes. I have zero knowledge about But again that's because my limited knowledge of S3 storages and I do not know who big this change is. |
| EOF | ||
| ) | ||
| $dc exec seaweedfs sh -c "printf '%s' '$lifecycle_policy' > /tmp/nodestore-lifecycle-policy.xml" | ||
| $s3cmd --access_key=sentry --secret_key=sentry --no-ssl --region=us-east-1 --host=localhost:8333 --host-bucket='localhost:8333/%(bucket)' setlifecycle /tmp/nodestore-lifecycle-policy.xml s3://nodestore |
There was a problem hiding this comment.
What does --region=us-east-1 mean here?
There was a problem hiding this comment.
It's the default region name for AWS S3. It doesn't mean anything for self-hosted, but we can only put known region values.
| "endpoint_url": "http://seaweedfs:8333", | ||
| "bucket_path": "nodestore", | ||
| "bucket_name": "nodestore", | ||
| "region_name": "us-east-1", |
There was a problem hiding this comment.
Is region_name mandatory?
There was a problem hiding this comment.
Yep. Otherwise it'll complain of missing region name.
Great! I'll change it later tonight. |
|
Uh, I mean lets do what @BYK said. I still prefer to merge all this before 15th. |
This reverts commit 84f904f.
This reverts commit 84f904f.
Enables S3 node store using SeaweedFS and sentry-nodestore-s3 by @stayallive
This should alleviate all the issues stemming from (ab)using PostgreSQL as the node store.