So I love the ES APM but I believe we are going to have to remove it due to the fact there is no way to limit the sampling of requests.
Currently 100% of requests will be sent no matter what with a fairly significant amount of data.
This can be limited by changing the transaction sample rate but this only reduces a small % of the data required to be indexed.
We get millions of requests a minute (mostly rate limited spam) and are constantly getting socket hangup and queue full errors so many requests are being lost anyway.
My estimate is to properly handle our requests it would probably cost us in the tens of thousands of dollars a month considering it can’t handle it and our current elastic cloud is already upwards of $2,000 a month. This would be as much or more than it costs to run our API.
Not to mention I have to clear out the entire system every 48 hours because it is filling up over a TB every couple days - have hot/warm architecture but that fills up completely.
We need to be able to sample a percent of actual requests on the agent then potentially add a multiplier on the other end to provide estimated metrics.
For example, I want to only have apm do anything for maybe 10% of requests made. This would allow us to reduce capacity requirements by 90% while still getting a general picture of what’s going on.
Note I already set to do like 0.01 transaction sampling and not capture any stack traces, etc. however it’s still far too much for apm to handle.
So I love the ES APM but I believe we are going to have to remove it due to the fact there is no way to limit the sampling of requests.
Currently 100% of requests will be sent no matter what with a fairly significant amount of data.
This can be limited by changing the transaction sample rate but this only reduces a small % of the data required to be indexed.
We get millions of requests a minute (mostly rate limited spam) and are constantly getting socket hangup and queue full errors so many requests are being lost anyway.
My estimate is to properly handle our requests it would probably cost us in the tens of thousands of dollars a month considering it can’t handle it and our current elastic cloud is already upwards of $2,000 a month. This would be as much or more than it costs to run our API.
Not to mention I have to clear out the entire system every 48 hours because it is filling up over a TB every couple days - have hot/warm architecture but that fills up completely.
We need to be able to sample a percent of actual requests on the agent then potentially add a multiplier on the other end to provide estimated metrics.
For example, I want to only have apm do anything for maybe 10% of requests made. This would allow us to reduce capacity requirements by 90% while still getting a general picture of what’s going on.
Note I already set to do like 0.01 transaction sampling and not capture any stack traces, etc. however it’s still far too much for apm to handle.