Conversation
0de5ad6 to
00af2db
Compare
Codecov Report
@@ Coverage Diff @@
## master #2747 +/- ##
=========================================
- Coverage 80.16% 78.8% -1.37%
=========================================
Files 83 82 -1
Lines 4589 4288 -301
=========================================
- Hits 3679 3379 -300
+ Misses 910 909 -1
|
|
failing test can't be fixed until Kibana side is merged, as Kibana returns |
|
depends on elastic/kibana#46995 |
simitt
left a comment
There was a problem hiding this comment.
You explained that GA changes will be breaking. Does that mean that the agents using the beta implementation would break when talking to APM Server or would they just stop applying ACM?
agentcfg/model.go
Outdated
| func dotKey(k1, k2 string) string { | ||
| if k1 == "" { | ||
| return k2 | ||
| // ZeroResult creates a Result struct |
beater/api/config/agent/handler.go
Outdated
| return c.Request.URL.Query().Get(agentcfg.Etag) | ||
| } | ||
|
|
||
| func sanitize(isRum bool, settings agentcfg.Settings) agentcfg.Settings { |
There was a problem hiding this comment.
The agentcfg package has knowledge about RUM, and the Query has an IsRum flag. IMO this sanitize method should be part of fetching the result from the agentcfg package.
agentcfg/cache.go
Outdated
| if !c.logger.IsDebug() { | ||
| return | ||
| if !authorized(q.IsRum, result.Source.Agent) { | ||
| return ZeroResult(), nil |
There was a problem hiding this comment.
With this implementation non authorized RUM results are not cached. IMO this should be changed.
There was a problem hiding this comment.
I don't think so. If then the same request comes trough the right endpoint, the result would be a wrong empty result.
There was a problem hiding this comment.
Kibana is responding the same way to this query, no matter who requests it, right? So I don't see why caching the result here would lead to wrong results.
Ofc I did not mean to cache the ZeroResult but the result.
There was a problem hiding this comment.
in that case i don't know why i would want to cache something that i am not even returning, but im fine either way
There was a problem hiding this comment.
My reasoning is the following:
What is the purpose of caching here?
To avoid unnecessary calls to Kibana based on legitimate requests from agents. But as soon as those requests can be triggered from requests to unsecured RUM endpoint, also to prevent overloading Kibana on any kind of attacks.
The authorized check would only return false if the agents had a bug or for malicious requests. In the case of malicious requests it makes sense to also avoid sending the same requests over and over to Kibana.
However, if you add it to the cache, you would probably also need to size limit the cache, as otherwise you open up another attack vector.
I don't see why the decision in which cases something should be added to the cache is in direct relation on whether or not you are returning this information.
There was a problem hiding this comment.
Not sure you find this useful, but I gave this some thought when introducing the caching #2220 (comment).
There was a problem hiding this comment.
Thanks.
I am changing the caching key to be service name + env, instead of Etag.
When there are many RUM agents, they will not send an ifnonematch arg until they get one from a previous call, which means that all of them would have to hit Kibana at least one.
Caching the service name solves this problem.
On the other side, I don't think is necessary to have a cache limit.
| middleware.SetRumFlagMiddleware(), | ||
| middleware.SetRateLimitMiddleware(cfg.RumConfig.EventRate), | ||
| middleware.CORSMiddleware(cfg.RumConfig.AllowOrigins), | ||
| middleware.KillSwitchMiddleware(cfg.RumConfig.IsEnabled())) |
There was a problem hiding this comment.
It still bothers me that all of these middleware functions are run if someone has RUM disabled; as a user I'd expect the KillswitchMiddleware to be run before the CORS and rate limiter middleware at least.
There was a problem hiding this comment.
i actually think we should do the CORS dance properly and return to the browsers the response they expect, otherwise the browser might show an error in the lines of "domain not allowed" when the root cause is other.
There was a problem hiding this comment.
I lack knowledge about what is the standard behavior related to CORS and browsers. If the advantage is worth the potential additional costs I'm ok with it.
| return func(h request.Handler) (request.Handler, error) { | ||
| return func(c *request.Context) { | ||
| c.RateLimiter = store | ||
| c.RateLimiter = store.ForIP(c.Request) |
There was a problem hiding this comment.
Can you change the name to include IP now.
There was a problem hiding this comment.
ha! i was about to do that, but then i thought you will reply with something like this: #2706 (comment)
not sure what is the criteria now, but will change as I agree
There was a problem hiding this comment.
I don't follow. The logic that the rate limiter limits by IP is now in the middleware. It was not before. So now it makes sense to reflect it in the middleware name.
There was a problem hiding this comment.
As clarified offline, this is a comment on L29, not L34.
Changed in 0788847
they will just not get any config updates, so they won't apply it |
simitt
left a comment
There was a problem hiding this comment.
The sanitize method is based on the querie's isRum, the cache key is based on Service attributes. You call sanitize before adding the value to the cache. In case a request is made for a backend service via the RUM endpoint and then again via the backend endpoint this will lead to wrong results.
The authorized check occurs only for non-cached information. As soon as information is cached, which is based on service attributes and ignoring RUM/backend, the information is just returned. If an unauthorized RUM request follows an authorized backend request, this will result in non-authorized response.
On the other side, I don't think is necessary to have a cache limit.
From what I see the cache can potentially grow infinite with requests for non-existing services. Since we had a bit of an offline discussion regarding caching, please describe why you decided a cache limit is not necessary.
Not true, as far I can see.
Right, will update.
Lets talk about that offline |
Within the |
|
This test 8bbae34#diff-6756162984ecc44fbead19b809440eaeR305 covers that scenario and passes locally, I have to see what's up |
|
it always worked because each handler has its own cache, added another test completeness. |
simitt
left a comment
There was a problem hiding this comment.
Except for one comment related to removing sourrinding " of etags I only left nitpick comments.
agentcfg/cache.go
Outdated
| func (c *cache) fetchAndAdd(q Query, fn func(Query) (*Doc, error)) (doc *Doc, err error) { | ||
| id := q.ID() | ||
|
|
||
| func (c *cache) fetch(q Query, fetch func() (Result, error)) (Result, error) { |
There was a problem hiding this comment.
As I understand from your arguments in an offline discussion, you are in favor of not using abbreviations, so please use query here instead of q.
agentcfg/cache.go
Outdated
| if !c.logger.IsDebug() { | ||
| return | ||
| if c.logger.IsDebug() { | ||
| c.logger.Debugf("Cache size %v. Added ID %v.", c.gocache.ItemCount(), result.Source.Etag) |
There was a problem hiding this comment.
logging result.Source.Etag is not correct anymore.
agentcfg/cache.go
Outdated
| return nil, found | ||
| } | ||
| return val.(*Doc), found | ||
| func authorized(isRum bool, agent string) bool { |
There was a problem hiding this comment.
That's only used in tests now
| if err != nil { | ||
| return nil, err | ||
| } | ||
| func (f *Fetcher) request(r io.Reader) ([]byte, error) { |
There was a problem hiding this comment.
please remove empty line 72
agentcfg/fetch.go
Outdated
| func (f *Fetcher) Fetch(query Query) (Result, error) { | ||
| req := func() (Result, error) { | ||
| result, error := newResult(f.request(convert.ToReader(query))) | ||
| return result, error |
There was a problem hiding this comment.
you can directly say return newResult... now
agentcfg/model.go
Outdated
| if err := parse(settings, out, "", h); err != nil { | ||
| return nil, err | ||
| } | ||
| func (q Query) toString() string { |
There was a problem hiding this comment.
I'd prefer calling this ID as that's better describing what it does, and toString is usually used for a nicer presentation of an instance.
tests/system/test_integration_acm.py
Outdated
|
|
||
| @classmethod | ||
| def setUpClass(cls): | ||
| super(AgentConfigurationTest, cls).setUpClass() |
tests/system/test_integration_acm.py
Outdated
|
|
||
| def update_service_config(self, settings, _id, name, env=None): | ||
| return self.create_service_config(settings, name, env, _id=_id) | ||
| return self.create_service_config(settings, name, agent="python", env=env, _id=_id) |
There was a problem hiding this comment.
nit: No need for setting agent="python", it reflects the default.
tests/system/test_integration_acm.py
Outdated
| etag = r1.headers["Etag"] | ||
|
|
||
| r2 = requests.get(self.rum_agent_config_url, | ||
| params={"service.name": service_name, "ifnonematch": etag.replace('"', '')}, |
There was a problem hiding this comment.
why is it necessary to remove surrounding " of etag? It should be possible to just use the etag that is returned by the server.
There was a problem hiding this comment.
because the rum agent will actually send it without double quotes
There was a problem hiding this comment.
But shouldn't the server be able to process with or without quotes? And why is the RUM agent sending without quotes (I probably missed that discussion somewhere).
There was a problem hiding this comment.
I approved the PR, as I haven't followed discussions about this. Please ensure then RUM agent properly removes the quotes from the response they get if you don't change it in the server.
07e4e65 to
520fc45
Compare
|
Revisiting this question:
What do you actually meant with "break"? I bumped the min version from 7.3 to 7.5, so the agents would get an error response code, and the should handle it, therefore, not applying any config |
8f94157 to
3ced356
Compare
3ced356 to
5e0a3b2
Compare
5e0a3b2 to
e62788f
Compare
fixes #2694, fixes #2630 , fixes #2646
depends on elastic/kibana#46995
required by elastic/apm-agent-rum-js#439