libbeat: Refactor error handling in schema.Apply()#7335
libbeat: Refactor error handling in schema.Apply()#7335ruflin merged 1 commit intoelastic:masterfrom
Conversation
libbeat/common/schema/schema.go
Outdated
There was a problem hiding this comment.
exported method Object.Map should have comment or be unexported
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported type KeyNotFoundError should have comment or be unexported
|
+100 on that change. Now that we have a result struct we can also add potentially more / other data in the future if we need it. |
|
Two other options I was thinking of on what we could return: It would make error checking more go like and often people would probably skip the result. An other option is to have event as part of the result. When I read the code I asked myself why is event not part of the result as it's kind of the result. I'm good with all 3 option as all 3 are much better then we have now. Thanks for coming up with this. |
|
The main reason why I chose to keep the event as first returned value and the "result" as second was to keep it more similar to the current interface so it continue working the same as before for the majority of cases where the error is being ignored. But it is true that in any case the signature is changing and maybe it is a good idea to force the review of uses of this function, specially where the error is being ignored. I like the |
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported function NewWrongFormatError should have comment or be unexported
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported type WrongFormatError should have comment or be unexported
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported function NewKeyNotFoundError should have comment or be unexported
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported type KeyNotFoundError should have comment or be unexported
libbeat/common/schema/error.go
Outdated
There was a problem hiding this comment.
exported type KeyError should have comment or be unexported
There was a problem hiding this comment.
This is to have more meaningful errors on nested objects, previously a missing key would report something like Key "segments" not found, now it reports key "total.segments" not found.
There was a problem hiding this comment.
nice, that was not very nice in the past.
There was a problem hiding this comment.
Added these optional modifiers because there are test files that don't contain these fields.
There was a problem hiding this comment.
Instead of that, added option to Apply to fail only on required fields.
There was a problem hiding this comment.
@ruflin this is the only place where I have seen that we'd make use of ApplyResult.NotFound, to check if source is in the list of missing fields, but to take advantage of this we'd have to complicate the eventsMapping function to return the result and not only the event and the errors. Also, I think that here it'd be enough with testing that it returns an error for the case of having a missing field, going further looks like if we were testing here the behaviour of schema.Apply, that is already being tested in the schema package.
Given that, I wonder again if it worths it to have the ApplyResult object. From the perspective of a consumer of the API, it is cleaner to just have event, err := schema.Apply(data), with an event and an error that can be used directly. I think it is also more valuable to be able to give directly the semantics of the use case, as in event.MetricSetFields, err = schema.Apply(index) or nodeData, _ := schemaXpack.Apply(node) instead of obtaining the result and then assigning result.Event to a meaningful variable. From our perspective, if we want to test more particular cases of missing fields, I think that a better place would be the schema package.
I'd go back to the approach in #7167, but with the additional improvements in error handling added here too, and with ApplyTo still returning a list of errors.
There was a problem hiding this comment.
I can see that from a user perspective it is much more convenient if the event can be assigned directly. It's also the reason why I started to think about 3 arguments.
For testing for the missing fields which are optional I think this is something specific to metricsets and not necessarly internal testing of Apply. The reason it only exists in one place so far I think is more that it was too complicated to do it. When we start to add more versions and have very subtle differences between each version I think tests for missing fields become more common.
I assume moving forward we will make optional fields the default as we discussed in the past and not required. In this scenario most fields will be optional. So I our testing will potentially more look like: max 3 missing fields and we get an error if there are more.
There was a problem hiding this comment.
I have changed the approach again, to event, err := schema.Apply(data, options...), options is optional, and can be used to select the mode of operation of Apply (so for example we can make all fields optional by default). Or to access unfiltered errors, so additional tests can be done on the missing fields for example. The test that was checking the missing fields has been refactored to this approach.
I have updated the description of the PR accordingly.
There was a problem hiding this comment.
This test was passing because schema.Apply() returned the empty schema.Errors, it was not detecting the unknown field in the (valid) JSON file. This test can be probably removed if the missing field is optional, if is not optional, an explicit check should be added in eventsMapping to check that it exists.
There was a problem hiding this comment.
Assuming this is not optional, explicit check added and skip removed.
There was a problem hiding this comment.
These fields don't exist in the test input:
Error: Received unexpected error:
2 errors: key `requests.total` not found; key `requests.disconnects` not found
Test: TestStats
Messages: _meta/test/stats.700.json
There was a problem hiding this comment.
ApplyTo can be used as a more advanced method for special cases, it still returns the full list of errors, that can be further analyzed if needed.
|
jenkins, test this again, please |
libbeat/common/schema/options.go
Outdated
There was a problem hiding this comment.
exported var DefaultApplyOptions should have comment or be unexported
libbeat/common/schema/schema.go
Outdated
There was a problem hiding this comment.
comment on exported function Required should be of the form "Required ..."
There was a problem hiding this comment.
comment on exported function DictRequired should be of the form "DictRequired ..."
ruflin
left a comment
There was a problem hiding this comment.
I really like this approach and now that I see it I'm quite happy that we kept iterating on it.
This will give us a lot of flexibility on how we can test things and making the switch to optional by default really easy.
There was a problem hiding this comment.
nice, that was not very nice in the past.
There was a problem hiding this comment.
One nice thing that we can do now is that we can check for the error from eventMapping and report it in only 1 place. The current code with reporting and returning the error now is only returning the error as there is only 1 type of errors left.
There was a problem hiding this comment.
How did you pick this field as required?
There was a problem hiding this comment.
There is a test failing if this field is not present, but I'm thinking now that all fields here seem to be required, I'm going to revert this part.
4f3231c to
b63168f
Compare
`schema.Apply()` returned a `schema.Errors` object as error, with this object is not easy to check if there was really an error, specially when optional fields are involved. This leads to lots of cases where these errors were being ignored. This change introduces a series of changes to have better control on the errors happening during the application of an schema. The interface is kept as `event, err := schema.Apply(data)`, but now the error returned is always a plain error in case of one or more errors happened, or nil if no error happened. `schema.ApplyTo()` returns the list of errors, and can be used where multiple schemas are applied for the same event. `Apply()` and `ApplyTo()` can now receive a list of options, these options are functions that can be used to filter the errors returned or to access the unfiltered list of errors. This allows different combinations of resulting errors in case of optional or required fields. It also allows to add additional checks on all the errors happened, even on the ones ignored for the final result. Three options are added by now: * `AllRequired` to fail if any non-optional field is missing, this mimics the current behaviour, and is set by default if no other option is set. * `FailOnRequired` to fail if any required field is missing. A new `Required` option is also added to be able to declare required fields. * `NotFoundKeys(cb func([]string))` is an option builder that receives a function, this function is called with the list of missing keys, so additional custom checks or tests can be done. For lists of errors, the `multierror` library is used now, some custom errors have been added to help on previous modifications and to improve some error messages. Errors on structured data contain now the full paths of the fields.
|
@exekias @andrewkroh if you have time, could you also take a look to this PR? Thanks! |
exekias
left a comment
There was a problem hiding this comment.
Great work here! I think error handling has improved a lot with this, thank you for taking it!
schema.Apply()returned aschema.Errorsobject as error, with thisobject is not easy to check if there was really an error, specially when
optional fields are involved. This leads to lots of cases where these
errors were being ignored.
This change introduces a series of changes to have better control on the
errors happening during the application of an schema.
The interface is kept as
event, err := schema.Apply(data), but now theerror returned is always a plain error in case of one or more errors
happened, or nil if no error happened.
schema.ApplyTo()returns the list of errors, and can be used wheremultiple schemas are applied for the same event.
Apply()andApplyTo()can now receive a list of options, theseoptions are functions that can be used to filter the errors returned or
to access the unfiltered list of errors. This allows different
combinations of resulting errors in case of optional or required fields.
It also allows to add additional checks on all the errors happened, even
on the ones ignored for the final result. Three options are added by now:
AllRequiredto fail if any non-optional field is missing, thismimics the current behaviour, and is set by default if no other option
is set.
FailOnRequiredto fail if any required field is missing. A newRequiredoption is also added to be able to declare required fields.NotFoundKeys(cb func([]string))is an option builder that receivesa function, this function is called with the list of missing keys, so
additional custom checks or tests can be done.
For lists of errors, the
multierrorlibrary is used now, some customerrors have been added to help on previous modifications and to improve
some error messages. Errors on structured data contain now the full
paths of the fields.