[Security Solution][Endpoint] Policy creation callback fixes + Improved error handling in user manifest loop#71269
Conversation
|
Pinging @elastic/endpoint-app-team (Feature:Endpoint) |
|
Pinging @elastic/endpoint-response (Team:Endpoint Response) |
| try { | ||
| if (snapshot && snapshot.diffs.length > 0) { | ||
| // create new artifacts | ||
| const errors = await manifestManager.syncArtifacts(snapshot, 'add'); |
There was a problem hiding this comment.
how does syncArtifacts relates getPackageConfigCreateCallback. That appears to update something, perhaps a side effect?
Sorry I see the method is handlePackageConfigCreate. There is a few condition blocks that may need to be tested to make sure all branches functions as expected. Is it possible to break this up so it may be more testable and probably maintainable.
There was a problem hiding this comment.
@nnamdifrankie If new exception list entries were created, we may need to create new artifacts before sending the updated manifest. That's what syncArtifacts does.
There was a problem hiding this comment.
@XavierM and I have been talking about how to improve the way this callback happens... I think we will need some changes to make this more robust. Happy to discuss it @nnamdifrankie
There was a problem hiding this comment.
I think a test around what branches are supposed to exist together will suffice for now. Not sure what is the relationship between the snapshot and new package config.
There was a problem hiding this comment.
@nnamdifrankie Just pushed some of the tests, working on more.
There was a problem hiding this comment.
@nnamdifrankie @paul-tavares I've added several unit tests and have run through some manual tests. I think this fix will ensure that policy creation is not interrupted by any of the manifest processing code.
|
Pinging @elastic/ingest-management (Team:Ingest Management) |
|
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
paul-tavares
left a comment
There was a problem hiding this comment.
Looks good 👍
All of my comments are minor in nature and optional. Also, I assume you (during Dev.) added a few throw's in the code for testing purposes and that he callback always returned successfully 😁
My only concern overall is that we seem to be doing lots of sync call (await) and that can extend overall time it takes for a Package config to be created in Ingest (as it is, it already take a little bit of time due to the work it does with Packages).
FYI:
One of the things I thought about implementing in the Ingest code around these callbacks for Create Package Config was a way to "timeout" the returned Promise after x amount of time. so that no one callback can hold up the entire operation. (cc/ @jfsiii , @jen-huang , @nchaulet )
| version: '0.9.0', | ||
| }, | ||
| inputs: [], | ||
| } as NewPackageConfig; |
There was a problem hiding this comment.
Suggestion: add the return type to the function definition instead of casting. Same below
| return updatedPackageConfig; | ||
| // Until we get the Default Policy Configuration in the Endpoint package, | ||
| // we will add it here manually at creation time. | ||
| if (newPackageConfig.inputs.length === 0) { |
There was a problem hiding this comment.
Personally - I would remove this if() and always insert our input into the Package Config.
History:
The if() is only here because at one point we thought that we would get data included in the Package Config at creation time by defining it in the Endpoint Package, thus why we were doing a conditional check.
There was a problem hiding this comment.
@paul-tavares Will inputs ever have length > 0 here?
There was a problem hiding this comment.
@madirey - For endpoint Package Configs, no. At the moment we only expect 1 input at most that includes all of our data.
| let snapshot: ManifestSnapshot | null = null; | ||
| let success = true; | ||
| try { | ||
| // Try to get most up-to-date manifest data. |
There was a problem hiding this comment.
Is what's returned here different from what you retrieved above in manifestManager.getLastDispatchedManifest?
There was a problem hiding this comment.
@paul-tavares Possibly. Perhaps could look at renaming this function. The above builds a Manifest from the SO record of the last manifest dispatched. This function builds a manifest dynamically from the latest exception lists... there could be new items, so we try to get the most up-to-date data possible to avoid having to immediately re-dispatch.
There was a problem hiding this comment.
@madirey thanks for explaining that. I'm not too familiar with the design around Lists, thus the questions 😃 .
I think there is a process in place that periodically (on a timer) goes through and picks up new versions of lists and re-generates the Manifest (which pushes updates to the Package configs), right? If thats the case, I would suggest that we avoid the additional delays here in explicitly triggering that process, so that the overall Package Config creation is not incur a longer delay.
| } | ||
|
|
||
| try { | ||
| return updatedPackageConfig; |
There was a problem hiding this comment.
this may be just be a personal choice 😄
I find this return here as well as the one inside of catch() confusing because one (a future "us" ) add another return to finally {} and that would be the value that is returned out of the function.
I would propose (for clarify) that the return be consolidated to one under the finally {} block.
There was a problem hiding this comment.
@paul-tavares It was an attempt to be sure that we return the data before doing other things... but we still don't have a good way to verify that the policy was actually created. We probably need to work together to figure out a better way to do this. Can we address in a follow-up to this PR? There are some important fixes here that will help us in the meantime.
There was a problem hiding this comment.
Agreed @madirey
My point really was that the finally {} block could actually change what the function returns (ex. it could overwrite the return from either the catch() or the regular try {}.
Also - I'm tempted to suggest that if we refactor this, that we split up the concerns into 2 callbacks: one for handling only adding the policy data and a second to handle only adding the manifest information, an we register both callbacks with Ingest. We can work on this post 7.9
There was a problem hiding this comment.
@paul-tavares As written, that should not happen, but I agree with the concerns. It's ugly.
I can remove the section that pulls the snapshot and tries to get the most up-to-date information. That comes with a few caveats though:
- There's no guarantee that we will have already created the manifest when the callback runs, so we may have to send an empty artifact list... the policy returned will not be deterministic on the first policy creation (artifacts may or may not exist), so that will have unit testing implications as well,
- If there have been new exception items added within the last minute, we won't pick it up.
In both of the cases above, we'll have to re-send the policy a minute later when the updates are detected. This is probably fine for now, but it may not be ideal when we have large deployments (could result in 2 back-to-back large rollouts if I'm not mistaken).
|
|
||
| import { ILegacyScopedClusterClient, SavedObjectsClientContract } from 'kibana/server'; | ||
| import { loggingSystemMock, savedObjectsServiceMock } from 'src/core/server/mocks'; | ||
| // eslint-disable-next-line @kbn/eslint/no-restricted-paths |
There was a problem hiding this comment.
We may want to ask the Kibana Platform team to include this in src/core/server/mocks
There was a problem hiding this comment.
@paul-tavares I see you approved, thank you! But let's follow up and figure out a better way to do this. +++
…ed error handling in user manifest loop (#71269) (#71376) * Clean up matcher types * Rework promise and error-handling in ManifestManager * Write tests for ingest callback and ensure policy is returned when errors occur * More tests for ingest callback * Update tests * Fix tests Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* master: (78 commits) Bump lodash package version (elastic#71392) refactor: 💡 use allow-list in AppArch codebase (elastic#71400) improve bugfix 7198 test stability (elastic#71250) [Security Solution][Ingest Manager][Endpoint] Optional ingest manager (elastic#71198) [Metrics UI] Round metric threshold time buckets to nearest unit (elastic#71172) [Security Solution][Endpoint] Policy creation callback fixes + Improved error handling in user manifest loop (elastic#71269) [Security Solution] Allow to configure Event Renderers settings (elastic#69693) Fix a11y keyboard overlay (elastic#71214) [APM] UI text updates (elastic#71333) [Logs UI] Limit `extendDatemath` to valid ranges (elastic#71113) [SIEM] fix tooltip of notes (elastic#71342) address index templates feedback (elastic#71353) Upgrade EUI to v26.3.1 (elastic#70243) [build] Creates Linux aarch64 archive (elastic#69165) [SIEM][Detection Engine] Fixes skipped tests (elastic#71347) [SIEM][Detection Engine][Lists] Adds read_privileges route for lists and list items [kbn/optimizer] implement "requiredBundles" property of KP plugins (elastic#70911) [Security Solution][Exceptions] - Exceptions modal pt 2 (elastic#70886) [ML] DF Analytics: stop status polling when job stopped (elastic#71159) [SIEM][CASE] IBM Resilient Connector (elastic#66385) ...
Summary
Checklist
For maintainers