Skip to content

Adds event log for actions and alerting#45081

Merged
pmuellr merged 11 commits intoelastic:masterfrom
pmuellr:actions/event-log
Jan 21, 2020
Merged

Adds event log for actions and alerting#45081
pmuellr merged 11 commits intoelastic:masterfrom
pmuellr:actions/event-log

Conversation

@pmuellr
Copy link
Copy Markdown
Contributor

@pmuellr pmuellr commented Sep 6, 2019

Resolves #45083.

Adds a persistent event log (writes to a new ES index) for use in actions and alerting (and really anyone) to be able to query over events that have occurred in the past. Eg, action created, action executed, alert fired, etc.

@pmuellr pmuellr force-pushed the actions/event-log branch 3 times, most recently from d33fc17 to e19f2e7 Compare September 16, 2019 18:47
@mikecote mikecote assigned mikecote and pmuellr and unassigned mikecote Oct 8, 2019
@pmuellr pmuellr force-pushed the actions/event-log branch 4 times, most recently from 9b01b04 to b93d6ab Compare October 9, 2019 17:38
@pmuellr pmuellr force-pushed the actions/event-log branch 2 times, most recently from 759e3a8 to 001eca9 Compare October 18, 2019 16:08
@pmuellr pmuellr force-pushed the actions/event-log branch 2 times, most recently from de05b5d to e442f63 Compare November 18, 2019 16:21
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elastic elastic deleted a comment from elasticmachine Dec 11, 2019
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-stack-services (Team:Stack Services)

Copy link
Copy Markdown
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure of this 👍 Just added a few minor questions / comments but LGTM!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably see it in the merge conflicts but this should be @elastic/kibana-alerting-services now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a need for these variables to be configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esNames is needed - someone else calculates the names of all the things, based on the kibana index prefix. ilmExists won't be needed longterm, is today since you usually can't create an ILM policy, but in the future you always should be able to.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per conversation with mike, he was wondering specifically about whether

      number_of_shards: 1,
      number_of_replicas: 1,

Should somehow be configurable. I think we actually need to set replicas to 0 to start with. But beyond that, this seems like it would only be needed for "large" deployments that ended up with a lot of logging.

Probably worth creating a section in the README about "customizations that can be made by customers", including this, and updating the ILM policy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question with these variables.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is somewhat interesting. I we have to have SOME KIND of ILM policy to get everything set up, but it's a one-time thing. Once it's been created, we'll never touch it again. The customer is expected to customize if they want. So hardly seems worthy of configurability.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, users would have to go elsewhere to change it after initial setup.

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

History

  • 💚 Build #14761 succeeded 0753e466a7856ba5da615ce91c2290d116806dd5
  • 💚 Build #14690 succeeded cb8b6153e9aa8d250132d273cb3eec792fd82f5d
  • 💚 Build #14363 succeeded d03183cadbba9e0e0b11cd5cab7539223aab9f00

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr added Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// and removed DONOTUSETeam:Stack Services labels Dec 16, 2019
Copy link
Copy Markdown
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass, trying to understand how this works, but I lack alot of ES specific context, so ended up focusing on language things and things that impact my ability to understand a new code base. :)

I'll make another more detailed pass, once I understand the requirements a bit more, but for now here are a few thoughts- nothing blocking so far, just tweaks and "code gardening"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to me to use != null on a value that can never be null, but rather Error | undefined.

I understand why you extract the error out of the catch, but I can't help feel I'd rather have the eventLogger methods called in two places, then the intermediary variable- but I accept this is subjective. So leave that to you...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real-life example of logging events right now, there will be more :-)

So, we should figure out a pattern that everyone's happy with, since we'll see more of this, and this usage is likely to be copied.

I'm not really happy with the current flow, nor Gidi's suggestion (duplicate the calls), so will noodle on this a bit more. Perhaps there's some kind of helper we can build to make this easier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a logger method that will wrap a function execution to handle the timing/logging. That way we have this yucky code (but went with your style of "two places" for the event logging - error or not). No doubt we'll tweak this a bit more over time, but should make it easier to apply event logging at existing call sites.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the code I had in at the time of the review was not quite right anyway. An action could throw an exception, but it can also return a result which indicates an error occurred, with more data provided. So my "wrapper" for a function execution wasn't useful here. But ... I think I cleaned this up quite a bit in the current set of commits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh woops, this actually needed additional work, as the reviewed code didn't handle the case for errors returned from the action outside of an exception (via status = 'error'). So the wrapper wasn't useful (yet), so I deleted it, maybe we'll find a use later, but ... not sure - it was easy to write.

So that code needed a re-write, hopefully it's a lot clearer, and has no null checks :-)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script seems pretty useful as a generic mapping->typescript tool, should we extract it from here?

Also, should we add tests around it? My fear is that if it has a mistake in it, and it is used again in the future for a mapping change, we might not catch it in time and cause a faulty migration in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's quite ready for prime time, but of course am interested in seeing tools like these. It's not clear the best way to indicate the cardinality (one vs array) of a field, as an instance of "it ain't ready yet".

Tests would be a good idea though, if I can figure out some useful ones.

I also have to figure out a migration story for this stuff - somehow the plugin will have to update the mappings in the template when the shape changes, and also presumably roll over the current index to get a new index created with the new mappings.

Once that work is done, we'll probably be in a better place to figure out how to test it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a task item to the umbrella issue to add some tests for the generators. To some extent, these should be tested implicitly via other tests, but wouldn't hurt to add some more tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm following this correctly, there's an implicit link between the number 2 here, and the number of spaces on line 22.
This seems simple enough, but, another developer might miss that, remove one space (to cahnge the formatting), and suddenly non-indentation characters will be dropped when dedent is called.

An easy way to sidestep this would be to have an INDENTATION constant from which lines 22 and 26 infer their values, and so a future developer will be forced to see that links because it is now explicit.
Just a good way to avoid a future mistake 😬

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like the test case should specify what the expected behaviour is, no?

I generally think of tests not just as validations of the code, but also as documentation for developers who have to maintain the code. The trouble I have with this test case is that I don't know, coming in from outside, what is expected generally...

Same for the other two cases - I feel like it should say what is expected when, for example, ilm is on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a type around the IndexTemplate rather than any?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could type this for what we're currently generating, could be useful in the future for typo checks, thx!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are only used to supply the bodies of calls to ES, and for nothing else - eg

async createIndexTemplate(opts: AddTemplateOpts): Promise<void> {
const templateBody = getIndexTemplate(this.esContext.esNames, opts.ilmExists);
const addTemplateParams = {
create: true,
name: this.esContext.esNames.indexTemplate,
body: templateBody,
};
try {
await this.esContext.callEs('indices.putTemplate', addTemplateParams);
} catch (err) {
throw new Error(`error creating index template: ${err.message}`);
}
}

As such, unless there are full typings for these already, I don't think it's worth doing any typing on them. We could use a Record<string, any>, but I'm not sure I see the value in even that.

Worth a comment though!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this test should be broken down to specific tests per each action, specifying what the expected behaviour is, as at the moment, I'm finding it hard as someone dropping in without much context, to understand what they are meant to do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I think some comments would help, and it seems likely these tests could be table-ized to make the file significantly smaller

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Can we avoid == where possible? As devs can easily miss that it's not string equality and think it means null, rather than null | undefined | false | 0 | etc....

I think this would achieve the same?

event.ecs = event.ecs || {};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have optional chaining now which could make this a little cleaner :)

function getEventStart(event?: any): number | null {
    return event?.event?.start
        ? Date.parse(event.event.start)
        : null
    ;
}
console.log(getEventStart());
console.log(getEventStart({}));
console.log(getEventStart({ event: {}}));
console.log(getEventStart({ event: { start: '04 Dec 1995 00:12:00 GMT'}}));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we? :-) (in reference to recent CI failure complaining about optional chaining). I'm anxious to start using this, so will change if we can!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we have it in TS, but seems JS code breaks on this even though the Node version we use now supports it (I was updating a reference in another plugin which is still JS).
I think it's because we're using a transpiler that targets lower than the version of Node we use....I need to look into it, but in anycase, I've been using it in TS code without a hitch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, I've never seen this usage of argument assignment as a property before, I like it :).

For uniformity, we should adopt it throughout, instead of multiple this.X = X throughout our constructors... otherwise it'll be confusing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda surprised I used it here, I think perhaps I copied it out of another plugin? I do like it myself, but I think there are limited places you can use it, will see if there's some more places tho

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found one other constructor in this code that could be changed to use this, in x-pack/plugins/event_log/server/es/init.ts. All the other constructors in this code pass args via an object, and assign instance variables from the object - sometimes the args are destructured in the parameter list, sometimes not, but I don't think you can use this shortcut in those cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can avoid the !?
We won't know if a logical error causes us to call start before setup until run time... just wondering if we can restructure things to make it impossible to accidentally do that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to find a way to do this. It's tough since it's an instance variable which by definition cannot be initialized in the constructor. I think there's probably a pattern of creating yet another object for some indirection so we don't have to deal with this, but then you're adding code just to make the type-y safe-y bits more pure, which seems not right. I'll also check the existing NP plugins again as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've encountered the same issue with moving TM to NP....
Seems a general issue with using Classes as the base for things that can't be provided in the constructor.

Quite annoying and means we're very likely to encounter null pointer exceptions... worth considering a adopting Optionals throughout this code to avoid these nulls 😬

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these so there is an explicit check for the references not having been set, TS is then smart enough to know you don't need the ! afterwards.

eg, avoids having to use this.esContext!.initialize()

    if (!this.esContext) throw new Error('esContext not initialized');
    ...
    this.esContext.initialize()

@pmuellr pmuellr force-pushed the actions/event-log branch from 9abc912 to dad3ec7 Compare January 3, 2020 22:43
@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 3, 2020

For anyone that pulled the branch down from my repo: I just force-pushed to get up to latest master - didn't want to, but a merge master left me w/8500 changes (wha?), tried twice, same thing. I was actually going to push that merge, but the commit script (ts linting) was taking forever, decided to live with a rebase.

@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 17, 2020

@elasticmachine merge upstream

@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 20, 2020

@elasticmachine merge upstream

Copy link
Copy Markdown
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Lets get this meeeeerged!

@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 21, 2020

@elasticmachine merge upstream

@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 21, 2020

Gonna do one more merge from master (just started) since it's a day old, will merge upon success!

@pmuellr
Copy link
Copy Markdown
Contributor Author

pmuellr commented Jan 21, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backported Feature:Alerting release_note:enhancement Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v7.7.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add persistent event log for actions and alerting

5 participants