Skip to content

Telemetry for manual rule run#186364

Merged
nkhristinin merged 18 commits intoelastic:mainfrom
nkhristinin:telemetry-manual-rule-run
Jul 2, 2024
Merged

Telemetry for manual rule run#186364
nkhristinin merged 18 commits intoelastic:mainfrom
nkhristinin:telemetry-manual-rule-run

Conversation

@nkhristinin
Copy link
Copy Markdown
Contributor

@nkhristinin nkhristinin commented Jun 18, 2024

Summary

report following events:

  • open modal window for manual rule run
  • execute manual rule run + save time range in ms
  • cancel backfill job
  • filter in event log by run type
  • show source event date range

Epic - https://github.com/elastic/security-team/issues/2840

How to test

enable feature flag - manualRuleRunEnabled

You can see feature demo here - #184500

Check that events appears here after some time - https://telemetry-v2-staging.elastic.dev/s/securitysolution/app/r/s/7YYlg

@nkhristinin nkhristinin requested review from a team as code owners June 18, 2024 11:41
@nkhristinin nkhristinin requested review from rylnd and xcrzx June 18, 2024 11:41
@nkhristinin nkhristinin added the release_note:skip Skip the PR/issue when compiling release notes label Jun 18, 2024
@yctercero
Copy link
Copy Markdown
Contributor

@nkhristinin Great to see telemetry added! Could you include steps on how to test the PR and link to the manual rule runs epic? Since this is a new feature, reviewers may not know what it's referencing.

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

Copy link
Copy Markdown
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nkhristinin, thanks for the PR! Leaving a couple of minor suggestions and a few questions, please take a look.

});

const handleShowSourceEventTimeRange = useCallback(
(e) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide the type for the event argument: e: EuiSwitchEvent, otherwise it will be any.

const handleSelectionChange = useCallback(
(types: RuleRunType[]) => {
onChange(types);
telemetry.reportEventLogFilterByRunType({ runType: types?.join(',') });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the runType argument be an array of strings instead of a string of comma-separated values? How will that look in a dashboard if someone wants to display a chart of the most used run types?

Comment on lines 24 to +27
const results = await mutateAsync(options);
telemetry.reportManualRuleRunExecute({
rangeInMs: options.timeRange.endDate.diff(options.timeRange.startDate),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only capture successful manual executions, do we need to capture failures as well?

@nkhristinin nkhristinin requested a review from xcrzx June 21, 2024 13:24
@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

OnboardingHubStepLinkClicked = 'Onboarding Hub Step Link Clicked',
ManualRuleRunOpenModal = 'Manual Rule Run Open Modal',
ManualRuleRunExecute = 'Manual Rule Run Execute',
ManualRuleRunCancelJob = 'Manual Rule Cancel Job',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without "Run" this description is sorta ambiguous (or less precise than it could be):

Suggested change
ManualRuleRunCancelJob = 'Manual Rule Cancel Job',
ManualRuleRunCancelJob = 'Manual Rule Run Cancel Job',

Copy link
Copy Markdown
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places where telemetry is being added here, but I only see a single unit test asserting that any of this telemetry code is being executed. An integration test would be great to see (interact with app, observe telemetry being collected), but even a few more unit tests would help ensure this behavior is both documented and verified.

ManualRuleRunExecute = 'Manual Rule Run Execute',
ManualRuleRunCancelJob = 'Manual Rule Cancel Job',
EventLogFilterByRunType = 'Event Log Filter By Run Type',
EventLogShowSourceEventDateRange = 'Event Log Show Source Event Date Range',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just strings, right? Is it possible/preferable to add some more structure, here? There are a lot of descriptive words here and it's hard to untangle exactly what's being expressed.

Suggested change
EventLogShowSourceEventDateRange = 'Event Log Show Source Event Date Range',
EventLogShowSourceEventDateRange = 'Event Log -> Show Source Event -> Date Range',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting that we add some kind of hierarchical delimiter to all of these descriptions, not just the one 😅 . I'd value consistency over improving a single item, here, so IMO I'd revert this change.

Copy link
Copy Markdown
Contributor

@jpdjere jpdjere Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping in here cause it took me some time to understand what 'Event Log -> Show Source Event -> Date Range' meant. But if I understand correctly, it's the action of clicking on the Show Source Event Date Range on the Execution Results tab:

image

So yes, I would either:

  • revert to the version without the arrows: 'Event Log Show Source Event Date Range'
  • delimit only Event log from the rest. Something like: 'Event Log -> Show Source Event Date Range' or 'Event Log | Show Source Event Date Range'

Maybe even adding the suffix Toggle Switch at the end: 'Event Log | Show Source Event Date Range Toggle Switch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same for EventLogFilterByRunType

totalTasks: {
type: 'integer',
_meta: {
description: 'Total number of tasks at the moment of job cancellation',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "task" in this context? What is a "job," for that matter?

rangeInMs: {
type: 'integer',
_meta: {
description: 'Execute manual rule run with range in milliseconds',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be describing rangeInMs exclusively, or the full event?

Suggested change
description: 'Execute manual rule run with range in milliseconds',
description: 'The time range (expressed in milliseconds) against which the manual rule run was executed',

status: {
type: 'keyword',
_meta: {
description: 'Execute manual rule run status (success|error)',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly:

Suggested change
description: 'Execute manual rule run status (success|error)',
description: 'Outcome state of the manual rule run. Possible values are "success" and "error"',

@nkhristinin nkhristinin requested a review from e40pud June 28, 2024 09:21
@nkhristinin nkhristinin requested a review from rylnd June 28, 2024 14:45
Copy link
Copy Markdown
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great! The additional test coverage is much appreciated.

@nkhristinin
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Copy Markdown

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 5583 5587 +4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 15.5MB 15.6MB +999.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 83.7KB 85.9KB +2.2KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@banderror banderror requested review from jpdjere and removed request for xcrzx July 2, 2024 09:35
Copy link
Copy Markdown
Contributor

@jpdjere jpdjere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all comments left by @xcrzx

I left one more comment regarding the TelemetryEventTypes enum, but overall LGTM.

@nkhristinin nkhristinin merged commit 32e7bf9 into elastic:main Jul 2, 2024
@kibanamachine kibanamachine added v8.15.0 backport:skip This PR does not require backporting labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants