[ML] Add audit warning for 1000 categories found early in job#51146
[ML] Add audit warning for 1000 categories found early in job#51146droberts195 merged 3 commits intoelastic:masterfrom
Conversation
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
|
Pinging @elastic/ml-core (:ml) |
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Looks good. Left a few minor suggestions.
| "Adjust the analysis_limits.model_memory_limit setting to ensure all data is analyzed"; | ||
| public static final String JOB_AUDIT_MEMORY_STATUS_HARD_LIMIT_PRE_7_2 = "Job memory status changed to hard_limit at {0}; adjust the " + | ||
| "analysis_limits.model_memory_limit setting to ensure all data is analyzed"; | ||
| public static final String JOB_AUDIT_EXCESSIVE_EARLY_CATEGORIES = "{0} categories observed in the first [{1}] buckets." + |
There was a problem hiding this comment.
We have square brackets about the second number in this message but not the first. Should we make it consistent?
There was a problem hiding this comment.
The reason I didn't put the first one in square brackets is that it won't vary (at least for a particular version of the product). The first number will always be 1000 unless somebody changes the code, whereas the second number can vary between different occurrences of the audit message.
| private volatile boolean failed; | ||
| private int bucketCount; // only used from the process() thread, so doesn't need to be volatile | ||
| private long priorRunsBucketCount; | ||
| private int currentRunBucketCount; // only used from the process() thread, so doesn't need to be volatile |
There was a problem hiding this comment.
should we take this chance and make this a long as well? It seems like something that could hit overflow problems.
There was a problem hiding this comment.
With a 1 second bucket span it will take 68 years to overflow, so pretty unlikely. But maybe I can avoid some casting by making it long, which will make the code cleaner. If so I'll change it.
There was a problem hiding this comment.
There wasn't much casting but I changed it anyway just so both variables have the same type.
| this.bulkResultsPersister = persister.bulkPersisterBuilder(jobId, this::isAlive); | ||
| this.timingStatsReporter = new TimingStatsReporter(timingStats, bulkResultsPersister); | ||
| this.deleteInterimRequired = true; | ||
| this.priorRunsBucketCount = timingStats.getBucketCount(); |
There was a problem hiding this comment.
I was wondering where we'd get this from but it's cool we already pass it in!
| @@ -225,6 +231,18 @@ void processResult(AutodetectResult result) { | |||
| CategoryDefinition categoryDefinition = result.getCategoryDefinition(); | |||
| if (categoryDefinition != null) { | |||
| persister.persistCategoryDefinition(categoryDefinition, this::isAlive); | |||
There was a problem hiding this comment.
Now that all this has become more complex than a single call to the persister, I'd be tempted to extract this in a handleCategoryDefinition method.
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
…c#51146) If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.
In elastic#51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by elastic#51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes elastic#50749
…2195) In #51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by #51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes #50749
…2195) In #51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by #51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes #50749
If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created. (This will cause a yellow warning triangle in the
ML UI's jobs list.)
Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
Relates #50749