If a job's memory status goes to hard_limit then we stop modelling new entities in anomaly detection. However, categorization can still create new categories. If there are many new categories then this can cause a very significant overrun of the configured memory limit.
Some possibilities:
- When a job is in
hard_limit status no new categories should be created. The input document that could not be categorized should be discarded as it cannot take part in anomaly detection without a category. A new statistic in the model size stats should be incremented to record the number of documents discarded for this reason.
- When a job is in
soft_limit status, we stop recording examples for the category.
If a job's memory status goes to
hard_limitthen we stop modelling new entities in anomaly detection. However, categorization can still create new categories. If there are many new categories then this can cause a very significant overrun of the configured memory limit.Some possibilities:
hard_limitstatus no new categories should be created. The input document that could not be categorized should be discarded as it cannot take part in anomaly detection without a category. A new statistic in the model size stats should be incremented to record the number of documents discarded for this reason.soft_limitstatus, we stop recording examples for the category.