[ML] Improve the accuracy of model memory control#122
[ML] Improve the accuracy of model memory control#122tveasey merged 7 commits intoelastic:masterfrom
Conversation
a function of elapsed time not buckets and assert on memory used vs target in limit test.
764246f to
f927102
Compare
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Looks good! Left a few minor comments.
| CDataGatherer(bool isForPersistence, const CDataGatherer& other); | ||
|
|
||
| ~CDataGatherer(); | ||
| CDataGatherer(const CDataGatherer&) = delete; |
There was a problem hiding this comment.
That's a cool language feature!
| result += core::CMemory::dynamicSize(window); | ||
| // The 0.3 is a rule-of-thumb estimate of the worst case | ||
| // compression ratio we achieve on the test state. | ||
| result += 0.3 * core::CMemory::dynamicSize(window); |
There was a problem hiding this comment.
Don't we only compress on persist?
There was a problem hiding this comment.
In fact, no. As of #100, we compress the raw bytes of some this object we actually hold in memory.
There was a problem hiding this comment.
Ah, I recall you saying we'd do that but I missed the fact it's already in. Cool.
lib/model/CAnomalyDetectorModel.cc
Outdated
| // if we were going to persist the data gatherer from within this class. | ||
| // We don't, so that's OK, but the next issue is that another thread will be | ||
| // modifying the data gatherer m_DataGatherer points to whilst this object | ||
| // modifying the data gatherer m_DataGatherer points too whilst this object |
There was a problem hiding this comment.
This seems like a typo
There was a problem hiding this comment.
This was a typo before, the to[o] in this context means as well which is too rather than to.
There was a problem hiding this comment.
It's still kind of a weird sentence but fair enough!
There was a problem hiding this comment.
Wait a sec, I'm sorry you're actually completely right. I'd somehow (repeatedly) misread!
|
|
||
| std::size_t CEventRateModel::memoryUsage() const { | ||
| return this->CIndividualModel::memoryUsage() + | ||
| core::CMemory::dynamicSize(m_InterimBucketCorrector); |
There was a problem hiding this comment.
Where is the memory for the interim bucket corrected accounted for now?
There was a problem hiding this comment.
It is getting accounted for in this->CEventRateModel::computeMemoryUsage(). This is all tied up with the model memory estimation process we use, i.e. measuring the computed memory usage periodically then using a regression on those measurements. The extra memory used is effectively accounted for in that regressions' parameters.
| // will be the overwhelmingly common source of additional memory | ||
| // so the model memory should be accurate (on average) in this | ||
| // time frame. | ||
| double scale{1.0 - static_cast<double>(elapsedTime) / |
There was a problem hiding this comment.
It seems the scale is fixed for a given bucket span. Should we consider setting it on construction?
There was a problem hiding this comment.
We could do given current usage. Although in the current usage this is called at the bucket frequency this doesn't feel like it has to be the case and we'd lose the ability to do this if that changed. Also, I think if we move this away from the API to adjust the margin it hides an important part of the implementation from the caller. If I changed how this function is called I think it would be easy to overlook that I need to change the value computed in the constructor. I think on balance I prefer to keep as is for that reason. What do you think?
There was a problem hiding this comment.
I agree. This gives us flexibility to change the way it works if needed in the future.
…31289) To avoid temporary failures, this also disables these tests until elastic/ml-cpp#122 is committed.
…31289) To avoid temporary failures, this also disables these tests until elastic/ml-cpp#122 is committed.
This makes a number of changes targeting our current memory control functionality. Specifically, these are:
CAnomalyDetectorModelobject and one held by aCAnomalyDetectorobject. However, the memory was only accounted for byCAnomalyDetectorModel. Since the reference count is two we were effectively halving its accounted memory. I've changed theCResourceMonitorto work in terms ofCAnomalyDetectorobjects and now account for both references to the data gatherer. This incidentally also means we account for the static size ofCAnomalyDetectorwhich was also be lost by the resource monitor. The impact can be large, especially for population models, for example inCAnomalyJobLimitTest::testModelledEntityCountForFixedMemoryLimitwe model 45% fewer over field values as a result.CDataGatherer. For example, we have access to the partition field name via the search key so don't need a copy in this class as well. (Note that this also reduces the number of parameters to constructor, which had quite a lot of fallout on the unit tests.)As a result we now have accurate control of the true memory (I measured consistently within 5% in a unit test with a variety of realistic data characteristics).
Note I factored out the test changes, which are mainly fallout from the
CDataGathererconstructor signature change and some tidy ups, from the functional code changes.This affects results for memory limited jobs only.