[ML] Use off-heap storage for forecasting large models#22
[ML] Use off-heap storage for forecasting large models#22hendrikmuhs wants to merge 7 commits intoelastic:masterfrom hendrikmuhs:forecast-scale
Conversation
Re-factor minimumSeasonalVarianceScale to make it available as method. This is required in order to restore models from a file stream, more precisely it is required as a parameter for maths::CModelParams which is required for maths::SModelRestoreParams
lib/api/CForecastRunner.cc
Outdated
|
|
||
|
|
||
| boost::filesystem::create_directories(temporaryFolder); | ||
| LOG_INFO("Persisting to: " << temporaryFolder.string()); |
tveasey
left a comment
There was a problem hiding this comment.
This looks pretty good on a quick first pass. I'll have another slightly more careful read through on Monday. I left some minor suggestions.
include/api/CForecastRunner.h
Outdated
| //! max memory allowed to use for forecast models persisting to disk | ||
| static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB | ||
|
|
||
| // Note: This value us lower than on X-pack side to prevent side-effects, |
include/api/CForecastRunner.h
Outdated
| static const size_t MAX_FORECAST_MODEL_MEMORY = 20971520; // 20MB | ||
|
|
||
| //! max memory allowed to use for forecast models persisting to disk | ||
| static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB |
There was a problem hiding this comment.
What's the rationale for this being so much less than MIN_FORECAST_AVAILABLE_DISK_SPACE?
There was a problem hiding this comment.
I do not want to fill the disk no matter how large the models are. So I keep some room.
Note that MAX_FORECAST_MODEL_PERSISTANCE_MEMORY is actually the size in memory, not persistence. I haven't made tests yet, but I assume that writing it as json has some expansion factor. So the size on disk will be higher than the size in memory. I do not think we need to exactly calculate that but only make a worst case assumption.
500MB is totally educated guess, I wonder if we ever have forecast models of that size.
I will add some more code docs.
There was a problem hiding this comment.
Are I see, that makes sense, ok cool plus we can always revisit if we think this is too conservative. +1 to a little more documentation on this.
include/core/RestoreMacros.h
Outdated
| return false; \ | ||
| } \ | ||
| target = enumtype(value); \ | ||
| restoreSuccess = true; \ |
There was a problem hiding this comment.
Also, I wonder whether we should check to see if we actually restored the enum in this macro. I'm inclined to say we shouldn't and ditch restoreSuccess.
We generally don't bother to check if the field was missing altogether from the state document. Sometimes we can cope with default initialising a new field and it doesn't have to be present in the original state document, so this cuts down usability.
If you want to check what fields are present in the state document you can always fill in an unordered string set in the loop which does the restoring, i.e.
boost::unordered_set<std::string> restored;
do
{
restored.insert(traverser.name());
...
}
while (traverser.next());
if (restored.count(MY_FIELD_I_WANT_TO_BE_PRESENT_TAG) > 0 && ...)
{
return false;
}
There was a problem hiding this comment.
FWIW: I first used a E_UNKNOWN state in the enum but that had to many side-effects on the existing code base. Other parts art not affected as I can initialize them to something invalid and check whether it has been set. So this only applies to enums, while enums are only used here.
For me the non-existence of the field is a bug.
I can use the suggested trick, but this looks more complicated to me than this 1 bool.
There was a problem hiding this comment.
It definitely is more complicated for this case, but I want this macro to be as widely useable as possible.
Suppose we introduce a new enum member to an existing class, but feel we can happily get away with default initialising it to something. Then we don't want this extra "check a value exists". Now we could always go ahead and just ignore the bool, but that feels wrong.
That is the rationale for this suggestion, make the macro as general and simple as possible and if we need special handling deal with that elsewhere. As it is this feels like odd functionality out to have just this one case check for existence of a field.
Another option would be to have some general functionality which allows us to force that a list field values is actually read in a given loop. Something like registerManditoryFields on the traverser?
| static const std::string FEATURE_TAG; | ||
| static const std::string DATA_TYPE_TAG; | ||
| static const std::string MODEL_TAG; | ||
| static const std::string BY_FIELD_VALUE_TAG; |
There was a problem hiding this comment.
As per my usual comment, can we move to cc? Looks like these (at least FORECAST_MODEL_PERSIST_TAG) are only referenced in cc, so just defining them in unnamed namespace cuts down code and cleans up class definition.
lib/api/CForecastRunner.cc
Outdated
| series.s_MinimumSeasonalVarianceScale, | ||
| series.s_ToForecastPersisted)); | ||
| // if in memory models are empty check if we can load persisted ones | ||
| if (series.s_ToForecast.empty() && modelRestore) |
There was a problem hiding this comment.
Surely modelRestore can't be null if we get here (although see below).
lib/api/CForecastRunner.cc
Outdated
| } | ||
|
|
||
| // if in memory models are empty check if we can load persisted ones | ||
| if (series.s_ToForecast.empty() && modelRestore) |
There was a problem hiding this comment.
Could we move into lambda to avoid code duplication also maybe modelRestore != nullptr.
There was a problem hiding this comment.
that would be modelRestore.get() != nullptr, hmm operator bool is a valid C++ construct on auto pointers. I do not mind, but we should make a note of it and be consistent everywhere.
There was a problem hiding this comment.
You don't need the .get(). There's an overload of the equality operator where rhs is nullptr_t: http://www.cplusplus.com/reference/memory/unique_ptr/operators/
There was a problem hiding this comment.
We seem to generally be moving towards making the condition as explicit as possible, even using == false verses !, on grounds that it is easier to see the intention at a glance. Using != nullptr seems more consistent with that philosophy.
There was a problem hiding this comment.
I am fine with that, lucene enforces this style as well.
lib/api/CForecastRunner.cc
Outdated
| totalMemoryUsage += prerequisites.s_MemoryUsageForDetector; | ||
|
|
||
| if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY) | ||
| if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY && forecastJob.s_TemporaryFolder.empty ()) |
There was a problem hiding this comment.
There are some non-standard spaces between function name and argument lists in a few places, such as here. I guess once we've agreed on auto-formatting tooling this goes away, but would be nice not to introduce anything we won't accept in the meantime.
lib/api/CForecastRunner.cc
Outdated
| { | ||
| boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder); | ||
|
|
||
| if (!sufficientAvailableDiskSpace(temporaryFolder)) |
| modelParams.s_DecayRate, | ||
| modelParams.s_BucketLength, | ||
| modelParams.s_ComponentSize}, | ||
| modelParams.distributionRestoreParams(dataType)}; |
There was a problem hiding this comment.
Not something you introduced, but it feels like it would be nice to have a method on SModelParams to get this object. Also you've mixed brackets and braces and indentation levels, which makes this hard to read.
droberts195
left a comment
There was a problem hiding this comment.
I left a few minor comments.
There are also some places where you have { on the same line as a class declaration or if statement. I guess if we're definitely going to do an automated reformat soon it doesn't matter though...
include/core/RestoreMacros.h
Outdated
| return false; \ | ||
| } \ | ||
| target = enumtype(value); \ | ||
| restoreSuccess = true; \ |
include/api/CForecastRunner.h
Outdated
| // Note: This value us lower than on X-pack side to prevent side-effects, | ||
| // if you change this value also change the limit on X-pack side. | ||
| //! minimum disk space required for disk persistence | ||
| static const size_t MIN_FORECAST_AVAILABLE_DISK_SPACE = 4294967296; // 4GB |
There was a problem hiding this comment.
I think in C++11 a number with no size suffix can be int, long or long long, whichever is the smallest it will fit into. But in C++98 it was only int or long. Visual Studio 2013 doesn't implement the complete C++11 standard, so this might cause an error when backporting, because long is still 32 bits on Windows. You could add ull to the number now or just watch out for a compilation problem on Windows when backporting to 6.x - Visual Studio 2013 might be OK with it as-is.
lib/api/CForecastRunner.cc
Outdated
| << " MB), using disk."); | ||
|
|
||
|
|
||
| boost::filesystem::create_directories(temporaryFolder); |
There was a problem hiding this comment.
I think this should be wrapped in a try/catch in case it throws an unexpected exception.
lib/api/CForecastRunner.cc
Outdated
| if (!forecastJob.s_TemporaryFolder.empty()) | ||
| { | ||
| boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder); | ||
| boost::filesystem::remove_all(temporaryFolder); |
There was a problem hiding this comment.
Should be wrapped in try / catch. If it's a best effort thing then we can ignore the exception.
lib/api/CForecastRunner.cc
Outdated
|
|
||
| if (errorCode) | ||
| { | ||
| LOG_ERROR("Failed to retrieve disk information for " << path << " error " << errorCode.value()); |
There was a problem hiding this comment.
It would be good to include errorCode.message() in the output as well as the number. Otherwise it will be like the moon landings where users faced with "error 1202" have to contact mission control to find out what's actually gone wrong.
| LOG_DEBUG("| CForecastModelPersistTest::testPersistAndRestore |"); | ||
| LOG_DEBUG("+----------------------------------------------------+"); | ||
|
|
||
| core_t::TTime bucketLength{1800}; |
| restoredModel->checksum(42)); | ||
|
|
||
| CPPUNIT_ASSERT(!restorer.nextModel(restoredModel, restoredFeature, restoredByFieldValue)); | ||
| std::remove(persistedModels.c_str()); |
There was a problem hiding this comment.
Does CRestore only close its file in its destructor? If so, you may find this remove() fails on Windows because the file is still open. (Linux will let you delete an open file but Windows won't unless it was explicitly opened with share mode delete.) In other tests that have this problem we've put most of the code in an inner block so the appropriate destructor gets called before removing files.
tveasey
left a comment
There was a problem hiding this comment.
I'm happy with this now although still marked as WIP, but I think this is fine to merge.
|
superseded by #36 |
This implements the C++ side of forecast persistence. An additional parameter allows the forecast runner to persist models on disk for temporary purposes. Models are loaded back into memory one by one.
For models smaller than the current limit of 20MB nothing changes.
X-Pack part: https://github.com/elastic/x-pack-elasticsearch/pull/4134