Conversation
| @@ -0,0 +1,314 @@ | |||
| package mil.nga.giat.geowave.analytic.javaspark; | |||
There was a problem hiding this comment.
any chance to make unit tests here?
| } | ||
|
|
||
| stopwatch.stop(); | ||
| LOGGER.warn("KMeans runner took " + stopwatch.getTimeString()); |
There was a problem hiding this comment.
warning messages aren't particularly appropriate or user friendly (can we do debug or info instead)
| "-i", | ||
| "--numIterations" | ||
| }, description = "The number of iterations to run") | ||
| private Integer numIterations = 20; |
There was a problem hiding this comment.
was 20 chosen as a default based on anything (ie. some trial and error, experimentation, or mllib recommendations) or is it more or less random?
|
|
||
| // Init the algorithm | ||
| KMeans kmeans = new KMeans(); | ||
| kmeans.setInitializationMode("kmeans||"); |
There was a problem hiding this comment.
there is a "random" initialization mode, and for kmeans|| there is initiatialization steps which defaults to 2 - both of these is probably the 99% case but I'm wondering if there's justification to have additional optional parameters for these or to keep it simple. Really, the only justification in my mind is likely if scaling to billions/trillions is a challenge re: performance with kmeans|| ... we should test this at some point on billions of geometries
There was a problem hiding this comment.
reminder - create a new issue for this
| "kmeans-hulls"); | ||
|
|
||
| stopwatch.stop(); | ||
| LOGGER.warn("KMeans hull generation took " + stopwatch.getTimeString()); |
There was a problem hiding this comment.
again, change from warning
|
|
||
| if (adapterId != null) { | ||
| ScaledTemporalRange scaledRange = new ScaledTemporalRange(); | ||
| String timeField = FeatureDataUtils.getFirstTimeField( |
There was a problem hiding this comment.
it seems there should be a couple possible ways to set this outside of using the first time field, the feature data adapter should always have a configured time field(s) that it uses for temporal indexing, and perhaps there's an option to override the indexed time
There was a problem hiding this comment.
actually it seems there's a disconnect between what is queried and this ScaledTemporalRange...the adapter ID doesn't have to be set (which makes sense) and it'll query all the objects in the table (speaking of which, there is potential for class cast exceptions if those objects aren't features, but we can just ignore that) but the features are then ignored maybe, perhaps unclear what the behavior would be...not sure if this is a corner case we should be thinking about, but it does seem like a disconnect
| }, description = "Bounding box for spatial query (LL-Lat LL-Lon UR-Lat UR-Lon)") | ||
| private String bbox = null; | ||
|
|
||
| @Parameter(names = { |
There was a problem hiding this comment.
this onle works on SimpleFeatures so this is probably more intuitively called something like "featureTypeName"
There was a problem hiding this comment.
and maybe in the description we could say that this is also the GeoWave adapter ID
| String bboxStr ) { | ||
| try { | ||
| // Expecting bbox in "LL-Lat LL-Lon UR-Lat UR-Lon" format | ||
| StringTokenizer tok = new StringTokenizer( |
There was a problem hiding this comment.
a bit of a nit, but really everywhere else we use longitude-first, and our bbox expressions through geotools CQL would be longitude-first. To be more consistent with everything else I suggest longitude first. Otherwise I also kind of like the idea of just 4 individual parameters for east, west, north, and south so no one needs to second guess the order (for me I'd have to print help every time I wanted to use this, to verify order)...anyways no big deal one way or the other, just make sure we stay with longitude first
There was a problem hiding this comment.
agree. we'll change it to x1 y1 x2 y2
|
|
||
| while (statsIt.hasNext()) { | ||
| final DataStatistics stats = statsIt.next(); | ||
| if (stats instanceof FeatureTimeRangeStatistics) { |
There was a problem hiding this comment.
there's a method public DataStatistics<?> getDataStatistics( ByteArrayId adapterId, ByteArrayId statisticsId, String... authorizations ) on statistics store that is intended to get you exactly the stat you want rather than this iteration...in this case there can be more than one time attribute so this can be incorrect (order of the statistics coming back is nondeterministic), but the statisticsId uses composeId with the attribute name (field ID)
There was a problem hiding this comment.
FeatureTimeRangeStatistics.composeId() gets you the statistics ID that can then be used within that call to DataStatisticsStore
| final DataStorePluginOptions dataStorePlugin, | ||
| final ByteArrayId adapterId ) { | ||
| final DataStatisticsStore statisticsStore = dataStorePlugin.createDataStatisticsStore(); | ||
| final CloseableIterator<DataStatistics<?>> statsIt = statisticsStore.getDataStatistics(adapterId); |
There was a problem hiding this comment.
there's a method public DataStatistics<?> getDataStatistics( ByteArrayId adapterId, ByteArrayId statisticsId, String... authorizations ) on statistics store that is intended to get you exactly the stat you want rather than this iteration...in this case there can be more than one geometry attribute so this can be incorrect (order of the statistics coming back is nondeterministic), but the statisticsId uses composeId with the attribute name (field ID)
| QueryOptions queryOptions = null; | ||
| if (adapterId != null) { | ||
| // Retrieve the adapters | ||
| CloseableIterator<DataAdapter<?>> adapterIt = inputDataStore.createAdapterStore().getAdapters(); |
There was a problem hiding this comment.
you can just setAdapterId() on queryoptions and not need to iterate through adapters to find a match
| DataAdapter adapter = adapterStore.getAdapter(adapterId); | ||
|
|
||
| if (adapter != null && adapter instanceof GeotoolsFeatureDataAdapter) { | ||
| GeotoolsFeatureDataAdapter gtAdapter = (GeotoolsFeatureDataAdapter) adapter; |
There was a problem hiding this comment.
adapter.getTimeDescriptors()
|
|
||
| while (statsIt.hasNext()) { | ||
| final DataStatistics stats = statsIt.next(); | ||
| if (stats instanceof FeatureTimeRangeStatistics) { |
There was a problem hiding this comment.
FeatureTimeRangeStatistics.composeId() gets you the statistics ID that can then be used within that call to DataStatisticsStore
No description provided.