store individual histogram statistic per partition by rfecher · Pull Request #1339 · locationtech/geowave

rfecher · 2018-05-31T21:00:33Z

No description provided.

coveralls · 2018-06-01T00:16:59Z

Coverage increased (+0.3%) to 47.233% when pulling 5d61214 on rfecher:partition-histo-rebase into b181e64 on locationtech:master.

jdgarrett

A few minor things, overall it looks good to me.

jdgarrett · 2018-06-05T00:24:19Z

 						dataAdapter);
 			}
+			//this is done primarily to ensure stats merging is enabled before the distributed ingest
+			if (dataStoreOptions.getFactoryOptions().getStoreOptions().isPersistDataStatistics()) {


This can be moved outside of the if/else block to avoid code duplication.

jdgarrett · 2018-06-05T01:03:38Z

@@ -0,0 +1,7 @@
+package mil.nga.giat.geowave.core.store.adapter.statistics;
+
+public interface DataStatisticsSet<T> extends


Javadoc comment with brief explanation of the need for this class.

jdgarrett · 2018-06-05T01:05:08Z

+import net.sf.json.JSONException;
+import net.sf.json.JSONObject;
+
+public class PartitionStatistics<T> extends


Javadoc comment with a brief explanation of this class.

jdgarrett · 2018-06-05T01:31:01Z

+			else {
+				// otherwise we keep a map per partition to call ingested once
+				// per partition on each partition's individual statistic
+				Map<ByteArrayId, List<GeoWaveRow>> rowsPerPartition = new HashMap<>();


Wouldn't it be more efficient to just call entryIngested once for every row, rather than building up this intermediate Map of Lists? It would also eliminate the need to have separate logic for 1 row vs many. I'm not sure what work the map is trying to avoid, since the histograms are cached in histogramPerPartition anyways.

it was collecting rows of the same partition key to delegate the array of matching partition keys to the histogram statistic's callback rather than single row at a time (in practice, they're often going to have the same partition key). I don't really know if there's an efficiency gain for that, and I agree, it minimizes complexity to simply delegate a row at a time. I think I was thinking more in terms of maintaining the typical contract of calling the callback once per entry rather than once per row. Some stats are more concerned about entries (for example, the count counts entries, not rows) and some work on the rows themselves while some may use both in context. For example, if the individual histogram statistic decided to normalize weight contribution by the number of rows to keep each entry's contribution constant it'd be able to (so if a single entry took up 5 rows, it could add 1/5 the weight per row). That's not the way it works now, and can buy that we can just minimize some complexity here, but yeah, just trying to stick to the contract in case something changes with how we keep histograms.

rfecher force-pushed the partition-histo-rebase branch 4 times, most recently from b010555 to 5191962 Compare June 4, 2018 15:38

store individual histogram statistic per partition

156d01e

rfecher force-pushed the partition-histo-rebase branch from 5191962 to 156d01e Compare June 4, 2018 16:36

added a partition statistic test

8e80d50

jdgarrett reviewed Jun 5, 2018

View reviewed changes

updates from review

5d61214

rfecher merged commit f1d32f1 into locationtech:master Jun 5, 2018

rfecher deleted the partition-histo-rebase branch June 5, 2018 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store individual histogram statistic per partition#1339

store individual histogram statistic per partition#1339
rfecher merged 3 commits intolocationtech:masterfrom
rfecher:partition-histo-rebase

rfecher commented May 31, 2018

Uh oh!

coveralls commented Jun 1, 2018 •

edited

Loading

Uh oh!

jdgarrett left a comment

Uh oh!

jdgarrett Jun 5, 2018

Uh oh!

jdgarrett Jun 5, 2018

Uh oh!

jdgarrett Jun 5, 2018

Uh oh!

jdgarrett Jun 5, 2018

Uh oh!

rfecher Jun 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,7 @@
		package mil.nga.giat.geowave.core.store.adapter.statistics;

		public interface DataStatisticsSet<T> extends

Conversation

rfecher commented May 31, 2018

Uh oh!

coveralls commented Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdgarrett left a comment

Choose a reason for hiding this comment

Uh oh!

jdgarrett Jun 5, 2018

Choose a reason for hiding this comment

Uh oh!

jdgarrett Jun 5, 2018

Choose a reason for hiding this comment

Uh oh!

jdgarrett Jun 5, 2018

Choose a reason for hiding this comment

Uh oh!

jdgarrett Jun 5, 2018

Choose a reason for hiding this comment

Uh oh!

rfecher Jun 5, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Jun 1, 2018 •

edited

Loading