GEOWAVE-143 by dcy2003 · Pull Request #397 · locationtech/geowave

dcy2003 · 2015-05-20T12:57:43Z

Initial stab at #143 including Integration Test to demonstrate approach and results. Feedback welcome!

Brief summary:

I added a new query method to the DataStore that allows one to provide a desired subset of attributes to be included in query results. Research into available iterators to accomplish this goal within the tablets led me to ColumnQualifierFilter and this discussion, which recommends leveraging scanner.fetchColumn(Text colFam, Text colQual) for this purpose.

The integration test ingests sample data for the 10 largest cities in the US (city, state, population, land area, coordinates) and demonstrates 3 queries against this data set. Each query constructs a bounding box designed to match 3 cities from Texas. One query leverages an existing query method to return all attributes. The other two queries leverage the new query method to request different subsets of attributes asserting for the correct presence (null vs. not-null) of each attribute in the results.

Please let me know if I missed or misinterpreted anything...

coveralls · 2015-05-20T15:12:19Z

Coverage decreased (-0.19%) to 62.43% when pulling 0ade84d on GEOWAVE-143 into 440257a on master.

rfecher · 2015-05-20T15:41:18Z

arguably, we should start adding things like this subsetting, as well as possibly the limit and maybe other things to be contained in the query object so we limit the number of methods in DataStore.

also, "attributes" are specific to simplefeature data, I think more generically you are filtering the "fieldIds"

I agree with the sentiment that DataStore has grown to a large number of query methods and we could look into encapsulating some items in the query object itself to limit the number of methods. Would you like me to (1) refactor "attributes" to "fieldIDs" and open a separate issue to limit the number of methods in the DataStore by moving some of this stuff to the query object or (2) address those concerns as part of this ticket as well?

My vote would be separate ticket because I could see that refactor touching a lot, but it's your call...

What are the thoughts on just adding a "QueryOptions" object - that way we don't break the API as new bits get added.
If that answers the mail, then I would say move fieldIds to that (since it's a pretty minor change).

If we need to do a bigger re-factoring, then i'm fine deferring it to a new ticket.

hmm, at first, the query methods were intended to be intuitive and expose things that are sort of expected in general with query (such as the common SQL syntax). So 'limit' was a parameter rather than hidden in an object...

Now we're getting to the point of too many options and therefore too many query methods...I agree with @chrisbennight that it may make sense to have a set of concrete query options that could apply to any query and then have a separate interface/abstraction that handles the query body which would change based on the type of index (spatial, spatial-temporal, or whatever else). The question is would we rather have QueryOptions as a parameter or have a query object that contains the concrete set of options and the query body? The only argument that would make me lean to the latter is that often no limit or field subsetting are used, so you could instantiate the query object with the query body and the default options would not need to be specified, but if it was part of the method signature you'd have to instantiate some sort of query options or pass null. Not a big deal, but the datastore is pretty much at the core of everything. Thoughts?

coveralls · 2015-05-20T17:12:37Z

Coverage decreased (-0.06%) to 62.56% when pulling c77484a on GEOWAVE-143 into 440257a on master.

coveralls · 2015-05-20T18:41:15Z

Coverage decreased (-0.06%) to 62.56% when pulling c77484a on GEOWAVE-143 into 440257a on master.

coveralls · 2015-05-20T20:14:01Z

Coverage decreased (-0.06%) to 62.58% when pulling c036bfb on GEOWAVE-143 into 08b66af on master.

chrisbennight · 2015-05-21T01:15:21Z

@rfecher
Thoughts on going this way vs. inheriting from one of the GeoWaveServices classes?

@dcy2003
My initial inclination would be to move to inherit from one of the existing service environments (maybe GeoWaveTestEnvironment ) - that's the pattern the other integration tests use. The though behind it would be to amortize the cost of setting up/configuration a miniaccumulocluster.
Caveated - this is just a quick off the cuff comment, so there may be a good reason not to do this - but absent that it's probably the pattern we would stick to. Basically you should just mostly be able to get rid of declaring your own accumulo member, tempdir, user, pass, and use instance variables. otherwise the rest of the code should be about the same. ( see: https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveBasicIT.java )

Also you might need to register this test here:
https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveITSuite.java

agreed on inheriting from GeoWaveTestEnvironment, as @chrisbennight mentioned when you run the test suite it will run a single mini accumulo cluster rather than starting one per test, just try to be good about clearing out any data ingested after the individual test completes so your test has no impact on subsequent tests

dcy2003 · 2015-05-21T14:29:47Z

@chrisbennight @rfecher thanks for the feedback and recommendations!

I modified the integration test to extend GeoWaveTestEnvironment as suggested, and this did in fact simplify things a decent bit. Thanks for pointing the cost savings of extending that class to avoid spinning up an unnecessary MiniAccumuloCluster.

With regards to the fieldIds, I took @chrisbennight suggestion (for now) and simply encapsulated them in a QueryOptions object. The end result is a single new public-facing method in the DataStore that accepts QueryOptions. If we want to move in a different direction per the alternative that @rfecher mentioned, I'm happy to tackle that but that would seem to warrant a separate issue due to the potential impact to existing code. Let me know if you guys have a preference on which way to go there.

Lastly, I squashed my commits down to a single commit and improved the wording of the commit message since we are now auto-generating a changelog. Waiting on Travis...

coveralls · 2015-05-21T15:34:19Z

Coverage increased (+0.03%) to 62.61% when pulling 0e648e8 on GEOWAVE-143 into e197482 on master.

GEOWAVE-143

rfecher reviewed May 20, 2015
View reviewed changes

dcy2003 force-pushed the GEOWAVE-143 branch from c77484a to c036bfb Compare May 20, 2015 18:42

chrisbennight reviewed May 21, 2015
View reviewed changes

Added support for tablets to return a subset of fieldIds

0e648e8

dcy2003 force-pushed the GEOWAVE-143 branch from c036bfb to 0e648e8 Compare May 21, 2015 14:14

chrisbennight added a commit that referenced this pull request May 21, 2015

Merge pull request #397 from ngageoint/GEOWAVE-143

4bc19ad

GEOWAVE-143

chrisbennight merged commit 4bc19ad into master May 21, 2015

chrisbennight deleted the GEOWAVE-143 branch May 21, 2015 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEOWAVE-143#397

GEOWAVE-143#397
chrisbennight merged 1 commit intomasterfrom
GEOWAVE-143

dcy2003 commented May 20, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

rfecher May 20, 2015

Uh oh!

dcy2003 May 20, 2015

Uh oh!

chrisbennight May 21, 2015

Uh oh!

rfecher May 21, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

chrisbennight May 21, 2015

Uh oh!

rfecher May 21, 2015

Uh oh!

dcy2003 commented May 21, 2015

Uh oh!

coveralls commented May 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dcy2003 commented May 20, 2015

Brief summary:

Uh oh!

coveralls commented May 20, 2015

Uh oh!

rfecher May 20, 2015

Choose a reason for hiding this comment

Uh oh!

dcy2003 May 20, 2015

Choose a reason for hiding this comment

Uh oh!

chrisbennight May 21, 2015

Choose a reason for hiding this comment

Uh oh!

rfecher May 21, 2015

Choose a reason for hiding this comment

Uh oh!

coveralls commented May 20, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

coveralls commented May 20, 2015

Uh oh!

chrisbennight May 21, 2015

Choose a reason for hiding this comment

Uh oh!

rfecher May 21, 2015

Choose a reason for hiding this comment

Uh oh!

dcy2003 commented May 21, 2015

Uh oh!

coveralls commented May 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants