Skip to content

GEOWAVE-143#397

Merged
chrisbennight merged 1 commit intomasterfrom
GEOWAVE-143
May 21, 2015
Merged

GEOWAVE-143#397
chrisbennight merged 1 commit intomasterfrom
GEOWAVE-143

Conversation

@dcy2003
Copy link
Copy Markdown
Contributor

@dcy2003 dcy2003 commented May 20, 2015

Initial stab at #143 including Integration Test to demonstrate approach and results. Feedback welcome!

Brief summary:

I added a new query method to the DataStore that allows one to provide a desired subset of attributes to be included in query results. Research into available iterators to accomplish this goal within the tablets led me to ColumnQualifierFilter and this discussion, which recommends leveraging scanner.fetchColumn(Text colFam, Text colQual) for this purpose.

The integration test ingests sample data for the 10 largest cities in the US (city, state, population, land area, coordinates) and demonstrates 3 queries against this data set. Each query constructs a bounding box designed to match 3 cities from Texas. One query leverages an existing query method to return all attributes. The other two queries leverage the new query method to request different subsets of attributes asserting for the correct presence (null vs. not-null) of each attribute in the results.

Please let me know if I missed or misinterpreted anything...

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.19%) to 62.43% when pulling 0ade84d on GEOWAVE-143 into 440257a on master.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably, we should start adding things like this subsetting, as well as possibly the limit and maybe other things to be contained in the query object so we limit the number of methods in DataStore.

also, "attributes" are specific to simplefeature data, I think more generically you are filtering the "fieldIds"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment that DataStore has grown to a large number of query methods and we could look into encapsulating some items in the query object itself to limit the number of methods. Would you like me to (1) refactor "attributes" to "fieldIDs" and open a separate issue to limit the number of methods in the DataStore by moving some of this stuff to the query object or (2) address those concerns as part of this ticket as well?

My vote would be separate ticket because I could see that refactor touching a lot, but it's your call...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the thoughts on just adding a "QueryOptions" object - that way we don't break the API as new bits get added.
If that answers the mail, then I would say move fieldIds to that (since it's a pretty minor change).

If we need to do a bigger re-factoring, then i'm fine deferring it to a new ticket.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, at first, the query methods were intended to be intuitive and expose things that are sort of expected in general with query (such as the common SQL syntax). So 'limit' was a parameter rather than hidden in an object...

Now we're getting to the point of too many options and therefore too many query methods...I agree with @chrisbennight that it may make sense to have a set of concrete query options that could apply to any query and then have a separate interface/abstraction that handles the query body which would change based on the type of index (spatial, spatial-temporal, or whatever else). The question is would we rather have QueryOptions as a parameter or have a query object that contains the concrete set of options and the query body? The only argument that would make me lean to the latter is that often no limit or field subsetting are used, so you could instantiate the query object with the query body and the default options would not need to be specified, but if it was part of the method signature you'd have to instantiate some sort of query options or pass null. Not a big deal, but the datastore is pretty much at the core of everything. Thoughts?

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.06%) to 62.56% when pulling c77484a on GEOWAVE-143 into 440257a on master.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.06%) to 62.56% when pulling c77484a on GEOWAVE-143 into 440257a on master.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.06%) to 62.58% when pulling c036bfb on GEOWAVE-143 into 08b66af on master.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfecher
Thoughts on going this way vs. inheriting from one of the GeoWaveServices classes?

@dcy2003
My initial inclination would be to move to inherit from one of the existing service environments (maybe GeoWaveTestEnvironment ) - that's the pattern the other integration tests use. The though behind it would be to amortize the cost of setting up/configuration a miniaccumulocluster.
Caveated - this is just a quick off the cuff comment, so there may be a good reason not to do this - but absent that it's probably the pattern we would stick to. Basically you should just mostly be able to get rid of declaring your own accumulo member, tempdir, user, pass, and use instance variables. otherwise the rest of the code should be about the same. ( see: https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveBasicIT.java )

Also you might need to register this test here:
https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveITSuite.java

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on inheriting from GeoWaveTestEnvironment, as @chrisbennight mentioned when you run the test suite it will run a single mini accumulo cluster rather than starting one per test, just try to be good about clearing out any data ingested after the individual test completes so your test has no impact on subsequent tests

@dcy2003
Copy link
Copy Markdown
Contributor Author

dcy2003 commented May 21, 2015

@chrisbennight @rfecher thanks for the feedback and recommendations!

I modified the integration test to extend GeoWaveTestEnvironment as suggested, and this did in fact simplify things a decent bit. Thanks for pointing the cost savings of extending that class to avoid spinning up an unnecessary MiniAccumuloCluster.

With regards to the fieldIds, I took @chrisbennight suggestion (for now) and simply encapsulated them in a QueryOptions object. The end result is a single new public-facing method in the DataStore that accepts QueryOptions. If we want to move in a different direction per the alternative that @rfecher mentioned, I'm happy to tackle that but that would seem to warrant a separate issue due to the potential impact to existing code. Let me know if you guys have a preference on which way to go there.

Lastly, I squashed my commits down to a single commit and improved the wording of the commit message since we are now auto-generating a changelog. Waiting on Travis...

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.03%) to 62.61% when pulling 0e648e8 on GEOWAVE-143 into e197482 on master.

chrisbennight added a commit that referenced this pull request May 21, 2015
@chrisbennight chrisbennight merged commit 4bc19ad into master May 21, 2015
@chrisbennight chrisbennight deleted the GEOWAVE-143 branch May 21, 2015 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants