Conversation
There was a problem hiding this comment.
arguably, we should start adding things like this subsetting, as well as possibly the limit and maybe other things to be contained in the query object so we limit the number of methods in DataStore.
also, "attributes" are specific to simplefeature data, I think more generically you are filtering the "fieldIds"
There was a problem hiding this comment.
I agree with the sentiment that DataStore has grown to a large number of query methods and we could look into encapsulating some items in the query object itself to limit the number of methods. Would you like me to (1) refactor "attributes" to "fieldIDs" and open a separate issue to limit the number of methods in the DataStore by moving some of this stuff to the query object or (2) address those concerns as part of this ticket as well?
My vote would be separate ticket because I could see that refactor touching a lot, but it's your call...
There was a problem hiding this comment.
What are the thoughts on just adding a "QueryOptions" object - that way we don't break the API as new bits get added.
If that answers the mail, then I would say move fieldIds to that (since it's a pretty minor change).
If we need to do a bigger re-factoring, then i'm fine deferring it to a new ticket.
There was a problem hiding this comment.
hmm, at first, the query methods were intended to be intuitive and expose things that are sort of expected in general with query (such as the common SQL syntax). So 'limit' was a parameter rather than hidden in an object...
Now we're getting to the point of too many options and therefore too many query methods...I agree with @chrisbennight that it may make sense to have a set of concrete query options that could apply to any query and then have a separate interface/abstraction that handles the query body which would change based on the type of index (spatial, spatial-temporal, or whatever else). The question is would we rather have QueryOptions as a parameter or have a query object that contains the concrete set of options and the query body? The only argument that would make me lean to the latter is that often no limit or field subsetting are used, so you could instantiate the query object with the query body and the default options would not need to be specified, but if it was part of the method signature you'd have to instantiate some sort of query options or pass null. Not a big deal, but the datastore is pretty much at the core of everything. Thoughts?
There was a problem hiding this comment.
@rfecher
Thoughts on going this way vs. inheriting from one of the GeoWaveServices classes?
@dcy2003
My initial inclination would be to move to inherit from one of the existing service environments (maybe GeoWaveTestEnvironment ) - that's the pattern the other integration tests use. The though behind it would be to amortize the cost of setting up/configuration a miniaccumulocluster.
Caveated - this is just a quick off the cuff comment, so there may be a good reason not to do this - but absent that it's probably the pattern we would stick to. Basically you should just mostly be able to get rid of declaring your own accumulo member, tempdir, user, pass, and use instance variables. otherwise the rest of the code should be about the same. ( see: https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveBasicIT.java )
Also you might need to register this test here:
https://github.com/ngageoint/geowave/blob/master/test/src/test/java/mil/nga/giat/geowave/test/GeoWaveITSuite.java
There was a problem hiding this comment.
agreed on inheriting from GeoWaveTestEnvironment, as @chrisbennight mentioned when you run the test suite it will run a single mini accumulo cluster rather than starting one per test, just try to be good about clearing out any data ingested after the individual test completes so your test has no impact on subsequent tests
|
@chrisbennight @rfecher thanks for the feedback and recommendations! I modified the integration test to extend GeoWaveTestEnvironment as suggested, and this did in fact simplify things a decent bit. Thanks for pointing the cost savings of extending that class to avoid spinning up an unnecessary MiniAccumuloCluster. With regards to the fieldIds, I took @chrisbennight suggestion (for now) and simply encapsulated them in a QueryOptions object. The end result is a single new public-facing method in the DataStore that accepts QueryOptions. If we want to move in a different direction per the alternative that @rfecher mentioned, I'm happy to tackle that but that would seem to warrant a separate issue due to the potential impact to existing code. Let me know if you guys have a preference on which way to go there. Lastly, I squashed my commits down to a single commit and improved the wording of the commit message since we are now auto-generating a changelog. Waiting on Travis... |
Initial stab at #143 including Integration Test to demonstrate approach and results. Feedback welcome!
Brief summary:
I added a new query method to the DataStore that allows one to provide a desired subset of attributes to be included in query results. Research into available iterators to accomplish this goal within the tablets led me to
ColumnQualifierFilterand this discussion, which recommends leveragingscanner.fetchColumn(Text colFam, Text colQual)for this purpose.The integration test ingests sample data for the 10 largest cities in the US (city, state, population, land area, coordinates) and demonstrates 3 queries against this data set. Each query constructs a bounding box designed to match 3 cities from Texas. One query leverages an existing query method to return all attributes. The other two queries leverage the new query method to request different subsets of attributes asserting for the correct presence (null vs. not-null) of each attribute in the results.
Please let me know if I missed or misinterpreted anything...