Skip to content

Implemented a Compound Index Strategy#445

Merged
rfecher merged 3 commits intomasterfrom
compound-idx
Jul 8, 2015
Merged

Implemented a Compound Index Strategy#445
rfecher merged 3 commits intomasterfrom
compound-idx

Conversation

@meislerj
Copy link
Copy Markdown
Contributor

Implemented CompoundIndexStrategy. For now it allows the
composition of two NumericIndexStrategy objects into a
single NumericIndexStrategy. This can easily be modified
to support N sub-indeces. The dimensions of the
CompoundIndexStrategy are the sum of the dimensions of
the sub-indeces. The compound index prefixes the bytes
from the first sub-index before any of the bytes from the
second. Externally, the compound index can be treated like
any other multi-dimensional index.

@chrisbennight
Copy link
Copy Markdown
Contributor

Can you add a comment to the top of the class that explains what it does? What you have in the description of the pull request would probably suffice :)

@meislerj
Copy link
Copy Markdown
Contributor Author

meislerj commented Jul 1, 2015

No problem

On Tue, Jun 30, 2015 at 9:09 PM, Chris Bennight notifications@github.com
wrote:

Can you add a comment to the top of the class that explains what it does?
What you have in the description of the pull request would probably suffice
:)


Reply to this email directly or view it on GitHub
#445 (comment).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is a special value that is used (anything less than 0) to mean "no max" ...this is something we should have documented in our javadoc of the interface but I see it being used here:https://github.com/ngageoint/geowave/blob/master/core/index/src/main/java/mil/nga/giat/geowave/core/index/sfc/tiered/TieredSFCIndexStrategy.java#L43

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I'd argue if the argument is 'maxEstimatedRangeDecomposition' that you'd take the floor instead of the ceiling to try to keep the result less than max rather than more than max, but I could see arguments either way I suppose

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding floor vs ceiling: I thought about that also. It's a tradeoff. If you have a compound index strategy with 3 dimensions, the default maxEstimatedRangeDecomposition is 8 (2^d). The way this is currently done, getQueryRanges(...) for each sub-index gets called with the square root of the maxEstimatedRangeDecomposition. Using the floor would yield ~4 ranges (depending on how closely each sub-index respects the parameter) instead of ~9 ranges. It's a tradeoff between giving too coarse coverage of the indexedRange or returning too many ranges. Since the javadoc said that the parameter is more of a suggestion than a strict limit, I chose ceiling. I'm happy to switch it if you feel it makes more sense.

@rfecher
Copy link
Copy Markdown
Contributor

rfecher commented Jul 2, 2015

I made a few comments but really I think the only significant one is watching out for negatives on the max query range. We didn't really well document our use of -1 as a pass-through for no max query ranges, but its used elsewhere particularly to re-use code between the 2 getQueryRange methods to not use a capped maximum query range. So although its not directly in the call chain I think that would affect this CompoundIndexStrategy (its definitely in the tierednumericindexstrategy at least), I'd try to be consistent with its handling of negative values (and we really should have had it in the javadoc of the interface).

meislerj added 3 commits July 7, 2015 19:10
Implemented CompoundIndexStrategy. For now it allows the
composition of two NumericIndexStrategy objects into a
single NumericIndexStrategy. This can easily be modified
to support N sub-indeces. The dimensions of the
CompoundIndexStrategy are the sum of the dimensions of
the sub-indeces. The compound index prefixes the bytes
from the first sub-index before any of the bytes from the
second. Externally, the compound index can be treated like
any other multi-dimensional index.
getQueryRanges(final MultiDimensionalNumericData indexedRange
,final int maxEstimatedRangeDecomposition) now handles negative
values for maxEstimatedRangeDecomposition by performing a full
decomposition.

Both getMaxQueryRanges(...) and getInsertionIds(...) take the
actual number of results from the call to the first sub-strategy
into account when calling the second strategy. For example, if the
first sub-strategy returned a single range, the second sub-strategy
can use the full maxRangeDecomposition (since the product of the two
will be maxRangeDecomposition).

getQueryRanges(...) uses the ceiling for computing the maximum estimated
range decomposition per strategy since the parameter is not strictly
enforced. getInsertionIds(...) uses the floor for each strategy
to reduce replication.
rfecher added a commit that referenced this pull request Jul 8, 2015
Implemented a Compound Index Strategy
@rfecher rfecher merged commit 5b106fd into master Jul 8, 2015
@rfecher rfecher deleted the compound-idx branch July 8, 2015 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants