Skip to content

Aggregation API and Geohash Aggregation implementation#1733

Merged
rfecher merged 1 commit intolocationtech:masterfrom
sinistersnare:feature/geohash-aggregation
Jul 2, 2020
Merged

Aggregation API and Geohash Aggregation implementation#1733
rfecher merged 1 commit intolocationtech:masterfrom
sinistersnare:feature/geohash-aggregation

Conversation

@sinistersnare
Copy link
Copy Markdown
Contributor

Hi!

This partly goes towards #1703. After this is in, creating hex-bin aggregations should be easy enough, and that can be closed.

There are a few TODOs left in here, but mostly its done! Just wanted to see what you guys need from me to get this merged in. Also a code-review would be great.

Thanks!

@sinistersnare
Copy link
Copy Markdown
Contributor Author

I signed the ECA a while ago, how can I get confirmation of that?

@@ -0,0 +1,136 @@
package org.locationtech.geowave.core.geotime.store.query.aggregate;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved.

}
}
// TODO: should this be 0, AKA 1 bin for all data? (if so make sure to fix tests!)
return 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to always have some degree of binning, otherwise there would be no reason to use a binning strategy at all. However, if the max bins allowed is less than 32, I would suggest throwing an IlleagalArgumentException that states that at least 32 bins are required for this strategy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved, I implemented this.

/**
* Create a Binning Strategy based on GeoHashes.
*
* @param maxBinsAllowed TODO: explain how this invariant works, and how itll break for < 32.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my next comment about throwing an exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved.

@@ -0,0 +1,149 @@
package org.locationtech.geowave.core.geotime.store.query.aggregate;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved.

* @param num The number to serialize as a {@link ByteBuffer}
* @return a byte array that represents the given BigDecimal.
*/
public static byte[] writeBigDecimal(final BigDecimal num) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure these functions really fit in VarintUtils, This class is mostly used for encoding integers using as few bytes possible, but it doesn't look like there is any variable length encoding being used. There is another class that handles BigDecimal serialization at org.locationtech.geowave.core.store.data.field.base.BigDecimalSerializationProvider.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I just lifted the code I removed from the math-agg and used that instead of this stuff. However, I changed it slightly so it was a better ByteBuffer citizen (storing the array length, instead of using all of .remaining()

import java.util.List;
import java.util.Map;

public class GeohashBinningExample {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do a better job of explaining our examples, there should be a comment at the top of this class that explains what the example does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bunch of comments, let me know if its better, or it should be explained more thoroughly.

allResults.entrySet().stream().filter((e) -> e.getKey().length() == 6).toArray()));
}

private static Map<String, BigDecimal> executeBinningAggregation(int precision) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should have fairly detailed comments, this will be used as a learning tool.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

if (result == null) {
return new byte[0];
}
final byte[] unscaled = result.unscaledValue().toByteArray();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this logic to the VarintUtils class since it is using Variable length encoding for each part of the BigDecimal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about above, I moved this code over with some slight changes.


@RunWith(GeoWaveITSuiteRunner.class)
@SuiteClasses({
GeoWaveVisibilityIT.class,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment these back in

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already done, forgot to uncomment it.

@@ -0,0 +1,133 @@
package org.locationtech.geowave.test.basic;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved.

@jdgarrett
Copy link
Copy Markdown
Contributor

@rfecher I noticed when reviewing this that most aggregations simply return new byte[0] in their serialization functions. It the expectation that an aggregation that is serialized and then deserialized is equivalent to the original, including the value?

import com.google.common.collect.Lists;
import org.opengis.feature.simple.SimpleFeatureType;
import java.util.Date;
import java.util.List;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to go thru the import reorganizations and change them back, they were fallout from a different formatter.

@sinistersnare sinistersnare force-pushed the feature/geohash-aggregation branch from 3ede8ae to 33c7262 Compare June 5, 2020 06:00
@rfecher
Copy link
Copy Markdown
Contributor

rfecher commented Jun 5, 2020

I signed the ECA a while ago, how can I get confirmation of that?

@sinistersnare I wouldn't worry too much about it at this stage, when you complete the PR make sure you sign all your commits with the same email as the ECA and it should pass the ECA validation.

@rfecher
Copy link
Copy Markdown
Contributor

rfecher commented Jun 5, 2020

@rfecher I noticed when reviewing this that most aggregations simply return new byte[0] in their serialization functions. It the expectation that an aggregation that is serialized and then deserialized is equivalent to the original, including the value?

@jdgarrett the "Result" and the "Params" are separate from the function itself and serialized separately. I actually would expect all Aggregations to be empty byte serializations (byte[0]), because the input to the function goes in params anyways. I think I had good reason to separate serialization of params and the actual function (there's definitely good reason to separate the result). It seems fairly clean and flexible considering the aggregation function is also responsible for serializing/deserializing result and computing the aggregation, there's no need to inherently tie a potentially heavy input parameter to its serialization. By far the most complicated aggregation at this point in the baseline is distributed rendering so making that make reasonable sense drove some of the design decisions (org.locationtech.geowave.adapter.vector.render.DistributedRenderAggregation).

@sinistersnare
Copy link
Copy Markdown
Contributor Author

I have addressed most of the comments, still some work to be done.

Me and JD had a talk on some different constructors for the GeohashBinningStrategy. Im not quite sure how I can implement them, and I made comments on that front.

Aside from that, testing, and docs, this is looking better and better.

@sinistersnare sinistersnare force-pushed the feature/geohash-aggregation branch from 24edbac to e4ccf18 Compare July 2, 2020 04:15
…ashes.

Signed-off-by: Davis Silverman <davis.silverman@maxar.com>
@sinistersnare sinistersnare force-pushed the feature/geohash-aggregation branch from e4ccf18 to dc4e469 Compare July 2, 2020 04:19
@sinistersnare
Copy link
Copy Markdown
Contributor Author

Signed my commit, rebased onto master (fixing that last TODO!), now we just wait for CI!

Woo!

@rfecher rfecher merged commit 42cd1ad into locationtech:master Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants