|
| 1 | +[[statistics]] |
| 2 | +=== Statistics |
| 3 | + |
| 4 | +Adapters provide a set of statistics stored within a statistic store. The set of available statistics is specific to each adapter and |
| 5 | +the set of attributes for those data items managed by the adapter. Statistics include: |
| 6 | + |
| 7 | +* Ranges over an attribute, including time. |
| 8 | +* Enveloping bounding box over all geometries. |
| 9 | +* Cardinality of the number of stored items. |
| 10 | +* Histograms over the range of values for an attribute. |
| 11 | +* Cardinality of discrete values of an attribute. |
| 12 | + |
| 13 | +Statistics are updated during data ingest and deletion. Range and bounding box statistics reflect the largest range over time. |
| 14 | +Those statistics are not updated during deletion. Cardinality-based statistics are updated upon deletion. |
| 15 | + |
| 16 | +Statistics retain the same visibility constraints as the associated attributes. Thus, there is a set of statistics for each unique constraint. |
| 17 | +The statistics store answers each statistics inquiry for a given adapter with only those statistics matching the authorizations of the requester. |
| 18 | +The statistics store merges authorized statistics covering the same attribute. |
| 19 | + |
| 20 | +image::stats_merge.png[scaledwidth="100%",alt="Statistics Merge"] |
| 21 | + |
| 22 | +==== Statistics Table Structure in Accumulo |
| 23 | + |
| 24 | +image::stats.png[scaledwidth="100%",alt="Statistics Structure"] |
| 25 | + |
| 26 | +===== Re-Computation |
| 27 | + |
| 28 | +Re-computation of statistics is required in three circumstances: |
| 29 | + |
| 30 | +["arabic"] |
| 31 | +. As indexed items are removed from the adapter store, the range and envelope statistics may lose their accuracy if the removed item |
| 32 | +contains an attribute that represents the minimum or maximum value for the population. |
| 33 | +. New statistics added to the statistics store after data items are ingested. These new statistics do not reflect the entire population. |
| 34 | +. Software changes invalidate prior stored images of statistics. |
| 35 | + |
| 36 | +A simple statistics tool is a command line tool to recompute all statistics for a given adapter. The tool is soon to be replaced by a more comprehensive and efficient tool. |
| 37 | +The tool removes all statistics for adapter, scans the entire data set and reconstructs to statistics. The tool is be executed within a JVM using any of the assembled JAR files. |
| 38 | +The arguments to the tool are as follow, presented in the exact order required. |
| 39 | + |
| 40 | +* Zookeepers - Formatted as a comma-separated string: zookeeper1:port,zookeeper2:port |
| 41 | +* Accumulo Instance ID - The "instance" that the Accumulo cluster. |
| 42 | +* Accumulo Username - The nme of the connection user associated with a user account managed by Accumulo, not a system, etc. |
| 43 | +* Accumulo Password - This is an Accumulo controlled secret. |
| 44 | +* Geowave Namespace - This is _not_ an Accumulo namespace; rather think of it as a prefix Geowave uses for index table creation. |
| 45 | +* Geowave Adapter ID - The name of the adapter. This is the local name for the feature name managed by the Feature Data Adapter. |
| 46 | +This name matches the layer name in GeoServer. |
| 47 | +* Authorizations - Ideally, the requesting authorizations should encompass ALL authorizations of the system. The authorizations may be provided in a comma-separated list. |
| 48 | + |
| 49 | +Make sure JAVA_HOME is set prior to invoking the following command. |
| 50 | + |
| 51 | + java -cp /usr/local/geowave/ingest/geowave-ingest-tool.jar mil.nga.giat.geowave.accumulo.util.StatsTool "localhost:12342" "GeoWave" "root" "pAssWord" "test" "GpxTrack" "A,B&C" |
| 52 | + |
| 53 | + |
| 54 | + |
0 commit comments