- replace querying the whole world spatially with empty or unset query - use count statistics instead of full table scan - re-use GeoWave within intermediate steps where appropriate (when using geometry in accumulo) - output polygon directly to GeoWave using GeoWaveOutputFormat - optionally you should be able to write back to the original table to enrich it with cluster ID - optionally you should be able to run the clustering without the polygon generation