Skip to content

adding spark ingest and custom crs to docs#1296

Merged
rfecher merged 1 commit intomasterfrom
documentation-update
Mar 27, 2018
Merged

adding spark ingest and custom crs to docs#1296
rfecher merged 1 commit intomasterfrom
documentation-update

Conversation

@mawhitby
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@rfecher rfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't there other new commands and other options on commands @srinivasreddyv2?

Such as --crs or config hdfs


[NOTE]
====
We are setting a custom CRS here as an example. If you don't set this GeoWave defaults to EPSG:4326.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the term Coordinate Refeerence System here (it is the quickstart guide after all)


In addition to the raw data to ingest, the ingest process requires an adapter to translate the native data into a format that can be persisted into the data store. Also, the ingest process requires an Index that is a definition of all the configured parameters that define how data is translated to Row IDs (how it is indexed). It also includes what common fields need to be maintained within the table to be used by fine-grained and secondary filters.

There are various ways to ingest data into a GeoWave store. The standard localToGW command is used to ingest files from locally or from an AWS s3 bucket into GeoWave in a single threaded fashion. For a distributed ingest (recommended for larger datasets) the sparkToGW and mrToGW commands can be used. Ingests can also be performed from directly from HDFS or utilizing Kafka.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ingest files from locally" should change to "ingest files from a local file system" , we can capitalize "S3", and
"performed from directly from HDFS" remove the extra "from"
change "in a single threaded fashion" to "in a single process." This is just clearer and more correct because localToGw has an option --numThreads which runs it multi-threaded.
also localToGW can ingest from HDFS the same as S3, sparkToGW. @srinivasreddyv2 can clarify if needed.

@mawhitby mawhitby force-pushed the documentation-update branch from e2d9037 to f1c315f Compare March 26, 2018 20:47
@mawhitby mawhitby force-pushed the documentation-update branch from f1c315f to 1d67af7 Compare March 27, 2018 18:54
@rfecher rfecher merged commit 113c340 into master Mar 27, 2018
@rfecher rfecher deleted the documentation-update branch March 27, 2018 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants