The ingest tool using mrToGW mode currently reads input data from a single directory that is specified on the command line. This utility should be enhanced to do the following:
- support reading from S3
- read directories recursively if "-r, --recursive" option is given
- use "--extension" flag
- if not in recursive mode, don't generate errors when encountering directories
- support a comma separated list of directories in "base path" parameter to allow multiple directories to be read with one yarn command
The ingest tool using mrToGW mode currently reads input data from a single directory that is specified on the command line. This utility should be enhanced to do the following: