-
Notifications
You must be signed in to change notification settings - Fork 5k
[Feature] Sqoop component optimization #2917
Description
Is your feature request related to a problem? Please describe.
dev branch sqoop task need to enhancment.
optimization points:
Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters
– MR task name
– MR map and reduce memory and quantity, etc.
• Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore, split-by needs support
• Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed
Describe the solution you'd like
ideas:
• The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page
• Add Hadoop custom parameter input box for setting MR parameter memory, etc.
• Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations
• Add option button to choose, custom script or use template script, refer to the design of DataX node