Skip to content

[Feature] Sqoop component optimization #2917

@Eights-Li

Description

@Eights-Li

Is your feature request related to a problem? Please describe.
dev branch sqoop task need to enhancment.
optimization points:
Sqoop's data access and data export do not support Hadoop-level custom parameters, that is, -D level parameters
– MR task name
– MR map and reduce memory and quantity, etc.
• Split-by field is not supported. If -m is greater than 1, if the primary key of the relational database table is not self-increasing, Sqoop It may cause duplicate data imported into Hadoop. The general solution is to specify a split-by field. therefore, split-by needs support
• Cannot customize parameters, such as import mysql, some tables can add –direct to speed up the import speed

Describe the solution you'd like
ideas:
• The task name of Sqoop is universal, and it must be changed to the required parameter on the Sqoop page
• Add Hadoop custom parameter input box for setting MR parameter memory, etc.
• Add Sqoop task-level custom parameters, like –driect, –fetch-size and other parameters used in specific situations
• Add option button to choose, custom script or use template script, refer to the design of DataX node

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions