Skip to content

Conversation

@xliuqq
Copy link
Collaborator

@xliuqq xliuqq commented Dec 16, 2023

Ⅰ. Describe what this PR does

Support for password-less SSH login between Pods of Data Operations(Currently for Data Migration).

Juicefs distributed-sync uses a manager to start multiple workers for concurrent data synchronization.

Like mpi-operator, use a job (as Manager) to launch a statefulset (as Workers).

The ssh private/public key is auto generated and stored in a configmap, and the manager and workers will mount the configmap for password-less SSH.

The added field DataMigrateSpec.Parallelism represents the worker numbers, default value is 1 (only deploy the job without statefulset, consistent with previous version).

Ⅱ. Does this pull request fix one issue?

fixes #3585

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

using image juicedata/juicefs-fuse:ce-v1.1.0-rc1 with openssh client and server

Ⅴ. Special notes for reviews

Currently, only support Once job, another pr will support Cron job.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@fluid-e2e-bot
Copy link

fluid-e2e-bot bot commented Dec 16, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link

codecov bot commented Dec 16, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (c7c5b72) 64.26% compared to head (1e68750) 64.46%.
Report is 76 commits behind head on master.

Files Patch % Lines
pkg/ddc/base/operation.go 0.00% 10 Missing ⚠️
pkg/ddc/juicefs/data_migrate.go 79.16% 3 Missing and 2 partials ⚠️
pkg/utils/kubeclient/statefulset.go 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3645      +/-   ##
==========================================
+ Coverage   64.26%   64.46%   +0.19%     
==========================================
  Files         443      473      +30     
  Lines       26739    28287    +1548     
==========================================
+ Hits        17185    18236    +1051     
- Misses       7541     7893     +352     
- Partials     2013     2158     +145     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cheyang cheyang requested a review from zwwhdls December 16, 2023 07:04
Signed-off-by: xliuqq <xlzq1992@gmail.com>
@xliuqq xliuqq marked this pull request as ready for review December 16, 2023 07:44
Signed-off-by: xliuqq <xlzq1992@gmail.com>
if [ $PARALLELISM -gt 1 ]
then
timeout $TIMEOUT /usr/local/bin/juicefs sync {{ .Values.datamigrate.migrateFrom }} {{ .Values.datamigrate.migrateTo }} $OPTION
workers=$(grep "Host " /root/.ssh/config | cut -d " " -f 2 | tr '\n' ',')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the enterprise version support multiple workers sync. please help confirm @zwwhdls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The community release also supports it.

@Hexilee
Copy link
Contributor

Hexilee commented Dec 22, 2023

The image juicedata/juicefs-fuse:ce-nightly supported openssh.

…ck, add pod ip

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@cheyang
Copy link
Collaborator

cheyang commented Dec 25, 2023

/test fluid-e2e

Copy link
Member

@TrafalgarZZZ TrafalgarZZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a great idea for dataset controller to auto-generate a RSA key pair and store them into configmaps. Once a user gets the permission to retrive content from configmaps, he/she can ssh into the migrate workers and check ak/sk info from env variables

@xliuqq
Copy link
Collaborator Author

xliuqq commented Jan 2, 2024

I'm not sure it's a great idea for dataset controller to auto-generate a RSA key pair and store them into configmaps. Once a user gets the permission to retrive content from configmaps, he/she can ssh into the migrate workers and check ak/sk info from env variables

The volcano project uses secrets volume mount for ssh. However, the fluid dataset controller does not have the secrets permissions. @TrafalgarZZZ have any ideas?

xliuqq added 3 commits January 3, 2024 17:49
… finished.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
xliuqq added 2 commits January 9, 2024 09:29
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
@xliuqq
Copy link
Collaborator Author

xliuqq commented Jan 9, 2024

@cheyang @TrafalgarZZZ Please review the newest code. Currently only support once job.

@cheyang
Copy link
Collaborator

cheyang commented Jan 20, 2024

I suggest the user to create SSH keys and Kubernetes secrets in the temporary directory.

# Switch to the temporary directory
cd /tmp

# Generate SSH key pair
ssh-keygen -t rsa -b 4096 -f id_rsa -N ''

# Create Kubernetes secrets
kubectl create secret generic ssh-keys \
  --from-file=id_rsa=/tmp/id_rsa \
  --from-file=id_rsa.pub=/tmp/id_rsa.pub

# Remove the private key file in the temporary directory
rm /tmp/id_rsa

Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
if exist {
dataMigrateInfo.ParallelOptions.SSHPort, err = strconv.Atoi(sshPort)
if err != nil {
j.Log.Info("sshReadyTimeoutSeconds in the parallelOptions is not a int, use default value.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about return err and fail fast? Ignoring the error may let users have incorrect expectation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

if exist {
dataMigrateInfo.ParallelOptions.SSHReadyTimeoutSeconds, err = strconv.Atoi(sshReadyTimeoutSeconds)
if err != nil {
j.Log.Info("sshReadyTimeoutSeconds in the parallelOptions is not a int, use default value.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

JuiceFSEEImageEnv = "JUICEFS_EE_IMAGE_ENV"

DefaultCEImage = "juicedata/juicefs-fuse:ce-v1.1.0-beta2"
DefaultCEImage = "juicedata/juicefs-fuse:ce-nightly"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the image tag to ce-nightly. Because it's not production ready.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the doc for parallel migrate usage should be written in the pr ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can create another PR to support. Thanks.

Signed-off-by: xliuqq <xlzq1992@gmail.com>
@sonarqubecloud
Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

3 New issues
0 Security Hotspots
No data about Coverage
0.6% Duplication on New Code

See analysis details on SonarCloud

Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@fluid-e2e-bot
Copy link

fluid-e2e-bot bot commented Jan 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang, TrafalgarZZZ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [TrafalgarZZZ,cheyang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURES] Support SSH-based Parallel Tasks for Data Operation(DataMigrate)

4 participants