-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: support ssh password free pod for parallel data migrate #3645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
Skipping CI for Draft Pull Request. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3645 +/- ##
==========================================
+ Coverage 64.26% 64.46% +0.19%
==========================================
Files 443 473 +30
Lines 26739 28287 +1548
==========================================
+ Hits 17185 18236 +1051
- Misses 7541 7893 +352
- Partials 2013 2158 +145 ☔ View full report in Codecov by Sentry. |
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
| if [ $PARALLELISM -gt 1 ] | ||
| then | ||
| timeout $TIMEOUT /usr/local/bin/juicefs sync {{ .Values.datamigrate.migrateFrom }} {{ .Values.datamigrate.migrateTo }} $OPTION | ||
| workers=$(grep "Host " /root/.ssh/config | cut -d " " -f 2 | tr '\n' ',') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the enterprise version support multiple workers sync. please help confirm @zwwhdls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The community release also supports it.
|
The image |
…ck, add pod ip Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
/test fluid-e2e |
TrafalgarZZZ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's a great idea for dataset controller to auto-generate a RSA key pair and store them into configmaps. Once a user gets the permission to retrive content from configmaps, he/she can ssh into the migrate workers and check ak/sk info from env variables
The volcano project uses secrets volume mount for ssh. However, the fluid dataset controller does not have the secrets permissions. @TrafalgarZZZ have any ideas? |
… finished. Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
@cheyang @TrafalgarZZZ Please review the newest code. Currently only support once job. |
|
I suggest the user to create SSH keys and Kubernetes secrets in the temporary directory. # Switch to the temporary directory
cd /tmp
# Generate SSH key pair
ssh-keygen -t rsa -b 4096 -f id_rsa -N ''
# Create Kubernetes secrets
kubectl create secret generic ssh-keys \
--from-file=id_rsa=/tmp/id_rsa \
--from-file=id_rsa.pub=/tmp/id_rsa.pub
# Remove the private key file in the temporary directory
rm /tmp/id_rsa |
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
pkg/ddc/juicefs/data_migrate.go
Outdated
| if exist { | ||
| dataMigrateInfo.ParallelOptions.SSHPort, err = strconv.Atoi(sshPort) | ||
| if err != nil { | ||
| j.Log.Info("sshReadyTimeoutSeconds in the parallelOptions is not a int, use default value.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about return err and fail fast? Ignoring the error may let users have incorrect expectation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed.
pkg/ddc/juicefs/data_migrate.go
Outdated
| if exist { | ||
| dataMigrateInfo.ParallelOptions.SSHReadyTimeoutSeconds, err = strconv.Atoi(sshReadyTimeoutSeconds) | ||
| if err != nil { | ||
| j.Log.Info("sshReadyTimeoutSeconds in the parallelOptions is not a int, use default value.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed.
pkg/common/juicefs.go
Outdated
| JuiceFSEEImageEnv = "JUICEFS_EE_IMAGE_ENV" | ||
|
|
||
| DefaultCEImage = "juicedata/juicefs-fuse:ce-v1.1.0-beta2" | ||
| DefaultCEImage = "juicedata/juicefs-fuse:ce-nightly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change the image tag to ce-nightly. Because it's not production ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the doc for parallel migrate usage should be written in the pr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can create another PR to support. Thanks.
Signed-off-by: xliuqq <xlzq1992@gmail.com>
|
cheyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
cheyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheyang, TrafalgarZZZ The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |

Ⅰ. Describe what this PR does
Support for password-less SSH login between Pods of Data Operations(Currently for Data Migration).
Juicefs distributed-sync uses a manager to start multiple workers for concurrent data synchronization.
Like mpi-operator, use a job (as Manager) to launch a statefulset (as Workers).
The ssh private/public key is auto generated and stored in a configmap, and the manager and workers will mount the configmap for password-less SSH.
The added field
DataMigrateSpec.Parallelismrepresents the worker numbers, default value is 1 (only deploy the job without statefulset, consistent with previous version).Ⅱ. Does this pull request fix one issue?
fixes #3585
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
using image
juicedata/juicefs-fuse:ce-v1.1.0-rc1withopenssh client and serverⅤ. Special notes for reviews
Currently, only support Once job, another pr will support Cron job.