Skip to content

Making CloudSQL to use internal IP address instead of external for Slurm Accounting DB.#1795

Merged
cboneti merged 57 commits into
GoogleCloudPlatform:developfrom
ek-nag:ek-cloudsql
Dec 8, 2023
Merged

Making CloudSQL to use internal IP address instead of external for Slurm Accounting DB.#1795
cboneti merged 57 commits into
GoogleCloudPlatform:developfrom
ek-nag:ek-cloudsql

Conversation

@ek-nag

@ek-nag ek-nag commented Sep 26, 2023

Copy link
Copy Markdown
Collaborator

This PR modifies community/modules/database/slurm-cloudsql-federation module. It makes communication between the Slurm cluster controller node and SQL database over the private network instead of the external. This way SQL instance does not need to have external IP address. It allows for both Slurm cluster and SQL instance to be completely isolated from the external networks.

The resulting Slurmdbd configuration made by using this updated module on the test cluster:

[root@hpcsmall-controller scripts]# cat /etc/slurm/slurmdbd.conf 
# slurmdbd.conf
# https://slurm.schedmd.com/slurmdbd.conf.html

DebugLevel=info
PidFile=/var/run/slurm/slurmdbd.pid

################################################################################
#              vvvvv  WARNING: DO NOT MODIFY SECTION BELOW  vvvvv              #
################################################################################

AuthType=auth/munge
AuthAltTypes=auth/jwt
AuthAltParameters=jwt_key=/var/spool/slurm/jwt_hs256.key

DbdHost=hpcsmall-controller

LogFile=/var/log/slurm/slurmdbd.log

SlurmUser=slurm

StorageLoc=slurm_accounting

StorageType=accounting_storage/mysql
StorageHost=10.212.0.3
StoragePort=3306
StorageUser=slurm
StoragePass=mPgWEqy31A8f

################################################################################
#              ^^^^^  WARNING: DO NOT MODIFY SECTION ABOVE  ^^^^^              #
################################################################################

Snippet from the test blueprint to demonstrate use case:

  - id: slurm-sql
    source: community/modules/database/slurm-cloudsql-federation
    kind: terraform
    use: [network1]
    settings:
      sql_instance_name: slurm-sql8-ofe
      tier: "db-g1-small"

  - id: slurm_controller
    source: community/modules/scheduler/schedmd-slurm-gcp-v5-controller
    use:
    - network1
    - compute_partition
    - homefs
    - slurm-sql
    settings:
      instance_image:
        family: slurm-gcp-5-8-hpc-rocky-linux-8
        project: schedmd-slurm-public

@cdunbar13 cdunbar13 requested a review from cboneti September 26, 2023 13:01
@cdunbar13 cdunbar13 added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Sep 26, 2023
@cboneti cboneti assigned ek-nag and unassigned cboneti Sep 28, 2023
@cboneti

cboneti commented Sep 28, 2023

Copy link
Copy Markdown
Member

Eimantas, the PR-validation test is failing.

Comment thread community/modules/database/slurm-cloudsql-federation/main.tf
@ek-nag

ek-nag commented Oct 17, 2023

Copy link
Copy Markdown
Collaborator Author

@cboneti I think I fixed the initial issue in pre-build, however it's still failing with ERROR: build step 2 "us-central1-docker.pkg.dev/hpc-toolkit-dev/hpc-toolkit-repo/hpc-toolkit-builder" failed: step exited with non-zero status: 1 , but I don't have permission to see the error. Would you mind checking what's wrong?

@cboneti

cboneti commented Oct 30, 2023

Copy link
Copy Markdown
Member

/gcbrun

tpdownes and others added 2 commits November 28, 2023 12:33
The /etc/profile.d login prompt informational message makes the
assumption that the VM is running a startup-script that uses our
startup-script module. This assumption is broken when an image is built
using our startup-script module and then a VM is booted with that image
that does not execute our startup-script module. This assumption is also
broken upon reboots of Slurm VMs because our script is wrapped inside
a startup script solution developed by SchedMD that exits early when
Slurm has previously started successfully. We can reconsider enabling
this message more robustly as part of future work.
…e_not_yet_started_message

Eliminate startup-script hasn't started message
@cboneti cboneti enabled auto-merge November 29, 2023 23:33
@cboneti

cboneti commented Nov 29, 2023

Copy link
Copy Markdown
Member

/gcbrun

auto-merge was automatically disabled December 8, 2023 12:21

Head branch was pushed to by a user without write access

yarikoptic and others added 27 commits December 8, 2023 12:23
* Add rudimentary codespell config

* Add pre-commit definition for codespell

* ot -> it typo fix

* Some more skips for codespell

* [DATALAD RUNCMD] Do interactive fixing of some ambigous typos

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w -i 3 -C 2 ./community/front-end/ofe/script/service_account.sh ./community/front-end/ofe/website/ghpcfe/models.py ./community/front-end/ofe/website/ghpcfe/models.py ./community/modules/scripts/htcondor-install/files/autoscaler.py ./community/modules/scripts/ramble-setup/README.md ./docs/videos/healthcare-and-life-sciences/README.md ./examples/README.md ./tools/validate_configs/test_configs/README.md",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

* 1 more typo fixed manually

* Skip (S)hortcuts, more words and files

* [DATALAD RUNCMD] run codespell throughout fixing typo automagically

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

* Duplicate ignore of requirements.txt and js in pre-commit config

until codespell-project/codespell#3196 is addressed
**NO CODE CHANGES** just moving around.
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.4...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.14.0 to 0.15.0.
- [Commits](golang/sys@v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.151.0 to 0.152.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.151.0...v0.152.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/spf13/afero](https://github.com/spf13/afero) from 1.10.0 to 1.11.0.
- [Release notes](https://github.com/spf13/afero/releases)
- [Commits](spf13/afero@v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/afero
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/go-git/go-git/v5](https://github.com/go-git/go-git) from 5.10.0 to 5.10.1.
- [Release notes](https://github.com/go-git/go-git/releases)
- [Commits](go-git/go-git@v5.10.0...v5.10.1)

---
updated-dependencies:
- dependency-name: github.com/go-git/go-git/v5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Legacy metadata to be removed, added comment.
Use older release until we address the code changes required by
terraform-linters/tflint-ruleset-terraform#133
@cboneti cboneti merged commit fd98e27 into GoogleCloudPlatform:develop Dec 8, 2023
@nick-stroud nick-stroud mentioned this pull request Jan 9, 2024
@ek-nag ek-nag deleted the ek-cloudsql branch January 15, 2024 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.