[KubejobRuntime] Secret token mounting for KubejobRuntime #9013

elbamit · 2025-12-04T12:49:16Z

📝 Description

This PR updates the MLRun API to mount the Kubernetes secret corresponding to IG4’s offline token default (for the running user) as a file inside the run pod at a predefined folder owned by MLRun. The folder path is exposed to the pod via an environment variable so that the SDK running inside the pod can access the token file.

The ServerSideLauncher._enrich_runtime method saves the token name in run.spec.auth. The KubejobRuntimeHandler then uses this token name to determine the Kubernetes secret name and mounts it on the runtime before pod creation.

As a result, volumes and volume_mounts are added to both the runtime and pod specifications.

The MLRUN_AUTH_OFFLINE_TOKEN env var that was used in #8840 is removed and not needed anymore.

🛠️ Changes Made

Removed MLRUN_AUTH_OFFLINE_TOKEN and set MLRUN_AUTH_WITH_OAUTH_TOKEN__TOKEN_FILE during auth env enrichment for pods
Added an auth field to RunSpec and to MLClientCtx(needed for it to be saved in DB)
Find secret's name and mount it as a file to the runtime during KubejobRuntimeHandler.run()

✅ Checklist

I updated the documentation (if applicable)
I have tested the changes in this PR
I confirmed whether my changes are covered by system tests
- If yes, I ran all relevant system tests and ensured they passed before submitting this PR
- I updated existing system tests and/or added new ones if needed to cover my changes
If I introduced a deprecation:
- I followed the Deprecation Guidelines
- I updated the relevant Jira ticket for documentation

🧪 Testing

Unit test for mount secret token to runtime and for ensuring IG4 auth envs are set to runtime
Manual tests that job pod is deployed and authenticates with mlrun using the file

🔗 References

Ticket link: https://iguazio.atlassian.net/browse/ML-11583
Design docs links: https://iguazio.atlassian.net/wiki/spaces/MLRUN/pages/416121541
External links:

🚨 Breaking Changes?

Yes (explain below)
No

🔍️ Additional Notes

In future PR's, token other than default will be allowed and resolved

liranbg · 2025-12-04T14:30:10Z

server/py/services/api/launcher.py

+    # TODO implement tests during ML-11600
+    def _resolve_and_validate_token_name_from_secret(self, run: mlrun.run.RunObject):
+        if run.spec.auth.get("token_name"):
+            # TODO perform token name validation ML-11600
+            pass
+        else:
+            # TODO perform token name resolution and validation ML-11600
+            run.spec.auth["token_name"] = "default"


duplicate TODO, simplify it. you can also write a function that "validate" the token name and put the todo there.

liranbg · 2025-12-04T14:31:52Z

mlrun/common/constants.py

 JOB_TYPE_RERUN_WORKFLOW_RUNNER = "rerun-workflow-runner"
 MLRUN_ACTIVE_PROJECT = "MLRUN_ACTIVE_PROJECT"

+MLRUN_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"


Suggested change

MLRUN_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"

MLRUN_JOB_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"

job/functions/everything that mlrun spins of curse but mainly for mlrun run pods as opposed to mlrun IDE or so.

rokatyy

Looking good!

Just couple of nitpickings and questions

mlrun/auth/utils.py

server/py/services/api/launcher.py

mlrun/model.py

Yacouby

Suggestions

Yacouby · 2025-12-07T09:47:58Z

server/py/services/api/launcher.py

+            )
+            return
+
+        run.spec.auth["token_name"] = "default"


could this break runs if we fall back to "default" even when that token is missing or invalid?

you are using hardcoded value , what about creating a constant for it ?

Currently we're always injecting default.
During ML-11600 we'll make the token resolution logic as described in the HLD https://iguazio.atlassian.net/wiki/spaces/MLRUN/pages/416121541/Secret+Token+Job+Function+Mounts+HLD#Token-name-resolution
So this is just temporary and will change.

Yacouby · 2025-12-07T09:48:58Z

mlrun/auth/utils.py


    :param env: The environment dictionary to enrich.
    :param db: The RunDBInterface instance to retrieve secret tokens.
    :param auth_info: The AuthInfo object containing authentication details.


you have to update the docstring after removing those 2 params

Yacouby · 2025-12-07T09:58:43Z

mlrun/auth/utils.py

-        env["MLRUN_AUTH_TOKEN_ENDPOINT"] = (
-            mlrun.mlconf.iguazio_api_url + "/api/v1/refresh-access-token"
+        env["MLRUN_AUTH_TOKEN_ENDPOINT"] = os.path.join(
+            mlrun.mlconf.iguazio_api_url, "/api/v1/refresh-access-token"


I think this is wrong since the second argument starts with a /, it ignores the first part and will always return "/api/v1/refresh-access-token", which is not intended.

Will remove the leading /

server/py/services/api/runtime_handlers/kubejob.py

Yacouby · 2025-12-07T10:11:17Z

server/py/services/api/tests/unit/runtime_handlers/test_kubejob.py

        mocked_responses = self._mock_list_namespaced_pods([[pod]])
        return mocked_responses[0].items
+
+    def test_mount_secret_token_to_runtime(self):


need to add more tests here as you check only the happy path where the secret exists. you should also cover cases like missing secrets, volume with the same name already exists
+
verifying the generate env variable is set in the runtime env

liranbg

minor comments

liranbg · 2025-12-07T20:50:46Z

server/py/services/api/launcher.py


        self._handle_retry(run)
        run = self._pre_run_image_pull_secret_enrichment(run)
+        self._resolve_and_validate_token_name_from_secret(run)


Suggested change

self._resolve_and_validate_token_name_from_secret(run)

self._enrich_and_validate_auth_token_name(run)

liranbg · 2025-12-07T20:51:32Z

server/py/services/api/launcher.py

+        # for token_name in services.api.crud.Secrets().list_secret_tokens(self._auth_info.username):
+        #     if self._validate_token_name(token_name):
+        #         run.spec.auth["token_name"] = token_name
+        #         return
+
+        # raise ValueError(
+        #     "No valid authentication token found. "
+        #     "Please create a valid offline token or provide one explicitly."
+        # )


why is this commented?

…r to the secret crud

liranbg

very well. minor suggestion

liranbg · 2025-12-08T12:17:18Z

server/py/services/api/launcher.py

+        run.spec.auth["token_name"] = "default"
+
+    def _validate_token_name(self, token_name: str, explicit: bool = False):
+        pass


leave the TODO here, above this function

Yacouby · 2025-12-09T09:24:44Z

server/py/services/api/tests/unit/crud/test_secrets.py

+            ],
+            1,
+        ),
+        # Volume with a different name already exists (should add new one)


both volumes has the the same name "other-volume", so this comment is wrong

One is the volumes, the other is the volume mounts.
The volume that the function add is called secret

Yacouby · 2025-12-09T09:30:25Z

server/py/services/api/tests/unit/crud/test_secrets.py

+        # Volume with a different name already exists (should add new one)
+        (
+            [{"mountPath": "/some/other/path", "name": "other-volume"}],
+            [{"name": "other-volume", "other-volume": {"items": []}}],


is this valid Kubernetes schema ?

Yacouby · 2025-12-09T09:44:56Z

server/py/services/api/tests/unit/crud/test_secrets.py

+    runtime = mlrun.runtimes.kubejob.KubejobRuntime()
+    # # Runtime that inherit from BaseRuntime already have existing volumes/mounts
+    # runtime.spec.volume_mounts = []
+    # runtime.spec.volumes = []


this comment is a bit misleading, you say that runtime already have existing volumes, but then expect them to be empty ?

server/py/services/api/tests/unit/crud/test_secrets.py

Secret token mounting for KubejobRuntime

cbb0040

github-actions bot added area/sdk area/server area/api labels Dec 4, 2025

elbamit added 3 commits December 4, 2025 15:00

lint

7335cf2

Fix path concatenation

56a6e4d

Make sure to perform secret mounting only when secret exists

cbbe176

elbamit marked this pull request as ready for review December 4, 2025 13:38

elbamit requested review from a team, liranbg and moranbental as code owners December 4, 2025 13:38

liranbg reviewed Dec 4, 2025

View reviewed changes

rokatyy reviewed Dec 4, 2025

View reviewed changes

mlrun/auth/utils.py Outdated Show resolved Hide resolved

mlrun/auth/utils.py Outdated Show resolved Hide resolved

server/py/services/api/launcher.py Outdated Show resolved Hide resolved

mlrun/model.py Show resolved Hide resolved

elbamit added 2 commits December 7, 2025 10:02

Fix review comments

0febea0

lint

dd92f02

elbamit marked this pull request as draft December 7, 2025 08:05

Yacouby requested changes Dec 7, 2025

View reviewed changes

elbamit added 3 commits December 7, 2025 15:22

Save auth inside the run in the DB

5c83b72

Remove rebundant comment and fix token endpoint path

a9d5bab

Add more tests

add2740

elbamit marked this pull request as ready for review December 7, 2025 15:45

elbamit added 3 commits December 7, 2025 18:48

Test for IG4 env vars that are injected

238a5b6

Fix failing test

41f12c8

lint

4c4ac95

liranbg reviewed Dec 7, 2025

View reviewed changes

elbamit added 3 commits December 8, 2025 09:49

small fixes

ba86d89

Merge remote-tracking branch 'upstream/development' into ML-11583

605c05a

Move mount_secret_token_to_runtime from outside of the runtime handle…

6519652

…r to the secret crud

elbamit requested review from Yacouby and liranbg December 8, 2025 12:05

lint

ec7eacd

liranbg approved these changes Dec 8, 2025

View reviewed changes

remove duplicate tests

9522ddc

liranbg approved these changes Dec 8, 2025

View reviewed changes

add todo

7e8ae85

Yacouby requested changes Dec 9, 2025

View reviewed changes

Minor comments

a92a9f0

Yacouby approved these changes Dec 10, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/development' into ML-11583

efaa3ca

rokatyy approved these changes Dec 10, 2025

View reviewed changes

liranbg merged commit 0f548c6 into mlrun:development Dec 10, 2025
13 checks passed

elbamit deleted the ML-11583 branch December 18, 2025 09:32

	MLRUN_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"
	MLRUN_JOB_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"

	self._resolve_and_validate_token_name_from_secret(run)
	self._enrich_and_validate_auth_token_name(run)

[KubejobRuntime] Secret token mounting for KubejobRuntime #9013

[KubejobRuntime] Secret token mounting for KubejobRuntime #9013

Uh oh!

Conversation

elbamit commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🛠️ Changes Made

✅ Checklist

🧪 Testing

🔗 References

🚨 Breaking Changes?

🔍️ Additional Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rokatyy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yacouby left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liranbg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liranbg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

elbamit commented Dec 4, 2025 •

edited

Loading