Skip to content

Conversation

@elbamit
Copy link
Contributor

@elbamit elbamit commented Dec 4, 2025

📝 Description

This PR updates the MLRun API to mount the Kubernetes secret corresponding to IG4’s offline token default (for the running user) as a file inside the run pod at a predefined folder owned by MLRun. The folder path is exposed to the pod via an environment variable so that the SDK running inside the pod can access the token file.

The ServerSideLauncher._enrich_runtime method saves the token name in run.spec.auth. The KubejobRuntimeHandler then uses this token name to determine the Kubernetes secret name and mounts it on the runtime before pod creation.

As a result, volumes and volume_mounts are added to both the runtime and pod specifications.

The MLRUN_AUTH_OFFLINE_TOKEN env var that was used in #8840 is removed and not needed anymore.


🛠️ Changes Made

  • Removed MLRUN_AUTH_OFFLINE_TOKEN and set MLRUN_AUTH_WITH_OAUTH_TOKEN__TOKEN_FILE during auth env enrichment for pods
  • Added an auth field to RunSpec and to MLClientCtx(needed for it to be saved in DB)
  • Find secret's name and mount it as a file to the runtime during KubejobRuntimeHandler.run()

✅ Checklist

  • I updated the documentation (if applicable)
  • I have tested the changes in this PR
  • I confirmed whether my changes are covered by system tests
    • If yes, I ran all relevant system tests and ensured they passed before submitting this PR
    • I updated existing system tests and/or added new ones if needed to cover my changes
  • If I introduced a deprecation:

🧪 Testing

  • Unit test for mount secret token to runtime and for ensuring IG4 auth envs are set to runtime
  • Manual tests that job pod is deployed and authenticates with mlrun using the file

🔗 References


🚨 Breaking Changes?

  • Yes (explain below)
  • No

🔍️ Additional Notes

In future PR's, token other than default will be allowed and resolved

@elbamit elbamit marked this pull request as ready for review December 4, 2025 13:38
@elbamit elbamit requested review from a team, liranbg and moranbental as code owners December 4, 2025 13:38
Comment on lines 699 to 706
# TODO implement tests during ML-11600
def _resolve_and_validate_token_name_from_secret(self, run: mlrun.run.RunObject):
if run.spec.auth.get("token_name"):
# TODO perform token name validation ML-11600
pass
else:
# TODO perform token name resolution and validation ML-11600
run.spec.auth["token_name"] = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate TODO, simplify it. you can also write a function that "validate" the token name and put the todo there.

JOB_TYPE_RERUN_WORKFLOW_RUNNER = "rerun-workflow-runner"
MLRUN_ACTIVE_PROJECT = "MLRUN_ACTIVE_PROJECT"

MLRUN_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MLRUN_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"
MLRUN_JOB_AUTH_SECRET_PATH = "/var/mlrun-secrets/auth"

job/functions/everything that mlrun spins of curse but mainly for mlrun run pods as opposed to mlrun IDE or so.

Copy link
Member

@rokatyy rokatyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Just couple of nitpickings and questions

@elbamit elbamit marked this pull request as draft December 7, 2025 08:05
Copy link
Member

@Yacouby Yacouby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions

)
return

run.spec.auth["token_name"] = "default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this break runs if we fall back to "default" even when that token is missing or invalid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you are using hardcoded value , what about creating a constant for it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're always injecting default.
During ML-11600 we'll make the token resolution logic as described in the HLD https://iguazio.atlassian.net/wiki/spaces/MLRUN/pages/416121541/Secret+Token+Job+Function+Mounts+HLD#Token-name-resolution
So this is just temporary and will change.

:param env: The environment dictionary to enrich.
:param db: The RunDBInterface instance to retrieve secret tokens.
:param auth_info: The AuthInfo object containing authentication details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to update the docstring after removing those 2 params

env["MLRUN_AUTH_TOKEN_ENDPOINT"] = (
mlrun.mlconf.iguazio_api_url + "/api/v1/refresh-access-token"
env["MLRUN_AUTH_TOKEN_ENDPOINT"] = os.path.join(
mlrun.mlconf.iguazio_api_url, "/api/v1/refresh-access-token"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong since the second argument starts with a /, it ignores the first part and will always return "/api/v1/refresh-access-token", which is not intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the leading /

mocked_responses = self._mock_list_namespaced_pods([[pod]])
return mocked_responses[0].items

def test_mount_secret_token_to_runtime(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add more tests here as you check only the happy path where the secret exists. you should also cover cases like missing secrets, volume with the same name already exists
+
verifying the generate env variable is set in the runtime env

@elbamit elbamit marked this pull request as ready for review December 7, 2025 15:45
Copy link
Member

@liranbg liranbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments


self._handle_retry(run)
run = self._pre_run_image_pull_secret_enrichment(run)
self._resolve_and_validate_token_name_from_secret(run)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._resolve_and_validate_token_name_from_secret(run)
self._enrich_and_validate_auth_token_name(run)

Comment on lines 711 to 719
# for token_name in services.api.crud.Secrets().list_secret_tokens(self._auth_info.username):
# if self._validate_token_name(token_name):
# run.spec.auth["token_name"] = token_name
# return

# raise ValueError(
# "No valid authentication token found. "
# "Please create a valid offline token or provide one explicitly."
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

@elbamit elbamit requested review from Yacouby and liranbg December 8, 2025 12:05
Copy link
Member

@liranbg liranbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very well. minor suggestion

run.spec.auth["token_name"] = "default"

def _validate_token_name(self, token_name: str, explicit: bool = False):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the TODO here, above this function

],
1,
),
# Volume with a different name already exists (should add new one)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both volumes has the the same name "other-volume", so this comment is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is the volumes, the other is the volume mounts.
The volume that the function add is called secret

# Volume with a different name already exists (should add new one)
(
[{"mountPath": "/some/other/path", "name": "other-volume"}],
[{"name": "other-volume", "other-volume": {"items": []}}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this valid Kubernetes schema ?

runtime = mlrun.runtimes.kubejob.KubejobRuntime()
# # Runtime that inherit from BaseRuntime already have existing volumes/mounts
# runtime.spec.volume_mounts = []
# runtime.spec.volumes = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is a bit misleading, you say that runtime already have existing volumes, but then expect them to be empty ?

@liranbg liranbg merged commit 0f548c6 into mlrun:development Dec 10, 2025
13 checks passed
@elbamit elbamit deleted the ML-11583 branch December 18, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants