Only generate model uuid when logging model#5167
Conversation
|
|
||
| reloaded_model_config = Model.load(os.path.join(model_path, "MLmodel")) | ||
| assert model_config.__dict__ == reloaded_model_config.__dict__ | ||
| assert model_config.model_uuid is not None and _is_valid_uuid(model_config) |
There was a problem hiding this comment.
| assert model_config.model_uuid is not None and _is_valid_uuid(model_config) | |
| assert model_config.model_uuid is not None | |
| assert _is_valid_uuid(model_config.model_uuid) |
|
Can you add a test that |
| model_uuid = uuid.uuid4().hex | ||
| mlflow_model = cls(artifact_path=artifact_path, run_id=run_id, model_uuid=model_uuid) |
There was a problem hiding this comment.
We should not set uuid here. Otherwise there're many other place we need also set uuid.
e.g.
mlflow/mlflow/pyfunc/__init__.py
Line 1072 in f9b5745
Line 122 in f9b5745
etc.
I move the set uuid code into Model constructor again.
| flavors=None, | ||
| signature=None, # ModelSignature | ||
| saved_input_example_info: Dict[str, Any] = None, | ||
| model_uuid=None, |
There was a problem hiding this comment.
Instead of removing model_uuid from the constructor and modifying the attribute during load(), can we define a third type of value for model_uuid?
- By default,
model_uuidshould be a function that generates a UUID, e.g.lambda: uuid.uuid4().hex - Model uuid could be
None, indicating that the model has no ID - Model uuid could be a string, indicating that the model already has a UUID
In the constructor, we can check if the input is a function and, if it is, call it to generate an ID. Otherwise, set self.model_uuid = model_uuid.
| self.signature = signature | ||
| self.saved_input_example_info = saved_input_example_info | ||
| self.model_uuid = uuid.uuid4().hex if model_uuid is None else model_uuid | ||
| self.model_uuid = uuid.uuid4().hex |
There was a problem hiding this comment.
This basically means every Model instance has a different model_uuid. Is this really the desired behavior?
There was a problem hiding this comment.
Every new constructed model has a different ID, this is desired.
We only need keep the rule that the model load back from saved model should load old ID back or set it None if saved model don't have ID
| if callable(model_uuid): | ||
| self.model_uuid = model_uuid() | ||
| else: | ||
| self.model_uuid = model_uuid |
There was a problem hiding this comment.
| if callable(model_uuid): | |
| self.model_uuid = model_uuid() | |
| else: | |
| self.model_uuid = model_uuid | |
| self.model_uuid = model_uuid() if callable(model_uuid) else model_uuid |
dbczumar
left a comment
There was a problem hiding this comment.
LGTM! Thanks @WeichenXu123 !
Signed-off-by: Weichen Xu weichen.xu@databricks.com
What changes are proposed in this pull request?
Only generate model uuid when logging model.
Before:
When loading model from a old version mlflow model, a new model uuid will be generated. Loading multiple times will generate multiple different model uuid.
After:
When loading model from a old version mlflow model, the model uuid attribute will be None.
How is this patch tested?
Unit tests.
Does this PR change the documentation?
ci/circleci: build_doccheck. If it's successful, proceed to thenext step, otherwise fix it.
Detailson the right to open the job page of CircleCI.Artifactstab.docs/build/html/index.html.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes