Enable mlflow-artifacts scheme as wrapper around http artifact scheme#5070
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
|
||
| def __init__(self, artifact_uri): | ||
| tracking_uri = get_tracking_uri() | ||
| resolved_artifacts_uri = artifact_uri.replace("mlflow-artifacts:", f"{tracking_uri}") |
There was a problem hiding this comment.
We should only perform this replacement if the URI doesn't specify a host. We also need to handle the case where the URI has one or three slashes after the scheme.
Right now, if the URI specifies a host (e.g. mlflow-artifacts://myhostname:5000), the following behavior occurs:
In [11]: MlflowArtifactsRepository("mlflow-artifacts://foo:5050/my/artifact/path").artifact_uri
Out[11]: 'http://localhost:5000//foo:5050/my/artifact/path'
Similarly, if a hostname-less mlflow-artifacts URI is specified with one or three slashes after the scheme, the following behaviors occur:
In [14]: mlflow.set_tracking_uri("http://localhost:5000/")
In [15]: MlflowArtifactsRepository("mlflow-artifacts:/my/artifact/path").artifact_uri
Out[15]: 'http://localhost:5000//my/artifact/path'
In [16]: MlflowArtifactsRepository("mlflow-artifacts:///my/artifact/path").artifact_uri
Out[16]: 'http://localhost:5000////my/artifact/path'
While the extra slashes in the path shouldn't be harmful, it would be nice to remove them.
To clarify re:hostnames - when a user supplies an artifact URI of the form mlflow-artifacts://<hostname>/<path> or mlflow-artifacts://<hostname>:<port>/<path>, we should resolve the URI to http://<hostname>/api/2.0/mlflow-artifacts/artifacts/<path> or http://<hostname>:<port>/api/2.0/mlflow-artifacts/artifacts/<path> respectively.
Can we add test cases for each of the above scenarios as well?
There was a problem hiding this comment.
Some... (not so minor) changes to handle resolving all of the different ways that this uri can be configured. I have a validation check in here for raising an Exception if only a port is passed; I'd really like an opinion on whether this is needed or if it's a silly thing to check for.
Also, should I take those private functions and make them class methods? I don't see those functions being particularly useful outside of this sub class of HttpArtifactRepository.
@dbczumar @harupy any thoughts / advice on these?
There was a problem hiding this comment.
I have a validation check in here for raising an Exception if only a port is passed.
This sounds good to me!
should I take those private functions and make them class methods?
I don't have a strong preference between private functions and class methods. The current code looks ok to me.
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
| uri = MlflowArtifactsRepository( # pylint: disable=unused-variable | ||
| failing_condition | ||
| ).artifact_uri |
There was a problem hiding this comment.
| uri = MlflowArtifactsRepository( # pylint: disable=unused-variable | |
| failing_condition | |
| ).artifact_uri | |
| MlflowArtifactsRepository(failing_condition) |
nit
There was a problem hiding this comment.
changed to just the class instantiation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
|
||
| @classmethod | ||
| def resolve_uri(cls, artifact_uri): | ||
| tracking_uri = get_tracking_uri() |
There was a problem hiding this comment.
Can we raise an exception if the scheme of tracking_uri is not http or https (e.g. sqlite:///foo/bar)?
There was a problem hiding this comment.
great idea :) added! I also simplified the parsing logic and temporarily set the tracking uri in the test function to validate against a valid server address (then reverted the global variable to not affect other tests with an inadvertent side-effect)
There was a problem hiding this comment.
Thanks for the update! A tracking URI looks like http://localhost:5000 and doesn't contain /api/2.0/mlflow-artifacts/artifacts.
There was a problem hiding this comment.
updated the tests and cleaned all of that up
There was a problem hiding this comment.
@BenWilson2 Sorry I wasn't clear. The resolved artifact URI needs to have /api/2.0/mlflow-artifacts/artifacts (e.g. http://localhost:5000/api/2.0/mlflow-artifacts/artifacts/path/to/dir).
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…i parsing Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…-artifact-scheme-ml-18423 Updating to new version of black Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
I installed the new version of black after merging master into this branch. Black on local (21.10b0) suggests no changes, but the lint test is failing in CI. @harupy have you seen this since the merge of the black update? |
|
Does |
| f"mlflow-artifacts://{base_path}/redundant", | ||
| f"{tracking_uri.scheme}://{tracking_uri.netloc}/{base_api_uri}{base_path}/redundant", | ||
| ), | ||
| ("mlflow-artifacts:/", uri_temp,), |
There was a problem hiding this comment.
| ("mlflow-artifacts:/", uri_temp,), | |
| ("mlflow-artifacts:/", uri_temp,), |
@BenWilson2 Removing the command here should fix the lint issue.
| print(f"\nuri_parse: {uri_parse}") | ||
| print(f"track_parse: {track_parse}") |
There was a problem hiding this comment.
| print(f"\nuri_parse: {uri_parse}") | |
| print(f"track_parse: {track_parse}") |
There was a problem hiding this comment.
deleted my debug ;)
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson benjamin.wilson@databricks.com
What changes are proposed in this pull request?
Support mlflow-artifacts: scheme for proxied mlflow server artifact handling functionality.
The implementation is a wrapper around the HTTP artifact repository scheme.
This supports both a 'basic scheme' (user leaving the resolving server address out of the uri) and approaches that retain the host (and/or port) for replacing the scheme with the originating registered uri for the artifact server.
How is this patch tested?
Unit tests
Does this PR change the documentation?
ci/circleci: build_doccheck. If it's successful, proceed to thenext step, otherwise fix it.
Detailson the right to open the job page of CircleCI.Artifactstab.docs/build/html/index.html.Release Notes
Is this a user-facing change?
Support for mlflow-artifacts scheme added (i.e., mlflow.create_experiment(name="my-experiment", artifact_location="mlflow-artifacts:/root/models/my-experiment-models/model-1"). Defining host, port, or both is also supported.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes