Skip to content

Enable mlflow-artifacts scheme as wrapper around http artifact scheme#5070

Merged
BenWilson2 merged 13 commits into
masterfrom
mlflow-artifact-scheme-ml-18423
Nov 24, 2021
Merged

Enable mlflow-artifacts scheme as wrapper around http artifact scheme#5070
BenWilson2 merged 13 commits into
masterfrom
mlflow-artifact-scheme-ml-18423

Conversation

@BenWilson2

Copy link
Copy Markdown
Contributor

Signed-off-by: Ben Wilson benjamin.wilson@databricks.com

What changes are proposed in this pull request?

Support mlflow-artifacts: scheme for proxied mlflow server artifact handling functionality.
The implementation is a wrapper around the HTTP artifact repository scheme.
This supports both a 'basic scheme' (user leaving the resolving server address out of the uri) and approaches that retain the host (and/or port) for replacing the scheme with the originating registered uri for the artifact server.

How is this patch tested?

Unit tests

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Support for mlflow-artifacts scheme added (i.e., mlflow.create_experiment(name="my-experiment", artifact_location="mlflow-artifacts:/root/models/my-experiment-models/model-1"). Defining host, port, or both is also supported.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions Bot added area/artifacts Artifact stores and artifact logging rn/feature Mention under Features in Changelogs. labels Nov 15, 2021
Comment thread mlflow/store/artifact/mlflow_artifacts_repo.py Outdated
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 requested a review from dbczumar November 17, 2021 15:08

def __init__(self, artifact_uri):
tracking_uri = get_tracking_uri()
resolved_artifacts_uri = artifact_uri.replace("mlflow-artifacts:", f"{tracking_uri}")

@dbczumar dbczumar Nov 17, 2021

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only perform this replacement if the URI doesn't specify a host. We also need to handle the case where the URI has one or three slashes after the scheme.

Right now, if the URI specifies a host (e.g. mlflow-artifacts://myhostname:5000), the following behavior occurs:

In [11]: MlflowArtifactsRepository("mlflow-artifacts://foo:5050/my/artifact/path").artifact_uri
Out[11]: 'http://localhost:5000//foo:5050/my/artifact/path'

Similarly, if a hostname-less mlflow-artifacts URI is specified with one or three slashes after the scheme, the following behaviors occur:

In [14]: mlflow.set_tracking_uri("http://localhost:5000/")

In [15]: MlflowArtifactsRepository("mlflow-artifacts:/my/artifact/path").artifact_uri
Out[15]: 'http://localhost:5000//my/artifact/path'

In [16]: MlflowArtifactsRepository("mlflow-artifacts:///my/artifact/path").artifact_uri
Out[16]: 'http://localhost:5000////my/artifact/path'

While the extra slashes in the path shouldn't be harmful, it would be nice to remove them.

To clarify re:hostnames - when a user supplies an artifact URI of the form mlflow-artifacts://<hostname>/<path> or mlflow-artifacts://<hostname>:<port>/<path>, we should resolve the URI to http://<hostname>/api/2.0/mlflow-artifacts/artifacts/<path> or http://<hostname>:<port>/api/2.0/mlflow-artifacts/artifacts/<path> respectively.

Can we add test cases for each of the above scenarios as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some... (not so minor) changes to handle resolving all of the different ways that this uri can be configured. I have a validation check in here for raising an Exception if only a port is passed; I'd really like an opinion on whether this is needed or if it's a silly thing to check for.

Also, should I take those private functions and make them class methods? I don't see those functions being particularly useful outside of this sub class of HttpArtifactRepository.
@dbczumar @harupy any thoughts / advice on these?

@harupy harupy Nov 18, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a validation check in here for raising an Exception if only a port is passed.

This sounds good to me!

should I take those private functions and make them class methods?

I don't have a strong preference between private functions and class methods. The current code looks ok to me.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread mlflow/store/artifact/mlflow_artifacts_repo.py Outdated
@BenWilson2 BenWilson2 requested a review from dbczumar November 18, 2021 03:14
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread mlflow/store/artifact/mlflow_artifacts_repo.py Outdated
Comment thread tests/store/artifact/test_mlflow_artifact_repo.py Outdated
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread mlflow/store/artifact/mlflow_artifacts_repo.py Outdated
Comment on lines +52 to +54
uri = MlflowArtifactsRepository( # pylint: disable=unused-variable
failing_condition
).artifact_uri

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uri = MlflowArtifactsRepository( # pylint: disable=unused-variable
failing_condition
).artifact_uri
MlflowArtifactsRepository(failing_condition)

nit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to just the class instantiation

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

@classmethod
def resolve_uri(cls, artifact_uri):
tracking_uri = get_tracking_uri()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we raise an exception if the scheme of tracking_uri is not http or https (e.g. sqlite:///foo/bar)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea :) added! I also simplified the parsing logic and temporarily set the tracking uri in the test function to validate against a valid server address (then reverted the global variable to not affect other tests with an inadvertent side-effect)

@harupy harupy Nov 22, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! A tracking URI looks like http://localhost:5000 and doesn't contain /api/2.0/mlflow-artifacts/artifacts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the tests and cleaned all of that up

@harupy harupy Nov 22, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 Sorry I wasn't clear. The resolved artifact URI needs to have /api/2.0/mlflow-artifacts/artifacts (e.g. http://localhost:5000/api/2.0/mlflow-artifacts/artifacts/path/to/dir).

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…i parsing

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…-artifact-scheme-ml-18423

Updating to new version of black

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2

Copy link
Copy Markdown
Contributor Author

I installed the new version of black after merging master into this branch. Black on local (21.10b0) suggests no changes, but the lint test is failing in CI. @harupy have you seen this since the merge of the black update?

@harupy

harupy commented Nov 20, 2021

Copy link
Copy Markdown
Member

@BenWilson2

Does black --version print out 21.10b0?

black --version

f"mlflow-artifacts://{base_path}/redundant",
f"{tracking_uri.scheme}://{tracking_uri.netloc}/{base_api_uri}{base_path}/redundant",
),
("mlflow-artifacts:/", uri_temp,),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("mlflow-artifacts:/", uri_temp,),
("mlflow-artifacts:/", uri_temp,),

@BenWilson2 Removing the command here should fix the lint issue.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread tests/store/artifact/test_mlflow_artifact_repo.py Outdated
Comment on lines +62 to +63
print(f"\nuri_parse: {uri_parse}")
print(f"track_parse: {track_parse}")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"\nuri_parse: {uri_parse}")
print(f"track_parse: {track_parse}")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted my debug ;)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

@harupy harupy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@BenWilson2 BenWilson2 merged commit 776831a into master Nov 24, 2021
@BenWilson2 BenWilson2 deleted the mlflow-artifact-scheme-ml-18423 branch November 24, 2021 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/artifacts Artifact stores and artifact logging rn/feature Mention under Features in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants