Skip to content

Add server option for serving only artifacts and proxied serving mode#5045

Merged
BenWilson2 merged 22 commits into
masterfrom
support-serve-artifacts
Dec 1, 2021
Merged

Add server option for serving only artifacts and proxied serving mode#5045
BenWilson2 merged 22 commits into
masterfrom
support-serve-artifacts

Conversation

@BenWilson2

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Add the flags:
--serve-artifact-opt to enable starting an MLflow server for artifact serving purposes
--artifacts-only to enable artifact serving as an exclusive option to an MLflow server instance, disabling all other endpoints in the tracking service (/api/2.0/mlflow/* disabled, /api/2.0/mlflow-artifacts/* enabled only).

This is a follow-on to 4946.

How is this patch tested?

unit tests

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
Add capability to enable or disable mlflow-artifact functionality for the mlflow server, as well as the ability to enforce exclusively functionality of artifact handling for an mlflow server instance.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…t-serve-artifacts

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions Bot added area/examples Example code area/server-infra MLflow Tracking server backend area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Nov 10, 2021
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread mlflow/cli.py Outdated
Comment thread mlflow/server/handlers.py Outdated
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…t-serve-artifacts

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment thread mlflow/server/handlers.py Outdated
return wrapper


def _disable_mlflow_artifacts_only(func):

@harupy harupy Nov 11, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _disable_mlflow_artifacts_only(func):
def _disable_if_artifacts_only(func):

Can we also rename this function to indicate that the decorated endpoint is disabled if --artifacts-only is specified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much more clear decorator name. changed

@harupy harupy Nov 12, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 It seems it's not changed yet :)

Comment thread examples/mlflow_artifacts/README.md Outdated
Comment thread examples/mlflow_artifacts/docker-compose.yml
Comment thread mlflow/server/handlers.py
…t-serve-artifacts

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 requested a review from harupy November 11, 2021 19:19
Comment thread mlflow/cli.py Outdated
Comment on lines +284 to +285
"false",
"false",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"false",
"false",
False,
False,

Should these be False?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely should. I forgot about how Python will return a bool validation on a string of "true" or "false" the same as the bool operator.

Comment thread mlflow/cli.py Outdated
"by routing these requests to the storage location that is specified by "
"'--artifact-destination' directly through a proxy. The default location that "
"these requests are served from is a local './mlartifacts' directory which can be "
"overridden via '--artifact-destination' arguments. "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"overridden via '--artifact-destination' arguments. "
"overridden via '--artifacts-destination' argument. "

nit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread tests/test_cli.py Outdated
Comment thread examples/mlflow_artifacts/docker-compose.yml Outdated
Comment thread examples/mlflow_artifacts/example.py
Comment thread tests/test_cli.py Outdated
Comment thread mlflow/utils/rest_utils.py Outdated
Comment thread mlflow/server/handlers.py Outdated
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

@dbczumar dbczumar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 This is awesome! One point of feedback I noticed while manual QAing is that, when --serve-artifacts is specified, --default-artifact-root still defaults to ./mlruns. For artifact serving, it should default to an mlflow-artifacts:/ URI. Is support for mlflow-artifacts URIs coming in a separate PR? If so, let's merge that one first.

Once we change the default behavior, can we update the documentation for the --default-artifact-root option? Even for the existing behavior with file stores, the mlflow server --help output is pretty confusing and has some English syntax errors:

  --default-artifact-root URI  Local or S3 URI to store
                               artifacts, for new experiments.
                               Note that this flag does not
                               impact already-created
                               experiments. Default: Within
                               file store, if a file:/ URI is
                               provided. If a sql backend is
                               used, then this option is
                               required.

@BenWilson2 BenWilson2 requested a review from dbczumar November 24, 2021 17:41
@dbczumar dbczumar changed the title Add server option for serving only artifacts and proxied serving mode [ML-18422] Add server option for serving only artifacts and proxied serving mode Nov 24, 2021
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 force-pushed the support-serve-artifacts branch from 09b84eb to d83c7a3 Compare November 25, 2021 00:37
Comment thread mlflow/cli.py Outdated
else:
default_artifact_root = DEFAULT_LOCAL_FILE_AND_ARTIFACT_PATH
default_artifact_root = resolve_default_artifact_root(
serve_artifacts, default_artifact_root, backend_store_uri, True

@harupy harupy Nov 29, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serve_artifacts, default_artifact_root, backend_store_uri, True
serve_artifacts, default_artifact_root, backend_store_uri, resolve_to_local=True

@BenWilson2 Can we use a keyword argument here?

Comment thread mlflow/server/handlers.py Outdated
(
f"Endpoint: {request.url_rule} disabled due to the mlflow server running "
"without `--serve-artifacts`. To enable artifacts server functionaltiy, "
"run `mlflow server` with `--serve-artfiacts`"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"run `mlflow server` with `--serve-artfiacts`"
"run `mlflow server` with `--serve-artifacts`"

typo

Comment thread mlflow/server/handlers.py Outdated
return Response(
(
f"Endpoint: {request.url_rule} disabled due to the mlflow server running "
"without `--serve-artifacts`. To enable artifacts server functionaltiy, "

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"without `--serve-artifacts`. To enable artifacts server functionaltiy, "
"without `--serve-artifacts`. To enable artifacts server functionality, "

typo

@harupy

harupy commented Nov 29, 2021

Copy link
Copy Markdown
Member

@BenWilson2 Can we merge the master branch to include changes made by #5070?

@harupy

harupy commented Nov 29, 2021

Copy link
Copy Markdown
Member

resolved = f"{base_url}{track_parse.path}{uri_parse.path.lstrip('/')}"

Is this line missing a slash after {track_parse.path}?

resolved = f"{base_url}{track_parse.path}/{uri_parse.path.lstrip('/')}" 

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…t-serve-artifacts

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2

Copy link
Copy Markdown
Contributor Author

resolved = f"{base_url}{track_parse.path}{uri_parse.path.lstrip('/')}"

Is this line missing a slash after {track_parse.path}?

resolved = f"{base_url}{track_parse.path}/{uri_parse.path.lstrip('/')}" 

the parsed path element will have a trailing '/' when returned from urllib.parse.urlparse. If we include another '/', we'll have paths like this:
Expected :http://myhostname/api/2.0/mlflow-artifacts/artifacts/my/artifact/path/host
Actual :http://myhostname/api/2.0/mlflow-artifacts/artifacts//my/artifact/path/host

@harupy

harupy commented Nov 29, 2021

Copy link
Copy Markdown
Member

@BenWilson2 My suggested code was incorrect. I think we need a slash before track_parse.path in a case where tracking_uri looks like http://localhost:5000.

resolved = f"{base_url}/{track_parse.path}{uri_parse.path.lstrip('/')}" 
Expected: http://myhostname/api/2.0/mlflow-artifacts/artifacts/my/artifact/path/host
Actual  : http://myhostname/api/2.0/mlflow-artifacts/artifactsmy/artifact/path/host
                                                     ^^^^^^^^^^^

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2

Copy link
Copy Markdown
Contributor Author

@BenWilson2 My suggested code was incorrect. I think we need a slash before track_parse.path in a case where tracking_uri looks like http://localhost:5000.

resolved = f"{base_url}/{track_parse.path}{uri_parse.path.lstrip('/')}" 
Expected: http://myhostname/api/2.0/mlflow-artifacts/artifacts/my/artifact/path/host
Actual  : http://myhostname/api/2.0/mlflow-artifacts/artifactsmy/artifact/path/host
                                                     ^^^^^^^^^^^

Ah, I get it! Added fixes for this and added fixture params to validate in the test suite.

Comment on lines +11 to +13
@pytest.fixture(
scope="module", autouse=True, params=["http://localhost:5000", "http://localhost:5000/"]
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doubles the number of tests. Can we just test that both http://localhost:5000 and http://localhost:5000/ can be resolved correctly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created another fixture and used an explicit single test for validating alternate uri naming convention

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines +30 to +32
f"mlflow-artifacts://myhostname:4242{base_path}/hostport",
f"http://myhostname:4242{base_url}{base_path}/hostport",
"http://myhostname:4242",

@harupy harupy Nov 30, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parametrize this test?

@pytest.parametrize("tracking_uri", ["http://localhost:5000", "http://localhost:5000/"])
@pytest.parametrize("artifact_uri, resolved_uri", [(...), (...)])
def test_xxx_yyy(tracking_uri, artifact_uri, resolved_uri):
    assert MlflowArtifactsRepository.resolve_uri(tracking_uri, artifact_uri, resolved_uri) == ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely. Added.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines +29 to +31
f"mlflow-artifacts://myhostname:4242{base_path}/hostport",
f"http://myhostname:4242{base_url}{base_path}/hostport",
"http://myhostname:4242",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"mlflow-artifacts://myhostname:4242{base_path}/hostport",
f"http://myhostname:4242{base_url}{base_path}/hostport",
"http://myhostname:4242",
f"mlflow-artifacts://myhostname:4242{base_path}/hostport",
"http://myhostname:4242",
f"http://myhostname:4242{base_url}{base_path}/hostport",

Can we fix the order?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your edited way in the last comment better. Changed to that easier to read and cleaner implementation.

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@harupy

harupy commented Nov 30, 2021

Copy link
Copy Markdown
Member

LGTM!

# Also used as default location for artifacts, when not provided, in non local file based backends
# (eg MySQL)
DEFAULT_LOCAL_FILE_AND_ARTIFACT_PATH = "./mlruns"
DEFAULT_ARTIFACTS_URI = "mlflow-artifacts:/"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an inline comment explaining what this is used for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly!

Comment thread mlflow/cli.py Outdated
Comment on lines +246 to +248
f"By default, data will be logged to the {DEFAULT_ARTIFACTS_URI} uri proxy if "
"the --serve-artifacts option is enabled. Otherwise, the default location will "
f"be {DEFAULT_LOCAL_FILE_AND_ARTIFACT_PATH}.",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"By default, data will be logged to the {DEFAULT_ARTIFACTS_URI} uri proxy if "
"the --serve-artifacts option is enabled. Otherwise, the default location will "
f"be {DEFAULT_LOCAL_FILE_AND_ARTIFACT_PATH}.",
f"If the --serve-artifacts option is specified, the default artifact root is {DEFAULT_ARTIFACTS_URI}. "
f "otherwise, the default artifact root is {DEFAULT_LOCAL_FILE_AND_ARTIFACT_PATH}".

@dbczumar dbczumar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 2 small docs comments. Thanks so much, @BenWilson2 !

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@BenWilson2 BenWilson2 merged commit fb2972f into master Dec 1, 2021
@BenWilson2 BenWilson2 deleted the support-serve-artifacts branch December 1, 2021 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/examples Example code area/server-infra MLflow Tracking server backend area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants