Make example input and PyFuncInput support csc csr matrix#5016
Merged
WeichenXu123 merged 16 commits intoDec 7, 2021
Merged
Conversation
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
harupy
requested changes
Nov 8, 2021
1284889 to
61c84dd
Compare
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
d9724cb to
a619df0
Compare
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
harupy
reviewed
Nov 8, 2021
tomasatdatabricks
left a comment
Contributor
There was a problem hiding this comment.
Looks good to me in general. One aspect I am not sure if I understand is - do we store the example as dense matrix? I think we should have a way to store it as sparse matrix and load it back as sparse matrix.
| else: | ||
| elif isinstance(input_tensor, np.ndarray): | ||
| return {"inputs": input_tensor.tolist()} | ||
| else: |
Contributor
There was a problem hiding this comment.
- can we do
elif isinstance(input_tesnor,csr_matrix, csc_matrix)? - does this mean we store sparse input as dense vector? or is this stored as array of indices and array of values?
Collaborator
Author
There was a problem hiding this comment.
I update code to store sparse input as data/indices/indptr vectors.
I also split csc/csr saving code out of the ndarray saving code.
harupy
previously approved these changes
Nov 26, 2021
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
tomasatdatabricks
approved these changes
Dec 6, 2021
tomasatdatabricks
left a comment
Contributor
There was a problem hiding this comment.
Thanks for the changes! The looks good in great.
I left a minor comment about the example type string for sparse matrices but it's only a suggestion. I am also ok with it as is.
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Make example input and PyFuncInput support csc csr matrix
How is this patch tested?
Unit tests.
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes