Skip to content

MLeap Flavor/Deployment #1: Java Support#307

Merged
dbczumar merged 102 commits intomlflow:masterfrom
dbczumar:java_basis
Aug 17, 2018
Merged

MLeap Flavor/Deployment #1: Java Support#307
dbczumar merged 102 commits intomlflow:masterfrom
dbczumar:java_basis

Conversation

@dbczumar
Copy link
Collaborator

This is the first PR in a multi-part series that will introduce support for the MLeap flavor in MLFlow. This PR provides the infrastructure for deserializing and invoking models as generic predictors in Java.

* Represents an MLeap flavor configuration
*/
public class MLeapFlavor implements Flavor {
public static final String FLAVOR_NAME = "mleap";
Copy link
Contributor

@aarondav aarondav Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google's Java Style Guide would have us do 2-space indent for all Java files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,50 @@
package com.databricks.mlflow.mleap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.mlflow, everywhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

import com.databricks.mlflow.sagemaker.PredictorLoadingException;
import com.databricks.mlflow.utils.FileUtils;

import java.util.Optional;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* Unit tests for the {@link com.databricks.mlflow.utils.SerializationUtils} module
*/
public class SerializationUtilsTests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all tests to end with Test instead of Tests.

*
* @param modelRootPath The path to the root directory of the MLFlow model
*/
public static Model fromRootPath(String modelRootPath) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test called ModelTests#testModelIsLoadedFromYamlUsingRootPathCorrectly

Assert.assertTrue(
model.getFlavor(MLeapFlavor.FLAVOR_NAME, MLeapFlavor.class).isPresent());
Assert.assertTrue(model.getUtcTimeCreated().isPresent());
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you're allowed to add throws to a test, and it will fail the test if the exception is thrown. Maybe see if it works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling is deliberate here, in order to provide additional information about the test failure. Added a stack trace dump to ensure that the original trace isn't lost.

* @param class The class of the Java object that should be produced
*/
public static <T> T parseJsonFromFile(String filePath, Class<T> objectClass)
throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypernit: continuation indents are at +4 indentation (compared to default +2) to distinguish from body

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param json A string in valid JSON format
*/
public static <T> T fromJson(String json) throws IOException {
return jsonMapper.readValue(json, new TypeReference<T>() {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I believe that this actually works for complex types. Do we need to define it, btw? Maybe we could remove the methods of SerializationUtils that are not currently used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used later by MLeap. See https://github.com/mlflow/mlflow/pull/278/files#diff-71d9fc3d16e2e46961159e9efccdf573R52.

Good catch, this didn't work for complex types. Added a Class argument.

@codecov-io
Copy link

codecov-io commented Aug 16, 2018

Codecov Report

Merging #307 into master will increase coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #307      +/-   ##
==========================================
+ Coverage   55.81%   55.94%   +0.12%     
==========================================
  Files         108      108              
  Lines        5531     5554      +23     
==========================================
+ Hits         3087     3107      +20     
- Misses       2444     2447       +3
Impacted Files Coverage Δ
mlflow/projects/__init__.py 93.92% <0%> (+0.1%) ⬆️
mlflow/tracking/__init__.py 82.96% <0%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55828ab...4a421a5. Read the comment docs.

Copy link
Collaborator Author

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarondav Addressed your comments and added a mvn clean test line to our .travis.yml

* Loads an MLFlow model as a generic predictor that can be used for
* inference
*
* @throws {@link com.databricks.mlflow.sagemaker.PredictorLoadingException} if
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've shortened links throughout the PR that correspond to imported modules.

* inference
*
* @throws {@link com.databricks.mlflow.sagemaker.PredictorLoadingException} if
* failures are encountered while attempting to load the model from the specified configuration.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the inline replacement. Done!

*/
public Predictor load(String modelRootPath) throws PredictorLoadingException {
try {
return load(Model.fromRootPath(modelRootPath));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Done!

@@ -0,0 +1,50 @@
package com.databricks.mlflow.mleap;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

*
* @param modelRootPath The path to the root directory of the MLFlow model
*/
public static Model fromRootPath(String modelRootPath) throws IOException {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test called ModelTests#testModelIsLoadedFromYamlUsingRootPathCorrectly

/**
* Unit tests for the {@link com.databricks.mlflow.utils.SerializationUtils} module
*/
public class SerializationUtilsTests {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all tests to end with Test instead of Tests.

* @param json A string in valid JSON format
*/
public static <T> T fromJson(String json) throws IOException {
return jsonMapper.readValue(json, new TypeReference<T>() {});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used later by MLeap. See https://github.com/mlflow/mlflow/pull/278/files#diff-71d9fc3d16e2e46961159e9efccdf573R52.

Good catch, this didn't work for complex types. Added a Class argument.

* @param class The class of the Java object that should be produced
*/
public static <T> T parseJsonFromFile(String filePath, Class<T> objectClass)
throws IOException {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Represents an MLeap flavor configuration
*/
public class MLeapFlavor implements Flavor {
public static final String FLAVOR_NAME = "mleap";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import com.databricks.mlflow.sagemaker.PredictorLoadingException;
import com.databricks.mlflow.utils.FileUtils;

import java.util.Optional;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,13 @@
package org.mlflow;
Copy link
Contributor

@aarondav aarondav Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the .swn files? What are those? Should we add them to a gitignore?

@aarondav
Copy link
Contributor

LGTM, just one comment

@aarondav aarondav added the LGTM label Aug 17, 2018
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.mlflow</groupId>
<artifactId>mlflow-java</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a nit but shouldn't we just call it mlflow? I guess we can rename it later too.

In the long run we'll probably want this to be a separate artifact from the Java API client anyway, so we'll need to make a more complicated pom.xml.

Copy link
Collaborator Author

@dbczumar dbczumar Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateiz Maven's package naming conventions, which are derived from Java conventions, indicate that a package name should correspond to a valid domain that the package owner controls. Since we own mlflow.org, org.mlflow seems to be the correct package name.

Here's the Java reference I found regarding package naming: https://docs.oracle.com/javase/specs/jls/se10/jls10.pdf (Page 137)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't asking about the org.mlflow line, but about the mlflow-java artifact ID. Why not just mlflow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! I'll correct this in the followup PR that adds SageMaker support to the Java library!

@dbczumar dbczumar merged commit cfc9402 into mlflow:master Aug 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants