Skip to content

Fix regression: disable saving history step in artifacts#3495

Merged
raubitsj merged 7 commits intorelease-0.12.14from
history-step-fix
Apr 8, 2022
Merged

Fix regression: disable saving history step in artifacts#3495
raubitsj merged 7 commits intorelease-0.12.14from
history-step-fix

Conversation

@vwrj
Copy link
Copy Markdown
Contributor

@vwrj vwrj commented Apr 7, 2022

Fixes WB-NNNN
Fixes #NNNN

Description

What does the PR do?

For context, we added the latest history step to the mutation that creates an artifact. This change was put in less than a month ago in the backend. The corresponding SDK PR was merged at the same time.

A Local user reported the following error when they switched to SDK versions 0.12.12 or 0.12.13. They stated that the error did not happen with 0.12.11:

ValueError: error logging artifact "{user-specific data}": {'message': 'Argument "input" has invalid value {artifactTypeName: $artifactTypeName, artifactCollectionNames: $artifactCollectionNames, entityName: $entityName, projectName: $projectName, ru
nName: $runName, description: $description, digest: $digest, digestAlgorithm: MANIFEST_MD5, labels: $labels, aliases: $aliases, metadata: $metadata, historyStep: $historyStep}.\nIn field "historyStep": Unknown field.', 'locations': [{'line': 2, 'column': 25}]}

This PR adds max_cli_version guardrails (0.12.12) for when history step was introduced in the SDK. Basically, it grabs the max_cli_version that the backend (either cloud or local) knows about and makes sure that we only include the history step info into the gql query if this max_cli_version >= 0.12.12.

We tested the following script against a version of local that is 2 months old:

import wandb

def run_experiment():
    with wandb.init() as run:
        table = wandb.Table(columns=["boom_col"])
        table.add_data(5)
        wandb.log({"data": 5})
        wandb.log({"data": 10})
        wandb.log({"table": table})

if __name__ == "__main__":
    run_experiment()

Initially, we hard-coded can_handle_history to be True, in order to trigger the GraphQL query with the added history step and error out (since a 2-month old version of local wouldn't have history in the schema). The gql query fails (as it should) and the artifact is not logged. However, the run finishes successfully (and fails silently without alerting the user, which is a separate issue).

Screen Shot 2022-04-07 at 3 00 55 PM

With the guardrail, can_handle_history becomes false and the artifact is logged successfully.

Screen Shot 2022-04-07 at 3 00 04 PM

Testing

This is a hotfix, in response to an issue that Local users would be experiencing.
It was tested manually against an old local release, as stated above.
In the future, we need to figure out a way to test this with unit tests, and also with regression tests against the last 5 (or other number) versions of local.

This PR adds a standalone test that is meant to be tested against local versions (and verify that the guardrail does indeed work and successfully logs an artifact).

Checklist

  • Name PR "[WB-NNNN][WB-MMMM] Add support for..." similar to entries in CHANGELOG.md
  • Include reference to internal ticket "Fixes WB-NNNN" (and github issue "Fixes #NNNN" if applicable)

@vwrj vwrj requested review from raubitsj and tssweeney April 7, 2022 22:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2022

Codecov Report

Merging #3495 (1306e34) into release-0.12.14 (5dc1a71) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 1306e34 differs from pull request most recent head d176683. Consider uploading reports for the commit d176683 to get more accurate results

Impacted file tree graph

@@                 Coverage Diff                 @@
##           release-0.12.14    #3495      +/-   ##
===================================================
+ Coverage            81.71%   81.72%   +0.01%     
===================================================
  Files                  236      236              
  Lines                29114    29115       +1     
===================================================
+ Hits                 23791    23795       +4     
+ Misses                5323     5320       -3     
Flag Coverage Δ
functest 58.18% <100.00%> (-0.01%) ⬇️
unittest 71.86% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wandb/sdk/internal/internal_api.py 83.98% <100.00%> (+0.02%) ⬆️
wandb/sdk/launch/docker.py 91.03% <0.00%> (+0.47%) ⬆️
wandb/sdk/lib/sock_client.py 94.16% <0.00%> (+1.66%) ⬆️

@vwrj vwrj requested a review from vanpelt April 7, 2022 22:12
Copy link
Copy Markdown
Contributor

@raubitsj raubitsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine.. but before this can be merged and released, the following things need to happen:

  • This PR needs to be merged: https://github.com/wandb/core/pull/9212
  • gorilla deployed to prod
  • this change tested against prod (verified to pass history step - with debug messages?)
  • this change tested against local (verified not to pass history step - with debug messages?)

Alternatively:

  • a fix which removes history step does not need any changes to core or any gorilla deploys.

@raubitsj raubitsj merged commit 45062a4 into release-0.12.14 Apr 8, 2022
@raubitsj raubitsj deleted the history-step-fix branch April 8, 2022 16:30
@raubitsj raubitsj changed the title history max_cli_version guardrail Fix regression: disable saving history step in artifacts Apr 8, 2022
@kptkin kptkin added this to the sdk-2022-04.3 milestone Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants