Create Github Action workflow for reporting DVC image diffs#1104
Create Github Action workflow for reporting DVC image diffs#1104
Conversation
To make reviewing new baseline test images (*.png) easier. this workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment.
7bf8be6 to
b4c1da5
Compare
| cat report.md | ||
|
|
||
| - name: Pull image data from cloud storage | ||
| run: dvc pull --remote upstream |
There was a problem hiding this comment.
Currently this line can only pull from the upstream dvc remote at https://dagshub.com/GenericMappingTools/pygmt. Will need to think about how to make it work for forks as well.
There was a problem hiding this comment.
Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?
There was a problem hiding this comment.
I think we can deal with fork on the second iteration of this dvc-diff-action workflow. Technically we should be able to do dvc pull from DAGsHub forks, but that'll be quite a bit of work to code up as we need to point to another DVC remote (and also not sure if this is the best way to do things).
There was a problem hiding this comment.
I believe very few users will work with DAGsHub forks, so not a big issue for us.
There was a problem hiding this comment.
Ok, will kick the can down the road then.
| on: | ||
| pull_request: | ||
| paths: | ||
| - 'pygmt/tests/baseline/*.png.dvc' |
There was a problem hiding this comment.
Is there a way to get this workflow to run only on commits where *.png.dvc files have changed? The dvc-diff workflow still appears to run on commits where no *.png.dvc files have changed, e.g. at e14674a.
| cat report.md | ||
|
|
||
| - name: Pull image data from cloud storage | ||
| run: dvc pull --remote upstream |
There was a problem hiding this comment.
Do we need to care about forks, if people follow the contributing guides (i.e., asking for permission on DAGsHub)? If people fork the DAGsHub repository and upload DVC images to their fork, our Tests workflow will fail, because it can't download the DVC images, right?
.github/workflows/dvc-diff.yml
Outdated
| # Get just the filename of the changed image from the report | ||
| tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt | ||
|
|
||
| # Append each image to the markdown report | ||
| echo -e "## Image diff(s)\n" >> report.md | ||
| while IFS= read -r line; do | ||
| cml-publish --md "$line" >> report.md | ||
| done < diff_files.txt | ||
|
|
||
| # Send diff report as GitHub comment | ||
| cml-send-comment report.md |
There was a problem hiding this comment.
Do these commands show both the old and new images, or only the new image?
There was a problem hiding this comment.
Currently only the new image. It should be possible to get the old image as well, but we don't really have many old images on dvc/DAGsHub yet.
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
|
I don't like the new workflow because:
|
Co-Authored-By: Dongdong Tian <seisman.info@gmail.com>
8b32f3a to
1db5328
Compare
This comment has been minimized.
This comment has been minimized.
5c5e66a to
45b2a96
Compare
| cat report.md | ||
|
|
||
| - name: Pull image data from cloud storage | ||
| run: dvc pull --remote upstream |
There was a problem hiding this comment.
I believe very few users will work with DAGsHub forks, so not a big issue for us.
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Co-Authored-By: Dongdong Tian <seisman.info@gmail.com>
f906e89 to
554ba6d
Compare
.github/workflows/dvc-diff.yml
Outdated
| # Get just the filename of the changed image from the report | ||
| tail --lines=+7 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt |
There was a problem hiding this comment.
The command doesn't work for deleted and modified files.
| Status | Path |
|----------|-------------------------------------------------------------|
| added | pygmt/tests/baseline/test_basemap.png |
| deleted | pygmt/tests/baseline/test_solar_terminators.png |
| modified | pygmt/tests/baseline/test_logo_on_a_map.png |
I ran the following command to the above "report.md" file:
tail --lines=+3 report.md | cut --delimiter=' ' --fields=7 > diff_files.txt
it gives me:
pygmt/tests/baseline/test_basemap.png
There was a problem hiding this comment.
It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.
There was a problem hiding this comment.
tail --lines=+3 report.md | awk '{print $4}'
This command gives me the expected output:
pygmt/tests/baseline/test_basemap.png
pygmt/tests/baseline/test_solar_terminators.png
pygmt/tests/baseline/test_logo_on_a_map.png
NOTE: you need to change +3 to +7.
There was a problem hiding this comment.
Thanks for the awk command suggestion, really need to pick up more bash scripting skills!
It would be better if you can delete a dvc file and update a dvc file in this PR, so that we can know the workflow works for added, deleted and modified images.
Probably won't work for deleted images since there's nothing to report. But I'm pretty sure added and modified images will work. I'd prefer to test this in a separate PR so that:
- We keep this PR small and focused. Modifying too many test images will result in a bigger diff to review.
- People can start using the dvc-diff action in their PRs. At this point in time, most of the changes will be adding new images to dvc so we don't need to worry about deleting/modifying images yet.
Co-Authored-By: Dongdong Tian <seisman.info@gmail.com>
|
|
||
| @check_figures_equal() | ||
| @pytest.mark.mpl_image_compare | ||
| def test_legend_entries(): |
There was a problem hiding this comment.
Rewrite the tests using a simpler 1d array, rather than the @Table_5_11.txt file?
There was a problem hiding this comment.
I agree, but best to not modify the test too much in this PR as mentioned in #1104 (comment)
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
|
Only the first image is shown (See #1096 (comment)). |
…appingTools#1104) To make reviewing new baseline test images (*.png) easier. the dvc-diff workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment. * Refactor test_legend_entries to use mpl_image_compare * Let actions/checkout fetch all history so that dvc diff works * Use peter-evans/create-or-update-comment to publish image diff report * Add bullet points with names for each of the images that have changed * Collapsible image diff section and use correct git SHA in the report * List dvc-diff.yml under MAINTENANCE.md * Use awk 'NR>=7' instead of tail and add some whitespace indentation Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Description of proposed changes
To make reviewing new baseline test images (*.png) easier, this workflow checks what images have been added or modified in a Pull Request. The changes are published in a table and as a series of images by a bot-generated GitHub comment.
References:
Addresses a painpoint of #963
Reminders
make formatandmake checkto make sure the code follows the style guide.doc/api/index.rst.Slash Commands
You can write slash commands (
/command) in the first line of a comment to performspecific operations. Supported slash commands are:
/format: automatically format and lint the code/test-gmt-dev: run full tests on the latest GMT development version