Conversation
|
Hi, I would be happy to contribute and provide source code on that topic (in particular provenance / PROV-O). |
|
@albangaignard your help is greatly appreciated. I have added a skeleton. Basically, one just needs to feed the information that is retrieved in the skeleton with the python-ro API. |
…ted to the SnakeMake architecture
Thanks very much for your feedback. This would be completely in line with the PROV profile of CWLprov (https://github.com/common-workflow-language/cwlprov/blob/main/prov.md). Regarding dataprov the approach is interesting but apparently it does leverage a standard for representing provenance metadata as proposed by the W3C (https://www.w3.org/TR/prov-primer/). |
|
Please format your code with black: |
|
Kudos, SonarCloud Quality Gate passed!
|
|
@johanneskoester, I recently reviewed the code quality based on the automatic checks (SonarCloud), and code formatting best practicies (Black tool). Would you have time to review this pull request ? In summary :
|
|
SonarCloud Quality Gate failed.
|
| def workdir_entry(i, f): | ||
| location = "??inputs.input_files[{}].location??".format(i) | ||
| if f.is_directory: | ||
| if os.path.isdir(f): |
There was a problem hiding this comment.
This I don't understand. We use the is_directory property of IOFile here, because the files may not yet be present. Then, isdir would not work.
| from snakemake.logging import logger | ||
| from snakemake.stats import Stats | ||
| from snakemake.utils import format, Unformattable, makedirs | ||
| from snakemake.provenance_tracking.provenance import provenance_manager |
There was a problem hiding this comment.
Since the provenance manager only has to be part of AbstractExecutor, I think we could avoid the singleton and just keep it in there instead.
| # print(job.params['biotools_id']) | ||
| tool_name = "" | ||
| if "biotools_id" in job.params.keys(): | ||
| tool_name = job.params["biotools_id"] |
There was a problem hiding this comment.
Mhm, is it possible to extract the biotools_ids from the conda packages instead?
| input_id_list=job.input, | ||
| tool_name=tool_name, | ||
| job_uri=job.uri, | ||
| ) |
There was a problem hiding this comment.
As you know, Snakemake has its own metadata tracking. I wonder if it would make sense to export the research object from there instead of doing this while running in here. The advantage is that you will get information also for stuff that happened in a previous run. The flag would then rather be a post-hoc command, just like --report, instead of requiring the user to remember to add it while running. Also, multiple partial runs of the same workflow do not result in several separate provenance information files.
|
I am so sorry for the late response. This completely slipped my attention (I get so many Github notifications that I sometimes miss one). Nice work, please see my comments above. |
- Incorporates @epruesse's fix for MRE snakemake#1 - Adds a fix for MRE snakemake#2 - properly marks group jobs as finished - Some minor updates to tests
* add failing tests 823 * fix mistakes * black * Fix the first two MREs from #823. - Incorporates @epruesse's fix for MRE #1 - Adds a fix for MRE #2 - properly marks group jobs as finished - Some minor updates to tests * Fix tests on Windows * Skip MRE 2 from 823 on Windows due to `pipe()` output Co-authored-by: Maarten-vd-Sande <maartenvandersande@hotmail.com> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
* add failing tests 823 * fix mistakes * black * Fix the first two MREs from snakemake#823. - Incorporates @epruesse's fix for MRE #1 - Adds a fix for MRE #2 - properly marks group jobs as finished - Some minor updates to tests * Fix tests on Windows * Skip MRE 2 from 823 on Windows due to `pipe()` output Co-authored-by: Maarten-vd-Sande <maartenvandersande@hotmail.com> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
|
SonarCloud Quality Gate failed.
|
Pixi install github action is failing with "failed to parse pypi name mapping" errors likely due to rate limiting when 30+ jobs are kicked off nearly simultaneously I tested this fix on #3820 since the tests kept failing due to the `pixi` install action failing. After committing this change, [the actions ran successfully](https://github.com/snakemake/snakemake/actions/runs/20684893321). In [this failing run's](https://github.com/snakemake/snakemake/actions/runs/20682879051/job/59383597583#step:3:3261) debug logs we see: ``` pixi install -e py311 [...] WARN resolve_conda{group=py313 platform=win-64}: reqwest_retry::middleware: Retry attempt #1. Sleeping 1.225245051s before the next attempt Error: × failed to parse pypi name mapping ├─▶ error decoding response body ╰─▶ expected value at line 1 column 1 ``` This warning is repeated many times until finally pixi stops retrying - this is what suggested to me that some sort of rate limit was the issue. One downside is that this does make the CI take a bit longer to run. We could consider using the `cache` feature of the pixi action. And turning up the max-parallel, or reducing the number of test-groups ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated development toolchain dependencies for improved build and test infrastructure. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
going to close this since its quite old and main has diverged so far. |











Plan