Skip to content

Research object support#1

Closed
johanneskoester wants to merge 37 commits intomainfrom
research-objects
Closed

Research object support#1
johanneskoester wants to merge 37 commits intomainfrom
research-objects

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

Plan

@albangaignard
Copy link
Copy Markdown

Hi, I would be happy to contribute and provide source code on that topic (in particular provenance / PROV-O).

@johanneskoester
Copy link
Copy Markdown
Contributor Author

@albangaignard your help is greatly appreciated. I have added a skeleton. Basically, one just needs to feed the information that is retrieved in the skeleton with the python-ro API.

@mdehollander
Copy link
Copy Markdown
Contributor

This would be a nice addition to snakemake. How does this compare to cwlprov and dataprov?

@albangaignard
Copy link
Copy Markdown

This would be a nice addition to snakemake. How does this compare to cwlprov and dataprov?

Thanks very much for your feedback. This would be completely in line with the PROV profile of CWLprov (https://github.com/common-workflow-language/cwlprov/blob/main/prov.md). Regarding dataprov the approach is interesting but apparently it does leverage a standard for representing provenance metadata as proposed by the W3C (https://www.w3.org/TR/prov-primer/).

@github-actions
Copy link
Copy Markdown
Contributor

Please format your code with black: black snakemake tests/*.py.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Nov 4, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 8 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@albangaignard albangaignard marked this pull request as ready for review November 4, 2020 14:20
@albangaignard
Copy link
Copy Markdown

@johanneskoester, I recently reviewed the code quality based on the automatic checks (SonarCloud), and code formatting best practicies (Black tool). Would you have time to review this pull request ?

In summary :

  • --provenance option
  • provenance capture in the AbstractExecutor,
  • two provenance serializations (RDF and JSON) in the working directory

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Aug 9, 2021

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 6 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

def workdir_entry(i, f):
location = "??inputs.input_files[{}].location??".format(i)
if f.is_directory:
if os.path.isdir(f):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I don't understand. We use the is_directory property of IOFile here, because the files may not yet be present. Then, isdir would not work.

from snakemake.logging import logger
from snakemake.stats import Stats
from snakemake.utils import format, Unformattable, makedirs
from snakemake.provenance_tracking.provenance import provenance_manager
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the provenance manager only has to be part of AbstractExecutor, I think we could avoid the singleton and just keep it in there instead.

# print(job.params['biotools_id'])
tool_name = ""
if "biotools_id" in job.params.keys():
tool_name = job.params["biotools_id"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, is it possible to extract the biotools_ids from the conda packages instead?

input_id_list=job.input,
tool_name=tool_name,
job_uri=job.uri,
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, Snakemake has its own metadata tracking. I wonder if it would make sense to export the research object from there instead of doing this while running in here. The advantage is that you will get information also for stuff that happened in a previous run. The flag would then rather be a post-hoc command, just like --report, instead of requiring the user to remember to add it while running. Also, multiple partial runs of the same workflow do not result in several separate provenance information files.

@johanneskoester
Copy link
Copy Markdown
Contributor Author

I am so sorry for the late response. This completely slipped my attention (I get so many Github notifications that I sometimes miss one). Nice work, please see my comments above.

DonFreed added a commit to DonFreed/snakemake that referenced this pull request Oct 12, 2021
- Incorporates @epruesse's fix for MRE snakemake#1
- Adds a fix for MRE snakemake#2 - properly marks group jobs as finished
- Some minor updates to tests
johanneskoester added a commit that referenced this pull request Oct 25, 2021
* add failing tests 823

* fix mistakes

* black

* Fix the first two MREs from #823.

- Incorporates @epruesse's fix for MRE #1
- Adds a fix for MRE #2 - properly marks group jobs as finished
- Some minor updates to tests

* Fix tests on Windows

* Skip MRE 2 from 823 on Windows due to `pipe()` output

Co-authored-by: Maarten-vd-Sande <maartenvandersande@hotmail.com>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
pvandyken referenced this pull request in pvandyken/snakemake Nov 15, 2021
* add failing tests 823

* fix mistakes

* black

* Fix the first two MREs from snakemake#823.

- Incorporates @epruesse's fix for MRE #1
- Adds a fix for MRE #2 - properly marks group jobs as finished
- Some minor updates to tests

* Fix tests on Windows

* Skip MRE 2 from 823 on Windows due to `pipe()` output

Co-authored-by: Maarten-vd-Sande <maartenvandersande@hotmail.com>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 6 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

cademirch added a commit that referenced this pull request Jan 5, 2026
Pixi install github action is failing with "failed to parse pypi name
mapping" errors likely due to rate limiting when 30+ jobs are kicked off
nearly simultaneously

I tested this fix on #3820 since the tests kept failing due to the
`pixi` install action failing. After committing this change, [the
actions ran
successfully](https://github.com/snakemake/snakemake/actions/runs/20684893321).

In [this failing
run's](https://github.com/snakemake/snakemake/actions/runs/20682879051/job/59383597583#step:3:3261)
debug logs we see:
```
pixi install -e py311
[...]
   WARN resolve_conda{group=py313 platform=win-64}: reqwest_retry::middleware: Retry attempt #1. Sleeping 1.225245051s before the next attempt
  Error:   × failed to parse pypi name mapping
    ├─▶ error decoding response body
    ╰─▶ expected value at line 1 column 1
```
This warning is repeated many times until finally pixi stops retrying -
this is what suggested to me that some sort of rate limit was the issue.


One downside is that this does make the CI take a bit longer to run. We
could consider using the `cache` feature of the pixi action. And turning
up the max-parallel, or reducing the number of test-groups


### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated development toolchain dependencies for improved build and test
infrastructure.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@cademirch
Copy link
Copy Markdown
Contributor

going to close this since its quite old and main has diverged so far.

@cademirch cademirch closed this Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants