samtools view update to v1.9#1963
Conversation
| #end if | ||
|
|
||
| ## if data is converted from an unsorted file (SAM, CRAM, or unsorted BAM) to BAM | ||
| ## then sort the output by coordinate, |
There was a problem hiding this comment.
not so sure about the auto sorting... maybe make it more explicit by asking the user to choose from SAM, CRAM (maybe sort versions of cram - I don't use them?), unsorted.bam, qname_sorted.bam, indexed bam (coord sorted). With the default being indexed bam?
There was a problem hiding this comment.
So you mean that we need to remove: or $input.metadata.sort_order == 'queryname'
Otherwise I hope that its correct. The main point is that we need to coordinate sort if the output is bam (by assumptions of Galaxys bam data type). I just check if the input is sorted .. because then we can skip this. Just not sure if the metadata of the bam data type is set correctly.
There was a problem hiding this comment.
I think user should specify what they want for an output format (or choose the default -which i think shoudl be bam). Since the output format implies the sort order, think that alone shoud determine whether or not to sort (e.g. sort for at least bam, maybe cram too? )
|
Commendable work on these important tools! I'll respond to your discussion points in order:
|
Agreed. What would be the IUC way to remove slice? Delete it? Or mark it deprecated..? How?
Jep. Can someone answer if this works? Otherwise I would suggest to increase the version slowly, such that samtools version will catch up eventually.
OK.
OK. Do we need to add them explicitely? Or are they implicitely allowed anyway, because of the data types class structure?
OK
OK
I will check
|
| @@ -0,0 +1,438 @@ | |||
| <tool id="samtools_view" name="Samtools view" version="2.0.1"> | |||
This doesn't apply here, since this is a new tool. Any way >>> from packaging.version import parse
>>> parse('2.0') > parse('1.7+g')
True |
|
Somehow I have a problem with planemo on my local install (only this branch): All tests fail immediately, I just get: I also tried to reset my venv and planemo configuration directories. I'm completely stuck on this one. Any ideas? |
- v1.8 - `-h` per default - macros use CDATA - checking of CRAM index
|
got planemo running again (problem was that I linked to the test data of the converters). my todos are done. the remaining points are open questions. |
|
There is a merge conflict here @bernt-matthias. |
- added PREPIDX_MULTIPLE (needed by depth) - removed no-chrom-options (only needed by slice which is to be deprecated) - removed same option for outtype - tests w bam output now work wo sim_size
|
I think I would not include the different BAM datatypes here. My intuition is we should keep the tools simple in this regard and offer one sort tool that can create the other datatypes if needed. Also please note, that Galaxy ships by default with a few converters, so BAM to CRAM and so on should be already in Galaxy. |
|
@bgruening : I value simplicity too, but in this case I wonder whether the cost of duplicating the potentially very large data files for a format conversion is worth a bit more code complexity. I don't think having a list of 4 output types (sam, bam, cram, count) is much more complex for users than "bam (coordinate sorted), cram (coordinate sorted), sam, unsorted.bam, qname_input_sorted.bam, qname_sorted.bam, unsorted.bam,count" Or maybe you were referring to the input formats? I think there is even less user complexity if that's your concern. |
|
@bwlang has a point with the data duplication, but users can decide to delete intermediate data (in workflows you can do this automatically). |
- only bam output triggers sort (if not sorted already)
|
agreed - also, these other bam formats are proably not typically used much... |
|
@bernt-matthias do you mind if I rebase this against the new macros.xml file and address my last comment ? |
|
I merged the macros.xml file using the github frontend and commited an update using nicer names for the new macros. |
|
thanks, looks good. what do you think about #1963 (comment) ? I'm a little worried about users generating CRAM files and then being disappointed that they don't work when downloading them from Galaxy. |
You have a point with this. But this might be treated by $reffa!=None which adds |
|
It seems that only the md5 is stored in CRAM. https://www.htslib.org/workflow/#mapping_to_cram : |
|
@mvdbeek took some time to provide a somewhat useful help. |
|
We still need a .shed.yml for samtools view. |
|
@bernt-matthias could you add the .shed.yml file ? |
|
Done |
This updates samtools view to v1.9
Discussion:
change_format deprecated?