Skip to content

planemo lint --urls false positive from pmid:12345678 #573

@peterjc

Description

@peterjc

Linting this file https://github.com/peterjc/pico_galaxy/blob/801daf8dc7932a087eb83a96d0be1e99ed0447c3/tools/chromosome_diagram/chromosome_diagram.xml is failing,

$ planemo --version
planemo, version 0.33.0.dev0
$ planemo shed_lint --tools --urls tools/chromosome_diagram/ ; echo "Return code $?"
(snip)
Applying linter tool_urls... FAIL
.. ERROR: URL Error <urlopen error unknown url type: pmid> accessing pmid:19304878
.. INFO: URL OK http://dx.doi.org/10.1093/bioinformatics/btp163
Failed linting
Return code 1

Or,

$ planemo lint --urls tools/chromosome_diagram/ ; echo "Return code $?"
...
Applying linter tool_urls... FAIL
.. ERROR: URL Error <urlopen error unknown url type: pmid> accessing pmid:19304878
.. INFO: URL OK http://dx.doi.org/10.1093/bioinformatics/btp163
Failed linting
Return code 1

This is triggered by the RST help text in the tool XML file:

Cock et al 2009. Biopython: freely available Python tools for computational
molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.

https://github.com/peterjc/pico_galaxy/blob/801daf8dc7932a087eb83a96d0be1e99ed0447c3/tools/chromosome_diagram/chromosome_diagram.xml#L75

It appears pmid:19304878 is wrongly being picked up as a URL despite not having a double slash after the colon?

# http://stackoverflow.com/questions/7676255/find-and-replace-urls-in-a-block-of-te

>>> import re
>>> HTTP_REGEX_PATTERN = re.compile(r"""(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>\[\]]+|\(([^\s()<>\[\]]+|(\([^\
... \s()<>\[\]]+\)))*\))+(?:\(([^\s()<>\[\]]+|(\([^\s()<>\[\]]+\)))*\)|[^\s`!(){};:'".,<>?\[\]]))""")
>>> HTTP_REGEX_PATTERN.findall("\nSee pmid:12345678 for details.")
[('pmid:12345678', '', '', '', '')]
>>> HTTP_REGEX_PATTERN.findall("\nSee http://example.org or pmid:12345678.")
[('http://example.org', '', '', '', ''), ('pmid:12345678', '', '', '', '')]

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions