Skip to content

Expand "samtools --version" with more details.#1371

Merged
daviesrob merged 1 commit intosamtools:developfrom
jkbonfield:introspection
Feb 12, 2021
Merged

Expand "samtools --version" with more details.#1371
daviesrob merged 1 commit intosamtools:developfrom
jkbonfield:introspection

Conversation

@jkbonfield
Copy link
Copy Markdown
Contributor

This uses the new htslib introspection functions to report build information for htslib and which plugins it supports.
Also added similar logic to Samtools build, adding more variables to version.h (similar to Htslib's new config_vars.h).

Finally, the main samtools usage admits the existence of --version, --version-only and --help options, as the usage statement implies no [options] can be given without specifying a command.

@jmarshall
Copy link
Copy Markdown
Member

This is so voluminous that the version number has disappeared off the top of even a very tall terminal window!

Could consider leaving samtools --version as it is and putting all this in a new samtools version subcommand.

And/or making it less voluminous, e.g., I don't think there's any information in the HTS_FEATURE_FOO breakdown that's not in the feature string, hence

Samtools features: build=configure curses=yes 
HTSLib features:   build=configure plugins=yes, plugin-path=/Users/johnm/htslib:/Users/johnm/htslib-plugins: libcurl=yes S3=yes GCS=yes libdeflate=no lzma=yes bzip2=yes (0x601c03)

and similarly with squeezing each plugin onto one line each, perhaps.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

I did think about a samtools version command. Infact that's what I was about to write when I discovered the seim-hidden samtools --version option. Even then, if it weren't for the inclusion of a samtools --version-only option I'd have still gone down the subcommand route; now at least there is a significant difference between the two --version options, unlike before. I don't mind either way though. See what others think on this.

Yeah the plugin lines could probably be squashed. I'll have a play.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

How about this?

@ deskpro107386[samtools.../samtools]; ./samtools --version
Using htslib 1.11-86-gd9890a9-dirty
Copyright (C) 2021 Genome Research Ltd.

Samtools compilation details:
    CC:             gcc
    CPPFLAGS:       
    CFLAGS:         -g -Wall -O2
    LDFLAGS:        
    HTSDIR:         ../htslib
    LIBS:           
    CURSES_LIB:     -lcurses

Samtools feature string: build=Makefile curses=yes 

HTSlib compilation details:
    CC:             gcc
    CPPFLAGS:       -I/nfs/users/nfs_j/jkb/ftp/compression/libdeflate
    CFLAGS:         -Wall -g -O2 -fvisibility=hidden
    LDFLAGS:        -L/nfs/users/nfs_j/jkb/ftp/compression/libdeflate -Wl,-R/nfs/users/nfs_j/jkb/ftp/compression/libdeflate -fvisibility=hidden 

HTSLib feature string: build=configure plugins=no libcurl=yes S3=no GCS=yes libdeflate=yes lzma=yes bzip2=yes 

HTSlib plugins present:
    built-in:	 preload, data, file
    Google Cloud Storage:	 gs+http, gs+https, gs
    libcurl:	 imaps, pop3, http, smb, gopher, ftps, imap, smtp, smtps, rtsp, ftp, telnet, rtmp, ldap, https, ldaps, tftp, pop3s, smbs, dict
    crypt4gh-needed:	 crypt4gh
    mem:	 mem

Note also that samtools --version is kind of pointless currently as it reports nothing that you don't get by just typing samtools.

I do think a subcommand may be more obvious though (and I can remove the extra usage line reporting the various --options then).

@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented Feb 5, 2021

… I discovered the semi-hidden samtools --version option …

--help/--version[-only] are described in the man page. They were previously left off the usage on the basis that they (the first two anyway) should be obvious, but yeah probably nothing is obvious these days 🤷

Note also that samtools --version is kind of pointless currently [as plain samtools shows the same info]

The --version option is a well-known convention that outputs the version information in a well-known format. Some packagers (hello @tseemann 😄) will raise issues / send pull requests when tools do not have a ‑‑version option, and you can see on biostars etc that users are familiar with it.

That's pretty much compact enough now that IMHO it wouldn't interfere with the well-known output if that was folded into the --version option. Could save another couple of lines by putting the features under compilation details e.g.:

Samtools compilation details:
    Features:       build=Makefile curses=yes 
    CC:             gcc
    [etc]

and for users I would suggest replacing HTSlib plugins present by text like URL schemes understood by HTSlib (especially as saying “plugins” may be inaccurate, e.g. with --disable-plugins and the htslib features just having printed plugins=no).

@jkbonfield
Copy link
Copy Markdown
Contributor Author

I think I prefer samtools version as a subcommand, but I could make the --version call the same function if you like.

Regarding plugins, I'm calling it that as it's what htslib calls it itself. I don't know what the intention was, but I always assumed given we called it plugins that it was originally intended to be a general purpose plugin system and not limited purely to transport layers. Eg maybe a company with its own custom file type could ship this as an htslib plugin. AFAIK there's nothing to stop that. It just needs a prefix: eg "zam:foo.zam" and register zam. It's not quite transparent syntax, but close. We could trivially add some logic to the library so that filename ".suffix" if unknown but with a registered "suffix" plugin, automatically uses it.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

jkbonfield commented Feb 5, 2021

Ok I went with making --version the same, and squashed it together.

Edit: I left plugins as-is. If we wish to start calling them URL schemes, we need to change a lot more than this as it's documented in the various install files and also in samtools.1 description of HTS_PATH. We should be consistent.

We can follow this up with a separate PR to change the language in all places if we wish.

@jkbonfield jkbonfield force-pushed the introspection branch 3 times, most recently from 5e7c17e to 31d61a6 Compare February 5, 2021 12:36
@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented Feb 5, 2021

I thought you meant --version and version (full version subcommand) sharing a function like sam_view.c's usage() does for help and -? full help. I'm happy either way; curious if others have an opinion.

Also aargh tabs in test.pl — I feel your pain 😄


The intention was that HTSlib's plugin system would be a general-purpose system that could have several attachment points within htslib. The first attachment point implemented (and so far the only one) is the file transport one, which loads plugins matching hfile_*.{so,bundle} from HTS_PATH using the hFILE_plugin struct and enables dispatching based on filename URL schemes. (This is what the prefix parameter to hts_path_itr_setup() is for.) I anticipated the next one implemented might load compress_*.{so,bundle} from HTS_PATH using its own struct and be used to encapsulate e.g. libdeflate and maybe bz2 and lzma.

As for custom file formats, that's been shoe-horned into being done via a pseudo URL scheme for them in a couple of cases: see crypt4gh: and cip:. But these are easy as they are really just encrypted versions of existing htslib-supported formats. It would be tedious to shoehorn a real custom file format in via the existing hFILE plugin endpoint, because you'd have to implement a read() backend method that presented your file contents translated into a BAM or BCF (or etc) stream. The sensible way to do it would be to define a new file format attachment point that enabled a different category of plugin to affect hts_detect_format() and the high-level I/O routines like sam_read1() and suffix handling like sam_open_mode() etc.

Regarding plugins, I'm calling it that as it's what htslib calls it itself.

So, to be precise, htslib calls these “hFILE plugins”. So HTSlib hFILE plugins present would be more accurate but still a bit opaque to users.

Eg maybe a company with its own custom file type could ship this as an htslib plugin. AFAIK there's nothing to stop that. It just needs a prefix: eg "zam:foo.zam" and register zam.

Yes — a pseudo URL scheme called zam:. Hence the user-meaningful URL schemes understood by HTSlib suggestion, but YMMV.

Expand "samtools --version" with more details and added a synonym of
"samtools version" too so it's simpler to list in the usage.

Also added the missing help subcommand in the usage list.  Note this
isn't exactly the same as samtools without args, as that is an error,
while this reports to stdout.
@daviesrob
Copy link
Copy Markdown
Member

Rebased to get rid of the merge conflict.

@daviesrob daviesrob merged commit 32de3e4 into samtools:develop Feb 12, 2021
@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented Feb 12, 2021

I just saw this previous comment from James:

Edit: I left plugins as-is. If we wish to start calling them URL schemes, we need to change a lot more than this as it's documented in the various install files and also in samtools.1 description of HTS_PATH. We should be consistent.

That's a red herring IMHO. The install files and especially the HTS_PATH documentation talk about plugins because they are about plugins as physically separate object files on the filesystem. In particular, it doesn't apply to the built-in scheme handlers. (Conversely, if there were any other kinds of plugins, it would be about them too.)

But this part of the samtools capabilities report is about samtools/htslib's file access capabilities. That's a different angle, and from the users' point of view is really about what URL schemes are understood. In particular, this includes the built-in ones, which aren't plugins.

@daviesrob
Copy link
Copy Markdown
Member

Hmm, possibly the text could be tweaked to say "handler" instead of "plugin"?

@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented Feb 12, 2021

[Sorry, I had already composed the previous comment when you hit merge, so I posted it anyway.]

So, to be precise, htslib calls these “hFILE plugins”. So HTSlib hFILE plugins present would be more accurate but still a bit opaque to users.

It's all quite minor and angels-on-a-pinhead-esque 😄, but the three technical problems with HTSlib plugins present are that it would become wrong if HTSlib gained some non-hFILE plugins, that the built-in handlers are not plugins, and that your HTSlib might have plugins disabled entirely. So HTSlib hFILE handlers present would indeed solve all those problems.

But IMHO the real (albeit technologically minor) problem is that this text is here to be read by samtools users. And plugins, handlers, hFILE, etc are all terms that are meaningless to samtools users. Hence the suggestion to use URL schemes as a shorthand for “Kinds of remote files that can be accessed” — which is what hFILE plugins/handlers is about from the user's perspective. URL scheme is still a technical term, but at least it's an accurate general term that people can look up.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

jkbonfield commented Feb 12, 2021

I'm not saying that plugins is the correct term, but I stand by my comment earlier.

If we feel that plugins is a confusing term to the users, then it's more than just this which needs fixing. Eg from the man page:

HTS_PATH
              A colon-separated list of directories in  which  to  search  for
              HTSlib  plugins.   If  $HTS_PATH  starts or ends with a colon or
              contains a double colon (::), the built-in list  of  directories
              is searched at that point in the search.

I know many of the built in URL handlers aren't plugins by default, but many can also become plugins (and indeed forcibly will if plugins are enabled, which IMO is a poor choice as I'd like the ability to support plugins without punting everything out to them).

If you wish to rename the version information, then I'd suggest a PR with man page updates too to clarify things (rather than to use "terms that are meaningless to samtools users").

@jmarshall
Copy link
Copy Markdown
Member

I started writing a sentence about that, but deleted it because I hoped we wouldn't need to discuss it.

There are different categories of users. “Plugins” is confusing in some contexts for some users some of the time, but not in other contexts at other times.

If you're reading about HTS_PATH, then you're using an --enable-plugins build and you are interested in plugins and probably have some that you're trying to get HTSlib to use. (And in the possible glorious future when there are other kinds of plugins that aren't about remote file access, they'll naturally be in HTS_PATH too. This Venn diagram is not one superimposed circle.)

If you're reading this samtools --version output, you might be unaware of all this. It might be taken care of for you by your sysadmin or your distro packagers. Or you might be using a default or --disable-plugins HTSlib build. For the purposes of this particular samtools --version output, my contention is that plugins is a red herring and what most users reading it are really interested in is what remote access methods are available.

But, in the end, whatevs 😄

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Ok, how about a mix then like "URL schemes (plugins)"?

I just think it's confusing to use two different terms for the same thing and may lead people to assume that plugins are not the same as URL schemes when they clearly are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants