Skip to content

Removed knet.#1200

Merged
daviesrob merged 4 commits intosamtools:developfrom
jkbonfield:knet_nuke
Jan 7, 2021
Merged

Removed knet.#1200
daviesrob merged 4 commits intosamtools:developfrom
jkbonfield:knet_nuke

Conversation

@jkbonfield
Copy link
Copy Markdown
Contributor

Recent bug reports [personal communication] in knet is the straw that broke the camels back. I've been wanting to do this for years though!

There's little benefit to keeping it (basically FTP and unsecured http which has largely gone out of fashion), while curl is already required for htsget, refget (CRAM ref cache), AWS S3 and GCS protocols. Curl is available on all platforms we support.

INSTALL Outdated
Comment on lines +44 to +47
Using libcurl provides HTSlib withnetwork protocol support, for
example it enables the use of ftp and https:// URLs. It is also
required if direct access to Amazon S3 or Google Cloud Storage is
enabled.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo (missing space), and suggested spelling out all the important URL schemes:

Suggested change
Using libcurl provides HTSlib withnetwork protocol support, for
example it enables the use of ftp and https:// URLs. It is also
required if direct access to Amazon S3 or Google Cloud Storage is
enabled.
Using libcurl provides HTSlib with network protocol support, for example it
enables the use of ftp://, http://, and https:// URLs. It is also required
if direct access to Amazon S3 or Google Cloud Storage is enabled.

Copy link
Copy Markdown
Contributor Author

@jkbonfield jkbonfield Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd. I checked the original and the missing space is actually an entire missing " better " word. Must have been some odd finger slippage. [Edit: ignore me - clearly still befuddled!]

Anyway thanks for the updates. I'll amend.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing “better“ here was clearly the right thing to do 😄 Now I'm off to get my third coffee of the morning, to fend off my own befuddlement…

Makefile Outdated
Comment on lines +536 to +537
# Filter out macros in htslib/hts_defs.h that aren't needed on non-Windows
# platforms.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspect “that aren't needed on non-Windows platforms” pertained only to knet_win32_ and can disappear too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure of that and ummed and ahhed over it, but happy to remove. I did start to edit the associated perl script, but figured it's harmless for it to check for functions that don't exist and I'm more likely to break it in doing so. It's perhaps still something that should be changed though.

hfile_add_scheme_handler("data", &data);
hfile_add_scheme_handler("file", &file);
hfile_add_scheme_handler("preload", &preload);
init_add_plugin(NULL, hfile_plugin_init_net, "knetfile");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nuke the hfile_plugin_init_net declaration from hfile_internal.h.

@jmarshall
Copy link
Copy Markdown
Member

jmarshall commented Jan 5, 2021

Huzzah. This also deserves a NEWS entry, and there are some knetfile/knet_win32 references in test/header_syms.pl and test/maintainer/check_spaces.pl that could go.

I guess some will consider the removal of htslib/knetfile.h and the functions declared there to be a removal of vital public API functions. Pedantically correct, but… sigh… noone should be using these functions directly.

So either this perhaps warrants a soversion bump, or perhaps as a transition before removal during a later soversion bump in a future release, replace the implementations of knet_open() etc with dummy versions that just call through to hopen() (and hence newly fail if libcurl was not configured — tough!) and have them maintain knetFile::offset and maybe knetFile::fd I guess (for the sake of the knet_tell(), knet_fileno() macros)…

@jkbonfield
Copy link
Copy Markdown
Contributor Author

That's a good point - knetfile.h was a public header.

Maybe it should have stub functions (static inlines?).

@jmarshall
Copy link
Copy Markdown
Member

Annoyingly, in the absence of a soversion bump they would need to be real symbols in libhts.so.

Which would mean keeping this PR as a draft to be applied in the eventual happy future, and making a new smaller one that basically keeps the infrastructure but replaces all the function bodies within knetfile.c. Alas…

@jkbonfield
Copy link
Copy Markdown
Contributor Author

It appears to be trivial to get most of the ABI to still exist, eg:

typedef struct knetFile_s knetFile;
knetFile *knet_open(const char *fn, const char *mode) {
    return (knetFile *)hopen(fn, mode);
}

knetFile *knet_dopen(int fd, const char *mode) {
    return (knetFile *)hdopen(fd, mode);
}

ssize_t knet_read(knetFile *fp, void *buf, size_t len) {
    return hread((hFILE *)fp, buf, len);
}

off_t knet_seek(knetFile *fp, off_t off, int whence) {
    return hseek((hFILE *)fp, off, whence);
}

int knet_close(knetFile *fp) {
    return hclose((hFILE *)fp);
}

API though is a bit less clear due to the hfile_fileno macro. Frankly it looks to be never used outside of ancient copies of bcftools. The macro for knet_tell would need changing too, which is the fly in the ointment as far as ABI goes. It's trivial to retain API compatibility by simply changing it to htell. I could keep track of offset internally of course, but that means allocating new structures etc. Pondering...

@jkbonfield
Copy link
Copy Markdown
Contributor Author

I guess some will consider the removal of htslib/knetfile.h and the functions declared there to be a removal of vital public API functions. Pedantically correct, but… sigh… noone should be using these functions directly.

They appear to be endemic in older bgzf implementations. E.g. see

https://github.com/search?q=knet_dopen+NOT+filename%3Aknetfile.h+NOT+filename%3Akurl.h+NOT+filename%3Amain.c+NOT+filename%3Abgzf.h+NOT+filename%3Aknetfile.c&type=code

However the packages which copied in their own internal copy of bgzf.c have mostly also copied in knet.c (why would you copy one without the other if it has a dependency?). The only package I saw that didn't was EMBOSS, but then the bgzf.c is #ifdefed around _USE_KNETFILE which wouldn't be set anyway.

This cargo-cult meme really makes github searching a royal PITA! It also means people won't get fixes and "problems" are endemic across bioinformatics. :(

I couldn't find much else in usage though. That, kurl obviously, and ancient bcftools copies (again if you've gone to the effort of bundling the entire bcftools source into your package then you'll have also bundled the rest of it too).

@daviesrob
Copy link
Copy Markdown
Member

Simplest solution might be to remove its ability to read anything other than local files, which would make the implementation trivial, and also mark the whole thing as deprecated using HTS_DEPRECATED. Or possibly even refuse to open files at all, which would make the stub even simpler.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Ok, so this second commit undoes some things in the first, so it needs squashing, but is left as-is for now so reviewers can see the history.

  • htslib/knetfile.h now exists again for API compatibility. I'm still open to removing this (it's not needed by any of this code), which will remove knet from the API without affecting ABI. It depends on whether we wish to give warning of impending demise or not. (It has HTS_DEPRECATED markers).
  • hfile.c now has a dummy set of knet_* functions which are simply wrappers around hopen/hread etc. They manually track offset so the knet_tell macro still works (ABI compat). NB: it doesn't include knetfile.h so it's just a hacked duplicated. Perhaps I could move the hf structure member to in-situ of another char* so the structure size doesn't change incase people are using arrays of knetFiles, but I suspect that's unnecessary.
  • The windows knet functions have gone, but they weren't listed using HTS_EXPORT so I assume they were not publically callable anyway.
  • The knet_fileno macro is hideous to implement for curl (not sure even if possible), but the only uses I could find of it were immediately followed by fstat so I made it work for local files only and it returns -1 otherwise.

I've validated this using this tiny program:

#include <stdio.h>
#include "htslib/knetfile.h"

int main(int argc, char **argv) {
    char dat[100];

    knetFile *fp = knet_open(argv[1], "r");
    if (!fp) {
	perror(argv[1]);
	return 1;
    }
    knet_seek(fp, 15, SEEK_SET);
    int x = knet_read(fp, dat, 50);
    printf("FD %d: Read '%.*s'\n", knet_fileno(fp), x, dat);
    printf("Now at pos %ld\n", (long)knet_tell(fp));
    return knet_close(fp);
}

I built this against htslib 1.11 headers, and have switched dynamic library by use of LD_LIBRARY_PATH. I can confirm it works as expected.

Copy link
Copy Markdown
Member

@daviesrob daviesrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, apart from a few minor issues.

The only stray reference to knet I could find was in a comment in hfile_libcurl.c. It might be a good idea to remove that as well; the comment will still make sense without it.

Could you add a NEWS item, please?

hfile.c Outdated

HTSLIB_EXPORT
knetFile *knet_open(const char *fn, const char *mode) {
knetFile *fp = malloc(sizeof(*fp));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be safer to use calloc here.

hfile.c Outdated
HTSLIB_EXPORT
off_t knet_seek(knetFile *fp, off_t off, int whence) {
off_t r = hseek(fp->hf, off, whence);
if (r > 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 0 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so.


HTSLIB_EXPORT
knetFile *knet_open(const char *fn, const char *mode);
knetFile *knet_open(const char *fn, const char *mode) lHTS_DEPRECATED("Please use hopen instead");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stray l has appeared here, so this header currently doesn't compile.

There's little benefit to keeping it (basically FTP and unsecured
http), while curl is already required for htsget, refget (CRAM ref
cache), AWS S3 and GCS protocols.  Curl is available on all platforms
we support.
Plus include stub knet functions for ABI compatibility.

The knetfile.h has HTS_DEPRECATED markers for the knet_* functions, so
anyone compiling against the new code will get a warning that we
intend to delete these in the future.  It's debateable though whether
it should exist at all.  We could just remove this header right now
and break API, but retain the previous ABI.

The stub copy in knetFile is now divorced knetfile.[ch] and has its
own implementation which simply wraps up hFILE instead.

The knet_tell macro does direct access of knetFile::offset so we
cannot simply wrap up htell as we'd like.  Similar knet_fileno
directly queries knetFile::fd.  However the only use I've found of
this was in old copies of bgzf and bcftools main which then did fstat
on it so it's only filled out for local files.  (That's good because
getting it out of curl isn't supported by our code.)
@jkbonfield
Copy link
Copy Markdown
Contributor Author

Thanks. Don't know what went on with that stray "l". I previously had commented out the deprecated lines as it made my testing a pain. I guess I finger slipped removing the ;//.

Comment in libcurl.c has been nuked. I left it there before because I wasn't sure which bit to delete (is the RFC comment for knet or for the whole idea of full pathnames in general?), but I'm sure curl must be pretty compliant and if not I doubt we need to be spelling it out.

@daviesrob
Copy link
Copy Markdown
Member

The RFC comment was about using full pathnames. RFC1738 says you should CWD through each directory, curl did by default but knet did not, which made curl quite a bit slower in comparison.

Git blame suggests you added the original comment in commit 5bd5b95.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Fair enough. I reinstated that bit of comment again then.

@daviesrob daviesrob merged commit cecf738 into samtools:develop Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants