Conversation
INSTALL
Outdated
| Using libcurl provides HTSlib withnetwork protocol support, for | ||
| example it enables the use of ftp and https:// URLs. It is also | ||
| required if direct access to Amazon S3 or Google Cloud Storage is | ||
| enabled. |
There was a problem hiding this comment.
Fix typo (missing space), and suggested spelling out all the important URL schemes:
| Using libcurl provides HTSlib withnetwork protocol support, for | |
| example it enables the use of ftp and https:// URLs. It is also | |
| required if direct access to Amazon S3 or Google Cloud Storage is | |
| enabled. | |
| Using libcurl provides HTSlib with network protocol support, for example it | |
| enables the use of ftp://, http://, and https:// URLs. It is also required | |
| if direct access to Amazon S3 or Google Cloud Storage is enabled. |
There was a problem hiding this comment.
Odd. I checked the original and the missing space is actually an entire missing " better " word. Must have been some odd finger slippage. [Edit: ignore me - clearly still befuddled!]
Anyway thanks for the updates. I'll amend.
There was a problem hiding this comment.
Removing “better“ here was clearly the right thing to do 😄 Now I'm off to get my third coffee of the morning, to fend off my own befuddlement…
Makefile
Outdated
| # Filter out macros in htslib/hts_defs.h that aren't needed on non-Windows | ||
| # platforms. |
There was a problem hiding this comment.
Suspect “that aren't needed on non-Windows platforms” pertained only to knet_win32_ and can disappear too.
There was a problem hiding this comment.
I was unsure of that and ummed and ahhed over it, but happy to remove. I did start to edit the associated perl script, but figured it's harmless for it to check for functions that don't exist and I'm more likely to break it in doing so. It's perhaps still something that should be changed though.
| hfile_add_scheme_handler("data", &data); | ||
| hfile_add_scheme_handler("file", &file); | ||
| hfile_add_scheme_handler("preload", &preload); | ||
| init_add_plugin(NULL, hfile_plugin_init_net, "knetfile"); |
There was a problem hiding this comment.
Also nuke the hfile_plugin_init_net declaration from hfile_internal.h.
|
Huzzah. This also deserves a NEWS entry, and there are some knetfile/knet_win32 references in test/header_syms.pl and test/maintainer/check_spaces.pl that could go. I guess some will consider the removal of htslib/knetfile.h and the functions declared there to be a removal of vital public API functions. Pedantically correct, but… sigh… noone should be using these functions directly. So either this perhaps warrants a soversion bump, or perhaps as a transition before removal during a later soversion bump in a future release, replace the implementations of |
|
That's a good point - knetfile.h was a public header. Maybe it should have stub functions (static inlines?). |
|
Annoyingly, in the absence of a soversion bump they would need to be real symbols in libhts.so. Which would mean keeping this PR as a draft to be applied in the eventual happy future, and making a new smaller one that basically keeps the infrastructure but replaces all the function bodies within knetfile.c. Alas… |
|
It appears to be trivial to get most of the ABI to still exist, eg: API though is a bit less clear due to the |
They appear to be endemic in older bgzf implementations. E.g. see However the packages which copied in their own internal copy of bgzf.c have mostly also copied in knet.c (why would you copy one without the other if it has a dependency?). The only package I saw that didn't was EMBOSS, but then the bgzf.c is #ifdefed around _USE_KNETFILE which wouldn't be set anyway. This cargo-cult meme really makes github searching a royal PITA! It also means people won't get fixes and "problems" are endemic across bioinformatics. :( I couldn't find much else in usage though. That, kurl obviously, and ancient bcftools copies (again if you've gone to the effort of bundling the entire bcftools source into your package then you'll have also bundled the rest of it too). |
|
Simplest solution might be to remove its ability to read anything other than local files, which would make the implementation trivial, and also mark the whole thing as deprecated using |
|
Ok, so this second commit undoes some things in the first, so it needs squashing, but is left as-is for now so reviewers can see the history.
I've validated this using this tiny program: I built this against htslib 1.11 headers, and have switched dynamic library by use of |
daviesrob
left a comment
There was a problem hiding this comment.
Generally looks good, apart from a few minor issues.
The only stray reference to knet I could find was in a comment in hfile_libcurl.c. It might be a good idea to remove that as well; the comment will still make sense without it.
Could you add a NEWS item, please?
hfile.c
Outdated
|
|
||
| HTSLIB_EXPORT | ||
| knetFile *knet_open(const char *fn, const char *mode) { | ||
| knetFile *fp = malloc(sizeof(*fp)); |
There was a problem hiding this comment.
It would be safer to use calloc here.
hfile.c
Outdated
| HTSLIB_EXPORT | ||
| off_t knet_seek(knetFile *fp, off_t off, int whence) { | ||
| off_t r = hseek(fp->hf, off, whence); | ||
| if (r > 0) |
htslib/knetfile.h
Outdated
|
|
||
| HTSLIB_EXPORT | ||
| knetFile *knet_open(const char *fn, const char *mode); | ||
| knetFile *knet_open(const char *fn, const char *mode) lHTS_DEPRECATED("Please use hopen instead"); |
There was a problem hiding this comment.
A stray l has appeared here, so this header currently doesn't compile.
There's little benefit to keeping it (basically FTP and unsecured http), while curl is already required for htsget, refget (CRAM ref cache), AWS S3 and GCS protocols. Curl is available on all platforms we support.
Plus include stub knet functions for ABI compatibility. The knetfile.h has HTS_DEPRECATED markers for the knet_* functions, so anyone compiling against the new code will get a warning that we intend to delete these in the future. It's debateable though whether it should exist at all. We could just remove this header right now and break API, but retain the previous ABI. The stub copy in knetFile is now divorced knetfile.[ch] and has its own implementation which simply wraps up hFILE instead. The knet_tell macro does direct access of knetFile::offset so we cannot simply wrap up htell as we'd like. Similar knet_fileno directly queries knetFile::fd. However the only use I've found of this was in old copies of bgzf and bcftools main which then did fstat on it so it's only filled out for local files. (That's good because getting it out of curl isn't supported by our code.)
|
Thanks. Don't know what went on with that stray "l". I previously had commented out the deprecated lines as it made my testing a pain. I guess I finger slipped removing the Comment in libcurl.c has been nuked. I left it there before because I wasn't sure which bit to delete (is the RFC comment for knet or for the whole idea of full pathnames in general?), but I'm sure curl must be pretty compliant and if not I doubt we need to be spelling it out. |
|
The RFC comment was about using full pathnames. RFC1738 says you should CWD through each directory, curl did by default but knet did not, which made curl quite a bit slower in comparison. Git blame suggests you added the original comment in commit 5bd5b95. |
|
Fair enough. I reinstated that bit of comment again then. |
Recent bug reports [personal communication] in knet is the straw that broke the camels back. I've been wanting to do this for years though!
There's little benefit to keeping it (basically FTP and unsecured http which has largely gone out of fashion), while curl is already required for htsget, refget (CRAM ref cache), AWS S3 and GCS protocols. Curl is available on all platforms we support.