Skip to content

Adds CRAM 3.1 support (experimental)#929

Merged
daviesrob merged 19 commits intosamtools:developfrom
jkbonfield:htscodecs
Jan 20, 2021
Merged

Adds CRAM 3.1 support (experimental)#929
daviesrob merged 19 commits intosamtools:developfrom
jkbonfield:htscodecs

Conversation

@jkbonfield
Copy link
Copy Markdown
Contributor

This is here as demonstration of how to add this into Htslib (in addition to the existing support in Scramble).

It uses htscodecs as a submodule, but compiles and links the code directly into htslib so there isn't a second library hanging around. This now means scramble and htslib share the same codec implementation and we don't accidentally only fix bugs in one place.

Unlike Scramble there is nothing to actively warn that this is a technology test of CRAM 3.1 and not an official GA4GH standard. I could add this, but I wasn't envisioning merging it until it's an adopted standard anyway. (It's sufficient to be able to state there is support for this format in Samtools and hence maybe get more people looking into assessing the new spec.)

@jkbonfield jkbonfield force-pushed the htscodecs branch 7 times, most recently from 64de0e5 to 7f69043 Compare September 23, 2019 14:08
@jkbonfield
Copy link
Copy Markdown
Contributor Author

jkbonfield commented Sep 23, 2019

Bug-fixed windows / macos (limited stack sizes) and rebased off develop.

Some examples:

Develop vs this PR using an existing local v3.0 CRAM. Tests read this file (a 3.0 file of 10 million NovaSeq records) and convert it to a new BAM/CRAM file and then read it back again to get encode and decode times. NB: This build has libdeflate support. Without it BAM would be around 2x slower.

Branch CRAM vers Size Enc(s) Dec(s)
develop (as BAM) 576789115 53.2 4.3
develop (as SAM.gz) 542040943 60.3 9.5
develop 3.0 208635462 40.0 12.9
htscodecs 2.1 241451288 41.5 13.3
htscodecs 3.0 208392952 38.5 12.0
htscodecs 3.1 fast 231023913 31.5 11.3
htscodecs 3.1 (normal) 175696160 36.4 10.8
htscodecs 3.1 small 166172362 88.8 39.0
htscodecs 3.1 archive 162697852 123.0 38.1

(Edit: added gzipped SAM for Martin :-))

Note: "fast" mode is not only light weight compression, but also fine grained random access (10x finer than normal) so why it doesn't look much faster at streaming it's potentially considerably more with some random access patterns.

Example command line to test with:

perf stat ./test/test_view -C -o version=3.1 -o small ~/scratch/data/novaseq.10m.cram -p /tmp/_v31.cram
ls -l /tmp/_v31.cram
perf stat ./test/test_view -B /tmp/_v31.cram

@jkbonfield jkbonfield force-pushed the htscodecs branch 4 times, most recently from 0a0e6c1 to a22a0af Compare February 13, 2020 14:48
@jkbonfield jkbonfield force-pushed the htscodecs branch 3 times, most recently from 4dc915f to d98e0e3 Compare May 6, 2020 10:46
@jkbonfield
Copy link
Copy Markdown
Contributor Author

Happy Birthday :-)

Given 1.11 is imminent and this PR has missed two (lengthy!) releases already, I am bumping this and request please that it is reviewed early in the 1.12 release cycle. If not I'll continue with my previous assertion that all unreviewed PRs of mine after 1 year get merged anyway under the implicit assumption that noone objects. ;-)

@daviesrob
Copy link
Copy Markdown
Member

The main issue with this pull request is how we manage the new forced dependency on the htscodecs library. It at least needs additional configure options to allow linking against an external htscodecs, as distributors like Debian will want to include it that way. It also currently forces a dependency on autotools which may be unpopular with projects that embed htslib and expect to be able to just use make; and it uses recursive make, which is known to cause problems.

This will also need to be checked carefully to ensure it doesn't introduce any regressions, especially on memory safety and handling invalid inputs. Given the size of this change, that might take a while.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

jkbonfield commented Sep 18, 2020

The main purpose for htscodecs as far as I am concerned is to avoid duplication between io_lib and htslib. I need both because io_lib is much faster moving and has more experimental features (eg it's had CRAM 3.1 and 4 for several years).

It doesn't necessarily have to be explicitly done as a separate library, or submodule. Eg the files could be copied in (klib style), provided we're committed to make changes in the upstream project instead of just forking and diverging. It also doesn't necessarily have to use the build system or be a library. It could be done within the htslib Makefile, even if we import it as a separate git module or package. Eg just add the necessary .c files in Makefile so they end up being linked in as part of htslib. That won't be flavour of the month with Debian though.

Htscodecs has been through rounds of sanitizers and fuzzing, but yes it may not be as robust as the OSSFuzz hammering. I know it'll take time, but that's why I'm asking for it to be consider at the start of the next release cycle. It's always been "this is too complex, so we'll punt it to the next one" which basically means never unless we act early on. Hence this reminder.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Also, to be frank do we want to encourage continued "cp -r htslib ." cargo-cult style of development? It's a total nightmare for getting bug fixes out there. As long as we make it easy for people to do things the Right Way then it's not so bad. Autotools also isn't that bad from a user perspective - you don't have to install anything (unlike cmake or the myriad of other flavour of the month make replacements). If it's built using --disable-shared then most of the libtool monstrosities go away. :)

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Rebased and added some updates to bring this in line with io_lib's CRAM 4 implementation. I've verified that scramble and test_view can now read and write each others files with compatibility between them. CRAM 4 is even more experimental than 3.1, but we may as well at least be consistent. This now also brings this PR in line with the draft cram4.tex specification, which had a few tweaks and adjustments since Sep 2019 (when this PR was made).

The old branch, before rebasing, can be found here: https://github.com/jkbonfield/htslib/tree/htscodecs.old
(That may be useful if you wish to replay the latest commits on this against your own rebase of the original PR.)

@jkbonfield
Copy link
Copy Markdown
Contributor Author

(Compare with 2nd to top post above, but note these are different files than before.)

Current htscodecs timings - ie this PR as it now stands, with updated htscodecs submodule too.

10 million NovaSeq records:

Branch CRAM vers Size Enc(s) Dec(s)
develop BAM/libdeflate 539562772 44.9 5.3
develop 3.0 223704658 32.9 13.3
htscodecs 3.0 223876657 31.7 10.6
htscodecs 3.1 186554165 34.4 9.4
htscodecs 4.0 182186308 32.2 10.7
htscodecs 3.1 small 175986355 78.4 37.8
htscodecs 4.0 small 170854620 75.0 36.1
scramble 3.0 223761677 33.4 10.3
scramble 3.1 186429613 33.8 8.3
scramble 4.0 182080941 32.3 8.2

It's unclear why scramble's 3.1 is comparable encode speed to 3.0, but not htslib's. It's of note though that scramble's system time is vastly smaller, which is mainly due to it using mallopt to workaround inefficiencies in glib's malloc (which htslib does not do).

On 10 million recs from the HiSeq 2k data (9827_2#49.bam):

Branch CRAM vers Size Enc(s) Dec(s)
develop BAM/libdeflate 1163050644 50.6 7.9
develop 3.0 573365519 36.5 14.4
htscodecs 3.0 573317290 34.4 12.7
htscodecs 3.1 551563284 39.1 13.3
htscodecs 4.0 549018451 38.3 11.9
htscodecs 3.1 small 388439096 111.1 63.3
htscodecs 4.0 small 385983116 108.2 63.5
scramble 3.0 573349740 35.3 11.0
scramble 3.1 551636226 39.5 11.1
scramble 4.0 549116518 38.3 11.5

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Some more benchmarks, this time 10 million NovaSeq records:

CRAM 3.0        Read(s)   Read/Write(s)                                         
---------------------------------------                                         
Develop         13.43     40.11                                                 
Htscodecs       10.60     38.25                                                 

I post these as data to consider for how to progress. @daviesrob I know your preference is to keep cram/rANS_static.c and have htscodecs as an optional build. IMO this is a big mistake. We don't need more variants and ways to build sub-optimal libraries. Either we support CRAM, or we don't. I'd rather us have a --disable-cram option that to have two different choices of codecs. It's more work for us to maintain and I see zero benefit. (We've been down this road before with knet + curl as choices, and it's caused many more problems than it solved.) It'll also lead to poor comparisons in benchmarks run by other people.

The other thing is there is a catch-22 issue here. CRAM 3.1 spec has gained minimal support elsewhere, partly because it's an abstract notion and isn't supported by the main packages. However this PR has basically been ignored for well over a year because CRAM 3.1 is experimental and hasn't been adopted by GA4GH yet. The only way to break this cycle is to lead the community, not sit back and hope it'll all just go away. That way just gives power to the likes of MPEG fostering commercial alternatives that may be able to legitimately compare against CRAM 3.0 and conveniently pretend 3.1 doesn't exist.

As you may gather, I'm getting somewhat annoyed by the endless delays. PLEASE consider merging this. We're more or less half way through our mythical 3 month release cycle. I don't wish there to be yet another excuse of "we can't merge it this late" as happened for the last 2 releases. The sooner it is merged, the sooner it gets into the standard fuzzing and testing regimes and the more confident we'll be that it is solid.


pthread_mutex_lock(&fd->metrics_lock);
for (i = 0; i < DS_END; i++)
fd->m[i]->stats = c->stats[i];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leading to threading problems. fd->m[i] is shared between all threads, so they fight over which c->stats[i] should be in there. When the pointer is used in cram_compress_block2() it may get stats for the wrong container, or in the worst case the memory may have been released already resulting in a use-after-free bug.

Either cram_compress_block2() needs an extra parameter for c->stats[type] (or c->stats[type]->nvals); or the adjustments to method need to be done in this function so there is no need to pass the stats to cram_compress_block2(). After that I think it should be possible to remove cram_metrics::stats as I don't think it's used anywhere else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong container doesn't matter much as the metrics are on-going trends that aggregate data across containers. However you're correct that having a pointer to it is bad as they may go out of scope.

I think the correct answer is to set a flag (or boost a counter) in metrics so it doesn't need the pointer, but metrics analysis can still use it for biasing which codecs to try.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks like a real problem, although I've tried very hard to recreate it and with 100s of tests still cannot.

Anyway, I've now replaced it with a boolean flag which is all we really need.

@jmarshall
Copy link
Copy Markdown
Member

It uses htscodecs as a submodule, but compiles and links the code directly into htslib so there isn't a second library hanging around. This now means scramble and htslib share the same codec implementation and we don't accidentally only fix bugs in one place.

The question of how to incorporate building htscodecs into HTSlib's build infrastructure comes down to this question about whether there should be an htscodecs entity that is versioned independently from htslib, whether to facilitate distribution packagers making such an independent library, and how to make the codec source code available — both in a Git checkout and in an HTSlib source tarball.

FWIW I agree that avoiding having an independently versioned second library for this is the desirable and simple approach. The purpose of the separate htscodecs repository is simply to share the code between HTSlib and e.g. io_lib, not to have them both depend on another separate moving target.

I'm not usually a fan of Git submodules: they make the subproject operate in absolute lockstep with the superproject, so are not a good fit for e.g. the samtools/bcftools relationship with htslib (in which many adjacent versions of htslib will do for a particular samtools/bcftools commit). However for this case, in which htslib wants to share a particular htscodecs commit with scramble, a Git submodule is exactly what is desired.

Now, how to build the codecs and link their object code directly into htslib.a/.so? HTSlib has its own very capable build system already, so the simplest thing (from HTSlib's point of view) would be to have the submodule's .c files built using rules in HTSlib's Makefile. As the htslib git checkout would reference a particular commit of the codecs submodule, there is no difficulty in having the list of codec source files located in an HTSlib file that would need updating whenever the list of source files in the submodule changes (which would be infrequent anyway). In fact, probably the best way to do this would be to have a new HTSlib source file, e.g., codecs.c, that includes all the desired htscodecs source files:

// codecs.c -- HTSlib source file that pulls in code from the htscodecs submodule

#include <config.h>  // i.e. *HTSlib's* config.h

// Any other needed setting up

#include "htscodecs/arith_dynamic.c"
#include "htscodecs/fqzcomp_qual.c"
#include "htscodecs/pack.c"
#include "htscodecs/rANS_static.c"
#include "htscodecs/rANS_static4x16pr.c"
#include "htscodecs/rle.c"
#include "htscodecs/tokenise_name3.c"

Then HTSlib's Makefile would just list codecs.o in LIBHTS_OBJS just like any other HTSlib source file — and would not itself need any lists of codecs source files etc. Or, indeed, any further complexity at all.

And that's it. No need for HTSlib's INSTALL or configure.ac to learn about automake or libtool or do anything at all. Not using htscodecs's makefile or build system, as we are literally using only the codec source code from the subproject. Not running or using the htscodecs test suite either, but the expectation would be that people doing development on the codecs themselves would be doing it from a separate checkout of htscodecs itself (rather than as a submodule) so would be building and testing using htscodecs's own build system and test suite — quite separately from HTSlib; and then when complete the new commits could be git fetched/pulled into HTSlib's referenced htscodecs commit.

Finally, what about HTSlib release source tarballs? There is no git checkout to use to supply files from a submodule, so c-maint/mktarball would need to be extended to copy the htscodecs files (or just the .c/.h files actually needed and the basic README, LICENSE etc) from the referenced repository and commit into a newly mkdired subdirectory within the tarball it's creating. Then the existing codecs.c and its build rules would work exactly as they do in a git checkout.

There are a couple of minor organisational changes that would facilitate all this:

  • I guess before this is merged to samtools/htslib, it would be desirable to move the codecs repository to the samtools organisation. IMHO samtools/hts-codecs would be a slight improvement on htscodecs, with the hyphen paralleling the one in samtools/hts-specs.

  • It would be preferable for mktarball to copy the needed source files into a top-level directory, say htslib-1.x/codecs/*.[ch], so that they're not nested inside a tree of otherwise empty directories. Therefore ideally the codec source code would be at the top-level directory of the submodule, so that #include "codecs/foo.c" (for a submodule path of codecs) would be the right path for both a git checkout and a release tarball.

@jmarshall
Copy link
Copy Markdown
Member

The downside of this is that the codecs would be built in HTSlib by a different system from the one they are built by in the native htscodecs project; in particular, with different compile flags than as determined by the htscodecs build system. OTOH they would be built with the same compile flags as the rest of HTSlib — so there are pros and cons here; it's a tradeoff. My suspicion is that the differences would be minimal anyway.

The big advantage is that htscodecs is free to use automake and libtool (boo, hiss 😄) if it wants to, and this will have no effect on HTSlib. However perhaps it would be worth revisiting the htscodecs build system to focus on just test harnesses for the various codecs.

[Rob]

The main issue with this pull request is how we manage the new forced dependency on the htscodecs library. It at least needs additional configure options to allow linking against an external htscodecs, as distributors like Debian will want to include it that way.

[James]

It also doesn't necessarily have to use the build system or be a library. It could be done within the htslib Makefile […] That won't be flavour of the month with Debian though.

Even though Debian have already jumped the gun and drafted a shared library package from James's draft repository, HTSlib and htscodecs have a choice about whether to facilitate that. It's really a question of managing expectations.

Debian perfectly reasonably likes to debundle separate libraries so that the subproject can be independently updated and the superproject gain from that (getting bug fixes etc).

However one point of view here is that htscodecs really isn't a separate library — it's just a collection of source files. There's no intrinsic reason why HTSlib or htscodecs need to facilitate using or building htscodecs as a library, when they claim that htscodecs is just a git-native way of sharing some code between HTSlib and io_lib — a more reliable form of klib's copying files around between projects.

The claim is that by only having HTSlib (and io_lib) with released versions, and each HTSlib release determining the htscodecs submodule commit that it's used with, maintenance and support are simplified. (E.g. the need to maintain yet another stable API is avoided entirely!) Behaviour from bug reports is easier to understand because there isn't a random combination of htslib version and codec version.

This point of view will be more palatable if htscodecs doesn't present itself as a library. I haven't really looked at how io_lib uses it, but if the htscodecs build system was changed to just build a bunch of *.o files that can be linked with their test harnesses, then it would be clear that this code was not intended to be separated as a library. There may be temporary squealing 😄 but I suspect packagers would come around to just using HTSlib's / io_lib's native inclusion of these source files. Particularly as HTSlib/io_lib releases should be frequent enough to keep up with the (in the fullness of time) likely small infrequent updates to the codecs.

@jkbonfield
Copy link
Copy Markdown
Contributor Author

Even though Debian have already jumped the gun and drafted a shared library package from James's draft repository, HTSlib and htscodecs have a choice about whether to facilitate that. It's really a question of managing expectations.

It's used by a released version of io_lib, so they haven't jumped the gun.

This problem has mainly come about due to our tardiness. I wasn't willing to tie my own releases of io_lib (libstaden_read1 on Debian) to wait years for htslib PRs to get reviewed and with a lack of any feedback here I just went with something that worked for me. I'd rather not rename it now either, but it does need moving to the samtools organisation (that's not something I have permission to do). It was named after htslib, which doesn't have a hyphen.

However one point of view here is that htscodecs really isn't a separate library — it's just a collection of source files. There's no intrinsic reason why HTSlib or htscodecs need to facilitate using or building htscodecs as a library, when they claim that htscodecs is just a git-native way of sharing some code between HTSlib and io_lib — a more reliable form of klib's copying files around between projects.

It can be thought of either way. I'm open to all ways for htslib to use it. My aim is simply to avoid duplication. I've had enough of having to support two variants of the same code, and occasionally finding subtle bugs that we'd already fixed in the other copy. One source location is absolutely the right solution.

As for the tarball, I think it should just be bundled in with it. Definitely for samtools/bcftools, but maybe also for htslib.

@daviesrob
Copy link
Copy Markdown
Member

I'm just getting around to working out how to integrate this into HTSlib without the need for a recursive make (which I'm not in favour of due to the pain it has caused in SAMtools). My current thinking is to have configure options that would allow it to be built using either bundled htscodecs code, or linked to an external library as Debian would like. Some other projects work this way, for example mplayer can build its own copy of various dependencies or link to them externally. It would be useful as long as it doesn't add too much complication if it means distributors don't start adding big patches so they can package htscodecs externally.

James should be able to transfer the htscodecs repository to the samtools organisation now.

daviesrob added a commit to daviesrob/samtools that referenced this pull request Jan 11, 2021
In anticipation of samtools/htslib#929 merge, which adds a
submodule.

Also makes Appveyor use `autoreconf -i` when setting up HTSlib
as that will be required for autoconf 2.70.
jkbonfield and others added 19 commits January 11, 2021 15:33
In tested I spotted v2.1 output still uses the rANS codec, which is
incorrect.  It does this because rans is enabled when opening a file
if the version is >= 3.0, but we set the version after opening the
file and that didn't explicitly disable rans again.

Io_lib's Scramble does these bits in reverse (set default vers and
then open file) so worked fine.
The cram/rANS_static.[ch] implementation has been removed.

Updates the build system to make the required parts of htscodecs.

Co-Authored-By: Rob Davies <rmd+git@sanger.ac.uk>
Note: this is experimental, subject to the 3.1 spec being agreed upon
by the GA4GH File Formats committee.

Example usage (high compression):

    ./test/test_view -C -o version=3.1 -o use_fqz -o use_bzip2 -o use_arith \
        -o seqs_per_slice=100000 /tmp/30.cram -p /tmp/31-max.cram

Also added support for compression profiles.
Primarily these are CRAM only, but for consistency they're also used as
synonyms for compression levels in bgzf.

In the process of adding CRAM 3.1 several improvements and refactoring
took place to the cram encoder (mainly by dint of copying over the
cram_compress_block function from io_lib):

- Added a "Gzip -1" deflate strategy as sometimes it's smaller
  than gzip or comparable size (but considerably fatser).  This is
  purely for metrics and optimisation.

- Refactored the cram_compress_block function so it can deal with many
  more compression method / strategies.  It now has a number of lookup
  tables which need to be kept synchronised (see comments).

- Added a new cram_compress_block2 function which takes a slice.  This
  is used for the fqzcomp code to obtain the read lengths.

- Cram_compress_block now has a consistency metric too.  This is to
  learn when data isn't changing rapidly so don't be too quick to
  reevaluate.
Some rANS codecs require over 2 Mbytes of temporary space
for the encode/decoder state.  On some platforms, notably
MacOS the default thread stack size is not big enough to
hold this.  Previously we fixed this by using malloc instead,
but that has other undesireable side effects such as repeated
mmap/munmap and wasting time zeroing pages over and over.  Our
traditional fix here has been to use pthread_once to get a
single malloc and then reuse this block, to avoid the mmap issues
in glibc.  However this is complicated, ensuring the thread
stack is sufficiently big is much easier.
This cannot be avoided!  It's spam, but it will be removed once the
spec becomes official.  If there are no changes, that does mean this
htslib will be compliant, but spammy, however we can then make a new
release.
Not sure we have any files large enough to really put it through
its paces, but this is a reasonable start.
We can read/write CRAM 3.1.
We can read CRAM 4.0, but writing 4.0 is corrupted.
Normally sorted data deltas the positions and unsorted does not.  This
allows us to explicitly reenable the AP=1 container header field to do
deltaing, as it's sometimes useful on unsorted data.  Eg pairs are
normally neighbouring position, so it's still a bit better to compress
than leaving them as-is.
This does some basic analysis to spot grouped variants and if and only
if long enough and scoring +ve will it switch to 'b'(?) feature.
- We no longer keep GZIP_RLE used when libdeflate is compiled in
  because it doesn't support strategy in the same manner as zlib.
  Instead we use GZIP level 1.

- Adjusted the fixed small-block compensation code.  It's now equiv to
  a 60 byte difference required before changing up to the next tier in
  complexity.

  This had no impact on my novaseq test normally, but with bzip2+lzma
  added in it saved 6% CPU (at a cost of 0.14% file growth).
Same as io_lib's eda7cfe and 16c19ee.

This replaces the length field in containers with a variable size
quantity instead of 32-bit int, for consistency with other fields, and
marks a few fields as using a signed encoding strategy (necessary for
the next commit).

[The CRAM 4 specification moved on slightly since the original date of
this PR.]
This is the htslib equivalent to io_lib's 090dfd3f and d86b008.
It is now in adherance to the draft CRAM 4 spec.

It removes the E_SINT and E_SLONG types and replaces E_EXTERNAL with
E_BYTE, E_VARINT_SIGNED and E_VARINT_UNSIGNED.  Integer data types
aren't explicitly signed or unsigned but simply use the codec that
works for them.  So if all ref-ids are >= 0 it'll be VARINT_UNSIGNED,
otherwise VARINT_SIGNED.  This future proofs things by adding
introspection to the format.

Also removed HUFFMAN from CRAM 4.  It was only being used for storing
constant values, so we now have CONST_BYTE and CONST_INT instead
(which always uses a signed encoding).

This should be the final removal of ever encoding negatives as if they
were positives (eg -1 as 0xFFFFFFFF) and so in turn removes all size
limitations on values.
This has been replaced by an "unpackable" flag.  This avoids
cram_compress_block2 from needing to access metrics->stats, which is a
pointer to the container->stats struct.  In rare cases this could be
freed prior to the compression completing.
Also the public version has had the enum members renamed to include an
HTS_ prefix.  At present they appear to be unused, although it's hard
to know for sure with some of them due to the common hits you get all
over github.  Leaving them as they are though is just waiting for a
hard problem to arrive.

This commit introduces the notion of HTS_COMPAT.  The ABI changes
changed, but the API has (albeit in a way we believe should affect
people).  It's a 3+ digit number with the last two being minor version
and earlier being major.  So htslib version 1.12 would be 112.

The intention is simply to permit packages with dependencies on
specific earlier API names to still compile with the minimal of
changes (even if it's just by overriding CFLAGS).
They are included by adding extra dependencies to the "test" and
"check" targets when htscodecs is bundled via the submodule.
So that anyone who tries to build the library without setting
up the submodule first will get a hint on how to make the build
work.

The check isn't really needed for anyone building from a release
tarball, but it will be harmless as it should always succeed (as
long as the tarball was built correctly and includes the htscodecs
files).
Currently if the submodule files are missing it asks the user
to run "git submodule update" to install them.  It's possible
that the Makefile run this itself and continue, although
I'm not sure if that would be completely safe in parallel builds.

The variables lising dependencies for htscodecs header files
are moved to htscodecs_bundled.mk to ensure they are only set
for submodule builds.
@daviesrob
Copy link
Copy Markdown
Member

I've squashed in a couple of the bugfix commits, and rebased so this will merge cleanly again.

daviesrob added a commit to daviesrob/bcftools that referenced this pull request Jan 12, 2021
submodule.

Also makes everything use autoreconf -i when setting up HTSlib as
that will be required for autoconf 2.70.

Note that both Travis and cirrus-ci use the .travis/clone script.
daviesrob added a commit to daviesrob/bcftools that referenced this pull request Jan 12, 2021
submodule.

Also makes everything use autoreconf -i when setting up HTSlib as
that will be required for autoconf 2.70.

Note that both Travis and cirrus-ci use the .travis/clone script.
jkbonfield pushed a commit to samtools/samtools that referenced this pull request Jan 12, 2021
In anticipation of samtools/htslib#929 merge, which adds a
submodule.

Also makes Appveyor use `autoreconf -i` when setting up HTSlib
as that will be required for autoconf 2.70.
daviesrob added a commit to daviesrob/bcftools that referenced this pull request Jan 13, 2021
In anticipation of samtools/htslib#929 merge, which adds a
submodule.

Also makes everything use autoreconf -i when setting up HTSlib as
that will be required for autoconf 2.70.

Increases the memory limit for the address sanitizer cirrus-ci
test.  While the average memory used seems to be low, occasionally
some test processes may get big enough to trigger an out of memory
killer.

Note that both Travis and cirrus-ci use the .travis/clone script.
@daviesrob
Copy link
Copy Markdown
Member

The samtools and bcftools tests should now work with the submodule, so I'll merge this tomorrow morning unless there are any last-minute hitches.

@daviesrob daviesrob merged commit 01830d7 into samtools:develop Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants