Skip to content

Portability improvements#1153

Closed
anderskaplan wants to merge 7 commits intosamtools:developfrom
anderskaplan:portability-improvements
Closed

Portability improvements#1153
anderskaplan wants to merge 7 commits intosamtools:developfrom
anderskaplan:portability-improvements

Conversation

@anderskaplan
Copy link
Copy Markdown
Contributor

Portability improvements made while getting HTSlib to build with msvc on Windows. Most of the changes are not specific to Windows but should also benefit other platforms.

The PR is split into commits which are all stand-alone and described individually.

…rom htslib instead of the one in stdlib.h. Only the regular int version is available on Windows.
… with a more standards compliant mechanism.

The problem was this test (sam.c line 2343): if (/.../ && fd->dispatcher)
According to the pthreads standard, the pthread_t type should be considered opaque.
On some platforms it is implemented as a struct rather than an integer, causing a compilation error.
In addition, if a pthread_create call fails, the value of the thread identifier is undefined.
…nition; different linkage" errors on Windows.
@mp15
Copy link
Copy Markdown
Member

mp15 commented Oct 2, 2020

Can you be a bit more specific about e46b03f which compiler is faulty?

@jmarshall
Copy link
Copy Markdown
Member

Re e46b03f: that's a C99ism that presumably the OP's compiler and/or compiler configuration doesn't like. It's just test code, so make it a #define!

Re os/rand.c: this file is #included into hts_os.c, which provides the #include "htslib/hts_defs.h". You should only have hts_os.c listed as a translation unit going into your build, not os/rand.c (which you should treat as a header file, not a source file).

@anderskaplan
Copy link
Copy Markdown
Contributor Author

Can you be a bit more specific about e46b03f which compiler is faulty?

Sure. It's the old msvc 2015 which isn't fully C99 compliant. It does not support variable-length arrays.

But in this case the length wasn't really variable, so it was straightforward to change to a regular fixed-length array.

@anderskaplan
Copy link
Copy Markdown
Contributor Author

Re e46b03f: It's just test code, so make it a #define!

I was choosing between a #define or an enum for the constant. Both work fine, but I tend to prefer the enum because it's more in-line with the code and has a tighter scope. If there is a preference for the #define alternative in this code base then I can of course change it. Just let me know.

@jmarshall
Copy link
Copy Markdown
Member

Your problem is with an old compiler that doesn't understand a (C99) array declaration. Other old compilers historically had trouble with array declarations that were anything other than a numeric literal, hence the suggestion to change it to the lowest common denominator, a #define as classically and familiarly used for this purpose. I tend to prefer the #define because it is more familiar for this use (and this is test code; production code would of course not have a fixed-size array so the issue would not arise). However enumeration constants are indeed integral constant expressions in C90 so are valid for this purpose in C90, so ¯\_(ツ)_/¯

The “should be treated as a header file” comment added is pretty opaque in isolation. it's pretty obvious that this is not an ordinary source file if you start from LIBHTS_OBJS and the existing build rules in the Makefile rather than a directory listing…

(I have comments about your other commits too, but much of it is contingent on what the maintainers' plans are re which Windows compilers they wish to pander to support. AIUI there is an existing Windows build that they consider to be supported.)

@jkbonfield
Copy link
Copy Markdown
Contributor

Agreed our supported windows platform for building htslib is gcc (mingw iirc, but maybe msys - I always get them muddled up).

However we do try to ensure people can program and link against htslib using a wider variety of compilers hence we sometimes make tweaks in header files to handle the idiosyncrases of broken compilers. (Plus in that regard it's really no different to all the extern C boilerplate for C++ users.) I think most of this doesn't come under that remit, but it's worth investigating if it doesn't break anything else.

@anderskaplan
Copy link
Copy Markdown
Contributor Author

These changes should all be neutral to non-Windows users, except 5728bdc which introduces a change in how the dispatcher thread for multithreaded SAM reads and writes is initialized. The current code checks if the thread handle is null to find out if the thread has been created or not. The pthreads standard explicitly recommends against this usage pattern, as it is not portable and also because the thread handle is undefined if the call to pthread_create should fail. I.e. this should not be expected to work on linux/gcc either even if it compiles.

@anderskaplan
Copy link
Copy Markdown
Contributor Author

Perhaps it would be better to split the PR into smaller ones?

@jmarshall
Copy link
Copy Markdown
Member

You are quite right that the pthread fd->dispatcher one is of general importance. (I've just changed my fake-stdlib portability aid's pthread_t to a struct as blessed by POSIX, which makes sam.c line 2342 throw an error. Thanks for highlighting this.)

So it would certainly be worth pulling that one out into a separate PR, though the best starting point would be for @jkbonfield to confirm exactly what he meant to test on that line. The trouble with your current suggested fix is that it changes behaviour (or at least gives the impression of doing so).

@anderskaplan
Copy link
Copy Markdown
Contributor Author

Sounds good.

My interpretation of the code with the test ("if fd->dispatcher") is that is checks whether the dispatcher thread has been started or not. Because if it hasn't been started, and we're waiting for it to transition from CLOSE to CLOSED, then we're in for a long wait.

What needs to be done instead is to check the return value of pthread_create() and save that information in the SAM_state somehow. That struct has a comment in it saying that changes to it can have severe performance penalties (caching lines?) so I thought it would be a better idea to slightly re-purpose the thread command/state enum to keep track of this information: introducing a new state to signal that the dispatcher thread has started.

What do you say @jkbonfield -- would you like a new PR with only this fix, or would you rather take care of it yourself?

@jmarshall
Copy link
Copy Markdown
Member

That would be my interpretation too, but notice that a few lines later the same code calls pthread_join(fd->dispatcher, …) without any additional guard — thus this code block in fact assumes that the dispatcher thread has indeed been started. So in fact this information is effectively saved in the SAM_state already, in the fact that fd->h is non-NULL.

Or it ought to be signalled thus — see PR #1154.

@jkbonfield
Copy link
Copy Markdown
Contributor

I'd need to get my head around this code again! It's a bit of a can of worms, but that's not uncommon with complex thread locking. My main problem is it needs some time to fuzz all the combinations as experience has shown it's very easy to get wrong!

@daviesrob
Copy link
Copy Markdown
Member

I guess the check for fd->dispatcher was to ensure that the dispatcher thread (specifically sam_dispatcher_read()) had actually been started so sam_state_destroy() wouldn't get stuck in an infinite loop. Just removing the check completely as in #1154 might therefore be risky if such a state can ever be reached in practice, and I'm not sure if checking fd->h is a good proxy. I don't think 5728bdc is quite right either, but adding a SAM_RUN state with a few tweaks and combining with #1154 could work.

Instead of setting fd->command = SAM_RUN; after pthread_create(), I think we should ensure it's SAM_NONE before pthread_create() and then add this to the start of sam_dispatcher_read():

pthread_mutex_lock(&fd->command_m);
    if (fd->command == SAM_NONE) {
        fd->command = SAM_RUN;
    } else {
        pthread_mutex_unlock(&fd->command_m);
        goto tidyup;
    } 
pthread_mutex_unlock(&fd->command_m);

Then we can use SAM_state::command != SAM_NONE as a test that the dispatcher thread really has started up. sam_state_destroy() would need to test for fd->command == SAM_NONE before setting it to SAM_CLOSE. It should then set it to SAM_CLOSE in either case, but only do the hts_tpool_wake_dispatch() bits if it was not SAM_NONE when checked earlier (which means removing the && fd->dispatcher test is now safe). It should ensure sam_dispatcher_read() shuts down irrespective of where it has got to when sam_state_destroy() is called, and if it hadn't started for some reason we won't try to wait for an event that will never happen.

This may be easier to code up than describe...

@jmarshall
Copy link
Copy Markdown
Member

Just removing the check completely as in #1154 might therefore be risky if such a state can ever be reached in practice

Part of the point of #1154 is that either removing this check is entirely safe or the subsequent pthread_join() invocation is incorrect (as it would need a similar check).

and I'm not sure if checking fd->h is a good proxy.

It may or may not be 😄, but note that in the current code pthread_create(&fd->dispatcher, …) is called if and only if fd->h has just been set.

@jkbonfield
Copy link
Copy Markdown
Contributor

It is true that the pthread_create is only done if fd->h is set, but I am unsure if the reverse is true. Ie can fd->h be set without the thread having been created yet? Less clear, but potentially it could fail in sam_hdr_fill_hrecs causing the state to be shutdown before the dispatcher thread has been created.

Easy way to test it is to add a deliberate return -2 into the code before the pthread_create and test your PR for validity.

@daviesrob
Copy link
Copy Markdown
Member

@jmarshall Hmm, yes. A specific "I started the thread" flag may in that case be a good idea. I was hoping that we could use the existing state machine in fd->command but it looks like that won't quite work. fd->h might be usable for this, but we would have to be very careful when doing this. As @jkbonfield notes, #1154 in its current state still allows it to be set without starting the thread here. A dedicated flag might be safer.

@anderskaplan
Copy link
Copy Markdown
Contributor Author

Would you like to keep this PR open for the discussion? Otherwise I'll close it, as it shouldn't be merged in its current form anyway.

I'm planning to follow up with other PR's for the fixed-size array and similar.

@jkbonfield
Copy link
Copy Markdown
Contributor

Leave it for the time being as it's a reminder that we still need to fix the threading issue. I'm leaning to just having a variable we set along side the thread pointer as it's the safest in terms of 100% understandable and easy to validate it does the same as the current code and doesn't have sneaky race conditions or gotchas with failing at key moments.

I thought Rob was looking at this, but he's unavailable this week so I may do it quickly as a PR myself.

jkbonfield added a commit to jkbonfield/htslib that referenced this pull request Oct 28, 2020
Replaces samtools#1154
See also samtools#1153

It does this by the addition of a separate variable to do the boolean
check on instead of fd->dispatcher itself.  I deemed this easier to
understand than overloading the interpretation of fd->h being set plus
I'm unsure of the potential failure case in sam_hdr_fill_hrecs where
fd->h has been set but we errored before creating the threads.

Also improves error recovering in case of pthread creation failure,
avoiding a false pthread_join later.  Thanks to John Marshall and
Anders Kaplan for identifying that issue.
daviesrob pushed a commit to jkbonfield/htslib that referenced this pull request Nov 5, 2020
Replaces samtools#1154
See also samtools#1153

It does this by the addition of a separate variable to do the boolean
check on instead of fd->dispatcher itself.  I deemed this easier to
understand than overloading the interpretation of fd->h being set plus
I'm unsure of the potential failure case in sam_hdr_fill_hrecs where
fd->h has been set but we errored before creating the threads.

Also improves error recovering in case of pthread creation failure,
avoiding a false pthread_join later.  Thanks to John Marshall and
Anders Kaplan for identifying that issue.
@anderskaplan anderskaplan deleted the portability-improvements branch November 7, 2020 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants