Conversation
Calling serial::write() in quick succession was blowing away the previous buffer, fsync does not guarantee that data is transmitted on serial lines. On the other hand tcdrain waits until the output buffer is empty.
|
|
||
| int written = ::write(_serial_fd, buffer, buffer_size); | ||
| ::fsync(_serial_fd); | ||
| tcdrain(_serial_fd); // Wait until all output is transmitted |
There was a problem hiding this comment.
This looks fine. The original fsync was copied from the GPS serial driver I believe. But this change will also need to be done on the Posix SerialImpl.cpp file as well.
|
This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there: |
|
FYI this is a breaking change for a bunch of things, including bursts of RTCM traffic. We had a problem where a burst of RTCM data was taking down the GPS. This change made the GPS serial port effectively half-duplex, and was blocking the READ part of the code. If I reverted this change, I get a /dev/ttyS0 write error -1 on the write thread and the read loop continues to process correctly. This change affects a whole bunch of serial devices throughout the whole system- were any of those tested as part of this change? |
@jeremyzff What other things? Can you list them out? Looking at this again I can definitely see how this change could cause problems. The core problem with the GPS driver injection logic is that it can get stuck writing large blocks without yielding to read, see #25535 However we shouldn't block while writing in most cases anyway, see #25537 |
|
I tested reverting this commit, and the behavior when RTCM messages burst at the GPS was different- I never lost GPS lock and if the writes went too fast I simply got a write error instead of the system waiting for write, delaying everything else. This wrapper was introduced to try and commonize the serial access for the different Operating systems (nuttx, posix, qurt), and I'm not even sure how many things use it, but that may change in the future, and forcing this write to be blocking could have other yet to be discovered impacts (PX4IO comms, low speed telemetry radios with flow control, who knows). The main difference is that with fsync, the call returns once all the bytes have been passed to the underlying OS- so it would return once all the bytes were in the transfer buffer. This allows for effectively non-blocking writes as long as you stay below the overall rate capability of the channel and don't exceed the buffer in bursts. But waiting for tcdrain won't return until all the bytes are SENT, which causes a bunch of timing problems for periodic writes such as the GPS injections. A good example would be sending a long mavlink message once per second. With fsync, it would get dumped into the buffer and sent and control would return immediately. with this change, you'd have to wait for the entire message to flush before anything else can continue. I understand the use case this was created to address (sustained, reliable writes), but this sort of fix feels like it has overly broad scope to fix that specific problem, with many possible consequences all over the system. |
Calling serial::write() in quick succession was blowing away the previous buffer, fsync does not guarantee that data is transmitted on serial lines. On the other hand tcdrain waits until the output buffer is empty.
I tested on nuttx, probably need to update posix as well. @katzfey was the intention with fsync to flush the output buffer? I discovered this issue in a custom module where I am updating the firmware of a device, reading a firmware file from the file system and writing chunks of 512 bytes at a time. I was discovering that it was rapidly looping and not actually writing each 512 byte chunk to completion.