serial: change fsync to tcdrain by dakejahl · Pull Request #23686 · PX4/PX4-Autopilot

dakejahl · 2024-09-18T03:02:43Z

Calling serial::write() in quick succession was blowing away the previous buffer, fsync does not guarantee that data is transmitted on serial lines. On the other hand tcdrain waits until the output buffer is empty.

I tested on nuttx, probably need to update posix as well. @katzfey was the intention with fsync to flush the output buffer? I discovered this issue in a custom module where I am updating the firmware of a device, reading a firmware file from the file system and writing chunks of 512 bytes at a time. I was discovering that it was rapidly looping and not actually writing each 512 byte chunk to completion.

Calling serial::write() in quick succession was blowing away the previous buffer, fsync does not guarantee that data is transmitted on serial lines. On the other hand tcdrain waits until the output buffer is empty.

katzfey · 2024-09-23T15:59:22Z


 	int written = ::write(_serial_fd, buffer, buffer_size);
-	::fsync(_serial_fd);
+	tcdrain(_serial_fd); // Wait until all output is transmitted


This looks fine. The original fsync was copied from the GPS serial driver I believe. But this change will also need to be done on the Posix SerialImpl.cpp file as well.

DronecodeBot · 2024-09-25T15:54:16Z

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-sync-q-a-sep-25-2024/40853/1

jeremyzff · 2025-09-08T22:54:19Z

FYI this is a breaking change for a bunch of things, including bursts of RTCM traffic. We had a problem where a burst of RTCM data was taking down the GPS. This change made the GPS serial port effectively half-duplex, and was blocking the READ part of the code. If I reverted this change, I get a /dev/ttyS0 write error -1 on the write thread and the read loop continues to process correctly.

This change affects a whole bunch of serial devices throughout the whole system- were any of those tested as part of this change?

dakejahl · 2025-09-09T01:11:40Z

FYI this is a breaking change for a bunch of things

This change affects a whole bunch of serial devices throughout the whole system

@jeremyzff What other things? Can you list them out? Looking at this again I can definitely see how this change could cause problems.

The core problem with the GPS driver injection logic is that it can get stuck writing large blocks without yielding to read, see #25535

However we shouldn't block while writing in most cases anyway, see #25537

jeremyzff · 2025-09-09T16:00:06Z

I tested reverting this commit, and the behavior when RTCM messages burst at the GPS was different- I never lost GPS lock and if the writes went too fast I simply got a write error instead of the system waiting for write, delaying everything else.

This wrapper was introduced to try and commonize the serial access for the different Operating systems (nuttx, posix, qurt), and I'm not even sure how many things use it, but that may change in the future, and forcing this write to be blocking could have other yet to be discovered impacts (PX4IO comms, low speed telemetry radios with flow control, who knows).

The main difference is that with fsync, the call returns once all the bytes have been passed to the underlying OS- so it would return once all the bytes were in the transfer buffer. This allows for effectively non-blocking writes as long as you stay below the overall rate capability of the channel and don't exceed the buffer in bursts. But waiting for tcdrain won't return until all the bytes are SENT, which causes a bunch of timing problems for periodic writes such as the GPS injections. A good example would be sending a long mavlink message once per second. With fsync, it would get dumped into the buffer and sent and control would return immediately. with this change, you'd have to wait for the entire message to flush before anything else can continue.

I understand the use case this was created to address (sustained, reliable writes), but this sort of fix feels like it has overly broad scope to fix that specific problem, with many possible consequences all over the system.

serial: change fsync to tcdrain

bba71c3

Calling serial::write() in quick succession was blowing away the previous buffer, fsync does not guarantee that data is transmitted on serial lines. On the other hand tcdrain waits until the output buffer is empty.

dakejahl requested a review from katzfey September 18, 2024 03:02

AlexKlimaj requested a review from dagar September 23, 2024 15:48

katzfey reviewed Sep 23, 2024

View reviewed changes

dagar approved these changes Sep 25, 2024

View reviewed changes

dagar enabled auto-merge (squash) September 25, 2024 16:04

dagar disabled auto-merge September 25, 2024 16:17

dagar merged commit ba75b9c into PX4:main Sep 25, 2024

dakejahl deleted the pr-serial_drain_tx branch November 20, 2024 02:54

dakejahl mentioned this pull request Sep 9, 2025

serial: nuttx: revert tcdrain back to fsync #25538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serial: change fsync to tcdrain#23686

serial: change fsync to tcdrain#23686
dagar merged 1 commit intoPX4:mainfrom
dakejahl:pr-serial_drain_tx

dakejahl commented Sep 18, 2024

Uh oh!

katzfey Sep 23, 2024

Uh oh!

DronecodeBot commented Sep 25, 2024

Uh oh!

jeremyzff commented Sep 8, 2025

Uh oh!

dakejahl commented Sep 9, 2025

Uh oh!

jeremyzff commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

dakejahl commented Sep 18, 2024

Uh oh!

katzfey Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

DronecodeBot commented Sep 25, 2024

Uh oh!

jeremyzff commented Sep 8, 2025

Uh oh!

dakejahl commented Sep 9, 2025

Uh oh!

jeremyzff commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants