fix: ensure that tedge mqtt pub fully publishes messages sent with QoS 2#3603
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
Robot Results
|
7690f66 to
0e58116
Compare
albinsuresh
left a comment
There was a problem hiding this comment.
Unrelated to the issue that we're fixing, but, seeing the additional handling required for Outgoing QoS 2 packets published by this client, I'm wondering whether we should do something similar for Incoming QoS 2 packets as well, to make sure that this client properly acknowledges those packets received from the broker.
| let remaining_events_empty = | ||
| event_loop.state.inflight() == 0 && pub_count.load(Ordering::SeqCst) == 0; |
There was a problem hiding this comment.
Couldn't that be just the latter?
| let remaining_events_empty = | |
| event_loop.state.inflight() == 0 && pub_count.load(Ordering::SeqCst) == 0; | |
| let remaining_events_empty = pub_count.load(Ordering::SeqCst) == 0; |
There was a problem hiding this comment.
No, because pub_count == 0 tells us that everything we intended to publish has been published, and inflight == 0 tells us that all the messages are acknowledged.
| event = event_loop.poll() => { | ||
| // If we receive more events, we want to reset the disconnect permit | ||
| // so the event loop doesn't block disconnection | ||
| disconnect_permit.take(); |
There was a problem hiding this comment.
Does that really help? After the permit is acquired in line no 254 during an iteration, we start the next iteration and would have triggered the disconnection in line no. 235. Since this event_loop.poll() happens after that, setting it back to None doesn't really help, does it? As the damage (triggering the disconnect task) has already been done.
There was a problem hiding this comment.
I thought there was a chance it might not since we definitely have cases where we acquire the disconnect permit, but then can't send the disconnect due to pending packets, but if we have pending packets that means we have to wait for event_loop.poll() to complete before we can achieve the required precondition for sending the disconnect. So thinking about it more thoroughly, I think you're correct, this is redundant.
| awaiting_ack.remove(&p.pkid); | ||
| } | ||
|
|
||
| Ok(Event::Incoming(Packet::PubRec(p))) => { |
There was a problem hiding this comment.
Isn't it safer to remove the packet once PUBCOMP is received rather than PUBREC? Since that guarantees that the broker has released the message to other subscribers?
| Ok(Event::Incoming(Packet::PubRec(p))) => { | |
| Ok(Event::Incoming(Packet::PubComp(p))) => { |
There was a problem hiding this comment.
Yes this is probably true
There was a problem hiding this comment.
Actually true, as PUBCOMP is the QoS 2 equivalent of PUBACK of QoS 1. By closing on PUBREC, the PUBREL might be missing and the broker let waiting for it.
|
Rerunning the failed test. Flaky, but no so: |
yeah it looked like it was caused by a duplicate operation being received…we could relax the assertion to not check a max message count, but not sure if that would cause downstream failures anyway…but yes all unrelated |
…QoS 2 Signed-off-by: James Rhodes <jarhodes314@gmail.com>
65840ef to
f60f280
Compare
Proposed changes
Ensure that all outstanding publishes are sent when running
tedge mqtt pub -q 2 ....The root cause was the queue in rumqttc's event loop. When we attempt to publish a message, this is sent to the event loop via a channel. In the case of this bug, the event loop didn't receive this publish notification before we detected the sender loop had shut down, and thus we started the disconnection process before publishing the message. The publish would still be processed before the disconnect request, but the disconnection caused us not to respond to the broker's PubRec acknowledgement, thus the QoS 2 message wasn't fully processed.
.Types of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments