Conversation
- The network can get stuck in an available loop if the radio is receiving data faster than the network can process it. If this happens, the RX FIFO needs to be flushed - Also return NETWORK_OVERRUN when this happens - Add network system type 160 NETWORK_OVERRUN - Also return before radio.read() if payload too small
|
nrf_to_nrf lib has no flush_rx function, so the build is failing... |
|
Added flush_rx() to nrf_to_nrf, I'm tempted to just merge this anyway, because it may take a while to pickup on the new release for the jobs to run successfully. |
- Need to clear the packets from the radio even if they are shorter than RF24Network header size
|
So current thoughts are that this addresses the problem, but I'm still not entirely sure why the radio would so often go into an available() loop unless there was random interference popping up. It seems to be pretty random and still a bit difficult to recreate on demand. As per below, it seems to happen in bunches, with multiple timeouts happening in short periods of time: pi@RPi5:~ $ tail -F failLog.txt
10
tail: failLog.txt: file truncated
13
tail: failLog.txt: file truncated
14
tail: failLog.txt: file truncated
15
tail: failLog.txt: file truncated
17
tail: failLog.txt: file truncated
19
tail: failLog.txt: file truncated
20I also want to play around with the timeout, because 1 second seems like a long time to remain in the available function. |
|
The payload is not removed from the RX FIFO unless the entire payload's length is |
|
100 milliseconds would match the timeout in |
Hmm, the DPL is just an added field in the overall packet that gets sent, so it probably would be possible for this to happen. I should be able to do it with NRF52 devices on purpose. I can probably test that out later. |
|
I doubt you'd be able to with RF24 lib because payloads written to the TX FIFO are truncated to 32 bytes (or they should be). If that software limit didn't exist, then I believe it might be possible to mess up the dynamic payload size by uploading a buffer larger than 32 bytes to the TX FIFO when dynamic payloads is enabled. |
|
Yeah, it looks like I can't replicate that behavior with RF52 either. 100ms seems to be a good value for the timeout in testing, I think we can go with that. |
2bndy5
left a comment
There was a problem hiding this comment.
I'm guessing you have local patches for the higher layer(s) to handle this new msg type.
| /** | ||
| * Messages of this type indicate the network is being overrun with data & the RX FIFO has been flushed. | ||
| **/ | ||
| #define NETWORK_OVERRUN 160 |
There was a problem hiding this comment.
Here's my thoughts on the actual number. Beware, I still thinking in binary from exposing the STATUS byte...
The msg type is an 8-bit var. I was thinking we'd assert the top 3 bits (0b11100000 = 0xE0 = 224) and use the lower 5 bits to indicate the layer.
It doesn't really matter though. As long as the number isn't already used.
If you really want to get fancy about it, we could declare an enum of ErrorKinds (or ReservedMsgTypes). That way the source code (and docs) would be a bit more organized about OSI errors.
This idea can wait I guess, until we flesh out what errors we want to propagate up the stack...
There was a problem hiding this comment.
I was thinking we'd assert the top 3 bits (0b11100000 = 0xE0 = 224) and use the lower 5 bits to indicate the layer.
I think I'd need to see how this works out before forming an opinion. Knowing you it will all make sense in the end. :p
If you really want to get fancy about it, we could declare an enum of ErrorKinds.
That I understand, and its sounds like a good idea, but maybe for another PR.
| radio.failureDetected = 1; | ||
| #endif | ||
| break; | ||
| if (millis() > timeout) { |
There was a problem hiding this comment.
Ah, that sweet performant math. 😆
Not yet, currently nothing really needs to be done on encountering this error, but it would be nice for RF24Gateway to display the error count, instead of logging to a file like i have it doing currently. |
|
Yeah File IO is so timely. Although on Linux, its probably the same as writing to stdout or stderr. |
- The rx buffer also needs to be flushed if R_RX_PL_WID returns 0 - No need for the delay nRF24/RF24Network#249
- The rx buffer also needs to be flushed if R_RX_PL_WID returns 0 - No need for the delay nRF24/RF24Network#249
* Fix handling of radio/network overruns - The network can get stuck in an available loop if the radio is receiving data faster than the network can process it. If this happens, the RX FIFO needs to be flushed - Also return NETWORK_OVERRUN when this happens - Add network system type 160 NETWORK_OVERRUN * Set timeout to 100 in available()
I was finally able to recreate this issue repeatedly using the following Arduino Sketch while the RPi was actively transmitting and receiving data.
The system type used etc can be up for debate, I just threw this fix together.
Closes nRF24/RF24#1033