Skip to content

ReaderProxy history reloaded incorrectly missing GAP message [4962]#453

Closed
guillaumeautran wants to merge 1 commit intoeProsima:developfrom
clearpathrobotics:issue_452
Closed

ReaderProxy history reloaded incorrectly missing GAP message [4962]#453
guillaumeautran wants to merge 1 commit intoeProsima:developfrom
clearpathrobotics:issue_452

Conversation

@guillaumeautran
Copy link
Copy Markdown
Contributor

@guillaumeautran guillaumeautran commented Mar 18, 2019

The problem is invalid changes at the end of the stateful writer history don't get added to a newly connected reader proxy cache causing the reader proxy to miss a lot of invalid changed and not generate the proper GAP message.

The fix is to make sure the missing changes are loaded into the reader proxy history so that the GAP message is properly generated.

should fix #452

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@guillaumeautran
Copy link
Copy Markdown
Contributor Author

Looks like the build failure on Linux is due to a infrastructure problem

Cloning repository https://github.com/eProsima/Fast-RTPS.git
 > git init /workspace/FastRTPS Manual Linux # timeout=10
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init /workspace/FastRTPS Manual Linux

Not really sure why the MAC build failed though. The test results is unclear to me...

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@raquelalvarezbanos raquelalvarezbanos changed the title ReaderProxy history reloaded incorrectly missing GAP message ReaderProxy history reloaded incorrectly missing GAP message [4962] Mar 19, 2019
@MiguelCompany MiguelCompany mentioned this pull request Mar 29, 2019
31 tasks
@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines were removed on PR #425, as the error message could incorrectly be shown on correct situations after the changes on PR #411. There could be a situation were we receive an acknack message requesting a single change that is not considered unacknowledged (may be underway), in which case the logError will be incorrectly stated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed this after the fact. Is there a easy way to differentiate those two cases? (when the missing change is legitimate vs. when it is not?).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only non-legitimate situation I can think of is if the acknack requests a change that has been already acknowledged, in which case its sequence number would be lower or equal to the low_mark. We could add some code before line 172. Something like:

if(*sit <= changesFromRLowMark_)
{
    logError(RTPS_WRITER,"Requested already acknowledged change: " << *sit
                   << " (low mark: " << changesFromRLowMark_ << ")");
}
else if(chit != m_changesForReader.end() && UNACKNOWLEDGED == chit->getStatus())

We could even just perform an assert(*sit > changesFromRLowMark_ ) before line 170.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to perform the assert because I know there are cases where the requested change is above the low_mark. But I will most definitely silence the log is the requested changes are below the low_mark.

[2019-04-01 15:23:48.632 UTC] [ERROR] [/bridge]: [fastrtps.RTPS_WRITER] Requested Changes: 14 15 16 17 18 30 31 32 33 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66  not found (low mark: 13)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace shown may just mean those changes are either already marked as requested (i.e. the same acknack was received twice), or considered underway (i.e. recently sent). Both are legal situations. That is the reason the message was originally silenced.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I've modified the code to record those sequence numbers greater than the low mark but not found in the reader history. These are most definitely an error that should not happen. Let me know if that's good now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not found in the reader history

I guess you mean writer history, as the ReaderProxy is part of the writer.

Even though, this will still log errors in legal situations. The changes may have been removed from the writer's history (due to history limits, removed by the user, ...). In that case, after receiving the acknack, a gap message will be sent informing the reader of the missing changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the reader proxy (aka the writer history).

I am still sceptical wrt the GAP message being sent in the cases where there is a GAP in the writer history without a corresponding irrelevant change recorded in the reader proxy.
At least in 1.7.2 release, I do not see this happening. Can you please point me to the code where this is taken care of (ie: a ACKNACK message received generating a GAP message for missing change seqnum > low mark)? IMHO, if that was the case, this whole fix would be mute since the whole problem this fix addresses is the fact that a new reader proxy being created did not get initialized with the missing (irrelevant) change records indicating the gap at the tail end of the Writer History.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a deeper look at this and I've got to the conclusion that you're right. The changes may not be on the writer's history but should be on the ReaderProxy collection. We may refactor that in the future, but right now we're good to go with your changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks!

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

MiguelCompany
MiguelCompany previously approved these changes Apr 8, 2019
@guillaumeautran
Copy link
Copy Markdown
Contributor Author

@MiguelCompany : Just rebased against latest develop. Feel free to merge anytime.

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

The problem is invalid changes at the end of the stateful writer history don't get added to a newly connected reader proxy cache causing the reader proxy to miss a lot of invalid changed and not generate the proper GAP message.

The fix is to make sure the missing changes are loaded into the reader proxy history so that the GAP message is properly generated.

Reworked the error message on missing changes to eliminate legitimate cases (UNACK'ed changed or duplicated ACK request).
The ERROR will only show if the change was not found and the change was not ACK'ed previously.

Remove meaningless assert statement.

issue eProsima#452
@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@richiware
Copy link
Copy Markdown
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@guillaumeautran
Copy link
Copy Markdown
Contributor Author

@MiguelCompany Any help in understanding why the MAC tests are not passing?

@MiguelCompany
Copy link
Copy Markdown
Member

Hi @guillaumeautran

ReaderProxy was refactored and changes are now removed from ReaderProxy when they are removed from writer's history.

I've found that irrelevant changes are added to the collection of the ReaderProxy and they shouldn't.

GAP messages are sent taking into consideration history's next_sequence_number() (see here and here )

So to take already removed changes into account when a new reader matches, and based on your changes, I propose the following patch:

diff --git a/src/cpp/rtps/writer/ReaderProxy.cpp b/src/cpp/rtps/writer/ReaderProxy.cpp
index 442a7abc..cb331fdf 100644
--- a/src/cpp/rtps/writer/ReaderProxy.cpp
+++ b/src/cpp/rtps/writer/ReaderProxy.cpp
@@ -142,6 +142,12 @@ void ReaderProxy::add_change(
         return;
     }
 
+    // Irrelevant changes are not added to the collection
+    if (!change.isRelevant())
+    {
+        return;
+    }
+
     if (changes_for_reader_.push_back(change) == nullptr)
     {
         // This should never happen
diff --git a/src/cpp/rtps/writer/StatefulWriter.cpp b/src/cpp/rtps/writer/StatefulWriter.cpp
index 00f4e2b0..73b0c20f 100644
--- a/src/cpp/rtps/writer/StatefulWriter.cpp
+++ b/src/cpp/rtps/writer/StatefulWriter.cpp
@@ -610,6 +610,13 @@ bool StatefulWriter::matched_reader_add(const ReaderProxyData& rdata)
         for(std::vector<CacheChange_t*>::iterator cit = mp_history->changesBegin();
                 cit != mp_history->changesEnd(); ++cit)
         {
+            // This is to cover the case when there are holes in the history
+            while (current_seq != (*cit)->sequenceNumber)
+            {
+                not_relevant_changes.insert(current_seq);
+                ++current_seq;
+            }
+
             ChangeForReader_t changeForReader(*cit);
 
             if(rp->durability_kind() >= TRANSIENT_LOCAL && this->getAttributes().durabilityKind >= TRANSIENT_LOCAL)
@@ -632,7 +639,12 @@ bool StatefulWriter::matched_reader_add(const ReaderProxyData& rdata)
             ++current_seq;
         }
 
-        assert(last_seq + 1 == current_seq);
+        // This is to cover the case where the last changes have been removed from the history
+        while (current_seq < next_sequence_number())
+        {
+            not_relevant_changes.insert(current_seq);
+            ++current_seq;
+        }
 
         try
         {

@guillaumeautran
Copy link
Copy Markdown
Contributor Author

Sounds good @MiguelCompany
Are you taking care of the new updated changes then or do you want me to make a new PR (I will not be able to test it anytime soon as we are stuck on 1.7.2 until late May).

@MiguelCompany
Copy link
Copy Markdown
Member

Closing in favor of #502

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants