Description
Orchagent received notifications in an order different from the order in which other daemons sent them.
This issue can be reproduced in dynamic buffer calculation test.
This is a new feature which will come soon. In the feature, there is a logic that the buffer profile will be dynamically created when a new tuple of speed, cable length occurs and removed when it isn't referenced any longer.
One of the test cases has the following flow:
- port's speed updated, causing port down and routing entries withdrawn. After that, the cable length of the port has been updated, causing buffer profile on the port updated.
- buffermgrd create a new buffer profile and replaces the port's old profile on lossless PG with the newly created one.
- the old profile is removed
After step 2 being executed, the old profile shouldn't be referenced any more, which means it's safe to remove it. However, we observed error from sairedis saying it's still being referenced. This is because orchagent received two notifications in a reversed order in which they were sent.
I suspect that the orchagent handles notifications in a different order in which buffermgrd sends them is caused by the low-level mechanism of the orchagent, which takes advantage of I/O multiply mechanism to receive notifications from redis-db:
- Each orch in orchagent creates an FD for each table it listens to and inserts all the FDs into something like a select vector orchdaemon.cpp#L453.
- The main loop is blocked at a select-like interface, orchdaemon.cpp#L453.
When the system isn't busy everything works well. But when the system is busy, notifications are backlogged in sockets, the orchagent will handle notifications in the order like FD-ascending or FD-descending rather than the order they come.
My suggestion is to add a dependency between different buffer tables. In details, task_need_retry should be returned in the following case:
- if a BUFFER_POOL entry is referenced by a BUFFER_PROFILE entry when a removing notification is received.
- if a BUFFER_PROFILE entry is referenced by a BUFFER_PG entry when a removing notification is received.
To be more precise, we need to:
- record the dependencies among
BUFFER_POOL, BUFFER_PROFILE and BUFFER_PG in the orchagent.
- add a
set to the objects. it contains the objects that depend on it. typically they are the objects that reference it.
- when a new object created, add it to the
set of the object it depends on.
for example, when a new BUFFER_PG is added, it will be added to the set of the BUFFER_POOL object it references.
- when an object's dependency is updated, remove it from the field of the old depended object and addition it to the new one.
for example, if the profile in a BUFFER_PG is updated from a to b, we need remove the BUFFER_PG from the set of a and add it to that of b
- when an object is removed, remove it from the
set of the depended object
- check the dependency ahead of removing entries, return task_need_retry if the
set isn't empty.
By doing so, an entry is prevented from being released before all the references removed.
The drawback of the solution is that making it retry will block all the further notification from one table and make orchagent keep retrying. But this won't introduce further risk as long as:
- there is no loop dependency, which means all the depending entries can be removed within a certain period of time.
- the notifications are out of order rather than lost, which means the time won't be too long.
Steps to reproduce the issue:
See the above summary.
Describe the results you received:
Detailed flow and log message:
- port's speed updated, causing port down and routing entries withdrawn. After that, the cable length of the port has been updated, causing buffer profile on the port updated.
- buffermgrd create a new buffer profile and replaces the port's old profile on lossless PG with the newly created one.
Jul 31 06:52:08.235333 mtbc-sonic-03-2700 INFO config: 'interface speed Ethernet112 10000' executing...
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 INFO swss#buffermgrd: :- handlePortTable: Updating BUFFER_PG for port Ethernet112 due to speed or port updated
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 INFO swss#buffermgrd: :- allocateProfile: Allocating new BUFFER_PROFILE pg_lossless_10000_40m_profile
Jul 31 06:52:08.346616 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- allocateProfile: Creating new profile 'pg_lossless_10000_40m_profile'
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 INFO swss#orchagent: message repeated 25 times: [ :- doTask: Handling buffer task]
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 INFO swss#orchagent: :- setPortAdminStatus: Set admin status DOWN to port pid:10000000002d6
Jul 31 06:52:08.351527 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- allocateProfile: BUFFER_PROFILE pg_lossless_10000_40m_profile has been created successfully
- buffermgrd removes the port's lossless PG, causing the newly created buffer profile no longer referenced.
Jul 31 06:52:22.603972 mtbc-sonic-03-2700 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=True strip_empty_ends=True _raw_params=config interface lossless_pg remove Ethernet112 3-4 removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Jul 31 06:52:23.131873 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- doRemovePgTask: Remove BUFFER_PG Ethernet112:3-4 (profile pg_lossless_10000_40m_profile)
The newly created buffer profile is removed as well.
Jul 31 06:52:23.144138 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- doRemovePgTask: No lossless PG configured on port Ethernet112 anymore, try removing the original profile pg_lossless_10000_40m_profile
Jul 31 06:52:23.144261 mtbc-sonic-03-2700 NOTICE swss#buffermgrd: :- releaseProfile: BUFFER_PROFILE pg_lossless_10000_40m_profile has been released successfully
- bufferorch handles the notification in a reverse order, receiving the notification of buffer profile removing first and failing to remove the profile. The failure is caused by the buffer profile being still referenced at the time
Jul 31 06:52:23.660780 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task
Jul 31 06:52:23.662209 mtbc-sonic-03-2700 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x1900000000074f reference count is 2, can't remove
Jul 31 06:52:23.662372 mtbc-sonic-03-2700 ERR swss#orchagent: :- processBufferProfile: Failed to remove buffer profile pg_lossless_10000_40m_profile with type BUFFER_PROFILE, rv:-5
Jul 31 06:52:23.662503 mtbc-sonic-03-2700 ERR swss#orchagent: :- doTask: Failed to process buffer task, drop it
Jul 31 06:52:23.662724 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task
Jul 31 06:52:23.663667 mtbc-sonic-03-2700 NOTICE swss#orchagent: :- processPriorityGroup: Remove buffer PG Ethernet112:3-4
Jul 31 06:52:23.664102 mtbc-sonic-03-2700 INFO swss#orchagent: :- doTask: Handling buffer task
Describe the results you expected:
No error should be observed.
Additional information you deem important (e.g. issue happens only occasionally):
**Output of `show version`:**
```
(paste your output here)
```
**Attach debug file `sudo generate_dump`:**
```
(paste your output here)
```
Description
Orchagent received notifications in an order different from the order in which other daemons sent them.
This issue can be reproduced in dynamic buffer calculation test.
This is a new feature which will come soon. In the feature, there is a logic that the buffer profile will be dynamically created when a new tuple of speed, cable length occurs and removed when it isn't referenced any longer.
One of the test cases has the following flow:
- port's speed updated, causing port down and routing entries withdrawn. After that, the cable length of the port has been updated, causing buffer profile on the port updated.
- buffermgrd create a new buffer profile and replaces the port's old profile on lossless PG with the newly created one.
- the old profile is removed
After step 2 being executed, the old profile shouldn't be referenced any more, which means it's safe to remove it. However, we observed error from sairedis saying it's still being referenced. This is because orchagent received two notifications in a reversed order in which they were sent.
I suspect that the orchagent handles notifications in a different order in which buffermgrd sends them is caused by the low-level mechanism of the orchagent, which takes advantage of I/O multiply mechanism to receive notifications from redis-db:
When the system isn't busy everything works well. But when the system is busy, notifications are backlogged in sockets, the orchagent will handle notifications in the order like FD-ascending or FD-descending rather than the order they come.
My suggestion is to add a dependency between different buffer tables. In details,
task_need_retryshould be returned in the following case:To be more precise, we need to:
BUFFER_POOL,BUFFER_PROFILEandBUFFER_PGin the orchagent.setto the objects. it contains the objects that depend on it. typically they are the objects that reference it.setof the object it depends on.for example, when a new
BUFFER_PGis added, it will be added to thesetof theBUFFER_POOLobject it references.for example, if the
profilein aBUFFER_PGis updated fromatob, we need remove theBUFFER_PGfrom thesetofaand add it to that ofbsetof the depended objectsetisn't empty.By doing so, an entry is prevented from being released before all the references removed.
The drawback of the solution is that making it retry will block all the further notification from one table and make orchagent keep retrying. But this won't introduce further risk as long as:
Steps to reproduce the issue:
See the above summary.
Describe the results you received:
Detailed flow and log message:
Describe the results you expected:
No error should be observed.
Additional information you deem important (e.g. issue happens only occasionally):