In large scale, SGApp will slow down the speed of processing a logical port updated event

Bug #1642895 reported by yuan wei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DragonFlow
New
Medium
rajiv

Bug Description

In a small test in my environment, I added 2 log prints before and after a APP's handler to a event is called in AppDispatcher like below:

def dispatch(self, method, *args, **kwargs):
        errors = []
        for app in self.apps:
            handler = getattr(app, method, None)
            if handler is not None:
                try:
                    LOG.info(_LI("Enter app %s"), str(app))
                    handler(*args, **kwargs)
                    LOG.info(_LI("Exit app %s"), str(app))

And I made and put 10000 fake logical ports among 50 networks within 1 tenant to DF DB.
Those ports were designed to be remote ports in a fake host.

Then I restarted the DF local controller, so those ports would be pulled from DF DB (those ports have the same tenant as a local logical port) and APPs' handler would be called to process those ports updated events.

By observing the printed DF logs, I found the speed of SGApp processing one port updated event would be eventually slowed down while the number of processed ports increased (the time of processing one port updated event changed from less than 1ms to over 60ms).
Other APPs didn't have this same phenomenon.

 2016-11-18 13:01:24.450 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.l2_ml2_app.L2App object at 0x7fa0ad3d3c90>
2016-11-18 13:01:24.452 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.l2_ml2_app.L2App object at 0x7fa0ad3d3c90>
2016-11-18 13:01:24.452 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.l3_proactive_app.L3ProactiveApp object at 0x7fa0ad3d3bd0>
2016-11-18 13:01:24.453 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.l3_proactive_app.L3ProactiveApp object at 0x7fa0ad3d3bd0>
2016-11-18 13:01:24.454 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.dhcp_app.DHCPApp object at 0x7fa0ad3892d0>
2016-11-18 13:01:24.454 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.dhcp_app.DHCPApp object at 0x7fa0ad3892d0>
2016-11-18 13:01:24.454 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.dnat_app.DNATApp object at 0x7fa0ad389290>
2016-11-18 13:01:24.455 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.dnat_app.DNATApp object at 0x7fa0ad389290>
2016-11-18 13:01:24.455 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.sg_app.SGApp object at 0x7fa0ad1c2d50>
2016-11-18 13:01:24.521 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.sg_app.SGApp object at 0x7fa0ad1c2d50>
2016-11-18 13:01:24.521 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.portsec_app.PortSecApp object at 0x7fa0ad356190>
2016-11-18 13:01:24.521 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.portsec_app.PortSecApp object at 0x7fa0ad356190>

By deeply inspection, I suspected the current use of netaddr.IPRange caused this bug. I replaced SGApp with a older version SGApp which used own implementation to instead of netaddr.IPRange, this bug won't occur and the time of processing one port updated event could remain less than 1ms.

2016-11-18 14:03:16.887 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.l2_ml2_app.L2App object at 0x7f526a4c2c90>
2016-11-18 14:03:16.889 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.l2_ml2_app.L2App object at 0x7f526a4c2c90>
2016-11-18 14:03:16.889 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.l3_proactive_app.L3ProactiveApp object at 0x7f526a4c2bd0>
2016-11-18 14:03:16.890 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.l3_proactive_app.L3ProactiveApp object at 0x7f526a4c2bd0>
2016-11-18 14:03:16.890 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.dhcp_app.DHCPApp object at 0x7f526a4782d0>
2016-11-18 14:03:16.891 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.dhcp_app.DHCPApp object at 0x7f526a4782d0>
2016-11-18 14:03:16.891 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.dnat_app.DNATApp object at 0x7f526a478290>
2016-11-18 14:03:16.891 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.dnat_app.DNATApp object at 0x7f526a478290>
2016-11-18 14:03:16.892 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.sg_app.SGApp object at 0x7f526a27ab10>
2016-11-18 14:03:16.892 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.sg_app.SGApp object at 0x7f526a27ab10>
2016-11-18 14:03:16.893 INFO dragonflow.controller.dispatcher [-] Enter app <dragonflow.controller.portsec_app.PortSecApp object at 0x7f526a27af10>
2016-11-18 14:03:16.893 INFO dragonflow.controller.dispatcher [-] Exit app <dragonflow.controller.portsec_app.PortSecApp object at 0x7f526a27af10>

Omer Anson (omer-anson)
Changed in dragonflow:
importance: Undecided → High
rajiv (rajiv-kumar)
Changed in dragonflow:
assignee: nobody → rajiv (rajiv-kumar)
Revision history for this message
Li Ma (nick-ma-z) wrote :

Reduce the importance as it is an improvement.

Changed in dragonflow:
importance: High → Low
Omer Anson (omer-anson)
Changed in dragonflow:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on dragonflow (master)

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.opendev.org/434133

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.