Strip null bytes from LLDP chassisid before lookup#3666
Strip null bytes from LLDP chassisid before lookup#3666hamartin wants to merge 1 commit intoUninett:masterfrom
Conversation
Some Aruba switches return LLDP chassis IDs containing null bytes. This caused NAV's ipdevpoll to fail when storing neighbor information, since Django does not allow null bytes in database fields. As a result, topology discovery broke for these devices. This change strips null characters from the chassis ID string and only performs the lookup if the cleaned value is non-empty. With this fix, topology collection from affected Aruba switches works as expected. Also added to the LLDP plugin unit test, a test to verify that null bytes are being removed from the chassis ID.
|
|
There is an alternative to just stripping the string for null bytes and then checking if there actually is any content in it. One could modify the Snmp/init.py safestring function to do the stripping on strings as well and then importing it into the LLDP plugin. I have not checked the codebase if this would have any unwanted side effects. This feels cleaner, but it just adds more code in general so I chose this solution instead. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3666 +/- ##
==========================================
+ Coverage 62.88% 62.94% +0.05%
==========================================
Files 611 611
Lines 45125 45152 +27
Branches 43 43
==========================================
+ Hits 28378 28422 +44
+ Misses 16737 16720 -17
Partials 10 10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This proposed solution worked for our environment of Aruba 6200F's as well, topology is not crashing anymore with error "A string literal cannot contain NUL (0x00) characters" |
|
I have been trying to modify the unittest by reverting the function to what it was and creating a new function in the same file so that it actually tests the changes I did. However, I am simply not skilled enough and been looking at the NAV source for long enough to do this in any way that I know is good or even "decent enough". Do I have to create a unittest for the change I did? I do not think I am able to create something relatively soon. I simply need more time to get to know the code. |
lunkwill42
left a comment
There was a problem hiding this comment.
Many thanks for contributing this PR!
From what I can see, this should definitely help with #2479 - assuming your tracebacks show the same problem.
However, I would suggest another location for this fix :)
| chassid_str = str(chassid).strip('\x00') | ||
| if chassid_str: | ||
| netbox = lookup(chassid_str) |
There was a problem hiding this comment.
I'm not entirely convinced this is where you should be stripping null bytes.
A remote chassis identifier always comes with a subtype that tells us how to interpret the identifier data string. At this point in the code, chassid has already been parsed according to its subtype, and represents some object that may not even be a string any more (hence the conversion back to a string for the lookup function).
Only three subtypes are supported for lookups in NAV, though:
macAddress: A binary string to be interpreted as a MAC address (which can definitely include null bytes). The resultingchassidvalue is in fact aMacAddressobject, not a string, but can be coerced into a string usingstr()- which will definitely not contain any null bytes.networkAddress: A binary string to be interpreted as an IP address (which can definitely include null bytes). The resultingchassidvalue is in fact anIPobject, not a string, but can be coerced into a string usingstr()- which will definitely not contain any null bytes.local: This one is tricky. The MIB doesn't really specify how to interpret this, other than as some "local" identifier:
The enumeration 'local(7)' represents a chassis identifier based on a locally defined value.
NAV tries to interpret this in a couple of ways: Devices can provide their own local identifier through their LLDP-MIB, so NAV stores this for lookups. But some devices also appear to provide some variant of their sysname here, so NAV tries to look up both using _netbox_from_local().
You didn't provide a traceback that indicates properly where your issue is, but I would assume it arises due to a local identifier that includes null bytes (the other two possibilities at this point will not).
Therefore, I think a more directed solution would be to modify the lookup function _netbox_from_local() to ensure that the value it uses for lookups does not contain null bytes.
There was a problem hiding this comment.
The errors we are getting are indeed because of local() values. I can revert the changes I did at work and record some of the tracebacks and information around the tracebackes at work tomorrow.
I was also thinking this might not be the correct place to put the "fix" as this data is retrieved using SNMP, so I was looking at maybe fixing this at the source when the data is comming in to the system as I cannot see the flow of data through the code for these LLDP objects.
I simply saw a traceback, followed the problem to these specific lines and figured out what I had to do at this specific point in the code to fix the problem. So if anyone else have a better solution which solves the root problem, of course that is the solution that should be used.
Anyway, I will provide tracebacks and information about the interfaces, switches etc. tomorrow.
There was a problem hiding this comment.
I simply saw a traceback, followed the problem to these specific lines and figured out what I had to do at this specific point in the code to fix the problem. So if anyone else have a better solution which solves the root problem, of course that is the solution that should be used.
Anyway, I will provide tracebacks and information about the interfaces, switches etc. tomorrow.
So, I whipped together an alternative implementation and a test in #3695 - it would be great if you're able to test and verify on your system (and provide a traceback if it differs significantly from the one described in #2479), @hamartin
There was a problem hiding this comment.
The following stack trace is for the unmodified NAV. Any DNS information, IP, comment or MAC information has been changed to hide the true values.
2025-12-15 11:25:51,625 [563774] [ERROR jobs.jobhandler] [topo anon-switch-a6148fp.anon.tlp] Plugin nav.ipdevpoll.plugins.lldp.LLDP('anon-switch-a6148fp.anon.tlp') reported an unhandled failure
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks
result = context.run(
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator
return g.throw(self.value.with_traceback(self.tb))
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 86, in handle
yield run_in_thread(self._process_remote)
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 269, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
return func(*args, **kw)
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/db.py", line 111, in _reset
return func(*args, **kwargs)
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in _process_remote
neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in <listcomp>
neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 127, in __init__
self.identify()
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 130, in identify
self.netbox = self._identify_netbox()
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 231, in _identify_netbox
netbox = lookup(str(chassid))
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 242, in _netbox_from_local
netbox = self._netbox_query(
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 204, in _netbox_query
netbox = manage.Netbox.objects.values('id', 'sysname').get(query)
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 635, in get
num = len(clone)
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 382, in __len__
self._fetch_all()
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 1886, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 210, in __iter__
for row in compiler.results_iter(
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1513, in results_iter
results = self.execute_sql(
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
cursor.execute(sql, params)
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
builtins.ValueError: A string literal cannot contain NUL (0x00) characters.
The following is from the switch NAV has created a stacktrace for above.
anon-switch-a6148fp# show lldp configuration 1/1/37
LLDP Global Configuration
=========================
LLDP Enabled : Yes
LLDP Transmit Interval : 30
LLDP Hold Time Multiplier : 4
LLDP Transmit Delay Interval : 2
LLDP Reinit Time Interval : 2
LLDP Trap Enabled : No
LLDP Port Configuration
=======================
Med Priority Override : No
PORT TX-ENABLED RX-ENABLED INTF-TRAP-ENABLED
--------------------------------------------------------------------------
1/1/37 Yes Yes Yes
TLVs Advertised
===============
Management Address
Port Description
Port VLAN-ID
System Capabilities
System Description
System Name
OUI
Port VLAN-Name
Dot1 Link Aggregation
Dot3 Power
Dot3 Macphy
Dot3 MFS
Med Capability
Med Network Policy
Med Power
anon-switch-a6148fp#
anon-switch-a6148fp# show lldp local-device 1/1/37
Local Port Data
===============
Port-ID : 1/1/37
Port-Desc : "**Some comment**"
Port VLAN ID : 1234
Maximum Frame Size: 1500
Parent Interface : interface 1/1/37
PoE Information
PSE Tlv Sent Type : dot3
PSE Power Type : Type 2 PSE
Power Source : Primary
Power Priority : low
PSE Allocated Power Value : 0.00 W
PD Requested Power Value : 0.00 W
anon-switch-a6148fp#
anon-switch-a6148fp# show lldp neighbor-info 1/1/37
Port : 1/1/37
Neighbor Entries : 1
Neighbor Entries Deleted : 0
Neighbor Entries Dropped : 0
Neighbor Entries Aged-Out : 0
Neighbor System-Name :
Neighbor System-Description :
Neighbor Chassis-ID : ADELLCOMPUTER
Neighbor Management-Address :
Chassis Capabilities Available :
Chassis Capabilities Enabled :
Neighbor Port-ID : 00:11:22:33:44:55
Neighbor Port-Desc :
Neighbor Port VLAN ID :
Neighbor Port VLAN Name :
Neighbor Port MFS : 0
Link aggregation supported :
Link aggregation enabled :
Aggregation port ID :
TTL : 3601
Neighbor MED Capabilities
Neighbor Device class : CLASS_I
MED capabilities enabled : Capabilities
MED capabilities supported : Capabilities
Neighbor Mac-Phy details
Neighbor Auto-neg Supported : true
Neighbor Auto-Neg Enabled : true
Neighbor Auto-Neg Advertised : 1000 BASE_TFD
Neighbor MAU type :
Neighbor EEE information : DOT3
Neighbor TX Wake time : 0 us
Neighbor RX Wake time : 0 us
Neighbor Fallback time : 0 us
Neighbor TX Echo time : 0 us
Neighbor RX Echo time : 0 us
anon-switch-a6148fp#
anon-switch-a6148fp# show lldp oui-tlv-details
LOCAL-PORT CHASSIS-ID PORT-ID OUI OUI-SUBTYPE OUI-DATA
--------------------------------------------------------------------------------
...
...
1/1/37 ADELLCOMPUTER 00:11:2... 00120f 1 0300010000
1/1/37 ADELLCOMPUTER 00:11:2... 0012bb 1 000101
...
...
anon-switch-a6148fp# show lldp tlv
TLVs Advertised
===============
Management Address
Port Description
Port VLAN-ID
System Capabilities
System Description
System Name
OUI
Port VLAN-Name
Dot1 Link Aggregation
anon-switch-a6148fp#
How it looks when I look up the switch after doing the change.

I will test your solution later this evening and give you a report.
There was a problem hiding this comment.
Tried your suggested pr in #3695. I agree that it is a better place to solve the problem, but it seems the solve requires that sysname is returned properly, which it seems, it does not for some or all of the Aruba switches with this problem in our infrastructure. You will find the stack trace I got in your suggested pr.



Some Aruba switches return LLDP chassis IDs containing null bytes. This caused NAV's ipdevpoll to fail when storing neighbor information, since Django does not allow null bytes in database fields. As a result, topology discovery broke for these devices.
This change strips null characters from the chassis ID string and only performs the lookup if the cleaned value is non-empty. With this fix, topology collection from affected Aruba switches works as expected.
Scope and purpose
This commit does not fix a reported issue I think, but it is very closely related to #2479, #3552 and #3559 as it work on the same variables and situations, but where the content is a string, not bytes.
The chassis_id contains null bytes after it has been converted to a string, doing so ipdevpolld shows stacktraces for all switches which return null bytes in their chassis ID string. For the company i work for, this would be Aruba switches in the 6000, 6100 and 6200 series.
Contributor Checklist
Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to NAV can be found in the
Hacker's guide to NAV.
<major>.<minor>.x). For a new feature or other additions, it should be based onmaster.