Skip to content

Strip null bytes from LLDP chassisid before lookup#3666

Closed
hamartin wants to merge 1 commit intoUninett:masterfrom
hamartin:fix-lldp-aruba-nullbytes
Closed

Strip null bytes from LLDP chassisid before lookup#3666
hamartin wants to merge 1 commit intoUninett:masterfrom
hamartin:fix-lldp-aruba-nullbytes

Conversation

@hamartin
Copy link
Copy Markdown

@hamartin hamartin commented Dec 5, 2025

Some Aruba switches return LLDP chassis IDs containing null bytes. This caused NAV's ipdevpoll to fail when storing neighbor information, since Django does not allow null bytes in database fields. As a result, topology discovery broke for these devices.

This change strips null characters from the chassis ID string and only performs the lookup if the cleaned value is non-empty. With this fix, topology collection from affected Aruba switches works as expected.

Scope and purpose

This commit does not fix a reported issue I think, but it is very closely related to #2479, #3552 and #3559 as it work on the same variables and situations, but where the content is a string, not bytes.

The chassis_id contains null bytes after it has been converted to a string, doing so ipdevpolld shows stacktraces for all switches which return null bytes in their chassis ID string. For the company i work for, this would be Aruba switches in the 6000, 6100 and 6200 series.

Contributor Checklist

Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to NAV can be found in the
Hacker's guide to NAV.

  • Added a changelog fragment for towncrier
  • Added/amended tests for new/changed code
  • Added/changed documentation
  • Linted/formatted the code with ruff, easiest by using pre-commit
  • Wrote the commit message so that the first line continues the sentence "If applied, this commit will ...", starts with a capital letter, does not end with punctuation and is 50 characters or less long. See https://cbea.ms/git-commit/
  • Based this pull request on the correct upstream branch: For a patch/bugfix affecting the latest stable version, it should be based on that version's branch (<major>.<minor>.x). For a new feature or other additions, it should be based on master.
  • If applicable: Created new issues if this PR does not fix the issue completely/there is further work to be done
  • If it's not obvious from a linked issue, described how to interact with NAV in order for a reviewer to observe the effects of this change first-hand (commands, URLs, UI interactions)
  • If this results in changes in the UI: Added screenshots of the before and after
  • If this adds a new Python source code file: Added the boilerplate header to that file

Some Aruba switches return LLDP chassis IDs containing null bytes. This
caused NAV's ipdevpoll to fail when storing neighbor information, since
Django does not allow null bytes in database fields. As a result,
topology discovery broke for these devices.

This change strips null characters from the chassis ID string and only
performs the lookup if the cleaned value is non-empty. With this fix,
topology collection from affected Aruba switches works as expected.

Also added to the LLDP plugin unit test, a test to verify that null
bytes are being removed from the chassis ID.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 5, 2025

CLA assistant check
All committers have signed the CLA.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Dec 5, 2025

@hamartin
Copy link
Copy Markdown
Author

hamartin commented Dec 5, 2025

There is an alternative to just stripping the string for null bytes and then checking if there actually is any content in it.

One could modify the Snmp/init.py safestring function to do the stripping on strings as well and then importing it into the LLDP plugin. I have not checked the codebase if this would have any unwanted side effects. This feels cleaner, but it just adds more code in general so I chose this solution instead.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.94%. Comparing base (9ba8d5c) to head (2946964).
⚠️ Report is 505 commits behind head on master.

Files with missing lines Patch % Lines
python/nav/ipdevpoll/plugins/lldp.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3666      +/-   ##
==========================================
+ Coverage   62.88%   62.94%   +0.05%     
==========================================
  Files         611      611              
  Lines       45125    45152      +27     
  Branches       43       43              
==========================================
+ Hits        28378    28422      +44     
+ Misses      16737    16720      -17     
  Partials       10       10              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@oh01
Copy link
Copy Markdown

oh01 commented Dec 9, 2025

This proposed solution worked for our environment of Aruba 6200F's as well, topology is not crashing anymore with error "A string literal cannot contain NUL (0x00) characters"

@hamartin
Copy link
Copy Markdown
Author

hamartin commented Dec 9, 2025

I have been trying to modify the unittest by reverting the function to what it was and creating a new function in the same file so that it actually tests the changes I did. However, I am simply not skilled enough and been looking at the NAV source for long enough to do this in any way that I know is good or even "decent enough".

Do I have to create a unittest for the change I did? I do not think I am able to create something relatively soon. I simply need more time to get to know the code.

Copy link
Copy Markdown
Member

@lunkwill42 lunkwill42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for contributing this PR!

From what I can see, this should definitely help with #2479 - assuming your tracebacks show the same problem.

However, I would suggest another location for this fix :)

Comment on lines +231 to +233
chassid_str = str(chassid).strip('\x00')
if chassid_str:
netbox = lookup(chassid_str)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced this is where you should be stripping null bytes.

A remote chassis identifier always comes with a subtype that tells us how to interpret the identifier data string. At this point in the code, chassid has already been parsed according to its subtype, and represents some object that may not even be a string any more (hence the conversion back to a string for the lookup function).

Only three subtypes are supported for lookups in NAV, though:

  • macAddress: A binary string to be interpreted as a MAC address (which can definitely include null bytes). The resulting chassid value is in fact a MacAddress object, not a string, but can be coerced into a string using str() - which will definitely not contain any null bytes.
  • networkAddress: A binary string to be interpreted as an IP address (which can definitely include null bytes). The resulting chassid value is in fact an IP object, not a string, but can be coerced into a string using str() - which will definitely not contain any null bytes.
  • local: This one is tricky. The MIB doesn't really specify how to interpret this, other than as some "local" identifier:

The enumeration 'local(7)' represents a chassis identifier based on a locally defined value.

NAV tries to interpret this in a couple of ways: Devices can provide their own local identifier through their LLDP-MIB, so NAV stores this for lookups. But some devices also appear to provide some variant of their sysname here, so NAV tries to look up both using _netbox_from_local().

You didn't provide a traceback that indicates properly where your issue is, but I would assume it arises due to a local identifier that includes null bytes (the other two possibilities at this point will not).

Therefore, I think a more directed solution would be to modify the lookup function _netbox_from_local() to ensure that the value it uses for lookups does not contain null bytes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors we are getting are indeed because of local() values. I can revert the changes I did at work and record some of the tracebacks and information around the tracebackes at work tomorrow.

I was also thinking this might not be the correct place to put the "fix" as this data is retrieved using SNMP, so I was looking at maybe fixing this at the source when the data is comming in to the system as I cannot see the flow of data through the code for these LLDP objects.

I simply saw a traceback, followed the problem to these specific lines and figured out what I had to do at this specific point in the code to fix the problem. So if anyone else have a better solution which solves the root problem, of course that is the solution that should be used.

Anyway, I will provide tracebacks and information about the interfaces, switches etc. tomorrow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply saw a traceback, followed the problem to these specific lines and figured out what I had to do at this specific point in the code to fix the problem. So if anyone else have a better solution which solves the root problem, of course that is the solution that should be used.

Anyway, I will provide tracebacks and information about the interfaces, switches etc. tomorrow.

So, I whipped together an alternative implementation and a test in #3695 - it would be great if you're able to test and verify on your system (and provide a traceback if it differs significantly from the one described in #2479), @hamartin

Copy link
Copy Markdown
Author

@hamartin hamartin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following stack trace is for the unmodified NAV. Any DNS information, IP, comment or MAC information has been changed to hide the true values.

2025-12-15 11:25:51,625 [563774] [ERROR jobs.jobhandler] [topo anon-switch-a6148fp.anon.tlp] Plugin nav.ipdevpoll.plugins.lldp.LLDP('anon-switch-a6148fp.anon.tlp') reported an unhandled failure
Traceback (most recent call last):
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks
    result = context.run(
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 86, in handle
    yield run_in_thread(self._process_remote)
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 269, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/db.py", line 111, in _reset
    return func(*args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in _process_remote
    neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in <listcomp>
    neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 127, in __init__
    self.identify()
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 130, in identify
    self.netbox = self._identify_netbox()
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 231, in _identify_netbox
    netbox = lookup(str(chassid))
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 242, in _netbox_from_local
    netbox = self._netbox_query(
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 204, in _netbox_query
    netbox = manage.Netbox.objects.values('id', 'sysname').get(query)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 635, in get
    num = len(clone)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 382, in __len__
    self._fetch_all()
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 1886, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/query.py", line 210, in __iter__
    for row in compiler.results_iter(
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1513, in results_iter
    results = self.execute_sql(
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
    cursor.execute(sql, params)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/venvs/nav/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
builtins.ValueError: A string literal cannot contain NUL (0x00) characters.

The following is from the switch NAV has created a stacktrace for above.

anon-switch-a6148fp# show lldp configuration 1/1/37

LLDP Global Configuration
=========================
LLDP Enabled                 : Yes
LLDP Transmit Interval       : 30
LLDP Hold Time Multiplier    : 4
LLDP Transmit Delay Interval : 2
LLDP Reinit Time Interval    : 2
LLDP Trap Enabled            : No

LLDP Port Configuration
=======================
Med Priority Override        : No

PORT           TX-ENABLED          RX-ENABLED          INTF-TRAP-ENABLED
--------------------------------------------------------------------------
1/1/37         Yes                 Yes                 Yes

TLVs Advertised
===============

Management Address
Port Description
Port VLAN-ID
System Capabilities
System Description
System Name
OUI
Port VLAN-Name
Dot1 Link Aggregation
Dot3 Power
Dot3 Macphy
Dot3 MFS
Med Capability
Med Network Policy
Med Power

anon-switch-a6148fp#
anon-switch-a6148fp# show lldp local-device 1/1/37
Local Port Data
===============

Port-ID           : 1/1/37
Port-Desc         : "**Some comment**"
Port VLAN ID      : 1234
Maximum Frame Size: 1500
Parent Interface  : interface 1/1/37


PoE Information

PSE Tlv Sent Type          : dot3
PSE Power Type             : Type 2 PSE
Power Source               : Primary
Power Priority             : low
PSE Allocated Power Value  : 0.00 W
PD Requested Power Value   : 0.00 W

anon-switch-a6148fp#
anon-switch-a6148fp# show lldp neighbor-info 1/1/37

Port                           : 1/1/37
Neighbor Entries               : 1
Neighbor Entries Deleted       : 0
Neighbor Entries Dropped       : 0
Neighbor Entries Aged-Out      : 0
Neighbor System-Name           :
Neighbor System-Description    :
Neighbor Chassis-ID            : ADELLCOMPUTER
Neighbor Management-Address    :
Chassis Capabilities Available :
Chassis Capabilities Enabled   :
Neighbor Port-ID               : 00:11:22:33:44:55
Neighbor Port-Desc             :
Neighbor Port VLAN ID          :
Neighbor Port VLAN Name        :
Neighbor Port MFS              : 0
Link aggregation supported     :
Link aggregation enabled       :
Aggregation port ID            :
TTL                            : 3601

Neighbor MED Capabilities
Neighbor Device class          : CLASS_I
MED capabilities enabled       : Capabilities
MED capabilities supported     : Capabilities

Neighbor Mac-Phy details
Neighbor Auto-neg Supported    : true
Neighbor Auto-Neg Enabled      : true
Neighbor Auto-Neg Advertised   : 1000 BASE_TFD
Neighbor MAU type              :

Neighbor EEE information       : DOT3
Neighbor TX Wake time          : 0 us
Neighbor RX Wake time          : 0 us
Neighbor Fallback time         : 0 us
Neighbor TX Echo time          : 0 us
Neighbor RX Echo time          : 0 us

anon-switch-a6148fp#
anon-switch-a6148fp# show lldp oui-tlv-details

LOCAL-PORT  CHASSIS-ID         PORT-ID     OUI      OUI-SUBTYPE    OUI-DATA
--------------------------------------------------------------------------------
...
...
1/1/37      ADELLCOMPUTER          00:11:2...  00120f   1              0300010000
1/1/37      ADELLCOMPUTER          00:11:2...  0012bb   1              000101
...
...
anon-switch-a6148fp# show lldp tlv

TLVs Advertised
===============

Management Address
Port Description
Port VLAN-ID
System Capabilities
System Description
System Name
OUI
Port VLAN-Name
Dot1 Link Aggregation
anon-switch-a6148fp#

How it looks when I look up the switch after doing the change.
Skjermbilde 2025-12-15 150347_edit

I will test your solution later this evening and give you a report.

Copy link
Copy Markdown
Author

@hamartin hamartin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried your suggested pr in #3695. I agree that it is a better place to solve the problem, but it seems the solve requires that sysname is returned properly, which it seems, it does not for some or all of the Aruba switches with this problem in our infrastructure. You will find the stack trace I got in your suggested pr.

@lunkwill42
Copy link
Copy Markdown
Member

Again, thanks a bunch to @hamartin 🫶 who got the ball rolling on this issue with this succinct PR - though, we will be going with the alternate version in #3695

@lunkwill42 lunkwill42 closed this Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Job 'topo' aborted due to plugin failure (cause=ValueError('A string literal cannot contain NUL (0x00) characters.'))

5 participants