Skip to content

Strip null bytes from LLDP local-type chassis IDs before database lookup#3695

Merged
lunkwill42 merged 3 commits intomasterfrom
bugfix/strip-lldp-local-chassisid-nullbytes
Dec 18, 2025
Merged

Strip null bytes from LLDP local-type chassis IDs before database lookup#3695
lunkwill42 merged 3 commits intomasterfrom
bugfix/strip-lldp-local-chassisid-nullbytes

Conversation

@lunkwill42
Copy link
Copy Markdown
Member

Scope and purpose

Fixes #2479 by implementing my alternative suggestion from #3666.

Contributor Checklist

Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to NAV can be found in the
Hacker's guide to NAV.

  • Added a changelog fragment for towncrier
  • Added/amended tests for new/changed code
  • Added/changed documentation
  • Linted/formatted the code with ruff, easiest by using pre-commit
  • Wrote the commit message so that the first line continues the sentence "If applied, this commit will ...", starts with a capital letter, does not end with punctuation and is 50 characters or less long. See https://cbea.ms/git-commit/
  • Based this pull request on the correct upstream branch: For a patch/bugfix affecting the latest stable version, it should be based on that version's branch (<major>.<minor>.x). For a new feature or other additions, it should be based on master.
  • If applicable: Created new issues if this PR does not fix the issue completely/there is further work to be done
  • If it's not obvious from a linked issue, described how to interact with NAV in order for a reviewer to observe the effects of this change first-hand (commands, URLs, UI interactions)
  • If this results in changes in the UI: Added screenshots of the before and after
  • If this adds a new Python source code file: Added the boilerplate header to that file

@lunkwill42 lunkwill42 requested a review from a team December 15, 2025 11:59
@lunkwill42 lunkwill42 self-assigned this Dec 15, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 15, 2025

Test results

    27 files      27 suites   45m 30s ⏱️
 2 738 tests  2 738 ✅ 0 💤 0 ❌
20 294 runs  20 294 ✅ 0 💤 0 ❌

Results for commit 709e71f.

♻️ This comment has been updated with latest results.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.05%. Comparing base (8078f12) to head (709e71f).
⚠️ Report is 503 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3695      +/-   ##
==========================================
+ Coverage   62.88%   63.05%   +0.17%     
==========================================
  Files         611      611              
  Lines       45125    45193      +68     
  Branches       43       43              
==========================================
+ Hits        28376    28498     +122     
+ Misses      16739    16685      -54     
  Partials       10       10              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hamartin
Copy link
Copy Markdown

hamartin commented Dec 15, 2025

I tried your code change and I got the following new stack trace which I haven't seen before.
The stack trace mentions this line return self._netbox_from_sysname(chassid) as a problem and that is logical from what I have seen. The sysname has in some if not all cases I am having this problem also been empty. See the debug output I gave you for the pr in #3666. For all I know, it might contain those pesky null bytes as well.

2025-12-15 21:04:11,023 [582457] [ERROR jobs.jobhandler] [topo host.domain.tld] Plugin nav.ipdevpoll.plugins.lldp.LLDP('host.domain.tld') reported an unhandled failure
Traceback (most recent call last):
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks
    result = context.run(
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 86, in handle
    yield run_in_thread(self._process_remote)
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 269, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/db.py", line 111, in _reset
    return func(*args, **kwargs)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in _process_remote
    neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 157, in <listcomp>
    neighbors = [LLDPNeighbor(lldp) for lldp in self.remote]
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 127, in __init__
    self.identify()
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 130, in identify
    self.netbox = self._identify_netbox()
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 231, in _identify_netbox
    netbox = lookup(str(chassid))
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/plugins/lldp.py", line 254, in _netbox_from_local
    return self._netbox_from_sysname(chassid)
  File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/neighbor.py", line 184, in _netbox_from_sysname
    sysname = match.group('sysname')
builtins.AttributeError: 'NoneType' object has no attribute 'group'

@lunkwill42
Copy link
Copy Markdown
Member Author

I tried your code change and I got the following new stack trace which I haven't seen before. The stack trace mentions this line return self._netbox_from_sysname(chassid) as a problem and that is logical from what I have seen. The sysname has in some if not all cases I am having this problem also been empty. See the debug output I gave you for the pr in #3666. For all I know, it might contain those pesky null bytes as well.

The example data you provided seemed to indicate some devices reported only a series of null bytes and nothing else as their local chassis ID. I.e. we end up with an empty string after stripping all null bytes. I guess that would be an issue for the code as well. I'll try to update the PR accordingly.

Some devices report null bytes in their local chassis ID strings;
these are unusable in PostgreSQL queries and should be stripped in
order to avoid spurious crashes.
Early return saves the day.
@lunkwill42 lunkwill42 force-pushed the bugfix/strip-lldp-local-chassisid-nullbytes branch from 397c352 to 709e71f Compare December 16, 2025 14:32
@lunkwill42
Copy link
Copy Markdown
Member Author

If you want, I've pushed another update you can test, @hamartin

@sonarqubecloud
Copy link
Copy Markdown

@hamartin
Copy link
Copy Markdown

If you want, I've pushed another update you can test, @hamartin

I've implemented this change and tried it out over time. There are no stack traces or errors other than the expected warnings for some of our more "special" devices.

When looking at this change I started to wonder. This change assumes that if there is no chassis ID for local, then there also will be no system name. Do we know this for a fact? Also, if we assume that system name can be in the data even though chassisID is not. Is that enough to resolve the host?

@lunkwill42
Copy link
Copy Markdown
Member Author

I've implemented this change and tried it out over time. There are no stack traces or errors other than the expected warnings for some of our more "special" devices.

That's great!

When looking at this change I started to wonder. This change assumes that if there is no chassis ID for local, then there also will be no system name. Do we know this for a fact?

I cannot see that this change makes that assumption. A chassis id of type local is used to look up against a stored chassis id (using a call to self._netbox_query()), and if that fails, it falls back to testing if the chassis id represents the sysname of a device known to NAV. If the stripped version of a local-type chassis id is an empty string, there is nothing to look up, period.

If the remote device could not at all be looked up using a chassis id value, there is an automatic fallback to using the sysname value from the LLDP record for a lookup. See lines 233-234.

Also, if we assume that system name can be in the data even though chassisID is not. Is that enough to resolve the host?

I'm not sure what you mean here. sysname and chassis id are two separate attributes of an LLDP record, but chassis id is preferred for identification because it is more precise. A device's own perception of its sysname and what the rest of the world (including NAV) considers its sysname may differ (and often does), which is why the sysname value is the fallback if chassis id identification was unsuccessful.

@lunkwill42 lunkwill42 merged commit 4198f0f into master Dec 18, 2025
20 checks passed
@lunkwill42
Copy link
Copy Markdown
Member Author

@hamartin We've merged this for the next release, but please do follow up if there is more to be done w/respect to Aruba and LLDP :)

@lunkwill42 lunkwill42 deleted the bugfix/strip-lldp-local-chassisid-nullbytes branch December 18, 2025 13:58
@hamartin
Copy link
Copy Markdown

@hamartin We've merged this for the next release, but please do follow up if there is more to be done w/respect to Aruba and LLDP :)

I will. We have some more issues with these Aruba switches, but they don't generate errors in the traditional way. So I need to sit down and understand them before I forward them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Job 'topo' aborted due to plugin failure (cause=ValueError('A string literal cannot contain NUL (0x00) characters.'))

3 participants