-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
lantiq: Improve gswip, build again #13200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| bridge). | ||
|
|
||
| Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> | ||
| [Do not disable flooding on CPU port in setup] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In kernel 5.15 gswip_port_enable() is not called for the CPU port. Always enable flooding for the CPU port.
Is this maybe helping? xdarklight/linux@a595ee7
| register. | ||
|
|
||
| Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> | ||
| [removed BR_PORT_LOCKED support for kernel 5.15] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BR_PORT_LOCKED is not supported in kernel 5.15.
| } | ||
| + } else { | ||
| + /* FID of a standalone port (single port bridge) */ | ||
| + fid = port + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is better, but it needs some extra features introduced between 5.15 and 6.1:
xdarklight/linux@4aadb97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is not working. Yes, it silences the warnings that people have been seeing.
But that doesn't meant that it makes the GSWIP driver comply with DSA's expectations.
It's been a while but from memory the issue is that DSA expects that it can add/remove static FDB entries for a port at any time.
Unfortunately this is tricky to implement for the GSWIP driver.
A as long as a port is a standalone port (meaning it's not part of a bridge) the fid = port + 1 logic is correct.
As soon as the port is part of a bridge this is not correct anymore a different fid (the one assigned to the bridge) has to be used.
But there's cases where a static FDB entry is added early before the port is part of a bridge. Then the port is added to the bridge, meaning it's fid changes. This is not reflected in the static FDB entries and so basically what happens is that any per port FDB entries that were added before the port became part of a bridge are not used by the hardware anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also forgot: this does not honor DSA_DB_PORT yet.
We should move to something like xdarklight/linux@4aadb97 - but again, even that still needs (at lot of) work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DSA_DB_PORT is not available in kernel 5.15. It was added with kernel 5.18, see:
torvalds/linux@c269336
Maybe we can migrate this forwarding rule when we add the port to a bridge?
|
@xdarklight Do you want to send your patches upstream to Linus or should I try it? I would like to get at least the trivial ones into upstream kernel. |
|
|
||
| ds->mtu_enforcement_ingress = true; | ||
|
|
||
| - gswip_port_enable(ds, cpu_port, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not called by DSA in kernel 5.15.
|
Tested with FB 7412 . No more nasty errors in dmesg. Device set in wireless relay operation and I see traffic via ethernet port. |
| +{ | ||
| + struct gswip_priv *priv = ds->priv; | ||
| + | ||
| + if (flags.mask & BR_LEARNING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this breaks kernel selftests and I think there are some real world side-effects.
It's been a while, but with Linux 6.1 the following patches should work once you're on Linux 6.1:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can drop this patch for now.
There are a lot of changes in DSA between kernel 5.15 and 6.1. I am fine if the gswip DSA driver has the same bugs and functionality with kernel 5.15 we had with kernel 5.10.
If you have time and motivation for this then please feel free to update the patches as needed and upstream them. In case you have any questions: we can do a session together (online or offline, the latter will be harder to organize due to my current schedule) and walk through the self-tests and some of the code so I can share thoughts from my previous session(s) with @sch-m |
I would like to get the low hanging fruits upstream first. The we can take care of the harder ones independently.
I am also pretty busy the next few days. |
|
It's been some time now since @xdarklight and I could take some time to have a look at this topic, but as far as I can remember one of the main problems is that the switch does not distribute the unknown/multicast/broadcast packets VLAN or PVID related. Instead, it can only be configured globally to which ports these types of packets should be routed. As a result, we actually always need to flood the CPU port, which in turn means that we will probably never get all selftests "green". |
lantiq-xrx200 is currently marked as source-only in OpenWrt 23.05, as the switch driver does not work correctly on Linux 5.15. Mark as broken in Gluon as well until the issue is fixed. Upstream PR: openwrt/openwrt#13200
lantiq-xrx200 is currently marked as source-only in OpenWrt 23.05, as the switch driver does not work correctly on Linux 5.15. Mark as broken in Gluon as well until the issue is fixed. Upstream PR: openwrt/openwrt#13200
|
Is there anything homehub owners can do to help this ticket progress (beta testing or something)? |
I can test on my HH 5, it's not in service so available to test on :) EDIT, it I need do build manually and apply patches some help would probably be required :) |
|
Same here, I installed HH5s by some of my family members and have a spare unit that I can run tests on (without DSL connectivity since I no longer have DSL) |
This reverts commit 0c117e1. Activate the lantiq/xrx200 target again. There are still some problems with the GSWIP, but it is not leaking packets to the wrong bridge in normal operations. It shows some error messages at configuration like these: [ 54.308861] gswip 1e108000.switch: port 5 failed to add ce:9d:84:d1:81:f0 vid 1 to fdb: -22 [ 54.325633] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 0 to fdb: -22 [ 54.351242] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 1 to fdb: -22 [ 54.358311] gswip 1e108000.switch: port 5 failed to delete ce:9d:84:d1:81:f0 vid 1 from fdb: -2 The problems are described in this pull request: openwrt#13200 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
This backports a patch from upstream kernel 5.19 which improves finding the bridge in the gswip_port_fdb() function. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
This backports some patches for the gswip switch driver. which should go into the upstream kernel soon. These patches are fixing some bugs and are adding minor new features. I copied them from this repository: https://github.com/xdarklight/linux/commits/lantiq-gswip-integration-20221022 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Since kernel 5.14 the DSA subsystem will also add the MAC addresses of the interfaces manually to the fdb. This also happens for interfaces which are not part of a bridge. Add support for this case too. This fixes some annoying error messages. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
This reverts commit 0c117e1. Activate the lantiq/xrx200 target again. There are still some problems with the GSWIP, but it is not leaking packets to the wrong bridge in normal operations. It shows some error messages at configuration like these: [ 54.308861] gswip 1e108000.switch: port 5 failed to add ce:9d:84:d1:81:f0 vid 1 to fdb: -22 [ 54.325633] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 0 to fdb: -22 [ 54.351242] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 1 to fdb: -22 [ 54.358311] gswip 1e108000.switch: port 5 failed to delete ce:9d:84:d1:81:f0 vid 1 from fdb: -2 The problems are described in this pull request: #13200 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> (cherry picked from commit e1aaa1d)
|
I have a spare HH5 and VDSL2 line and am happy ro flash it and test. |
|
I have a spare Zyxel p-2812 and VDSL line and I am happy to test. I use already OpenWrt 22.03.5. |
This reverts commit 0c117e1. Activate the lantiq/xrx200 target again. There are still some problems with the GSWIP, but it is not leaking packets to the wrong bridge in normal operations. It shows some error messages at configuration like these: [ 54.308861] gswip 1e108000.switch: port 5 failed to add ce:9d:84:d1:81:f0 vid 1 to fdb: -22 [ 54.325633] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 0 to fdb: -22 [ 54.351242] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 1 to fdb: -22 [ 54.358311] gswip 1e108000.switch: port 5 failed to delete ce:9d:84:d1:81:f0 vid 1 from fdb: -2 The problems are described in this pull request: openwrt#13200 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
This reverts commit 0c117e1. Activate the lantiq/xrx200 target again. There are still some problems with the GSWIP, but it is not leaking packets to the wrong bridge in normal operations. It shows some error messages at configuration like these: [ 54.308861] gswip 1e108000.switch: port 5 failed to add ce:9d:84:d1:81:f0 vid 1 to fdb: -22 [ 54.325633] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 0 to fdb: -22 [ 54.351242] gswip 1e108000.switch: port 5 failed to add e8:de:27:95:c1:b4 vid 1 to fdb: -22 [ 54.358311] gswip 1e108000.switch: port 5 failed to delete ce:9d:84:d1:81:f0 vid 1 from fdb: -2 The problems are described in this pull request: openwrt#13200 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
|
Where have we got to with this? I see lantiq is now being built for v23.05.02 but the wiki for the BT Home Hub 5A still points back to this thread... and in the release notes for 23.05.02 it talks about potentially needing special tweak to U-Boot environment but have been unable to find details |
|
I successfully flashed 23.05.2 to a device that I want to install by a family member, they also have a setup with a few vlans so will be interesting to see. |
|
@hauke @xdarklight @sch-m |
|
Those improvements are fairly mechanical, I don't think they will help too much here. |
this is an open source community. not everyone edits the wiki all the time. Therefore documentation is often outdated.
That only(!) affects the 3 Linksys EA6350v3, EA8300 and MR8300 devices. |
|
The newest release https://openwrt.org/releases/23.05/notes-23.05.3 mentions that this issue is still ongoing. Is this still the case? I guess they haven't merged this commit yet. |
I am wondering the same. Especially when openwrt 22.03 is ending support soon so it would be nice to get this merged to 23.05. |
|
Most patches went into main in #15233 which was just merged. This is not needed any more. |
lantiq-xrx200 is currently marked as source-only in OpenWrt 23.05, as the switch driver does not work correctly on Linux 5.15. Mark as broken in Gluon as well until the issue is fixed. Upstream PR: openwrt/openwrt#13200
|
Sorry for necroing this, but I still get gswip fdb kernel errors on 23.05.5 and 24.10.0. I'm testing on AVM FRITZ!Box 3370 Rev. 2 (Micron NAND), xRX200 rev 1.1 Is xrx200 still known broken for some devices? |
|
Does the OpenWrt code look like upstream here? From the logs, port 1 (lan4) is under br-dsl. I've no idea why dsa_port_bridge_dev_get() would fail to see that. |
|
The above output is from my own config, where lan4 and the dsl modem are on br-dsl. That way I can use the modem in bridged mode using lan4. For reference, these are relevant log entries from a default configuration on 24.10.0: |
|
I see no real scenarios issues . I have 2 vlans and 2 bridges. |
This adds multiples patches from this branch: https://github.com/xdarklight/linux/commits/lantiq-gswip-integration-20221022
It also tries to fix the error messages returned in
gswip_port_fdbwhen a mac address is added to the CPU port for the fdb.Here is a longer discussion about this topic: https://lore.kernel.org/netdev/0e66011d-c3bd-5df2-e81d-5b67e4689330@hauke-m.de/T/
Here is some code to make the selftests works: https://github.com/xdarklight/openwrt-packages/commits/heads/refs/wip-kernel-selftests-net-forwarding-20220727
After these changes we can remove the source only flag.
I plan to backport this after some testing to openwrt 23.05 branch too.
Please test this and report back the results.