Skip to content

Replace scapy pcap read write with apache 2.0 implementation in ptf#5132

Merged
jafingerhut merged 12 commits into
p4lang:mainfrom
jafingerhut:replace-scapy-pcap-read-write-with-apache-2.0-implementation-in-ptf
Feb 19, 2025
Merged

Replace scapy pcap read write with apache 2.0 implementation in ptf#5132
jafingerhut merged 12 commits into
p4lang:mainfrom
jafingerhut:replace-scapy-pcap-read-write-with-apache-2.0-implementation-in-ptf

Conversation

@jafingerhut

Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut

Copy link
Copy Markdown
Contributor Author

There are 6 tests failing in the CI test named "test-tools / build-and-test-tools". Copied from the log file, they are:

The following tests FAILED:
	4460 - testgen-p4c-bmv2-stf/gauntlet_hdr_out_in_action-bmv2.p4 (Failed) testgen-p4c-bmv2-stf
	4686 - testgen-p4c-bmv2-stf/issue447-2-bmv2.p4 (Failed)  testgen-p4c-bmv2-stf
	4687 - testgen-p4c-bmv2-stf/issue447-3-bmv2.p4 (Failed)  testgen-p4c-bmv2-stf
	4688 - testgen-p4c-bmv2-stf/issue447-4-bmv2.p4 (Failed)  testgen-p4c-bmv2-stf
	4745 - testgen-p4c-bmv2-stf/parser-unroll-issue3537-1.p4 (Failed) testgen-p4c-bmv2-stf
	4746 - testgen-p4c-bmv2-stf/parser-unroll-issue3537.p4 (Failed) testgen-p4c-bmv2-stf

From downloading the full logs and looking at them, they all fail because simple_switch is crashing with this assertion:

2025-02-18T18:43:51.9532048Z simple_switch: dev_mgr.cpp:285: std::string bm::DevMgr::sample_packet_data(const char*, int): Assertion `amount > 0' failed.

That is the assertion that I have seen before that led me to create this issue: #5130

Apparently it is not caused by running on an Ubuntu 24.04 system, because here it is happening on an Ubuntu 22.04 system.

I am pretty sure that assertion is firing because BMv2 is processing a packet, and after the deparser it has length 0, and it is attempting to send it.

I do not understand yet why using a different Python implementation of PcapWriter and rdpcap is leading to this situation -- the changes in this PR have no affect on the BMv2 binary being used. My best guess at this time is that the changes of this PR have an effect on the packets sent into BMv2, and thus different packets are coming out of BMv2, but that is just a guess right now.

@fruffy

fruffy commented Feb 18, 2025

Copy link
Copy Markdown
Collaborator

That is the assertion that I have seen before that led me to create this issue: #5130

Well, irrespective of the size, BMv2 shouldn't crash on packet input. That seems like a general bug to me.

@jafingerhut

Copy link
Copy Markdown
Contributor Author

I think I found the root cause of the BMv2 crash. It occurs when you use the command line option --dump-packet-data <n>, which only makes sense if the maximum number of bytes of each received and transmitted packet to record in the BMv2 log output, is at least 1. When you do that, and there is a length 0 packet, it asserts. If you omit the --dump-packet-data <n> option, it does not assert.

I will test BMv2 with that one assert statement omitted and see what happens.

Reference: https://github.com/p4lang/behavioral-model/blob/main/src/bm_sim/dev_mgr.cpp#L285

The function containing that line is only called if --dump-packet-data <n> option is used, with at least 1.

with 0-length packets.

Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut

Copy link
Copy Markdown
Contributor Author

OK, after removing --debug-dump-data 9000 command line option from BMv2 runs in commit 6, there are still the same 6 tests failing that I mentioned in an earlier comment, but they are failing for a different reason than before.

I think they are failing because the new rdpcap implementation we are using in this PR's changes can correctly read 0-length packets output by BMv2, and return them as length 0 packets, but p4testgen still has the workaround to expect a packet with 4 bytes in those situations.

Here is the name of one of the failing tests:

1498/2176 Test #4460: testgen-p4c-bmv2-stf/gauntlet_hdr_out_in_action-bmv2.p4 ....................***Failed    5.08 sec

Below is a copy of the last few lines of output from that test:

Calling target program-options parser
Adding interface pcap0 as port 0 (files)
Obtaining JSON from switch...
Done
Control utility for runtime P4 table manipulation
RuntimeCmd: 
ERROR: Received packet too short 0 vs 8
ERROR: Packet 0 on port 0 differs
ERROR: Test failed

Below is a code snippet from p4c/tools/testutils.py that prints the "Received packet too short" message above:

    if len(received) < len(expected):
        log.error("Received packet too short %s vs %s", len(received), len(expected))
        return FAILURE

From the context of those lines of code, the lengths are given in units of hex digits, so 2 times the number of bytes in the packets involved.

... with a 4-byte expected packet.

Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut

Copy link
Copy Markdown
Contributor Author

OK, commit 7 removes the hack from p4testgen that replaces 0-length expected output packets with a 4-byte expected output packet.

We will see what happens. I won't be surprised if the syntax of the STF file is now flagged as incorrect, because there is no expected packet data on the expect lines in the STF file for such packets.

@fruffy

fruffy commented Feb 18, 2025

Copy link
Copy Markdown
Collaborator

Looks like things are passing now?

I think I found the root cause of the BMv2 crash. It occurs when you use the command line option --dump-packet-data , which only makes sense if the maximum number of bytes of each received and transmitted packet to record in the BMv2 log output, is at least 1. When you do that, and there is a length 0 packet, it asserts. If you omit the --dump-packet-data option, it does not assert.

Personally, this still might be bug that should be fix. The option shouldn't cause assertions to fire that kill the switch. I believe you already added a clarification on the BMV2 issue, right? p4lang/behavioral-model#1291

@fruffy fruffy added the infrastructure Topics related to code style and build and test infrastructure. label Feb 18, 2025
@fruffy fruffy linked an issue Feb 18, 2025 that may be closed by this pull request
@jafingerhut

Copy link
Copy Markdown
Contributor Author

Looks like things are passing now?

All of the p4testgen tests that were failing before now pass. Still waiting for one CI test to finish, but it wasn't presenting a problem before.

I think I found the root cause of the BMv2 crash. It occurs when you use the command line option --dump-packet-data , which only makes sense if the maximum number of bytes of each received and transmitted packet to record in the BMv2 log output, is at least 1. When you do that, and there is a length 0 packet, it asserts. If you omit the --dump-packet-data option, it does not assert.

Personally, this still might be bug that should be fix. The option shouldn't cause assertions to fire that kill the switch. I believe you already added a clarification on the BMV2 issue, right? p4lang/behavioral-model#1291

I agree that this should be fixed, and I think it is just a one-line change of deleting an assert statement that doesn't really need to be there. I will create a PR for BMv2 to address that issue, but I think this PR (and the one on the ptf repo that it is exercising) should be able to proceed without that change. We have been living with that corner case crash for years now, apparently.

@jafingerhut

Copy link
Copy Markdown
Contributor Author

@fruffy If we want to merge this, I am guessing that a reasonable order of events would be:

  1. Review, approve, and merge this PR on the ptf repo: Generalize PcapWriter, and add rdpcap function ptf#214
  2. Update this PR's changes to the requirements.txt file to point at the new ptf commit SHA that results after that PR is merged, and ensure tests here still pass.
  3. Review, approve, and merge this PR.

Sound like the order of events you would expect?

@fruffy

fruffy commented Feb 18, 2025

Copy link
Copy Markdown
Collaborator

@fruffy If we want to merge this, I am guessing that a reasonable order of events would be:

1. Review, approve, and merge this PR on the ptf repo: [Generalize PcapWriter, and add rdpcap function ptf#214](https://github.com/p4lang/ptf/pull/214)

2. Update this PR's changes to the requirements.txt file to point at the new ptf commit SHA that results after that PR is merged, and ensure tests here still pass.

3. Review, approve, and merge this PR.

Sound like the order of events you would expect?

Yes!

Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut jafingerhut requested a review from fruffy February 19, 2025 05:43
@jafingerhut jafingerhut added this pull request to the merge queue Feb 19, 2025

@fruffy fruffy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost forgot, can you remove Bmv2TestBackend::ZERO_PKT_VAL and Bmv2TestBackend::ZERO_PKT_MAX? They are not needed anymore.

@jafingerhut jafingerhut removed this pull request from the merge queue due to a manual request Feb 19, 2025
Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut

Copy link
Copy Markdown
Contributor Author

Almost forgot, can you remove Bmv2TestBackend::ZERO_PKT_VAL and Bmv2TestBackend::ZERO_PKT_MAX? They are not needed anymore.

Removed in commit 12.

While making those changes, I noticed there is the same 0-length packet workaround in the ebpf back end code of p4testgen.

The changes in this PR do make similar changes to use the new ptf implementation of PcapWriter and rdpcap that I made for bmv2 tests, so my best guess is that none of the ebpf tests cause a 0-length packet to be output.

Should we leave that workaround in the p4testgen ebpf back end, or remove it?

@fruffy

fruffy commented Feb 19, 2025

Copy link
Copy Markdown
Collaborator

Should we leave that workaround in the p4testgen ebpf back end, or remove it?

We could do this in a separate PR given that the eBPF tests currently do not fail.

@jafingerhut

Copy link
Copy Markdown
Contributor Author

ZERO_PKT_* definitions have been removed from p4testgen bmv2 backend code, and tests are still passing, as expected.

@fruffy fruffy self-requested a review February 19, 2025 16:23
@jafingerhut jafingerhut added this pull request to the merge queue Feb 19, 2025
Merged via the queue into p4lang:main with commit f44f445 Feb 19, 2025
@jafingerhut jafingerhut deleted the replace-scapy-pcap-read-write-with-apache-2.0-implementation-in-ptf branch February 19, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Topics related to code style and build and test infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

p4testgen producing incorrect STF test for p4c test program?

2 participants