Skip to content

[RFC] net/netdev: New driver API#9832

Closed
maribu wants to merge 2 commits intoRIOT-OS:masterfrom
maribu:netdev_driver
Closed

[RFC] net/netdev: New driver API#9832
maribu wants to merge 2 commits intoRIOT-OS:masterfrom
maribu:netdev_driver

Conversation

@maribu
Copy link
Copy Markdown
Member

@maribu maribu commented Aug 23, 2018

Contribution description

  • Added netdev_driver_t::drop to drop the received frame
  • Added netdev_driver_t::size to get the expected size of the received frame without dropping it
  • Added module netdev_driver_glue that implements those two functions by calling netdev_driver_t::recv for dropping the received frame and to get its size, which before this commit was the canonical API
  • Added a LOG_WARNING to auto_init when this glue is used to motivate programmers to move to the new API
  • Added this glue to all drivers
  • Added comments to indicate where the dropping frame functionality is not implemented

Issues/PRs references

#9805

- Added `netdev_driver_t::drop` to drop the received frame
- Added `netdev_driver_t::size` to get the expected size of the received frame
  without dropping it
- Added module `netdev_driver_glue` that implements those two functions by
  calling `netdev_driver_t::recv` for dropping the received frame and to get
  its size, which before this commit was the canonical API
- Added a `LOG_WARNING` to auto_init when this glue is used to motivate
  programmers to move to the new API
- Added this glue to all drivers
- Added comments to indicate where the dropping frame functionality is not
  implemented
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 23, 2018

Btw: I will pause working on this from tomorrow till the last week of September. However, edits from maintainers are enabled :-)

@miri64 miri64 added Area: network Area: Networking Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Area: drivers Area: Device drivers Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet and removed Area: network Area: Networking labels Aug 23, 2018
@miri64
Copy link
Copy Markdown
Member

miri64 commented Aug 23, 2018

While I think it is a good idea to make the code more readable, this also is a major API change, so its needs to be handled with care. Only argument I personally see against it is a minor (so I won't block because of that) increase in RAM and ROM.

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 23, 2018

Hmm, an increase in RAM is unexpected to me :-( Maybe there is something wrong in my code, as this change should only increase ROM size by a few bytes.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Aug 23, 2018

Hmm, an increase in RAM is unexpected to me :-( Maybe there is something wrong in my code, as this change should only increase ROM size by a few bytes.

The size of the driver struct increases due to this, that's mainly what I was getting it with the RAM size increase... But I could be wrong with that.

@bergzand
Copy link
Copy Markdown
Member

I think the initial approach here is good. From my memory, most radio drivers implement the separate usage cases of recv with if statements checking the arguments. Splitting this to separate functions should increase the readability in those drivers too.

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 23, 2018

The size of the driver struct increases due to this, that's mainly what I was getting it with the RAM size increase... But I could be wrong with that.

I think the driver struct is marked as const, so I would believe it should end up in the ROM. But I did not check that.

@bergzand
Copy link
Copy Markdown
Member

I think the driver struct is marked as const, so I would believe it should end up in the ROM. But I did not check that.

Except for netdev_tap unfortunately, see my fix in #9834

@kaspar030
Copy link
Copy Markdown
Contributor

+76b code for gnrc_networking on samr21-xpro.

From my memory, most radio drivers implement the separate usage cases of recv with if statements checking the arguments. Splitting this to separate functions should increase the readability in those drivers too.

I'm not so sure about this. Most drivers do something like:

  1. lock the device
  2. get pkt len
  3. if buf, copy pkt, and remove from device buffer
  4. unlock device
  5. return pkt len

Seperating size() from recv() will add another function, with duplicated device locking, and probably a third internal function that gets the actual size from the device. all additional code, as recv() will look exactly the same (well, maybe slightly reordered to return early on buf==NULL instead of returning the packet length).

I think this is one of the cases where a slight name mismatch (recv() vs recv_and_or_return_pkt_size()) is acceptable, if the alternative is increased code size. Also, noone will call recv() with a NULL pointer as buffer argument and expect any effect but a return value.

drop() is another story.
@maribu is right, this leads to silent bugs where the driver is not really dropping the packet.
BTW, encx24j600, kw2xrf, cc110x, cc2420, mrf24j40, and at86rf2xx don't seem to drop the packet, either.

How about we just add "drop()" to the interface?

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 24, 2018

If I remember correctly, the cc110x does not handle the drop and the size case. Apparently newlib's memcpy checks for NULL pointers and ignores that calls. So even for the more common size case bugs can go unnoticed, if the received frames remains in memory until the next arrives.

Changing the name would also increase awareness.

But it would be interesting to see by how much ROM size increases with the driver being properly adapted to the new API, instead of introducing the extra glue functions. Maybe than it is no longer ~100 Bytes. Also the LOG_WARNING about legacy code in auto_init.c contributes significantly to the ~100B increase when glue code is used.

@kaspar030
Copy link
Copy Markdown
Contributor

If I remember correctly, the cc110x does not handle the drop and the size case. Apparently newlib's memcpy checks for NULL pointers and ignores that calls.

No, cc110x (apart from being undermaintained) uses different higher level code, which handles the "size" and / or allocation problem differently. (Which will obviously be a kick in the balls for the next guy trying to use cc110x with something else than gnrc).

Maybe than it is no longer ~100 Bytes.

I keep hearing that ~N bytes don't matter. Each individual case doesn't, but we have hundreds of these cases. They add up, and in the and, there's a bloated binary.

Should drop() be factored out, would recv() be fine as is, with updated documentation?

@jnohlgard
Copy link
Copy Markdown
Member

I don't like the approach of adding a new module here, it will not detect any existing issues with missing functionality. IMO the only way to do this correctly is to refactor this directly inside the device drivers. A first step would be to fix the existing device drivers so that they all conform to the current API (add missing drop functionality). After the drivers are conformant, we can continue with doing whatever API changes we agree on.

@kaspar030 I see your point about a large overlap between size and recv, but I think the benefit from having a clear API outweighs the cost of two more function pointers with associated function calls. If the split is done correctly, there should be only the cost of a few function calls added to the ROM size: put the common recv/size prologue inside a separate (private) function, call it from recv and size, remove some of the conditionals in the original combined implementation.

@kaspar030
Copy link
Copy Markdown
Contributor

(I'm arguing for keeping size() in recv())

I see your point about a large overlap between size and recv, but I think the benefit from having a clear API outweighs the cost of two more function pointers with associated function calls.

There would not only be more function pointers, it would lead to more complex implementations or duplicated code. In recv() and size() case, I don't see the benefit.

From a netdev user perspective, the expectation would be that recv() called with a buffer returns the next packet, and calling recv() with "NULL" as buffer is a mistake. Should the question arise "how do I find out the next packet size before allocating a buffer?", both existing code and the documention point to the "unclear" method of using recv(..., NULL, ...).

From a device driver developer perspective, having to return the size when the supplied buf is NULL might not be obvious, but will fail very fast, and reading the doxygen or any other driver should clear things up.

The price of getting this clear is quite a lot more boilerplate code in every driver, as both size() and recv() will need to get the actual size, and both lock the device. Using the glue method as in this PR makes things look way more complex with no benefit at all, but correctly implementing the proposed API will lead to more lines of either boilerplate or redundant code.

Personally, I'd implement size() as recv(..., NULL, ...) anyways. At that point, we're paying extra pointers, extra functions, extra indirection and an API change for this "clarity".

I estimate at least two more functions and +~20 lines of code per driver. If we change perspective and size and recv were distinct functions already, and someone would propose a PR integrating "size()" into "recv()", removing the function calls (both ROM and execution overhead), removing ~250 lines of redundant boilerplate code (12 drivers times 20 lines), would we dismiss the PR because it messes up API clarity?

IMO we have two actual issues ("drop" not being implemented by drivers and thus bad behaviour, and "recv()" overloaded so this could stay unnoticed). We should fix those and add "drop()", implementing it for all drivers.

Whether having recv() overloaded with size() reduces clarity or might lead to bugs is academic and we should not invest work, and more code lines, ..., given the fail-fast nature of any misunderstadings.

@jnohlgard
Copy link
Copy Markdown
Member

@kaspar030 good arguments. I am in favour of your proposal: fork drop into a separate function, keep size and recv in the same function.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Aug 24, 2018

Maybe a good compromise in case anyone is still arguing for splitting size and recv would be to have an inline function in net/netdev.h:

static inline int netdev_recv_buf_size(netdev_t *dev)
{
    return dev->driver->recv(dev, NULL, 0, NULL);
}

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 24, 2018

As reply to @kaspar030's arguments:

  1. Even small increases in ROM size do matter as they add up and result in a bloated binary

This is to simplified.

int main(void) {return 0;}

Above program is pretty much the least bloated C program. Adding any code to it would increase RAM & ROM requirements. So obviously, a change that increases ROM/RAM requirements is not necessary bad. You have to check if the benefits outweigh the increase in code size. So careful consideration is better than just dismiss any change increasing ROM/RAM right away. An investigation by how much ROM requirements actually increase by porting a driver (after the drop has been implemented for fair comparison) to the new API candidates would clearly help this careful consideration.

  1. A missing size implementation will be discovered easily

This is only true, if the upper layer do use this feature. In case of the cc110x, it is neither implemented nor used. The same is true for the nrfble and possibly others. In case of separate function pointers, upper layer code could simply check for the presence of the required implementation by a simple assert(driver->size != NULL); in the initialisation code.

  1. A separate size() increases code complexity and adds ~20 lines of boilerplate code

Basically it would change from

int recv(netdev_t *dev, void *buf, size_t len, void *info)
{
    header_t hdr;
    lock(dev);
    parse_header(dev, &hdr);
    if (!buf) {
        unlock(dev);
        return hdr.size;
    }
    [...]
}

to

int size(netdev_t *dev)
{
    header_t hdr;
    lock(dev);
    parse_header(dev, &hdr);
    unlock(dev);
    return hdr.size;
}

int recv(netdev_t *dev, void *buf, size_t len, void *info)
{
    header_t hdr;
    lock(dev);
    parse_header(dev, &hdr);
    [...]
}

The latter removes 4 lines of code and adds 8, so its 4 lines longer. I bet in non-pseudocode it will be at most 8 lines more, when having static helper functions for locking and header parsing. And those helper functions need to be added for the drop function in many drivers anyway, in case the hardware buffers more than one frame (like the en28j60): The frame is dropped by advancing the position in a ring buffer by the frame length.

  1. The benefits of the clean API are purely academic in case of a separate size() function

The current API may have led to the drop functionally being unimplemented in many cases, which let to possible denial of service attacks against the en28j60 driver: Sending two frames fast enough and the network stack got stuck in a loop trying to drop the second frame.

Lets assume an upper layer would simply allocate maximum frame size buffers when there is no memory pressure. Im that case a driver not implementing the size feature would just work. But under memory pressure the upper layer may adapt using the size feature and allocate only as much memory as actually required. With an overloaded recv() function the system again could be attacked by a denial of service attack: Just bring the device in a state of memory pressure and send it a packet, and a segfault occurs. In case of spilt function pointers, the upper layer can assert that all required features are implemented during initialisation. Thus, the missing feature would be noticed and the also the assert makes debugging easy.

Summing up: A clean API regarding the drop feature could very well have prevented a DoS vulnerability in the en28j60 driver. It is easy to imagine the same for a clean API regarding the size feature.

I believe security and maintainability to be bigger challenges in the IoT than the resource requirements. And a clean API can help to increase both. So in case a clean API can be implemented without to much RAM/ROM overhead, it is IMO worth the trouble.

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 25, 2018

To add some numbers to the discussion:

How the numbers have been obtained

  1. enc28j60 old API but fixed
  2. enc28j60 separate drop but no size
  3. enc28j60 separate drop and size
diff --git a/examples/gnrc_networking/Makefile b/examples/gnrc_networking/Makefile
index 86b7a94c1..5fb504c23 100644
--- a/examples/gnrc_networking/Makefile
+++ b/examples/gnrc_networking/Makefile
@@ -34,6 +34,7 @@ USEMODULE += ps
 USEMODULE += netstats_l2
 USEMODULE += netstats_ipv6
 USEMODULE += netstats_rpl
+USEMODULE += enc28j60
 
 # Comment this out to disable code in RIOT that does safety checking
 # which is not needed in a production environment but helps in the

Compiled with make BOARD=nucleo-f446re BUILD_IN_DOCKER=1

Differences in ROM/RAM requirements

   text	   data	    bss	    dec	    hex	filename
  69444	    556	  16632	  86632	  15268	enc28j60-clean-api.elf
  69308	    556	  16632	  86496	  151e0	enc28j60-old-fixed.elf
  69416	    556	  16632	  86604	  1524c	enc28j60-semi-clean-api.elf

So zero RAM overhead in any case, 108 Bytes overhead with separate drop only, 146 Bytes overhead with both drop and size separated.

Differences in lines of code:

-------------------------------------------------------------------------------
API                                         blank        comment           code
-------------------------------------------------------------------------------
old                                            72            131            348
-------------------------------------------------------------------------------
semi-clean                                     76            130            362
-------------------------------------------------------------------------------
clean                                          77            130            370
-------------------------------------------------------------------------------

Summary

Overhead RAM Overhead ROM Overhead Lines of Code
Clean API 0B 146B 22 Lines
Semi-Clean API 0B 108B 14 Lines

So trading in the clean separation between the size() and the recv() in case of the enc28j60 driver on the Nucleo-F446RE has a benefit of 38 bytes ROM size and 8 lines of code. And the largest part of the overhead is already paid when only separating the drop() case.

@kaspar030
Copy link
Copy Markdown
Contributor

I believe security and maintainability to be bigger challenges in the IoT than the resource requirements. And a clean API can help to increase both. So in case a clean API can be implemented without to much RAM/ROM overhead, it is IMO worth the trouble.

This is a text-book argument. Rarely "maintainability" and "security" are not number one points when it comes to networking software developed in a distributed way.

But can you provide numbers for "cleanlyness", security or maintainability benefits?

Touching all those drivers (which are perfectly working apart from the unimplemented "drop()"), refactoring them, testing them, reviewing the changes, changing documentation, (changing API), ... is a shot in the foot, maintanance wise.

If there was an identified security problem (e.g., a bug), it would be worth the trouble. But there is not. On the contrary, there are valid arguments that the size()-in-recv() cannot cause much harm.

Seperating size() from recv(), I see more internal functions, more code, more ROM requirement, more runtime overhead and developer time better invested somewhere else.

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Aug 31, 2018

This is a text-book argument. Rarely "maintainability" and "security" are not number one points when it comes to networking software developed in a distributed way.

I agree, in literature the opinion that security and maintainability is more valuable than size is vastly shared. That would be a point for a separate size function.

But can you provide numbers for "cleanlyness", security or maintainability benefits?

There are no metrics that allow to quantity security or maintainability that have a sound foundation. But concluding to give up on both until adequate metrics come available and focus on size instead is just dangerous!

On the contrary, there are valid arguments that the size()-in-recv() cannot cause much harm.

Please share them. This far, you only shared an argument that it does cause harm:

No, cc110x (apart from being undermaintained) uses different higher level code, which handles the "size" and / or allocation problem differently. (Which will obviously be a kick in the balls for the next guy trying to use cc110x with something else than gnrc).

Seperating size() from recv(), I see more internal functions, more code, more ROM requirement, more runtime overhead and developer time better invested somewhere else.

  1. There is no increase in runtime, both size and recv will get slidely faster by the separation.
  2. There are no more intenal functions except for the size function, as all other functions need to be introduced just for the drop function as well. At least for the enc28j60 this has been the case
  3. Arguing against an API change that whole reason is to make the API cleaner with the required effort it takes is a bit odd. The biggest point for changing it is to reduce the effort for reviewing new netdev_driver implementations, to reduce the effort for understanding the API, to reduce the effort for finding and fixing bugs. So on the long term, more effort is saved by using the clean API.
  4. 75% of the ROM increase is already been "paid" by the change your are in favor, 90% of the time for changing the code and 64% of the lines of code added. The advantage of just separating the drop and keep the size in recv is thus relatively small. (At least for the enc28j60, but feel free to add implementations for the competing proposals.)
  5. The readability of code is not determined by the number of lines of code: Other things like short functions, not too deeply nested control flows, how clearly the API is defined etc. contributes to readability as well. In this regard, one aspect gets worth (lines of code) by separation of the size function, but more aspects get better

@kaspar030
Copy link
Copy Markdown
Contributor

I agree, in literature the opinion that security and maintainability is more valuable than size is vastly shared. That would be a point for a separate size function.

The point here is that "security" and "maintainability" are often used in a technical discussion as joker arguments. Any number of changes could be justified with them. Rewriting everything in Rust would kill a whole class of (security sensitive) bugs and through it's more modern language features probably provides quite some maintainability advantages. Why don't we do that?

Software development is all about tradeoffs. RIOT is an embedded system, low ressource usage is key. If we'd go with "clean" on every occasion (there are hundreds of APIs), the price would add up to hundreds of bytes, maybe kilobytes. For some use cases, that doesn't matter at all. For some, it does.
We do have to make choices to get a usable system in the end. Those choices are not necessarily the textbook ones. If they would be, we'd all be using minix.

Maybe take a look at code usage over time here. RIOT is getting heavier. If that weight is due to necessary features, that is acceptable. If it is through bloat that doesn't make a compiled system any better, it is not. Again, we're developing embedded software. For desktop computers, technological advances make low-level optimization a little less important every day. For embedded systems, technological advances pack the same amount of RAM and flash into smaller packages or make them use less power.
We need to avoid stepping into every "this makes code look nicer, it adds just these >100bytes ROM".

The more I think about this, the more I'm convinced that even seperating drop() is not worth the extra bytes. netdev is more of an internal API. We'd expect maybe a couple of dozen users, compared to more user facing APIs like sock, which will have much higher user counts. It is entirely possible to go through all drivers and make sure they expose dropping functionality as expected.
Because after all, all drivers already have the drop code in there, as recv() implicitly drops after copying the packet. Exposing that should not add more than a couple of bytes (like 8, compared to >100). Every byte more than that is wasted.

IMO we can fix the missing drop without adding ROM usage. "misleading" API can be dealt with differently. Documentation goes a long way and is for free in terms of ROM usage.
A huge warning in netdev.h stating "MAKE SURE TO IMPLEMENT THESE THREE CASES" might be as effective as introducing more function pointers.

There are more options for improving netdev that don't add bloat.
@miri64's proposal for wrapping size()/drop() in static inline functions helps readablilty for code using netdev. We could introduce helpers static inline netdev_check_drop(buf, size) { return !buf && size>0;} so logic within different read() implementations can be shared.

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Sep 2, 2018

The point here is that "security" and "maintainability" are often used in a technical discussion as joker arguments. Any number of changes could be justified with them. Rewriting everything in Rust would kill a whole class of (security sensitive) bugs and through it's more modern language features probably provides quite some maintainability advantages. Why don't we do that?

Because POSIX mandates a C-API, which is more complex to provide on top of a Rust implementation. Because Rust is not available for a lot of embedded devices, while C is available for virtually all platforms. Because Rust is relatively young, which means fewer people have experience and thus less contributors and uses, which means non-backward-compatible language changes are more likely to happen. See? Not a joker argument ;-)

netdev is more of an internal API. We'd expect maybe a couple of dozen users, compared to more user facing APIs like sock, which will have much higher user counts.

I saw this "its ugly, but it is internal, so users are not affected"-argument quite often. Every time an internal and unnecessary bug results of this, people start realising that internal functions do have quite a lot of (indirect) users, that are all affected by the bug.

Also, first-time contributers will likely add device drivers, when they own a particular yet unsupported board and what to use RIOT on it. So even though this API is internal, non-core developers will sooner or later get directly in touch with it.

The documentation already is quite clear, it is the API that is not clear yet. So the API is right now the weakest link in the chain, not the documentation. Reinforcing a strong link in a chain without changing the weakest often doesn't help at all.

IMO we can fix the missing drop without adding ROM usage.

No. But just adding the missing functionality without changing the API would increase the ROM size significantly less. However, this is not a one-time effort: Every added driver will needed to be reviewed with additional care. The fact that even a lot of capable programmer and reviewers overlooked this issue proves how easily this issue is missed.

Also: What about drivers using upper layer not requiring the size() implementation? With the clean API it could be left out without having to worry much: Adding an assert(driver->size != NULL) in all upper layers using the size implementation will make sure changing the upper layer will not result in a long debug session.

There are more options for improving netdev that don't add bloat.
@miri64's proposal for wrapping size()/drop() in static inline functions helps readablilty for code using netdev. We could introduce helpers static inline netdev_check_drop(buf, size) { return !buf && size>0;} so logic within different read() implementations can be shared.

In the upper layers the code is already quite readable, e.g. in int expected_bytes = rhs; the right hand side (rhs) is obviously the expected size of the incoming frame. So the drop-wrapper would certainly increase readability without any cost in terms of ROM/RAM/runtime overhead, but the bigger problem IMHO is in the lower layers. (The fact that the bugs have been there support this point of view.) The netdev_check_drop will make correct recv implementations more readable, but it will not help much to actually get new recv implementations right without careful reviews. Implementers could have looked at existing and correct code and at the documentation before now and easily spot that the drop and size features are required in the recv function, but they still missed it. I bet just adding "textmarker" by highlighting it in the documentation and in the code (using the wrapper) will only reduce the frequency of this bug a bit, but it will remain a hot spot that needs careful reviewing for new netdev_driver implementations.

So, it looks like most arguments for the different approaches on how to react to the reoccurring bug of the missing drop implementation have been exchanged. I updated the issue #9805 and added the two additional proposals and the major pros/cons for each. (This could also be moved to the wiki, so everyone can add pros and cons, to get a more complete list, if there is demand for that.)

@miri64 miri64 added the Discussion: needs consensus Marks a discussion that needs a (not necessarily moderated) consensus label Oct 2, 2018
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Nov 16, 2018

Closed in favor of #10410

@maribu maribu closed this Nov 16, 2018
@maribu maribu deleted the netdev_driver branch December 29, 2018 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: drivers Area: Device drivers Area: network Area: Networking Discussion: needs consensus Marks a discussion that needs a (not necessarily moderated) consensus Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants