Skip to content

cpu/native/can: Segmentation fault when manipulating vcan's through socketcan #13309

@wosym

Description

@wosym

Description

I'm currently preparing a few PR's related to can-drivers and candev test-apps. During development, I experienced segmentation faults when executing the _init() for the native candriver.
I traced them back to the can_do_xxx() calls of libsocketcan. To rule out an issue with libsocketcan, I made a small C-file with only these functions, and compiled it with gcc: the functions return -1 or 0. This is what they should do according to the spec.
When I move the same code to a RIOT example, I get the segmentation fault again.
This was odd... is libsocketcan incompatible with RIOT for some reason?
No... Because conn_can also uses it. I tried to figure out what other stuff conn_can does, in order to make the socketcan-calls possible (e.g. auto_init_can), but I couldn't find anything. Removing the auto_init code from conn_can still made the socketcan functions behave according to spec, and adding the auto_init stuff to the experiment-app still made it segfault.

After a long search: I found a difference: conn_can starts an (unrelated) thread in main. When removing this thread-start, conn_can also segfaults.
Well... then there's some initialization done in this thread, right? Well... no... When I replace this thread with ONLY a return 0, it still works without a segfault! I'm really puzzled about this...
Same in the experiment-app: it works when I start a thread that only returns 0...

Regardless of the above behavior, I did some more research, and my current hypothesis is that it has to do with the vcan. I think the segfaults occur when you use these functions to send physical commands to virtual interfaces. Why this hypothesis?

  • The issue only occurs when the interface is a vcan, not a can. (Have to admit: when testing, one time it also happened when a regular can was used, but I wasn't able to reproduce it anymore afterwards, so my guess it was a different issue)
  • In the conn_can readme, on line 114, it says you can not set a bitrate on vcan's. The functions that cause the segfault are only used in the _set_bitrate() function in conn_can.
  • In cpu/native/can/candev_native.c in the _set() function (line 295) when you're trying to set the bittimings, it will do a check first if the can-interface is a vcan or real can. If it's not a real can, it will give an error and return EINVAL. However, this check is NOT done when the same function is called from _init.
  • In this page, they mention: "Warning Although libsocketcan seems to work fine for physical CAN interfaces (e.g. can0), I have had issues when using it with a virtual CAN interface (e.g. vcan0). Specifically, functions such as can_get_state() do not seem to work correctly."

The solution I propose is doing the same check for virtual-devices as is done in the _set() function in the _init() function. I tested this, and it appears to work perfectly. (Not only tested this in the experiment-app, but also in the candev-test-app for which I'm preparing a PR)

Even though this fixes the issue, it does not explain why creating an empty thread also solves this issue...
Also: it does not explain why it works when I compile the same script with plain gcc and run it. can or vcan, I wasn't able to ever get a segmentation fault there...

That's why I opened this Issue before opening a PR for this patch, because I'm not 100% sure if my hypotheses is correct. Maybe there's a bigger issue (in RIOT? In thread.h? In libsocketcan?) present, and by applying this patch we're simply curing the symptoms instead of the cause.

Steps to reproduce the issue

  • Install libsocketcan (see README of conn_can, it contains a guide on installing it in a way that works with RIOT)
  • Run this experiment-app (You can uncomment line 7 to add the magic-fix).
    --> It will segfault (at least on my computer it did. Tested both on Arch and Ubuntu)

Expected results

The functions should (probably) fail in this case, returning -1. They return 0 when they succeed.

Actual results

Segmentation fault when any of the can_do_xxx() functions are called.

Versions

OS: Arch and Ubuntu 18.04
Build environment: gcc 9.2.0

Metadata

Metadata

Assignees

Labels

Area: driversArea: Device driversType: bugThe issue reports a bug / The PR fixes a bug (including spelling errors)

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions