Skip to content

Will Fast DDS work abnormally if CPU affinity is set in a real-time process? #5272

@ClarkZaitun

Description

@ClarkZaitun

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

20240927-111224

Current behavior

The EtherCAT protocol requires very high real-time and deterministic performance. I use Fast DDS for node communication in the EtherCAT Master.
My EtherCAT Master process has the following characteristics:

  1. 1000hz node communication cycle
  2. The process locks the memory at startup to prevent memory swapping to disk; at the same time, the scheduling mode is set to FIFO (Linux RT Preeempt)
  3. Soon after the process starts, the CPU affinity of the thread is set. Although I only set the affinity of 4 threads to CPU 0, from the pictures provided, all threads of the process are running on CPU0.
    After running for a period of time, the Master will stop running completely. The CPU usage of the Master is generally 15%, and it will become 97% when it fails, of which the CPU usage of the dds.asyn.0.0 thread is 92%.

ps -T -p 3211336

PID    SPID TTY          TIME CMD

3211336 3211336 pts/6 00:00:02 ethercat_ma
3211336 3211337 pts/6 00:00:00 Log
3211336 3211338 pts/6 00:00:00 ClkTask
3211336 3211385 pts/6 00:00:01 JobTask
3211336 3211386 pts/6 00:00:00 dds.shm.wdog
3211336 3211387 pts/6 00:00:00 dds.ev.0
3211336 3211388 pts/6 00:00:00 dds.udp.20400
3211336 3211389 pts/6 00:00:00 dds.udp.20410
3211336 3211390 pts/6 00:00:00 dds.shm.20411
3211336 3211391 pts/6 00:00:00 dds.udp.20411
3211336 3211392 pts/6 00:00:00 dds.asyn.0.0
3211336 3211393 pts/6 00:00:00 dds.dsha.2820
3211336 3211394 pts/6 00:00:00 dds.dsha.3332
3211336 3211395 pts/6 00:00:00 dds.dsha.3844
3211336 3211396 pts/6 00:00:00 dds.dsha.4356
3211336 3211397 pts/6 00:00:00 ethercat_ma
3211336 3211398 pts/6 00:00:00 ethercat_ma
3211336 3211399 pts/6 00:00:00 ethercat_ma
3211336 3211400 pts/6 00:00:00 ethercat_ma

Steps to reproduce

  1. Use the ecat_start script to find the configuration of the EtherCAT Master process
  2. The script uses rosa run to run the EtherCAT Master
  3. The EtherCAT Master starts. It communicates with another node for topics and services. Wait for 30min-3h, and the fault will occur.

Fast DDS version/commit

2.14.0

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

Shared Memory Transport (SHM)

Additional context

Linux motion 5.15.158-rt76 #3 SMP PREEMPT_RT Tue Jun 11 07:18:25 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 22.04

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    need more infoIssue that requires more info from contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions