stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI()#11830
stm32l152re: fix hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI()#11830fjmolinas wants to merge 2 commits intoRIOT-OS:masterfrom
Conversation
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
- By default openocd sets DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP when examine-end is called. This means that `make reset`, `make debug`, `make flash` set these bits. When enabled HCLK and FCLK are not sopped in sleep mode, this changes the base behaviour of the MCU and increases power consumptio.
|
NOTE: 510ffce could be applied for all STM32 but I want to keep this PR focused on STM32L1 since its the only platform where this triggered hardfaults. I could be extended to other STM32 in subsequent PR's. |
|
@cladmi I don't know why the github api doesn't let me tag you as a reviewer but I would appreciate you taking a look, specially at the openocd stuff :) |
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
|
I'm on the fence about keeping 510ffce in this PR. Would it make more sense to open a PR that implements this for all stm32* boards? I'm trying to PR them upstream but I'm not sure how long it would take, I could be a good option to get this into RIOT and remove it if it gets upstream. @cladmi since you are deeply involved in everything regarding GDB, what do you think of this approach? Would you suggest another one? |
|
BTW in the wiki it says you have this board in Berlin, any chance you can give it a tests @kaspar030 @cladmi? |
|
I would love to look at it, especially completely understand the |
|
Just as a procedural note, PR titles should say "fix problem foo", not "fix issue #1234". References to issues can be added to the PR description and / or commit messages. |
|
I'm going to split this PR into two, I got some comment on my PR upstream regarding the openocd changes. |
- The __NOP() that was added in RIOT-OS#8518 is now remooved. - When DBG_STANDBY, DBG_STOP or DBG_SLEEP are set in DBG_CR a hardfault occurs on wakeup from sleep. This was first diagnosed in RIOT-OS#8518. When enabled, a hardfault occured when returning from a branch to irq_restore() we avoid the call by inlining the function call. See RIOT-OS#11830 for more details.
|
Did the other PR resolve this already? |
Part of it, I was trying to resolve the rest upstream, but it staled. |
Contribution description
When #7385 was introduced when going to sleep (i.e. calling
__WFI()),irq_enablewas changed toirq_restoreand that brokestm32l152re.#8518 fixed the issue by introducing a
__NOP()after__WFI(). This fixed the issue until #11159 where the call tocortexm_sleepwas changed and now__NOP()didn't fix the issue but instead somehow triggered it.After debugging in #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when
DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEPwhere enabled. By default openocd sets these bits after anexamine-endevent. This is done by default for all stm32 boards.https://github.com/ntfreak/openocd/blob/263deb3802a515ba8155b6c59146f0f539de4e43/tcl/target/stm32l1.cfg#L93-L100
510ffce fixes this behavior by only setting those bit when
gdbis launched. No need to enable debugging in other cases. This fixed #11820 for normal usage. When debugging this issue was still present.Multiple changes/fixes where attempted in #11820 to fix the issue. Non of them sufficiently consistent or un-intrusive.
As for all I can tell, and as stated in #8518, the issue was the loss of the content of
r0when jumping toirq_restorewhich resulted in a corruptedpctrying to execute instructions out of the flash region.Since the problem was the branch, with 5d96127
irq_restorewas replaced by__set_PRIMASK(state);which inlines the function call avoiding the jump and the whole issue all together.The disassembly:
Original disassembly:
Testing procedure
In master this
failshardfaults:make -C tests/xtimer_drift BOARD=nucleo-l152re -j3 flash termStart a debug session and
print *0xE0042004make -C tests/xtimer_drift BOARD=nucleo-l152re debugremove
-event gdb-attachand-event gdb-dettachfromboards/nucleo-l152re/dist/openocd.cfgand see that now the address content is 0.python dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --jobs 4 . nucleo-l152re, I got the following tests fails (expected fails)Issues/PRs references
Fixes #11820
See also #8518 & #8024