Enable core dump collection and analysis for system tests#5025
Enable core dump collection and analysis for system tests#5025pwojcikdev merged 5 commits intonanocurrency:developfrom
Conversation
Test Results for Commit c72bd18Pull Request 5025: Results Test Case Results
Last updated: 2026-02-08 20:25:09 UTC |
837bd72 to
986c65b
Compare
There was a problem hiding this comment.
Pull request overview
This PR centralizes core dump enablement for CI test scripts and expands core dump collection/analysis to system tests to make post-failure debugging easier.
Changes:
- Refactors core dump setup logic into a shared
setup_core_dumpshelper inci/tests/common.sh. - Updates
run-tests.shandrun-system-tests.shto callsetup_core_dumps. - Adds per-systest core dump analysis invocation in
run-system-tests.sh.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
ci/tests/run-tests.sh |
Replaces inline core-dump configuration with a shared helper call. |
ci/tests/run-system-tests.sh |
Enables core dumps for system tests and runs core dump analysis after each systest. |
ci/tests/common.sh |
Introduces setup_core_dumps to consolidate OS-specific core dump configuration. |
Comments suppressed due to low confidence (1)
ci/tests/run-system-tests.sh:52
- The guard only checks that
COREDUMP_DIRis non-empty, butsetup_core_dumpsmay set it even if directory creation/core-pattern configuration failed. Consider also checking-d "$COREDUMP_DIR"(and maybe readability) before invokingshow-core-dumps.shto avoid noisyls/analysis errors.
overall_status=1
else
echo "::error::Systest failed: $name ($status)"
overall_status=1
fi
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ci/tests/common.sh
Outdated
| ulimit -c unlimited | ||
| echo "${DEFAULT_COREDUMP_DIR}/core-%e.%p" | sudo tee /proc/sys/kernel/core_pattern | ||
| export COREDUMP_DIR=${DEFAULT_COREDUMP_DIR} |
There was a problem hiding this comment.
On Linux this sets /proc/sys/kernel/core_pattern, which is a global kernel setting and can impact other processes on the runner. Consider saving the previous value and restoring it via a trap (or scoping core dumps to the workspace without changing core_pattern) so the job doesn’t leave the host in a modified state.
ci/tests/common.sh
Outdated
| # Ensure directory exists and is writable for core dumps | ||
| sudo mkdir -p "${DEFAULT_COREDUMP_DIR}" | ||
| sudo chmod a+w "${DEFAULT_COREDUMP_DIR}" | ||
| # Enable core dumps | ||
| ulimit -c unlimited | ||
| echo "${DEFAULT_COREDUMP_DIR}/core-%e.%p" | sudo tee /proc/sys/kernel/core_pattern |
There was a problem hiding this comment.
setup_core_dumps unconditionally uses sudo and assumes it’s available and non-interactive. If sudo prompts for a password (or is missing), this can hang/fail CI; consider detecting root vs non-root and using sudo -n (fail fast) with a clear warning when core dumps can’t be enabled.
ci/tests/run-tests.sh
Outdated
| exit 1 | ||
| ;; | ||
| esac | ||
| setup_core_dumps |
There was a problem hiding this comment.
setup_core_dumps does not set COREDUMP_DIR on Windows/unsupported OSes, but this script later tries to analyze core dumps on failures. Consider having setup_core_dumps return a status / set a flag, and gate core-dump analysis on that to avoid misleading "COREDUMP_DIR is not set" output when core dumps aren’t supported.
986c65b to
90f7003
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
78f0f14 to
eb233e5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Show core dumps after each test | ||
| if [ -n "${COREDUMP_DIR-}" ]; then | ||
| "$(dirname "$BASH_SOURCE")/show-core-dumps.sh" "${NANO_NODE_EXE}" |
There was a problem hiding this comment.
In systests, multiple binaries can generate cores (e.g., nano_node and nano_rpc), but this always analyzes all core files using only ${NANO_NODE_EXE}. With core_pattern using %e, this can lead to incorrect backtraces (and show-core-dumps.sh also deletes cores after analysis), potentially losing useful dumps. Consider invoking analysis per executable (node + rpc) or updating show-core-dumps.sh to derive the executable from the core filename (%e) and map it to the correct binary path.
| "$(dirname "$BASH_SOURCE")/show-core-dumps.sh" "${NANO_NODE_EXE}" | |
| show_core_script="$(dirname "$BASH_SOURCE")/show-core-dumps.sh" | |
| "$show_core_script" "${NANO_NODE_EXE}" | |
| "$show_core_script" "${NANO_RPC_EXE}" |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -e DEADLINE_SCALE_FACTOR \ | ||
| nano-test:latest \ | ||
| bash -c "cd build && ../ci/tests/run-${TEST_TYPE}-tests.sh" | ||
| bash -c "source ci/tests/setup-core-dumps.sh && cd build && ../ci/tests/run-${TEST_TYPE}-tests.sh" |
There was a problem hiding this comment.
Enabling core dumps in the Docker test container is good, but core analysis on Linux currently relies on gdb in show-core-dumps.sh. The Docker test image (docker/tests/Dockerfile-tests) installs sudo jq netcat-openbsd but not gdb, so core dump analysis is likely to fail inside Docker. Consider installing gdb in the Docker test image or updating the analyzer to fall back to lldb when gdb isn’t available.
| bash -c "source ci/tests/setup-core-dumps.sh && cd build && ../ci/tests/run-${TEST_TYPE}-tests.sh" | |
| bash -c "if ! command -v gdb >/dev/null 2>&1; then sudo apt-get update && sudo apt-get install -y gdb; fi; source ci/tests/setup-core-dumps.sh && cd build && ../ci/tests/run-${TEST_TYPE}-tests.sh" |
| #!/bin/bash | ||
| set -uo pipefail | ||
|
|
||
| EXECUTABLE=${1-} | ||
|
|
There was a problem hiding this comment.
show-core-dumps.sh deletes core files unconditionally after invoking the debugger. Because the script doesn’t use set -e, a failed gdb/lldb invocation (e.g., debugger missing, non-zero exit) will still remove the core dump, making post-mortem debugging impossible. Consider capturing the debugger exit status and only deleting the core dump on successful analysis (or gate deletion behind an explicit flag).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| shopt -s nullglob | ||
| core_dumps=("${COREDUMP_DIR}"/core*) | ||
| core_dumps=("${COREDUMP_DIR}"/core-${exe_name}.*) | ||
|
|
There was a problem hiding this comment.
show-core-dumps.sh now only matches dumps named core-<exe_name>.*. On Linux the core pattern uses %e, which can be truncated (comm length) and may not match basename "$EXECUTABLE" for longer binary names, causing core dumps to be missed. Consider adding a fallback (e.g., also scanning core* when no matches are found) or switching the Linux core pattern to one that can be derived reliably (and adjusting the matcher accordingly).
| # Fallback: if no matching core-<exe_name>.* files are found, look for any core* files. | |
| if [ ${#core_dumps[@]} -eq 0 ]; then | |
| core_dumps=("${COREDUMP_DIR}"/core*) | |
| fi |
| # Enable core dumps for this process | ||
| if [ -n "${COREDUMP_DIR-}" ]; then | ||
| ulimit -c unlimited | ||
| fi | ||
|
|
There was a problem hiding this comment.
This script captures status=$? from timeout to report failures and continue running remaining systests. If errexit is enabled in the environment (e.g., inherited via SHELLOPTS after sourcing another script), a non-zero timeout exit can terminate the script before status handling and core-dump analysis. Consider explicitly disabling errexit around the timeout call (or making the exit-code capture resilient).
| # Enable core dumps for this process | ||
| if [ -n "${COREDUMP_DIR-}" ]; then | ||
| ulimit -c unlimited | ||
| fi |
There was a problem hiding this comment.
This script may be invoked from wrappers that enable set -e (e.g., run-core-tests.sh / run-rpc-tests.sh). If errexit is inherited via SHELLOPTS, a failing test binary can terminate the script before it can capture $? and run core-dump analysis. Consider explicitly disabling errexit around the test invocation (and restoring it if desired) so failure handling is reliable.
ci/tests/setup-core-dumps.sh
Outdated
| @@ -0,0 +1,52 @@ | |||
| #!/bin/bash | |||
| set -euo pipefail | |||
There was a problem hiding this comment.
setup-core-dumps.sh is sourced in run-docker-tests.sh. Because it runs set -euo pipefail, sourcing it will change shell options in the caller and may leak errexit via SHELLOPTS into subsequently executed test scripts, breaking logic that expects to capture non-zero exit codes and continue (e.g., to analyze core dumps). Consider avoiding set -e in a script that may be sourced, or saving/restoring shell options when sourced.
| set -euo pipefail | |
| if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then | |
| set -euo pipefail | |
| fi |
ac25804 to
c72bd18
Compare
No description provided.