Remove overly broad stall regex pattern causing false positive panic detections#4082
Conversation
Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com>
|
@LiliDeng LGTM |
|
@copilot yea I prefer having a branch name yzho/lisa-fix, and create a PR from yzho/lisa-fix to main |
I understand you prefer the branch name The code changes are complete and identical - only the branch name differs. To use
Alternatively, if you have repository admin access, you could rename the existing Would you like me to provide specific step-by-step instructions for either approach? |
…detections (#4082) * Initial plan * Remove overly broad stall regex pattern to prevent false alarms Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com>
…detections (#4082) * Initial plan * Remove overly broad stall regex pattern to prevent false alarms Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com>
…detections (#4082) * Initial plan * Remove overly broad stall regex pattern to prevent false alarms Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com>
…detections (#4082) * Initial plan * Remove overly broad stall regex pattern to prevent false alarms Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com>
…#4058) * Initial plan * Add unified message support for iperf3 TCP and UDP performance metrics Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * Refactor unified message methods to use parsed fields and add connections_num as parameter Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * Merge main and use Parameter relativity for connections_num and buffer_size metrics Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * Remove conn_suffix from metric names Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * Revert "Move examples and microsoft directories into the Python package (#4023)" (#4063) This reverts commit 89e7b53. * Reapply "Move examples and microsoft directories into the Python package (#4023)" (#4063) This reverts commit efe1cd3. * runbook: fix path for legacy layout * Add UnifiedMessage support for NetworkLatencyPerformanceMessage * kdump: Replace CvmDisabled with before_case SecurityProfile check (#4032) * kdump: Replace CvmDisabled with before_case SecurityProfile check * kdump: Fix SecurityProfile check to skip only CVM and Stateless VMs - Remove empty simple_requirement() calls (unnecessary) - Optimize f-string usage (only use f-prefix where needed) - Remove unused simple_requirement import * Add detailed panic categorization and error code extraction * enrich SerialConsole.check_panic() to return detailed panic * Added tests for network related components (#4009) * notifier: remove pytest-html dependency Replace pytest-html dependency with custom HTML report generator using string.Template. This change provides better control over report formatting and reduces external dependencies. * runbook: fix microsoft package name for new paths. The new path is still able to be written like "microsoft/testsuites", so that it needs to use "microsoft" instead of "testsuites" as the package name. * Remove watchdog pattern from serial console panic detection (#4075) * fix verify_cpu_count and improve PowerShell - Implement calculate_vcpu_count() method in WindowsLscpu class to fix verify_cpu_count test failure on Windows - Add null check for stderr in PowerShell.wait_result() to prevent errors when PowerShell is used to run cmd commands with no stderr output * iDRAC: Handle HTTP 500 internal errors with service reset * Fix Hyper-V Stop-VM to use TurnOff on timeout/failure * Remove overly broad stall regex pattern causing false positive panic detections (#4082) * Initial plan * Remove overly broad stall regex pattern to prevent false alarms Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> * Revert "skip test if hv_netvsc driver is not used" This reverts commit f6fdcf7. * change kselftest required /tmp/ size to 1GB for Overlake SoC limited space * Add enabled switch for environments and nodes This change introduces an `enabled` boolean field at both the environment and node levels, allowing selective loading of configurations through runbook variables. Example: environment: - name: my_env enabled: $(use_first_env) # Variable-controlled nodes: - type: local name: node1 enabled: true - type: local name: node2 enabled: false # Skip this node * Process: Raise exception on timeout. (#4077) * Skip tests on L1VH Nodes (#4078) * mshv: skip checking logfile size on l1vh L1VH parents by default don't have any entries in mshvlog file. Skip checking logfile size on these nodes. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> * mshv: skip mshvtrace test on l1vh Nodes L1VH nodes cannot collect performance traces. Skip the related test on the L1VH nodes. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> --------- Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> * Set minimum TLS setting 1.2 for storage accounts Support for TLS 1.0 and 1.1 will be discontinued for all Azure Storage accounts. The guidance is to migrate to minumum TLS version 1.2. https://learn.microsoft.com/en-us/azure/storage/common/transport-layer-security-configure-migrate-to-tls2#why-use-tls-12 * Fix IPTable Test (#4088) * Add virtualization feature * doc: fix doc path after test code moved. * doc: fix some build warnings. * doc: allow duplicate test case names in different test suites. * Fix VHD schema documentation to show nested hyperv_generation field (#4100) * changes to install xxhash tool before building kernel * Modrpobe command update for verbose is false * Document resource_group_tags parameter for Azure runbook (#4101) * Add Host version tracking for baremetal and HyperV platforms * Convert GPU Driver installation to Tool, Add amd-smi (#4080) * ch perf: Implement comprehensive performance stabilization framework * Classify /bin/true redirections in kernel modules as not loaded Previously, `is_module_loaded` returned True (loaded) when `modprobe -nv` produced a blacklist directive like 'install /bin/true', causing test cases like verify_floppy_module_is_blacklisted although module was not actually loaded. Added a minimal check for the install /bin/true pattern and now treat it as not loaded, returning False. * Kdump: Enhnace error log for incomplete dump file * Update Nested Feature Supported list in Azure * Create dm-cache test (#4093) * Fix nvme device path fetch logic * DPDK: add netvsc rescind tests (#4076) * Remove squirrelsc from CODEOWNERS file Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * UnifiedPerfMessage: add metric_str_value to store string value (#4107) * UnifiedPerfMessage: add str_value to store string value * Rename str_value to metric_str_value in UnifiedPerfMessage (#4108) * Initial plan * Rename str_value to metric_str_value for consistency Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> * Pass through MIGRATABLE_VERSION from pipeline environment * Add UnifiedMessage support for NetworkPPSPerformanceMessage (#4057) * Initial plan * Rebase on latest main branch * Initial plan * Initial plan * Rebase on latest main branch * Sync latest code from main branch * Clean commit history - single commit for PR changes * Add connections_num and buffer_size to metric names as suffix - Remove separate connections_num and buffer_size_bytes metrics - Add suffix format: _conn_{connections_num}_buffer_{buffer_size} - Apply suffix to all TCP metrics: rx/tx_throughput_in_gbps, congestion_windowsize_kb, retransmitted_segments - Apply suffix to all UDP metrics: rx/tx_throughput_in_gbps, data_loss - This allows distinguishing results by connection count and buffer size Co-authored-by: LiliDeng <10083705+LiliDeng@users.noreply.github.com> * Fix flake8 errors: remove trailing whitespace from blank lines - Remove trailing whitespace from line 492 in send_iperf3_tcp_unified_perf_messages - Remove trailing whitespace from line 534 in send_iperf3_udp_unified_perf_messages - Fixes W293 flake8 warnings and BLK100 black formatting issue Co-authored-by: LiliDeng <10083705+LiliDeng@users.noreply.github.com> --------- Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: squirrelsc <27178119+squirrelsc@users.noreply.github.com> Co-authored-by: LiliDeng <lildeng@microsoft.com> Co-authored-by: Chi Song (from Dev Box) <chisong@microsoft.com> Co-authored-by: Vivek Yadav <vyadav@microsoft.com> Co-authored-by: Balashivaram Ganesan <71939272+Balashivaram@users.noreply.github.com> Co-authored-by: lesscodingmorehappiness <81588170+lesscodingmorehappiness@users.noreply.github.com> Co-authored-by: Panfeng Xue <paxue@microsoft.com> Co-authored-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-authored-by: Sebastian Heid <8442432+s4heid@users.noreply.github.com> Co-authored-by: Umang Francis <umfranci@microsoft.com> Co-authored-by: rabdulfaizy <rabdulfaizy@microsoft.com> Co-authored-by: Aditya Nagesh <adityanagesh@microsoft.com> Co-authored-by: Rachel Menge <rachelmenge@microsoft.com> Co-authored-by: Kanchan Sen Laskar <kasenlaskar@microsoft.com> Co-authored-by: mcgov <mamcgove@microsoft.com> Co-authored-by: LiliDeng <10083705+LiliDeng@users.noreply.github.com>
The serial console panic detector was matching any log line containing "stall" (e.g., "install", "reinstall"), triggering false alarms.
Changes
re.compile(r"^(.*stall.*)$", re.MULTILINE | re.IGNORECASE)frompanic_patternsinlisa/features/serial_console.pyrcu_schedpattern which, combined with the existing categorization logic requiring both "RCU" and "STALL" keywords, still correctly detects legitimate RCU stallsOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.