Skip to content

Add detection of text missing values in validation_report()#242

Merged
martinctc merged 12 commits into
mainfrom
copilot/fix-35
May 28, 2025
Merged

Add detection of text missing values in validation_report()#242
martinctc merged 12 commits into
mainfrom
copilot/fix-35

Conversation

Copilot AI commented May 28, 2025

Copy link
Copy Markdown
Contributor

This PR enhances the validation_report() function to detect and report on text strings that likely represent missing values but aren't actual NA values in R.

This PR also bumps versions to v1.9.2 and is submitted to CRAN.

Changes:

  1. Added new na_values parameter to hrvar_count_all() function with default value c("NA", "N/A", "#N/A", " ")
  2. Updated the missing value calculations to identify both:
    • Actual NA values (using is.na())
    • Text strings that represent missing values (like "NA", "N/A", "#N/A", spaces)
  3. Added reporting of these potential missing values in the output message with a count and the specific values found
  4. Added the na_values parameter to validation_report() so it can be customized when calling the report function
  5. Added a test case to verify the text missing values detection works correctly

Example:

When data contains text strings like "NA" or "N/A" in HR attributes, the validation report will now include a message like:

There are 15 values which may potentially represent missing values: NA, N/A, #N/A.

These values are now also properly counted in the missing values percentage calculations.

Fixes #35.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits May 28, 2025 08:42
Co-authored-by: martinctc <17925865+martinctc@users.noreply.github.com>
Co-authored-by: martinctc <17925865+martinctc@users.noreply.github.com>
Copilot AI changed the title [WIP] Feature: check for spaces and "NA" in validation_report() Add detection of text missing values in validation_report() May 28, 2025
Copilot AI requested a review from martinctc May 28, 2025 08:50
@martinctc martinctc marked this pull request as ready for review May 28, 2025 10:32
@martinctc

Copy link
Copy Markdown
Member

@copilot - tried submitting this package to CRAN, but returned the following issues in the incoming checks:

  Found the following (possibly) invalid URLs:
    URL: https://docs.microsoft.com/en-us/learn/modules/workplace-analytics-r-package/ (moved to https://learn.microsoft.com/en-us/learn/modules/workplace-analytics-r-package/)
      From: README.md
      Status: 301
      Message: Moved Permanently
    URL: https://docs.microsoft.com/en-us/workplace-analytics/ (moved to https://learn.microsoft.com/en-us/workplace-analytics/)
      From: README.md
      Status: 301
      Message: Moved Permanently
    URL: https://stackoverflow.com/questions/34522732/changing-fonts-in-ggplot2
      From: man/theme_wpa.Rd
      Status: 403
      Message: Forbidden
  For content that is 'Moved Permanently', please change http to https,
  add trailing slashes, or replace the old by the new URL.

Please remove the offending links from the above so that we can pass the checks.

Co-authored-by: martinctc <17925865+martinctc@users.noreply.github.com>

Copilot AI commented May 28, 2025

Copy link
Copy Markdown
Contributor Author

@copilot - tried submitting this package to CRAN, but returned the following issues in the incoming checks:

  Found the following (possibly) invalid URLs:

...

I've fixed the invalid URLs flagged by the CRAN checks in commit d0b620b:

  1. Updated URLs in README.md from docs.microsoft.com to learn.microsoft.com
  2. Removed the Stack Overflow URL from man/theme_wpa.Rd that was returning a 403 Forbidden status

These changes should resolve the reported issues with the invalid URLs.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • CRAN.R-project.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • bioconductor.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • cloud.r-project.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)
  • igraph.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • lifecycle.r-lib.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • martinctc.github.io
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • microsoft.github.io
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • rdrr.io
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)
  • www.fstpackage.org
    • Triggering command: /usr/lib/R/bin/exec/R --no-restore --no-echo --args nextArg--as-crannextArgwpa_1.9.2.tar.gz (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@martinctc

Copy link
Copy Markdown
Member

package wpa_1.9.2.tar.gz is on its way to CRAN - merging now

@martinctc martinctc merged commit be75742 into main May 28, 2025
3 checks passed
@martinctc martinctc deleted the copilot/fix-35 branch May 28, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: check for spaces and "NA" in validation_report()

2 participants