Skip to content

DOC: clarify missing-value handling in pandas and NumPy reductions#65441

Merged
rhshadrach merged 1 commit into
pandas-dev:mainfrom
praneethhere:doc-missing-data-numpy-reductions
May 6, 2026
Merged

DOC: clarify missing-value handling in pandas and NumPy reductions#65441
rhshadrach merged 1 commit into
pandas-dev:mainfrom
praneethhere:doc-missing-data-numpy-reductions

Conversation

@praneethhere

Copy link
Copy Markdown
Contributor

Closes #56939.

This PR adds a short note to the missing-data user guide explaining that pandas reductions skip missing values by default, while NumPy reductions such as numpy.std return nan when the input contains nan values.

The change is documentation-only and does not modify pandas behavior.

Validation:

  • pre-commit run --files doc/source/user_guide/missing_data.rst
  • python make.py --num-jobs 1 --single user_guide/missing_data.rst

@praneethhere

Copy link
Copy Markdown
Contributor Author

The failing check appears to be a CI environment/setup issue rather than a docs failure. The job failed in “Post Set up Conda” with:

ENOENT: no such file or directory, lstat '/home/runner/work/_temp/setup-micromamba/micromamba-shell'

Local validation passed:

  • pre-commit run --files doc/source/user_guide/missing_data.rst
  • python make.py --num-jobs 1 --single user_guide/missing_data.rst

@rhshadrach rhshadrach left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but to close out the issue we'd also need to update the API docs for DataFrame.std to read somethign like

        Notes
        -----
        To have the same behaviour as ``numpy.std``, use ``ddof=0`` (instead of the
        default ``ddof=1``) and ``skipna=False``.

Note I'm also fixing the single backticks here.

In addition, I think we should also add this note to all std methods (there are many - can search the codebase for def std()

Comment thread doc/source/user_guide/missing_data.rst Outdated
<api.series.stats>` and :ref:`here <api.dataframe.stats>`) all
account for missing data.

This default differs from many NumPy reduction functions. For example,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's clear here what "This default" refers to. Maybe change to "The default behavior"?

@praneethhere praneethhere force-pushed the doc-missing-data-numpy-reductions branch from 42f5034 to 4fe7196 Compare May 3, 2026 22:51
@praneethhere

Copy link
Copy Markdown
Contributor Author

@rhshadrach Thanks for the review. I addressed the requested changes by:

  • Updating the missing-data user guide wording from “This default” to “The default behavior”.
  • Updating the DataFrame.std and Series.std notes to mention both differences from numpy.std: ddof=0 and skipna=False.
  • Adding the same applicable note to GroupBy.std, which supports skipna.
  • Adding a ddof-only note to Resampler.std, since Resampler.std does not expose a skipna parameter.

Validation:

  • git diff --check
  • pre-commit run --files doc/source/user_guide/missing_data.rst pandas/core/frame.py pandas/core/series.py pandas/core/groupby/groupby.py pandas/core/resample.py
  • python make.py --num-jobs 1 --single user_guide/missing_data.rst

@rhshadrach rhshadrach left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 6, 2026
@rhshadrach rhshadrach added this to the 3.1 milestone May 6, 2026
@rhshadrach rhshadrach merged commit 57c1a81 into pandas-dev:main May 6, 2026
56 of 59 checks passed
@rhshadrach

Copy link
Copy Markdown
Member

Thanks @praneethhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC: Highlight the difference between DataFrame/pd.Series/numpy ops when there are NA values

2 participants