Skip to content

docs: document approximate algorithm and Dask-specific params in describe()#12300

Merged
jacobtomlinson merged 4 commits intodask:mainfrom
cluster2600:fix-describe-docstring-approximate-10416
Mar 11, 2026
Merged

docs: document approximate algorithm and Dask-specific params in describe()#12300
jacobtomlinson merged 4 commits intodask:mainfrom
cluster2600:fix-describe-docstring-approximate-10416

Conversation

@cluster2600
Copy link
Copy Markdown
Contributor

Summary

Closes #10416

The describe() method silently uses an approximate algorithm for percentile computation (used for the 25%, 50%, 75% statistics). This can produce results that differ from pandas, which confuses users comparing outputs side-by-side.

Previous PRs (#11973, #12288, #12289) attempted to address this in docs/source files. A subsequent review on #12113 requested the fix be placed in the describe() docstring directly. This PR does exactly that.

Changes

Added an explicit docstring to both DataFrame.describe() and Series.describe() that:

  1. Adds a .. note:: block explaining that percentiles are computed using an approximate algorithm by default, and that results may differ slightly from pandas.
  2. Documents split_every – a Dask-specific parameter not present in pandas.
  3. Documents percentiles_method – a Dask-specific parameter not present in pandas, offering 'dask' and 'tdigest' options.

Since both methods use @derived_from(pd.DataFrame/pd.Series), the docstring is prepended to the inherited pandas docstring (following the existing pattern used by e.g. quantile()).

Testing

No new tests needed – this is a documentation-only change. Existing tests continue to pass.

…ribe()

The describe() method uses an approximate algorithm (by default) for
computing percentiles, which can produce results that differ slightly from
pandas. This was undocumented, confusing users who compare results.

Add an explicit docstring to both DataFrame.describe() and Series.describe()
that:
- Notes the approximate nature of percentile computation
- Documents the Dask-specific  parameter
- Documents the Dask-specific  parameter

Resolves: dask#10416
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 19, 2026

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     21 files  ± 0       21 suites  ±0   5h 31m 5s ⏱️ + 1m 41s
 18 289 tests + 5   17 015 ✅ + 5   1 274 💤 ± 0  0 ❌ ±0 
317 304 runs  +81  273 550 ✅ +70  43 754 💤 +11  0 ❌ ±0 

Results for commit 143e153. ± Comparison against base commit 45610ac.

♻️ This comment has been updated with latest results.

Comment on lines +3918 to +3925
.. note::

Dask computes percentiles (used for the ``25%``, ``50%``, and
``75%`` statistics) using an **approximate algorithm** by default.
Results may therefore differ slightly from pandas. Use
``percentiles_method='dask'`` for the built-in Dask algorithm or
``percentiles_method='tdigest'`` for the t-digest algorithm.
See :meth:`dask.dataframe.DataFrame.quantile` for details.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in a note block instead of just in the description?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good pont, will move it into the main description - was overthinking the formating there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — moved the text into the main description body and dropped the note block.

Comment on lines +3935 to +3938
percentiles_method : {'default', 'tdigest', 'dask'}, optional
Method for computing percentiles. ``'default'`` uses the internal
Dask algorithm. ``'tdigest'`` uses the t-digest algorithm for
floats and ints and falls back to ``'dask'`` otherwise.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use double quotes for string?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sorry about that - will updaet to double quotes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to double quotes throughout.

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
- Remove note block, move content into main description
- Use double quotes consistently for string values

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@jacobtomlinson jacobtomlinson merged commit 30ea561 into dask:main Mar 11, 2026
29 checks passed
@cluster2600 cluster2600 deleted the fix-describe-docstring-approximate-10416 branch March 11, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document that describe is using approximate algorithms

2 participants