Skip to content

Attempt to resolve: https://github.com/dask/dask/issues/6307#6318

Merged
TomAugspurger merged 1 commit intodask:masterfrom
asmith26:add_split_out_doc
Jun 19, 2020
Merged

Attempt to resolve: https://github.com/dask/dask/issues/6307#6318
TomAugspurger merged 1 commit intodask:masterfrom
asmith26:add_split_out_doc

Conversation

@asmith26
Copy link
Contributor

  • Add function (that is called by derived_from) to update docstrings for methods containing the split_out parameter.
  • Add split_out documentation to dataframe best practices.
  • Tests added / passed
  • Passes black dask / flake8 dask

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this would generate different docs for SeriesGroupBy.size, sorry.




By default groupby methods return an object with only 1 partition. This is to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite feel like the right section for this. The header here is "Reduce, then use Pandas" so the assumption is that the user wants an in-memory object back.

I think a new section at the end of dataframe-design.rst is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in edfc7a5

@asmith26
Copy link
Contributor Author

Thanks for the feedback @TomAugspurger

I agree, I've created a new section in dataframe-design.rst

Regarding the indentation problem, I've now noticed a few differences to how the online Dask doc looks and what I'm building locally (including for parts of the docs I haven't touched). Is there anywhere I can see how my changes are actually built for the website (e.g. I think they are built by maybe ReadTheDocs)?

@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 15, 2020 via email

@asmith26
Copy link
Contributor Author

asmith26 commented Jun 18, 2020

Perhaps a difference in dependencies? The docs are built with docs/requirements-docs.txt.

Thanks for the suggestion, I'm using these requirements.

I think you can view the build logs at https://readthedocs.org/projects/dask/builds/11243345/ to verify.

Thanks. I had a thought and realized I could build the docs myself with readthedocs. Most things are looking better/as expected, including the new section in dataframe-design.rst: https://asmith26-demo.readthedocs.io/en/latest/dataframe-design.html#groupby

Unfortunately building with readthedocs has not fixed the indentation problem, e.g. drop_duplicates is misaligned, though some are OK (as I found locally) like size. The misalignment appears to be due to some of the original docs using <blockquotes> html tags whilst (I think the majority of) others use just standard <p> tags. E.g. for drop_duplicates:

<blockquote>
<div><p>Return DataFrame with duplicate rows removed.</p>
<p>This docstring was copied from pandas.core.frame.DataFrame.drop_duplicates.</p>
<p>Some inconsistencies with the Dask version may exist.</p></div>
</blockquote>

<p>An explanation of the <cite>split_out</cite> parameter can be found <a class="reference internal" href="dataframe-design.html#dataframe-design-groupby"><span class="std std-ref">here</span></a>.</p>

<blockquote>
<div><p>Considering certain columns is optional. Indexes, including time indexes
are ignored.</p></div>
</blockquote>

Consequently, I feel there are a few ways to proceed:

  1. Just add the new section in the dataframe-design.rst (i.e. not update the API docs). I think this does provide sufficient information for end users.
  2. Try to understand why some of the (I think pandas) docs are creating <blockquote> html tags (I'm pretty flummoxed by this though).
  3. Commit this and don't worry about the misalignment of the doc (not as pretty as it could be, but does inform the user).

I think for simplicity, I'm happy to go with 1. What are your thoughts? Many thanks again for your helps and advice :)

@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 19, 2020 via email

@asmith26 asmith26 marked this pull request as ready for review June 19, 2020 18:04
@asmith26
Copy link
Contributor Author

Thanks for letting me know. I think my latest push completes this now then.

@TomAugspurger TomAugspurger merged commit fc99193 into dask:master Jun 19, 2020
@TomAugspurger
Copy link
Member

Thanks @asmith26!

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants