Skip to content

GH-38341: [Python] Remove usage of pandas internals DatetimeTZBlock#38321

Merged
jorisvandenbossche merged 7 commits intoapache:mainfrom
jorisvandenbossche:pandas-internals
Jan 8, 2024
Merged

GH-38341: [Python] Remove usage of pandas internals DatetimeTZBlock#38321
jorisvandenbossche merged 7 commits intoapache:mainfrom
jorisvandenbossche:pandas-internals

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche commented Oct 18, 2023

Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

We only need to ensure the data passed is 2D, because it is always for a Block in a dataframe (not series), and otherwise it complains the placement doesn't match.

Part of #35081

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #35081 has been automatically assigned in GitHub to PR creator.

@apache apache deleted a comment from github-actions bot Oct 18, 2023
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

So it seems that pandas doesn't preserve the unit in the DatetimeArray constructor:

In [4]: arr = np.array([1, 2, 3], dtype="datetime64[s]")

In [5]: dtype = pd.DatetimeTZDtype("s", tz="Europe/Brussels")

In [6]: pd.arrays.DatetimeArray(arr, dtype)
Out[6]: 
<DatetimeArray>
['1970-01-01 01:00:01+01:00', '1970-01-01 01:00:02+01:00',
 '1970-01-01 01:00:03+01:00']
Length: 3, dtype: datetime64[ns, Europe/Brussels]

This seems fixed on the last pandas release 2.1, so will have to keep the older code using internals for the older pandas versions.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

@github-actions crossbow submit -g python

@github-actions
Copy link
Copy Markdown

Revision: a133014

Submitted crossbow builds: ursacomputing/crossbow @ actions-eda4ccd5da

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.10-cython2 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-v3.5.0 Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.11-spark-master Github Actions
test-conda-python-3.12 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.5.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-35-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 Github Actions

@jorisvandenbossche jorisvandenbossche changed the title GH-35081: [Python] Remove usage of pandas internals DatetimeTZBlock GH-38341: [Python] Remove usage of pandas internals DatetimeTZBlock Oct 19, 2023
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #38341 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

@github-actions crossbow submit -g python

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2023

Revision: dfcfa22

Submitted crossbow builds: ursacomputing/crossbow @ actions-6fbf28b83d

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.10-cython2 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-v3.5.0 Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.11-spark-master Github Actions
test-conda-python-3.12 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.5.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-38-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 Github Actions

@jorisvandenbossche jorisvandenbossche merged commit 6b93c4a into apache:main Jan 8, 2024
@jorisvandenbossche jorisvandenbossche removed the awaiting committer review Awaiting committer review label Jan 8, 2024
@jorisvandenbossche jorisvandenbossche deleted the pandas-internals branch January 8, 2024 13:21
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6b93c4a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…lock (apache#38321)

### Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

Part of apache#35081

* Closes: apache#38341

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] pandas internals: avoid using DatetimeTZBlock

1 participant