API: C-API removals by seberg · Pull Request #25292 · numpy/numpy

seberg · 2023-12-01T18:32:28Z

These removes some functions from the C-API that seemed rather uncontroversial to me. Many are undocumented, some just felt a bit odd without any real use-case.

From the release note as an overview:

PyArrayFlags_Type and PyArray_NewFlagsObject as well as
PyArrayFlagsObject are private now.
There is no known use-case, use the Python API if needed.
PyArray_MoveInto, PyArray_CastTo, PyArray_CastAnyTo are removed
use PyArray_CopyInto and if absolutely needed PyArray_CopyAnyInto
(the latter does a flat copy).
PyArray_FillObjectArray is removed, its only true use is for
implementing np.empty. Create a new empty array or use
PyArray_FillWithScalar() (decrefs existing objects).
PyArray_CompareUCS4 and PyArray_CompareString are removed.
Use the standard C string comparison functions.
PyArray_ISPYTHON is removed as it is misleading, has no known
use-cases and easy to replace.
PyArray_FieldNames is removed as it is unclear what it would
be useful for. It also has incorrect semantics in some possible
use-cases.
PyArray_TypestrConvert since it seems a misnomer and unlikely
to be used by anyone. If you know the size or are limited to
few types, just use it explicitly, otherwise go via Python
strings.

I skipped removing PyArray_CopyAnyInto, it is a raveling copy and that seems dubious to me, but OK. Having 3 aliases around felt unnecessary, though.

The trickiest part is that I used the Cast* slots for the new Copy as the functions are fully equivalent and it seemed nice in that corner of the table.

PyArray_FillObjectArray is also renamed and reduced to only filling with None. It should be replaced, as the concept is quite problematic. I don't see it being used, otherwise the solution would be to tag on a big warning...

FieldNames is odd, because while we use the same logic in construction, I don't see users doing that. But if you use it to get the names, it reorders them, but name order is important, so that seems misleading.

Should be rather straight forward, if someone feels strongly about being able to compile partial commits, squash them, as I only updated the C version at the end.

mattip · 2023-12-03T08:29:05Z

Doc building is failing. There is a warning about missing files, but I don't think that is the cause:

WARNING: Path to light image logo does not exist: numpylogo.svgumpy.asarray_chkfinite
WARNING: Path to dark image logo does not exist: numpylogo_dark.svg

I couldn't rerun only the circleci CI.

seberg · 2023-12-03T09:21:32Z

Yeah, no idea... just closing/reopening to see if that helps.

seberg · 2023-12-04T10:01:38Z

Still beats, me... @tupui maybe a long shot, but do you happen to know if this is known from pydata-sphinx-theme or so?! I don't see why this PR should include anything causing the build to trip building. The error is:

updating environment: [new config] 2587 added, 0 changed, 0 removed
WARNING: Path to light image logo does not exist: numpylogo.svgumpy.asarray_chkfinite

(Which is clearly a malformed path, but one that seems unrelated to anything here.)

array_chkfinite is just the first function docs/html we build, I believe. I don't understand why none of the other PRs seem to fail. Could there be some caching that means they simply don't rebuild or are getting other dependencies, or is there some odd cause here?

mattip · 2023-12-04T10:32:24Z

Does document building work locally?

melissawm · 2023-12-04T17:14:50Z

Ok while debugging this I hit a nasty segmentation fault which completely froze my computer. I'll try to describe what I did:

Building locally following the usual steps in https://numpy.org/devdocs/dev/howto_build_docs.html doesn't work for me. I get errors about Matplotlib (I think?) being compiled with a different Python ABI. I realized this also has to do with a python=3.9 pin on our environment.yml file that I think needs to be updated (see MAINT: Update environment.yml to match *_requirements.txt #25308)
Next, I created a virtualenv environment to match what is described in https://github.com/numpy/numpy/blob/main/.circleci/config.yml This got me to the error we see here in CI. The logo warning is legitimate but shouldn't block the build (see DOC: Fix path to svg logo files #25309). The next error is the meaningful one:

Traceback (most recent call last):
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/cmd/build.py", line 298, in build_main
    app.build(args.force_all, args.filenames)
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/application.py", line 355, in build
    self.builder.build_update()
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 293, in build_update
    self.build(to_build,
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 313, in build
    updated_docnames = set(self.read())
                           ^^^^^^^^^^^
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 418, in read
    self._read_parallel(docnames, nproc=self.app.parallel)
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 474, in _read_parallel
    tasks.join()
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/util/parallel.py", line 102, in join
    if not self._join_one():
           ^^^^^^^^^^^^^^^^
  File "/home/circleci/repo/venv/lib/python3.11/site-packages/sphinx/util/parallel.py", line 120, in _join_one
    exc, logs, result = pipe.recv()
                        ^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
    raise EOFError
EOFError

Exception occurred:
  File "/home/circleci/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
    raise EOFError
EOFError
The full traceback has been saved in /tmp/sphinx-err-agt51dsa.log, if you want to report the issue to the developers.

So, because I've seen similar issues before, I tried building the docs with no parallel execution (removed -j2 from the docs build command). This gave me a segmentation fault the first time I tried it, and froze my linux computer completely the second time. No idea what this is but it looks like something coming from sphinx...

ngoldbaum · 2023-12-05T19:39:23Z

Docs seem to build fine with this PR checked out on an M3 Mac running Mac OS Sonoma.

rgommers · 2023-12-08T09:17:36Z

That all seems very reasonable. The only one of these that is used in SciPy is PyArrayFlags_Type (once):

https://github.com/scipy/scipy/blob/fa9f13e6906e7d00510d593f7f982db30e4e4f14/scipy/sparse/linalg/_dsolve/_superlumodule.c#L352-L355

I can't say that it's immediately obvious to me what is going on there.

seberg · 2023-12-08T09:56:16Z

Well, let's see if rebasing magically fixes the doc build issue... I have not really tried to dig deeper into it after Melissa had a look.

mattip · 2023-12-08T11:01:38Z

The build is still failing. My guess is that there is a segfault (maybe a header is still there but the implementation is missing?) in the worker thread, it dies without reporting anything, and the main thread can no longer communicate with it. Perhaps changing the doc build to non-parallel might expose the segfault? We are building with

sphinx-build -b html -WT --keep-going -d build/doctrees  -j2 -n source build/html

maybe the -j2 could be forced to j1

seberg · 2023-12-08T11:16:56Z

Are we just being stupid? We have scipy.linalg imports in NumPy. And those do that nonsense INCREF that Ralf found. That is bound to segfault.

We change API to break ABI, and in the rare case where it actually matters, our doc builds are bound to be broken until downstream is recompiled.

EDIT: also scipy.signal import seems to be sufficient, likely more.

mattip · 2023-12-08T11:37:36Z

Are we just being stupid?

Probably :)

seberg · 2023-12-08T11:47:20Z

Let's see if scipy/scipy#19656 is merged quickly. But I think I am willing to gamble and live with a briefly broken doc build until new downstream nightlies are uploaded.

rgommers · 2023-12-08T12:32:35Z

I'll get that SciPy PR merged and will see if I can get some new wheels up. There are a bunch of other failures preventing successful wheel uploads right now - we're kinda in circular dependencies territory at the moment.

We never use them, they are an odd concept, as they pretend our doubles are Python doubles which they are not really (maybe once upon a time). The concept thus seems useless to me, nothing maps to a Python object clearly, and all integers map to Python ints with `.item()`, so what does it mean?

No idea, I don't think this was ever used in NumPy...

These are aliases around CopyInto and CopyAnyInto effectively, so move them to those slots.

I have not found a single use of this. In theory, it is designed to give 3rd party extensions access to your type number. In practice, 3rd parties using C doesn't really exist and should be able to get the descriptor object via Python.

We never use this, and it wasn't even documented...

How likely is it anyone needs to parse this, just use the type number directly.

It probably is also wrong if you use it on a field dict directly since it mangles up the order. The logic was designed for a *new* dtype creation from only a dict, and even there it is unclear that we should use it, since Python guarantees dict order... In either case, I can't imagine a reasonable use-case, we don't use it.

The function is what ensures that `np.empty()` leaves None objects rather than NULL to help others who don't take care to explicitly support NULL everywhere. It serves no other purpose and that purpose is exceedingly unlikely to be useful, especially since it requires dirty cleaning out of the original array first.

It's still terrible, but at least make it clear what it is used for...

There is no need anyone should really need these in C and getting the attribute isn't even so bad even if you do. I might have been convinced if this was documented, as is, the only usecase would be a custom arrayscalar and it can steal the flags from any of ours if it wants to...

seberg · 2023-12-11T17:42:51Z

OK, scipy update is through and sufficient to make the docs build pass here.

seberg · 2023-12-13T19:48:48Z

Btw. I think this one is ready to go in now.

mattip · 2023-12-13T22:01:24Z

Thanks @seberg

jakevdp · 2023-12-19T23:53:41Z

Hi - I just ran into this when attempting to compile ml_dtypes against the nightly numpy release. ml_dtypes package uses PyArray_TypeNumFromName here: https://github.com/jax-ml/ml_dtypes/blob/6f7b695c95004c9b9b1d070918a5091920624e0c/ml_dtypes/_src/custom_float.h#L873-L882 Hopefully the long comment gives enough context – can you suggest an alternative API to use for this case?

jakevdp · 2023-12-20T00:59:30Z

It might be that our best option is to simply remove the highlighted code block, as the TODO there has been satisfied by the creation of the ml_dtypes package.

seberg · 2023-12-20T06:19:55Z

I thought you can just grab it from Python via PyArray_DescrConverter(unicode_string("name")) or similar. If we support names for this, it seems like they need to work for Python also, and not be its own mechanism?

jakevdp · 2023-12-20T17:04:52Z

Thanks @seberg - I'll give that a try.

github-actions Bot added the 30 - API label Dec 1, 2023

seberg closed this Dec 3, 2023

seberg reopened this Dec 3, 2023

seberg force-pushed the more-c-removals branch 2 times, most recently from bf86ada to 7aa9796 Compare December 4, 2023 09:37

seberg force-pushed the more-c-removals branch from 7aa9796 to a638f28 Compare December 8, 2023 09:49

seberg force-pushed the more-c-removals branch from 88714fa to a638f28 Compare December 8, 2023 11:40

seberg mentioned this pull request Dec 10, 2023

DRAFT - Change API to prepare for heap types. #25358

Closed

seberg added 9 commits December 11, 2023 17:53

MAINT: Delete PyBigArray_Type stub

d4a0cab

No idea, I don't think this was ever used in NumPy...

API: Delete PyArray_MoveInto as it is identical to PyArray_CopyInto

e0893c9

API: Remove undocumented PyArray_Cast and PyArray_CastAnyTo

1265757

These are aliases around CopyInto and CopyAnyInto effectively, so move them to those slots.

API: Remove PyArray_TypeNumFromName

dde15a6

I have not found a single use of this. In theory, it is designed to give 3rd party extensions access to your type number. In practice, 3rd parties using C doesn't really exist and should be able to get the descriptor object via Python.

API: Delete CompareUCS4 and CompareString (nothing the stdlib can't do)

83a7ffd

We never use this, and it wasn't even documented...

API: Remove (make private) PyArray_TypestrConvert

c3cae97

How likely is it anyone needs to parse this, just use the type number directly.

MAINT: Remove no-op PyArray_SetDatetimeParseFunction

547253d

seberg added 6 commits December 11, 2023 17:53

MAINT: Rename PyArray_FillObjectArray and use it only for None

e755366

It's still terrible, but at least make it clear what it is used for...

MAINT: Update C-API version for changes

16add7e

DOC: Add release note for removals

f50689b

BUG: Mark flagsobject extern in header

b6bc9c2

seberg force-pushed the more-c-removals branch from a638f28 to b6bc9c2 Compare December 11, 2023 16:54

mattip merged commit 3f19f77 into numpy:main Dec 13, 2023

seberg deleted the more-c-removals branch December 13, 2023 22:03

jakevdp mentioned this pull request Dec 19, 2023

CI: add a build against numpy nightly jax-ml/ml_dtypes#129

Merged

Uh oh!

Uh oh!

Conversation

seberg commented Dec 1, 2023

Uh oh!

mattip commented Dec 3, 2023

Uh oh!

seberg commented Dec 3, 2023

Uh oh!

seberg commented Dec 4, 2023

Uh oh!

mattip commented Dec 4, 2023

Uh oh!

melissawm commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Dec 5, 2023

Uh oh!

rgommers commented Dec 8, 2023

Uh oh!

seberg commented Dec 8, 2023

Uh oh!

mattip commented Dec 8, 2023

Uh oh!

seberg commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Dec 8, 2023

Uh oh!

seberg commented Dec 8, 2023

Uh oh!

rgommers commented Dec 8, 2023

Uh oh!

seberg commented Dec 11, 2023

Uh oh!

seberg commented Dec 13, 2023

Uh oh!

mattip commented Dec 13, 2023

Uh oh!

jakevdp commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakevdp commented Dec 20, 2023

Uh oh!

seberg commented Dec 20, 2023

Uh oh!

jakevdp commented Dec 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

melissawm commented Dec 4, 2023 •

edited

Loading

seberg commented Dec 8, 2023 •

edited

Loading

jakevdp commented Dec 19, 2023 •

edited

Loading