Skip to content

Conversation

@ngoldbaum
Copy link
Member

Fixes #28681.

Hopefully we can create a PyUnstable C API that NumPy can use to replace the use of CPython internals. I'd like to merge this for now to enable downstream 3.14 testing without disabling temporary elision.

Also updates tests to use refcount deltas instead of precise refcount values, allowing the tests to pass on all supported Python versions. There's one test that already uses refcount deltas but still has different results on 3.14 and 3.13 I don't understand, but all the other changes make sense given upstream changes to internal refcounting semantics.

Also fixes a ResourceWarning that in 3.14 is now enough for pytest to treat as an error.

Finally, adds CI for 3.14 and 3.14t on the linux smoke tests. I think we're close enough to 3.14 beta 1 that it shouldn't be too noisy to add this, but we can also hold off until beta 1 happens.

if sys.version_info < (3, 14):
assert_(newrc == rc - 1)
else:
assert_(newrc == rc)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this changed - maybe @seberg or @mhvk might have some insight?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm even confused why it worked. I guess the idea is that once you take out ob you have a new reference to it, as well as the existing one in the buffer array. Then, the in-place addition means that the object in the buffer array is replaced with a new object, so you get one less.

I guess on 3.14, the reference count does not get increased by taking the object out? I fear I'm as confused as you... Maybe some inspection along the way can help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has something to do with finalizers and NpyIter_Close. If I move the HAS_REFCOUNT check inside the with block, things work consistently. See f6ce63b.

@mattip it looks like you added the iterator-as-context-manager feature back in 2018. Maybe you have context about whether this refcounting behavior change is something to worry about.

@ngoldbaum
Copy link
Member Author

I ran into issues with setup-uv on 3.14, so switched this workflow back to setup-python.

See astral-sh/uv#12950.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think the changes from absolute to relative are better also just for understanding what actually is going on. Unfortunately, no understanding of the puzzle you mention inline -- only a suggestion to increase the number of CI runs by one instead of two...

strategy:
matrix:
version: ["3.11", "3.12", "3.13", "3.13t"]
version: ["3.11", "3.12", "3.13", "3.13t", "3.14-dev", "3.14t-dev"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since free-threading is still rapidly evolving, perhaps remove 3.13t already? It seems unlikely it would test anything not done by 3.14t-dev.

if sys.version_info < (3, 14):
assert_(newrc == rc - 1)
else:
assert_(newrc == rc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm even confused why it worked. I guess the idea is that once you take out ob you have a new reference to it, as well as the existing one in the buffer array. Then, the in-place addition means that the object in the buffer array is replaced with a new object, so you get one less.

I guess on 3.14, the reference count does not get increased by taking the object out? I fear I'm as confused as you... Maybe some inspection along the way can help?

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ngoldbaum , the necessary test suite changes provide a nice bit of concrete context for #28681

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me now. Being inside the with statement does feel more secure regardless.

assert_(sys.getrefcount(ob) == rc - 1)
if HAS_REFCOUNT:
newrc = sys.getrefcount(ob)
assert_(newrc == rc - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthfully, I am not sure what the refcount part of this test is supposed to be testing. It seems to be checking some internal state that I am not sure is really part of the API of iterators.

Copy link
Member

@seberg seberg Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to check cleanup witihn the buffered casting. The buffer cleanup probably happens either way as the iterator is exhausted, but there should be no reason to move it inside and if anything it might be wrong.

Did this really fail? The test seems OK as it was on first sight? Unless Python re-uses the large integer more agressively now? Maybe try adding some large number instead of the += 1?

EDIT: To explain, a contains no objects, but the iterator is set up with O, so buffers for O are created and must be cleaned up again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this really fail? The test seems OK as it was on first sight? Unless Python re-uses the large integer more agressively now? Maybe try adding some large number instead of the += 1?

EDIT: To explain, a contains no objects, but the iterator is set up with O, so buffers for O are created and must be cleaned up again.

I tried adding a large integer, it still seems to fail:

_______________________________________________ test_iter_object_arrays_conversions _______________________________________________

    def test_iter_object_arrays_conversions():
        # Conversions to/from objects
        a = np.arange(6, dtype='O')
        i = nditer(a, ['refs_ok', 'buffered'], ['readwrite'],
                        casting='unsafe', op_dtypes='i4')
        with i:
            for x in i:
                x[...] += 1
        assert_equal(a, np.arange(6) + 1)

        a = np.arange(6, dtype='i4')
        i = nditer(a, ['refs_ok', 'buffered'], ['readwrite'],
                        casting='unsafe', op_dtypes='O')
        with i:
            for x in i:
                x[...] += 1
        assert_equal(a, np.arange(6) + 1)

        # Non-contiguous object array
        a = np.zeros((6,), dtype=[('p', 'i1'), ('a', 'O')])
        a = a['a']
        a[:] = np.arange(6)
        i = nditer(a, ['refs_ok', 'buffered'], ['readwrite'],
                        casting='unsafe', op_dtypes='i4')
        with i:
            for x in i:
                x[...] += 1
        assert_equal(a, np.arange(6) + 1)

        #Non-contiguous value array
        a = np.zeros((6,), dtype=[('p', 'i1'), ('a', 'i4')])
        a = a['a']
        a[:] = np.arange(6) + 98172488
        i = nditer(a, ['refs_ok', 'buffered'], ['readwrite'],
                        casting='unsafe', op_dtypes='O')
        with i:
            ob = i[0][()]
            if HAS_REFCOUNT:
                rc = sys.getrefcount(ob)
            for x in i:
                x[...] += 123456789
        if HAS_REFCOUNT:
            newrc = sys.getrefcount(ob)
>           assert_(newrc == rc - 1)
E           AssertionError

a          = array([221629277, 221629278, 221629279, 221629280, 221629281, 221629282],
      dtype=int32)
i          = <numpy.nditer object at 0x106d01ed0>
newrc      = 2
ob         = 98172488
rc         = 2
x          = array(None, dtype=object)

Do you happen to know where the buffer cleanup is implemented? I'm not terribly familiar with the nditer internals and unfortunately I don't have time right now to bisect to figure out where the different behavior is triggered.

Copy link
Member

@seberg seberg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoldbaum I know where it's implemented, but I suspect it is a red herring.

I.e. NumPy fills the buffer on construction here, meaning we should see an initial refcount of 2+1 and it should clean during the iteration (if not, then it should clean up on with statement exit).

Could it be that sys.getrefcount() gets a new reference (the +1) sometimes and sometimes it doesn't!? I.e. Python doing fancy optimization that makes it very hard to predict what sys.getrefcount() returns?!

In which case, I have to ask whether Python should do something about that, at least document clearly in sys.getrefounct() when you see a +1 and when you don't. right now that documentation still says:

Return the reference count of the object. The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().

which, I admit, says "generally", but before it was always right, while it seems now almost always wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if trying to use sys.getrefcount(obj) is becoming too weak a proxy for what the test is actually checking. Is there a way to add something to the cleanup call to directly verify that the cleanup is happening?

Copy link
Member

@seberg seberg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows the behavior:

import sys
from contextlib import nullcontext

l = list(range(500000, 5000005))

def test(l):
    with nullcontext():
        obj = l[1]
        print(sys.getrefcount(obj))  # 2

    print(sys.getrefcount(obj))  # 3


test(l)

running with python3.14t will print 2 and 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Actually, adding obj = None before the with statement makes it print 2 in both cases.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. Kinda feels like a reference counting bug somewhere in CPython.

Copy link
Contributor

@kumaraditya303 kumaraditya303 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a bug but a perf optimization to avoid incref where the reference of frame surely outlives the ref of the operand stack.

import sys
from contextlib import nullcontext

l = list(range(500000, 5000005))

def test(l):
    with nullcontext():
        obj = l[1] 
        print(sys.getrefcount(obj)) # uses LOAD_FAST_BORROW so it doesn't increase the refcount as obj cannot be unbound

    print(sys.getrefcount(obj)) # uses LOAD_FAST_CHECK as obj can be unbound so it increases the refcount 


test(l)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, well, NumPy runs into it once, so maybe it is rather niche. Not sure if sys.getrefcount can/should explain or even account for this.

// that an object is a unique temporary variable. We scan the top of the
// interpreter stack to check if the object exists as an "owned" reference
// in the temporary stack.
PyFrameObject *frame = PyEval_GetFrame();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use _PyEval_GetFrame to get _PyInterpreterFrame directly here of the current frame without having to materialize a Python level frame object, this would also be more performant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like _PyEval_GetFrame isn't exposed in the public ABI, at least as of CPython commit 70b322d:

dlopen(/Users/goldbaum/Documents/numpy/build-install/usr/lib/python3.14/site-packages/numpy/_core/_multiarray_umath.cpython-314-darwin.so, 0x0002): symbol not found in flat namespace '__PyEval_GetFrame'

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve, now that I understand why this is moved into the with, I think it is a meaningless enough change.

Was PY_VERSIO_HEX enough to not use this in alternative implementations? (But we can fix it when it comes up.)

assert_(sys.getrefcount(ob) == rc - 1)
if HAS_REFCOUNT:
newrc = sys.getrefcount(ob)
assert_(newrc == rc - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Actually, adding obj = None before the with statement makes it print 2 in both cases.)

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Apr 22, 2025

Was PY_VERSIO_HEX enough to not use this in alternative implementations? (But we can fix it when it comes up.)

It looks like elsewhere we also guard with && !defined(PYPY_VERSION). I'll go ahead and do that here, no reason to leave behind a time bomb for when (if?) pypy adds 3.14 support.

@ngoldbaum ngoldbaum merged commit 8df09df into numpy:main Apr 22, 2025
73 checks passed
@seberg
Copy link
Member

seberg commented Nov 5, 2025

We were just debugging something in sunpy that ended up being due this not existing on the old 2.2.6 release.

If this is easy (and I don't know it is). We could jank the 2.2.6 and make a 2.2.6.post1 release without the Python 3.14 wheels.

(The reason to find this is that there was a bad <2.3 numpy pin, but...)

@ngoldbaum
Copy link
Member Author

Sorry, I'm confused what the problem is. Can you link to the sunpy discussion?

I'm not sure how disruptive it would be to yank wheels that have been up for a while and not replace them on 3.14. I could definitely see someone pinning that somewhere.

@seberg
Copy link
Member

seberg commented Nov 5, 2025

Yeah, we would at least need a .post1 I think and that is painful too.

The discussion isn't online, sorry. But their Python 3.14 test suite is failing on one test (that we also have in our CI). And we narrowed it down that it must be because of their CI picking up NumPy 2.2.6.

@ngoldbaum
Copy link
Member Author

Maybe raise it on the mailing list so others see?

@seberg
Copy link
Member

seberg commented Nov 8, 2025

OK, this is a bit stranger than I thought. @Cadair just FYI, the NumPy release doesn't advertise support for Python 3.14, so I think we should ignore this. It seems your CI must have downloaded and compiled NumPy from scratch. There are no actual wheels for Python 3.14 and NumPy 2.2.x.

MaxGhenis added a commit to MaxGhenis/policyengine-core that referenced this pull request Dec 2, 2025
NumPy 2.1.x has a bug on Python 3.14 where "temporary elision" optimization
incorrectly identifies arrays as temporary due to the new LOAD_FAST_BORROW
bytecode instruction, causing destructive in-place modifications.

This caused dividend_income_tax to return £0 instead of £22bn in UK
microsimulations.

Fix: numpy/numpy#28748 (released in NumPy 2.3.0)
Issue: numpy/numpy#28681

Closes PolicyEngine#407

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
MaxGhenis added a commit to PolicyEngine/policyengine-core that referenced this pull request Dec 3, 2025
* Allow numpy 2.3+ for Python 3.14 compatibility

NumPy 2.1.x has a bug on Python 3.14 where "temporary elision" optimization
incorrectly identifies arrays as temporary due to the new LOAD_FAST_BORROW
bytecode instruction, causing destructive in-place modifications.

This caused dividend_income_tax to return £0 instead of £22bn in UK
microsimulations.

Fix: numpy/numpy#28748 (released in NumPy 2.3.0)
Issue: numpy/numpy#28681

Closes #407

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add changelog entry for numpy Python 3.14 fix

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: temporary elision heuristics are broken on Python 3.14

6 participants