Fixup a refnanny-related performance regression by da-woods · Pull Request #5685 · cython/cython

da-woods · 2023-09-06T21:12:34Z

Originally reported in scikit-learn/scikit-learn#27086

Essentially:

I'd made functions raising an exception require refnanny even since prange/parallel block exception handling required it
I actually don't think this is sufficient and that there's other ways of raising an exception within a parallel block.
The upshot is that any function with a call to a function with a checked exception now requires refnanny.
For functions with object arguments (i.e. any cdef class method) that leads to the GIL always being acquired around the refnanny setup (even when refnanny is disabled).

I've fixed it by avoiding using refnanny in parallel blocks unless we know it's available. I think it's too hard to tell reliably ahead of time (at least for me).

I think the have_object_args and has_with_gil_block criteria for acquiring the GIL around function definitions might be excessive, and possibly unnecessary some of the time. It's possibly mainly reassigned object args. This might be worth refactoring but I don't think it's necessary to fix this particular performance regression.

Closes #5700

See scikit-learn/scikit-learn#27086 Essentially: * I'd made functions raising an exception require refnanny even since prange/parallel block exception handling required it * I actually don't think this is sufficient and that there's other ways of raising an exception within a parallel block. * The upshot is that *any* function with a call to a function with a checked exception now requires refnanny. * For functions with object arguments (i.e. any cdef class method) that leads to the GIL always being acquired around the refnanny setup (even when refnanny is disabled). I've fixed it by avoiding using refnanny in parallel blocks unless we know it's available. I think it's too hard to tell reliably ahead of time (at least for me). I think the `have_object_args and has_with_gil_block` criteria for acquiring the GIL around function definitions might be excessive, and possibly unnecessary some of the time. It's possibly mainly reassigned object args. This might be worth refactoring but I don't think it's necessary to fix this particular performance regression.

scoder · 2023-09-07T16:28:26Z

I haven't seen the resulting code, but if this fixes the problem at hand, it's probably ok for the time being.

If the issue is acquiring the GIL uselessly at call time just for the (unused) refnanny, then

__Pyx_RefNannySetupContext() can acquire the GIL itself at need, it just needs 1 as second argument
we might be able to generate the refnanny setup and the entry acquire-GIL/release-GIL dance in retrospect after generating the whole function, when we know if we actually needed it along the way or not.

da-woods · 2023-09-08T07:08:43Z

we might be able to generate the refnanny setup and the entry acquire-GIL/release-GIL dance in retrospect after generating the whole function, when we know if we actually needed it along the way or not.

Yes - that's probably the way to do it for both of them. Unless there's another release imminent I'll try to do that rather than merging this right now.

__Pyx_RefNannySetupContext() can acquire the GIL itself at need, it just needs 1 as second argument

Yes - and that'd presumably be a non-op without the refnanny so be reasonably OK. I think the issue was a combination of refnanny + entry-require/release code that wasn't actually needed.

djhoese · 2023-09-08T15:39:48Z

Sorry for commenting on this pull request, but I was working on a MCVE so I could file a bug report for what I think this PR is trying to fix. I don't know enough about the ref nanny, but it sounds like if I have a class method, then even if it is nogil the ref nanny will still need to be created and will usually (?) need the GIL to be created?

I have a decently simple method that accepts a ton of memory views and doesn't return anything (void), but it does call other nogil methods of the class. With just this basic description, does this sound like a place where the ref nanny is needed and should acquire the GIL. The generated C has this at the top of the declaration:

static void __pyx_fuse_1__pyx_f_12geotiepoints_19_modis_interpolator_17MODISInterpolator__get_atrack_ascan(struct __pyx_obj_12geotiepoints_19_modis_interpolator_MODISInterpolator *__pyx_v_self, __Pyx_memviewslice __pyx_v_satz, __Pyx_memviewslice __pyx_v_x, __Pyx_memviewslice __pyx_v_y, __Pyx_memviewslice __pyx_v_c_exp_coarse, __Pyx_memviewslice __pyx_v_c_ali_coarse, __Pyx_memviewslice __pyx_v_c_exp_fine, __Pyx_memviewslice __pyx_v_c_ali_fine, __Pyx_memviewslice __pyx_v_a_track, __Pyx_memviewslice __pyx_v_a_scan) {
  __Pyx_memviewslice __pyx_v_satz_a_view = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_satz_b_view = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_c_exp_view2 = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_c_ali_view2 = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice __pyx_t_1 = { 0, 0, { 0 }, { 0 }, { 0 } };
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  #ifdef WITH_THREAD
  PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_RefNannySetupContext("__pyx_fuse_1_get_atrack_ascan", 0);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif

If this GIL acquire is expected, then if I extracted this method to be outside of the class would you expect the GIL acquire and ref nanny to be unnecessary?

da-woods · 2023-09-08T16:29:50Z

@djhoese that looks like exactly the same bug and should be fixed by this PR. Thanks for your efforts towards preparing a good bug report though, even if not needed

djhoese · 2023-09-08T16:42:39Z

Ok so this is a bug. Good because I'm having trouble recreating it in a simple example.

djhoese · 2023-09-08T16:56:19Z

FYI I tried installing this PR and running my code through it and the Ensure/Release around the __Pyx_RefNannySetupContext is still there.

da-woods · 2023-09-08T17:28:23Z

FYI I tried installing this PR and running my code through it and the Ensure/Release around the __Pyx_RefNannySetupContext is still there.

I'm going to have a bit of a rewrite of this PR with a slightly more intelligent approach. So possibly wait for that before spending too much time filing a separate bug.

djhoese · 2023-09-08T17:32:24Z

Yeah I've been walking through the Node.py code and figuring out what the difference is between my test code that doesn't reproduce the issue and the realworld code that does. Currently getting lost in if self is considered a python argument and if that is different in my test code.

scoder · 2023-09-09T10:26:44Z

We have neither a ticket nor a reproducing test, though. I think both would help.

djhoese · 2023-09-09T15:20:34Z

Ok I tracked it down quite a bit. Here's the example:

# cython: language_level=3, boundscheck=False, cdivision=True, wraparound=False, initializedcheck=False, nonecheck=False
cimport cython
import numpy as np
cimport numpy as np

np.import_array()

def test_func():
    cdef np.ndarray[float, ndim=2] arr = np.zeros((5, 5), dtype=np.float32)
    cdef float[:, ::1] arr_view = arr
    t = Test(5.0)
    t.call_me(arr_view)


cdef class Test:

    cdef float _a

    def __cinit__(self, float a):
        self._a = a

    cdef void call_me(self, float[:, ::1] my_arr) noexcept:
        with nogil:
            self._call_me(my_arr)

    cdef void _call_me(self, float[:, ::1] my_arr) noexcept nogil:
        cdef Py_ssize_t idx
        cdef float[:, :] my_arr2 = _get_upper_left_corner(my_arr)
        for idx in range(my_arr2.shape[0]):
            my_arr2[idx, 0] = self._a


cdef inline float[:, :] _get_upper_left_corner(float[:, ::1] arr) noexcept nogil:
    return arr[:3, :3]

This is a version from my other recent issues like #5670 and the original SO question here: https://stackoverflow.com/questions/76972950/cython-returned-memoryview-is-always-considered-uninitialized/76974615

The result of those discussions was that declaring a C function that returns a memory view as noexcept was pointless because there would always be some exception handling on the error case of slicing/clearing/initializing the memoryviews. However, in the above example if you remove noexcept from _get_upper_left_corner then the function is marked as having exception handling so it hits this block:

cython/Cython/Compiler/ExprNodes.py

Lines 6345 to 6349 in dd62cba

    
           if (func_type.exception_value is not None or 
        
                   func_type.exception_check and func_type.exception_check != '+'): 
        
               if env.nogil: 
        
                   # need to generate refnanny for testing because of exception check 
        
                   env.has_with_gil_block = True

Setting has_with_gil_block which makes acquire_gil_for_var_decls_only True here:

https://github.com/da-woods/cython/blob/81db672ab7ab6510ef6f761f1ce481d731b8b583/Cython/Compiler/Nodes.py#L2066-L2068

I'm still not entirely sure why this generates the C code exactly as it does (GIL ensure/release only wrapping a refnanny), but my guess is that it is ensuring the GIL for variable declarations but I don't have any variables that need it here. It thinks that with the with gil (that doesn't actually exist) and the self argument as a Python object that it needs to declare variables with the GIL...I think. About here in the code:

https://github.com/da-woods/cython/blob/81db672ab7ab6510ef6f761f1ce481d731b8b583/Cython/Compiler/Nodes.py#L2066-L2079

Then it does the refnanny because it is a nogil function and thinks it has a with gil block internally. Then when it is done "declaring variables" and the ref nanny it releases the GIL. There just weren't actually any variables that needed to be declared. At least that's my guess.

scoder · 2023-09-20T09:52:37Z

Cython/Compiler/Code.py

+        if type.refcounting_tracked_by_refnanny and self.funcstate:
+            self.funcstate.needs_refnanny = True


Wow, this code multiplication is awful. Why not at least move this into PyObjectType? In the methods there, you could just always say code.funcstate.needs_refnanny = nanny/True, without further conditions.

That said, if these refcounting code bits are really the only point where we can take this decision, then the granularity is ok. But the question remains if we need to decide it here, at the very bottom of the code generation. Isn't it enough to check the types of named variables and temp variables after generating the function? Or, if we have local (parallel?) temp scoping for some reason, let the code funcstate remember whether it created any Python object temps along the way, and check for that flag.

You probably put a lot of thoughts and effort into this, so, could you explain why keeping the decision in the function generation code (rather than the low-level refcounting code) can't work?

Wow, this code multiplication is awful. Why not at least move this into PyObjectType? In the methods there, you could just always say code.funcstate.needs_refnanny = nanny/True, without further conditions.

That would be better - I'd missed that as an option. I'll do that

But the question remains if we need to decide it here, at the very bottom of the code generation. Isn't it enough to check the types of named variables and temp variables after generating the function? Or, if we have local (parallel?) temp scoping for some reason, let the code funcstate remember whether it created any Python object temps along the way, and check for that flag.

That's what we did before, and it mostly works. I think we'd mainly need to add a flag that things creating local Python variables can set. Where our current code fails a little is for Python object arguments that aren't reassigned (e.g. self quite frequently) which don't need reference counting, but get considered like they do. The nice thing about just tracking all decref/gotref/incref/etc is that it should give the right answer.

scoder · 2023-09-28T07:01:24Z

Cython/Compiler/PyrexTypes.py

+        if code.funcstate:
+            code.funcstate.needs_refnanny = code.funcstate.needs_refnanny or nanny


Suggested change

if code.funcstate:

code.funcstate.needs_refnanny = code.funcstate.needs_refnanny or nanny

if code.funcstate and nanny:

code.funcstate.needs_refnanny = True

funcstate looks to be unset in the module deallocation functions. nanny is set to false though, so testing it first is enough to not need the funcstate check I think.

scoder · 2023-09-28T07:06:04Z

Cython/Compiler/PyrexTypes.py


    def generate_incref(self, code, cname, nanny):
        if nanny:
+            if code.funcstate:


In which case do we not have a funcstate? Maybe we should get rid of that case instead?
I can't think of a place where we generated increfs outside of user/wrapper functions or the module init function.

scoder · 2023-09-28T07:36:36Z

tests/run/gil_in_var_initialization_tests.pyx

+        // Note - I'm using a 5 second timeout here to avoid the test
+        // being "deadlock forever". This could occasionally give false
+        // failures if things are running really slowly


Yeah, that would be annoying. I dislike uncontrolled concurrency in tests for pretty much this reason. Threads are one of those tools that programmers easily consider appropriate and that rarely are. Especially if test failures necessarily make the code run into timeouts, rather than failing timely.

Some people, when confronted with a problem, say "I know, I'll use threads". And now problems two have they. :)

Can't we use a C code validation check that errors if it finds any refnanny usage before the module init function, or something like that? That seems doable with a regex. (Which, admittedly, is the other tool which gives you two problems from one for free…)

We could also extend the C code validation to allow start and end markers, to match a specific function or section. sed uses /startregex/,/endregex/ command, for example, where start and end are optional. For our use case, we could use something like /startregex/,/endregex/ innerregex, and fail if the inner regex is (not) found, as well as if start or end are not found (because we always expect those to be present either way).

Improved C code validation is implemented in 09bb9e1

Pattern format is /start/:/end/pattern, or something like /start/ : /end/ pattern if you prefer adding whitespace. Both start and end are optional.

I'll add a C code validation test (but not today...).

I'd like to leave this test in though - I think it is the "right" test and making sure the actual problem doesn't recur.. I think the only thing that is non-deterministic is the slight chance it takes more than 5 seconds. But realistically that's a huge overestimate of the amount of time it needs

I've added some C code tests. I think we should skip the more complicated patterns on annotated HTML though - they seem almost impossible to make work there

Cython/Compiler/Nodes.py

Co-authored-by: scoder <stefan_ml@behnel.de>

scoder · 2023-09-29T10:12:41Z

Thanks for fixing this. It's sad to have performance regressions in Cython 3.0 compared to 0.29.x that keep people from adopting it.

scoder · 2023-10-01T10:03:19Z

There is a bug in the logic for closure functions. We generate the "ensure GIL" part (due to var_decls_definitely_need_gil), but not the "release GIL" part. You can see the unused variable '__pyx_gilstate_save' messages in the test logs. We incref the closure reference, so it's actually needed, but it's never cleaned up (unless lenv.nogil is set, which shouldn't be the case for closure functions).

scoder · 2023-10-02T00:54:27Z

There is a bug in the logic for closure functions.

Resolved in #5738

djhoese mentioned this pull request Sep 8, 2023

[BUG] "__pyx_memoryview_slice_memviewslice" always acquires GIL #5688

Closed

djhoese mentioned this pull request Sep 8, 2023

Upgrade to Cython 3+ in building pytroll/python-geotiepoints#57

Merged

5 tasks

da-woods added 3 commits September 10, 2023 11:21

Generate refnanny code on demand

398b7b2

Added tests

641b468

Code-style

f442d22

matusvalo added performance Code Generation labels Sep 10, 2023

matusvalo added this to the 3.0.3 milestone Sep 10, 2023

da-woods mentioned this pull request Sep 13, 2023

[BUG] Gil required but seems unnecessary #5700

Closed

scoder reviewed Sep 20, 2023

View reviewed changes

da-woods mentioned this pull request Sep 20, 2023

PERF Regression in HistGradientBoostingClassifier with cython 3 scikit-learn/scikit-learn#27426

Closed

Move needs_refnanny logic into PyObjectType

3518acb

rhshadrach mentioned this pull request Sep 22, 2023

DEP: Use Cython 3.0 pandas-dev/pandas#55179

Merged

5 tasks

scoder reviewed Sep 28, 2023

View reviewed changes

da-woods and others added 5 commits September 28, 2023 21:35

Update Cython/Compiler/Nodes.py

231f0b9

Co-authored-by: scoder <stefan_ml@behnel.de>

I think we can get away without code funcstate

2e39b69

Add C code matching tests and fix matching

9126dca

Merge remote-tracking branch 'real_origin/master' into refnanny-nogil

fff1169

Fix tests

48cb192

scoder approved these changes Sep 29, 2023

View reviewed changes

scoder merged commit b622247 into cython:master Sep 29, 2023

da-woods deleted the refnanny-nogil branch September 29, 2023 10:20

		if type.refcounting_tracked_by_refnanny and self.funcstate:
		self.funcstate.needs_refnanny = True

		if code.funcstate:
		code.funcstate.needs_refnanny = code.funcstate.needs_refnanny or nanny

Uh oh!

Conversation

da-woods commented Sep 6, 2023 • edited by scoder Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scoder commented Sep 7, 2023

Uh oh!

da-woods commented Sep 8, 2023

Uh oh!

djhoese commented Sep 8, 2023

Uh oh!

da-woods commented Sep 8, 2023

Uh oh!

djhoese commented Sep 8, 2023

Uh oh!

djhoese commented Sep 8, 2023

Uh oh!

da-woods commented Sep 8, 2023

Uh oh!

djhoese commented Sep 8, 2023

Uh oh!

scoder commented Sep 9, 2023

Uh oh!

djhoese commented Sep 9, 2023

Uh oh!

scoder Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

da-woods Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

scoder Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

da-woods Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scoder Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

scoder Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

scoder Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

da-woods Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

da-woods Sep 29, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scoder commented Sep 29, 2023

Uh oh!

scoder commented Oct 1, 2023

Uh oh!

scoder commented Oct 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

da-woods commented Sep 6, 2023 •

edited by scoder

Loading

da-woods Sep 28, 2023 •

edited

Loading

da-woods Sep 28, 2023 •

edited

Loading