Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up Modules/gcmodule.c #113688

Closed
colesbury opened this issue Jan 3, 2024 · 10 comments
Closed

Split up Modules/gcmodule.c #113688

colesbury opened this issue Jan 3, 2024 · 10 comments
Labels
3.13 new features, bugs and security fixes topic-free-threading type-feature A feature request or enhancement

Comments

@colesbury
Copy link
Contributor

colesbury commented Jan 3, 2024

Feature or enhancement

The free-threaded builds require substantial changes to the garbage collection implementation, but we don't want to make those changes to the default build. In other parts of CPython, we have used "local" #ifdef Py_GIL_DISABLED guards around free-threaded specific bits, but I think doing this throughout Modules/gcmodule.c will make the GC harder to maintain than having separate files for the free-threaded and default builds. Additionally, I think @markshannon already suggested splitting the core GC parts from the Python "gc" module.

There's definitely an increased maintenance cost for having two GC implementations in separate files, but I think this maintenance will be less than having a single file with intermixed implementations and lots of #ifdef Py_GIL_DISABLED guards.

Here's a proposed split:

Modules/gcmodule.c - gc Python package interface

Python/gc.c - core GC functionality common to both build configurations (GIL & free-threaded)
Python/gc_gil.c - GIL (non-free-threaded) specific implementations
Python/gc_free_threaded.c - free-threaded specific implementations

At first, I'd expect most code to be in either Modules/gcmodule.c or Python/gc.c. As work on the free-threaded build continues, I would expect Python/gc.c to shrink as code is moved into Python/gc_gil.c and Python/gc_free_threaded.c.

@nascheme @markshannon @pablogsal @DinoV @vstinner

Linked PRs

@colesbury colesbury added type-feature A feature request or enhancement 3.13 new features, bugs and security fixes topic-free-threading labels Jan 3, 2024
@nascheme
Copy link
Member

nascheme commented Jan 3, 2024

A single file with both versions of the GC sounds quite confusing so I favor splitting it. Your scheme looks okay to me. Possibly we need an internal header file that's shared between the .c files. The original gcmodule.c was written with the assumption that small functions could be inlined by the compiler as it wishes. With split files, an LTO build probably results in the same inlining. Otherwise, we might need some inline functions in the .h file. As with any optimization, it's probably best to do that only as required for payoff.

@pablogsal
Copy link
Member

I’m fine with the splitting as well as I really don’t think we should have both versions in the same file so let’s go ahead unless someone has some strong reservations.

Refarding the performance concer a, we do have benchmarks that are checking the performance of GC runs in the pyperformance suite so we can always monitor if changes we made are significant. I am mentioning this so we are not too conservative with the changes and being worried that they are going to affect performance a lot because at the end of the day is very difficult to actually predict how these kind of alterations are going to affect performance.

So my proposal is that we do optimise for maintainability and then check performance and adapt from there.

@gpshead
Copy link
Member

gpshead commented Jan 3, 2024

Agreed, I think this split makes sense. What y'all have already said above.

@corona10
Copy link
Member

corona10 commented Jan 4, 2024

I agree, too. Other runtime parts are still possible to maintain implementation details except caring whether GIL exists or not. (or with a small cost)

But GC is wholly different. GC should care where objects are located. For example, some objects existed in the interpreter area only, but now we should move them into the thread area. Those kinds of things require separation, and this suggestion will improve maintainability.
And if we introduce more GC implementation details(STW, deferred refcount) in a single file, I can not imagine what will happen :)

@pablogsal
Copy link
Member

pablogsal commented Jan 4, 2024

But GC is wholly different. GC should care where objects are locate

Technically, the current one doesn't care, and that was an advantage. The new one does, which is a different advantage ;)

@colesbury
Copy link
Contributor Author

Great - it seems like there's consensus on splitting the files.

@nascheme, I was thinking that maybe Python/gc.c could #include one of Python/gc_gil.c or Python/gc_free_threaded.c. Something like:

// in Python/gc.c
#ifdef Py_GIL_DISABLED
#  include "gc_free_threaded.c"
#else
#  include "gc_gil.c"
#endif

We already do this in a few places, with clinic generated files, in Modules/socketmodule.c, obmalloc with mimalloc files, and a few other modules.

It's a bit icky, but it avoids both the concerns around inlining and complexity with configuring the build system to include one or the other file since that's handled at the macro pre-processor level.

@nascheme
Copy link
Member

nascheme commented Jan 4, 2024

I'd prefer not to do that include of .c files. It might make debugging harder, for example. I'd include both files in the build always and then put ifdefs inside of them to disable code if needed. That avoids needing extra logic when generating the makefile or whatever.

@gpshead
Copy link
Member

gpshead commented Jan 4, 2024

This is something we should be able to adjust the how of based on practical experiences if the initial approach winds up causing any issues. Debuggers know which file any given line of code came from, I'm not worried at all about that for #includes.

@colesbury
Copy link
Contributor Author

@nascheme - that sounds fine. I'll follow that approach.

colesbury added a commit to colesbury/cpython that referenced this issue Jan 4, 2024
This splits part of Modules/gcmodule.c of into Python/gc.c, which
now contains the core garbage collection implementation. The Python
module remain in the Modules/gcmodule.c file.
colesbury added a commit to colesbury/cpython that referenced this issue Jan 4, 2024
This splits part of Modules/gcmodule.c of into Python/gc.c, which
now contains the core garbage collection implementation. The Python
module remain in the Modules/gcmodule.c file.
colesbury added a commit to colesbury/cpython that referenced this issue Jan 5, 2024
nascheme pushed a commit that referenced this issue Jan 5, 2024
This splits part of Modules/gcmodule.c of into Python/gc.c, which
now contains the core garbage collection implementation. The Python
module remain in the Modules/gcmodule.c file.
kulikjak added a commit to kulikjak/cpython that referenced this issue Jan 8, 2024
@kulikjak
Copy link
Contributor

kulikjak commented Jan 8, 2024

This change broke --with-dtrace build on Solaris (or other systems where object files need to be modified with dtrace before linking) with error like this:

Undefined			first referenced
 symbol  			    in file
__dtraceenabled_python___gc__start  Python/gc.o
__dtraceenabled_python___gc__done   Python/gc.o
__dtrace_python___gc__start         Python/gc.o
__dtrace_python___gc__done          Python/gc.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status

#113814 fixes the issue.

gpshead pushed a commit that referenced this issue Jan 8, 2024
(the gcmodule -> gc refactoring broke it)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 new features, bugs and security fixes topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants