Issue535905
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-03-27 20:47 by gvanrossum, last changed 2022-04-10 16:05 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tup.py | tim.peters, 2002-03-27 21:04 | Test case to provoke the crash | ||
| trashcan_gc.diff | nascheme, 2002-03-27 22:18 | |||
| lis.py | gvanrossum, 2002-03-28 01:17 | |||
| trash-2.1.2.diff | gvanrossum, 2002-03-28 02:15 | Patch for 2.1.2 | ||
| ceval.c.diff | gvanrossum, 2002-03-28 18:21 | Separate fix for SETLOCAL() race in ceval.c | ||
| Messages (17) | |||
|---|---|---|---|
| msg10060 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-27 20:47 | |
This bug is only a hypothetical bug -- from source code inspection it appears clear that it can happen, and we have a Zope user who is reporting crashes that are consistent with this, but it isn't confirmed that this is indeed the cause. Tim is attempting to reproduce it while I'm typing this. The trashcan code stores NULL or other values in the ob_type field of an object, while that object is still in the GC doubly-linked list of container objects. When the collector is invoked, it dereferences ob_type (specifically, in subtract_refs, it retrieves ob_type- >tp_traverse and calls it). Obviously, the value stored in ob_type by the trashcan code causes this to fail. This bug occurs in all versions of Python that have the GC code enabled (i.e. 2.0 and above). In 2.1 and before, the code looks different than in 2.2, but the effect is the same: the trashcan code abuses ob_type of an object that's in a GC generational chain. If we fix this, a 2.1.3 release including the fix should be issued, and we should hold up the final release of 2.2.1 to include a fix as well. |
|||
| msg10061 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2002-03-27 21:04 | |
Logged In: YES user_id=31435 Attaching a brief program (tup.py) that provokes a crash quickly, at least on Windows. The crash goes away if the trashcan limit is boosted above the depth of tuple nesting this program creates. |
|||
| msg10062 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-27 21:55 | |
Logged In: YES user_id=35752 The fix should be simple. Coming soon. :-) |
|||
| msg10063 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-27 22:18 | |
Logged In: YES user_id=35752 Well, not as simple as I had hoped. Attached is a rough patch that fixes the bug (I think, I haven't had time to really study the interaction between GC and the trashcan macros). |
|||
| msg10064 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 01:17 | |
Logged In: YES user_id=6380 Alas, that fixes it for tuples, but not for lists. I've uploaded the list example as lis.py. We also need a fix for 2.1.2 (to help the Zope user who originally ran into this) and one for 2.2.1 -- the latter is needed quickly if we want to make the final release. |
|||
| msg10065 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 02:15 | |
Logged In: YES
user_id=6380
Here's a patch for the 2.1 maintenance branch. AFAICT this
fixes the problem for lists, tuples and dicts, which is a
100% fix (the other trashcan objects, frames and tracebacks,
don't use GC).
Note: the dict test case is just like the tuple or list
testcase, but has
t = {1: t, 2: Ouch()}
as the last line of f().
|
|||
| msg10066 - (view) | Author: Matthew T. Kromer (zerohero) | Date: 2002-03-28 14:10 | |
Logged In: YES user_id=378059 Leo in Brazil (the Zope customer experiencing the problem) has bumped the trashcan depth limit up to 5 million, and been stable for the entire overnight and morning production periods. He confirms the description and I believe he will be trying the patch today. |
|||
| msg10067 - (view) | Author: Christian Tismer (old) (tismer) | Date: 2002-03-28 14:20 | |
Logged In: YES user_id=105700 Ouchh! I will look into this and find an alternative implementation. My current abuse of the type field interacts with that fact that GC still needs it. We just moved into a new house, I will not be online often over Easter. Will sove it in the coming week. ciao - chris |
|||
| msg10068 - (view) | Author: Christian Tismer (old) (tismer) | Date: 2002-03-28 14:20 | |
Logged In: YES user_id=105700 Ouchh! I will look into this and find an alternative implementation. My current abuse of the type field interacts with that fact that GC still needs it. We just moved into a new house, I will not be online often over Easter. Will sove it in the coming week. ciao - chris |
|||
| msg10069 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-28 15:58 | |
Logged In: YES user_id=35752 I'm not sure the abuse of the ob_type pointer is the only problem we are facing here. I've have a patch that changes trashcan so that it doesn't overwrite the ob_type pointer. I can still make the interpreter crash. I'm still trying to understand this problem completely. :-( |
|||
| msg10070 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 16:28 | |
Logged In: YES user_id=6380 Well, it *also* abuses the ob_refcnt field. How about this wild idea (which Tim & I had simultaneously yesterday): change the trashcan code to simply leave the object in the GC generational list, and let the GC code special-case objects with zero refcnt so that they are eventually properly disposed? |
|||
| msg10071 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 18:21 | |
Logged In: YES user_id=6380 Looks like there are at least two independent bugs exposed by the lis.py example in the CVS trunk/head: on Windows, traversing the frame object can encounter freed object because of the way the SETLOCAL() macro first decrefs fastlocals[i] and then assigns it a new value. I've got a fix for this (separate, see ceval.c.diff attached). On Linux, the traceback points to the brand new semaphore code; haven't researched this further yet. |
|||
| msg10072 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 19:32 | |
Logged In: YES user_id=6380 The new semaphore code is innocent -- it wasn't even used in this case. I cannot reproduce the bug with Neil's patch + my ceval.c patch any more, so I consider this case closed. I'll check the necessary things in to the 2.1 and 2.2 maintenance branches. Neil, can you check in your patch to the trunk? |
|||
| msg10073 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2002-03-28 20:44 | |
Logged In: YES user_id=6380 Closing this; fixes have been checked in. Maybe someone (Tim?) can check in the test case? Only the trunk seems sufficient. Christian: no need for a new trashcan IMO. |
|||
| msg10074 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2002-03-28 22:10 | |
Logged In: YES user_id=31435 Just noting that I added a test case to test_gc.py in the trunk. |
|||
| msg10075 - (view) | Author: Neil Schemenauer (nascheme) * ![]() |
Date: 2002-03-28 22:58 | |
Logged In: YES user_id=35752 I'm working on the trashcan code fix right now (use gc_prev instead of ob_type). I was going to check it in but an interpreter built --without-cycle-gc and with Py_DEBUG defined crashes. I don't know if our recent changes broke things of if --without-cycle-gc has been broken for a while. |
|||
| msg10076 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2002-03-28 23:28 | |
Logged In: YES user_id=31435 Sorry, Neil, no clue -- PythonLabs likely builds every day in release and debug (Py_DEBUG) modes, each using all platform defaults, but I don't believe any of us ever builds in any other way. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-10 16:05:10 | admin | set | github: 36341 |
| 2002-03-27 20:47:53 | gvanrossum | create | |
