Detect and log objects being cleaned up by the Python cyclic garbage collector, and remove several more reference cycles#11499
Conversation
… collector. Includes COM pointers, and most NVDA-specific classes (AutoPropertyObject, _eventExecutor, WavePlayer, sayAll readers).
…dded by MozillaCompoundTextInfo, in order to break a reference cycle.
…hin an exception caluse where the Exception is exposed as a local variable within the clause.
…ce cycle when there is an exception when executing an event.
LeonarddeR
left a comment
There was a problem hiding this comment.
Looks good to me
I think there are additional objects that can benefit from being a subclass of TrackedObject.
- updateCheck.UpdateChecker
- updateCheck.UpdateDownloader
- Several GUI classes, including gui.SettingsPanel and gui.SettingsDialog
|
Hi, let’s put this in for potential Python 3.8 test blocker (later). Thanks.
|
feerrenrut
left a comment
There was a problem hiding this comment.
Looks good! Thanks for the detailed PR description! One minor thing, but happy for you to merge this when you are ready.
| _reportCountDuringCollection = 0 | ||
| elif action == "stop": | ||
| if _reportCountDuringCollection > 0: | ||
| log.error(f"Found at least {_reportCountDuringCollection} unreachable objects in run") |
There was a problem hiding this comment.
It might be nice if this message said that it marked the end of this garbage collection run. Note that this error means that two errors will be logged if there are any object deletions.
0c401fe
|
I'm getting lots of errors when arrowing through a content recognition result. |
|
Same here. Also I receive many errors as follows: |
|
I have found the reference cycles on the ContentRecog classes.
The ContentRecog NVDAObject stores a TextInfo instance as _selection on
itself. But, it also patches the TextInfo's updateCaret and
updateSelection methods to use self._selection. It is the fact these
lambdas reference self (the ContentRecog NVDAObject) and they are stored
on the TextInfo instance, thus a reference cycle.
I think I'd like to see a base TextInfo class that all ContentRecog
TextInfos should inherit from, which implements updateCaret /
updateSelection. In deed we already have
cursorManager._reviewCursorTextInfo which does just this.
|
|
As expected, there will be a number of issues detected by this code with lot of log entries. |
Link to issue number:
Summary of the issue:
The lifetime of objects in Python are managed by reference counting. When an object's reference count hits 0, it is deleted straight away. However, if an object directly or indirectly holds a reference to itself, then the reference count will never reach 0 and therefore the object would not normally be cleaned up. Python therefore has a Garbage Collector (gc) which periodically searches through all objects that can hold references, and forcefully deletes any object found to be unreachable. As this check is periodic, and can happen in any thread, objects that participate in reference cycles may get deleted at strange or unpredictable times. This is especially problematic for comtypes COM objects as when they destruct, they call their IUnknown::Release method, which may block or cause other issues as it may be called from the wrong thread.
We try not to cause reference cycles in NVDA, but they easily can happen by accident.
It would be good if we were able to detect reference cycles, or at very least, when an object is being deleted by the garbage collector, and clearly log this.
In future we may also wish to take complete control of garbage collection and perform it at predictable times in our main thread, but for now at least knowing when it is already going wrong would be a big step forward in debugging.
Description of how this pull request fixes the issue:
This pr adds a new garbageHandler module to NVDA which hooks into the Python garbage collector, such that it knows when a garbage collection run starts and stops. It provides a module-level function (notifyObjectDeletion) which an object can call from its
__del__method. If this function detects that the caller is currently inside a garbage collection run (one is in progress and it is in the same thread) then a warning is logged stating that this object is being deleted by the Garbage collector.Further to this, the very first time a deleted object is logged in this garbage collection run, an error is logged to clearly notify us that the garbage collector has found one or more important objects in this run. this error message also contains the full stack at this point, so that we can clearly see exactly where the garbage collector run happened, before anything bad happens (such as a freeze).
Of course the long-term goal is to not see any errors/warnings at all, as we will have removed all reference cycles.
the
__del__method on comtypes COM objects has been patched to call this function, thus we now can see in the log as soon as a COM object is about to be released due to the garbage collector.So if for instance a COM object happened to be released by the garbage collector which caused a freeze, we now have very clear logging to understand what happened.
garbageHandler also provides a new TrackedObject class, which also provides a
__del__that does this. And now all major NVDA classes inherit from TrackedObject. Thus we now can see in the log when one of these objects is deleted due to the garbage collector. Object's that inherit from TrackedObject include nvwave.WavePlayer, the sayAll reader classes, eventHandler._eventExecutor, and baseObject.AutoPropertyObject (which therefore provides it for appModules, synthDrivers, globalPlugins, NVDAObjects, treeInterceptors etc).Note that before Python 3.4 we could not have done this, as it used to be the case that any object with a
__del__would not be cleaned up by the garbage collector.Thanks to this functionality, several more reference cycles were identified, and also addressed in further commits on this pr.
They include:
except Exception as e), a reference cycle would be created on the frame holding that variable, thus all the frames (and local variables in each) to do with that exception were held until the garbage collector could forcefully delete them. This bug would have been new to Python 3, as Python 3 holds the traceback and frames on the Exception object. Python 2 does not do this. The bug was specifically in logHandler.getCodePath, where when fetching the given frame's local variables, they were being cached on the frame. It seems that a function Frame's f_locals member creates a new locals dictionary and caches it on itself each time this member is accessed. Therefore, we now clear this dictionary once we have fetched what we need.Testing performed:
Ran NVDA for extended periods performing daily tasks. Running with just the first commit, causes a lot of errors to be logged about deleting unreachable objects. After the rest of the commits, I am so far not seeing any errors at all. Though there very well may be more -- but the whole point of this pr is to keep detecting them and fixing them.
Known issues with pull request:
None.
Change log entry:
tbd.