Skip to content

squid: mgr: fix PyObject* refcounting in TTLCache and cleanup logic#66483

Open
NitzanMordhai wants to merge 5 commits intoceph:squidfrom
NitzanMordhai:wip-74057-squid
Open

squid: mgr: fix PyObject* refcounting in TTLCache and cleanup logic#66483
NitzanMordhai wants to merge 5 commits intoceph:squidfrom
NitzanMordhai:wip-74057-squid

Conversation

@NitzanMordhai
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/74057


backport of #65245
parent tracker: https://tracker.ceph.com/issues/68989

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs lgtm

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a nit: the differences from cherry-picking need to be explicitly mentioned the commit description (Conflicts).

import enum
from collections import namedtuple
from collections import OrderedDict
from tempfile import NamedTemporaryFile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a difference from main which was including tempfile already.

import time
import enum
from collections import namedtuple
from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so the discrepancy between main and the backport has been tackled not by adding the Conflict section but rather by squeezing the diff. Fine!

@ljflores
Copy link
Member

ljflores commented Dec 8, 2025

@NitzanMordhai see #65245 (comment)

@NitzanMordhai
Copy link
Contributor Author

don't merge until #66571 merged

@batrick batrick marked this pull request as draft February 18, 2026 15:42
@batrick
Copy link
Member

batrick commented Feb 18, 2026

Converting to draft based on #66483 (comment)

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-project-automation github-project-automation bot moved this from New to Reviewer approved in Ceph-Dashboard Mar 5, 2026
This patch introduces several improvements to the Prometheus module:

 - Introduces `HealthHistory._prune()` to drop stale and inactive health checks.
  Limits the in-memory healthcheck dict to a configurable max_entries (default 1000).
  TTL for stale entries is configurable via `healthcheck_history_stale_ttl` (default 3600s).

 - Refactors HealthHistory.check() to use a unified iteration over known and current checks,
  improving concurrency and minimizing redundant updates.

 - Use cherrypy.tools.gzip instead of manual gzip.compress() for cleaner
  HTTP compression with proper header handling and client negotiation.

 - Introduces new module options:
    - `healthcheck_history_max_entries`

 - Add proper error handling for CherryPy engine startup failures
 - Remove os._exit monkey patch in favor of proper exception handling
 - Remove manual Content-Type header setting (CherryPy handles automatically)

Fixes: https://tracker.ceph.com/issues/68989
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit be28901)
Fix incorrect reference counting and memory retention behavior in TTLCache
when storing PyObject* values.
Previously, TTLCache::insert did not increment the reference count,
and `erase` / `clear` did not correctly decref the values, leading
to use-after-free or leaks depending on usage.

Changes:
- Move Py_INCREF from cacheable_get_python() to TTLCache::insert()
- Add `TTLCache::clear()` method for proper memory cleanup
- Ensure TTLCache::get() returns a new reference
- Fix misuse of std::move on c_str() in PyJSONFormatter

These changes prevent both memory leaks and use-after-free errors when
mgr modules use cached Python objects logic.

Fixes: https://tracker.ceph.com/issues/68989
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit 7fadf6a)
NitzanMordhai and others added 3 commits March 5, 2026 08:12
The HealthHistory.check() method acquires the lock and then calls
HealthHistory.save(), which also tries to acquire the same lock.
With a regular Lock(), the same thread blocks trying to re-acquire it (deadlock).
Switch to RLock to allow nested acquisition by the same thread.
PR ceph#65245 added the locks.

Fixes: https://tracker.ceph.com/issues/74148
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 26394ca)
PR ceph#65245 drop the header set for standby module,
we should still have it.

Fixes: https://tracker.ceph.com/issues/74149
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 2ef12b2)
Fixes: https://tracker.ceph.com/issues/74149
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit e3de8c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Reviewer approved

Development

Successfully merging this pull request may close these issues.

6 participants