Skip to content

log: save/fetch thread name infra#60058

Merged
yuriw merged 1 commit intoceph:mainfrom
mchangir:log-save-thread-name-in-log-entries
Oct 16, 2024
Merged

log: save/fetch thread name infra#60058
yuriw merged 1 commit intoceph:mainfrom
mchangir:log-save-thread-name-in-log-entries

Conversation

@mchangir
Copy link
Contributor

@mchangir mchangir commented Sep 30, 2024

pthread name is saved against the pthread ID in a class-wide static unordered_map as part of the Thread class

save/fetch methods are implemented to manage the thread names
thread names are never removed from the map

Fixes: https://tracker.ceph.com/issues/50743
Signed-off-by: Milind Changire mchangir@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@vshankar vshankar requested review from a team, badone, ljflores and rzarzynski September 30, 2024 12:46
@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 8d71897 to 2b08e2c Compare October 1, 2024 03:15
@mchangir
Copy link
Contributor Author

mchangir commented Oct 1, 2024

jenkins test make check

@mchangir
Copy link
Contributor Author

mchangir commented Oct 1, 2024

jenkins test api

@mchangir
Copy link
Contributor Author

mchangir commented Oct 1, 2024

jenkins test make check arm64

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global registry might be worrisome from the performance point of view. I'm not sure we really need it. Please see my comments.

int cpuid;
std::string thread_name;
static std::mutex threadid_name_mutex;
static std::unordered_map<pthread_t, std::string> threadid_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this look like reinventing the thread local storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like it ... but I'm trying to help Log::dump_recent() find the thread name after a thread ceases to exist.
I'm also worried that a pthread_t may be recycled by the time Log::dump_recent() eventually calls pthread_getname_np().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what's your intention. My point is recording in TLS would happen when the thread is still alive, not at the dump_recent() stage.

@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 2b08e2c to 00386c1 Compare October 2, 2024 09:38
@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch 2 times, most recently from eebb5a5 to 67d2532 Compare October 4, 2024 12:02
@mchangir
Copy link
Contributor Author

mchangir commented Oct 4, 2024

  • now storing global thread IDs in the thread_id --> thread_name lookup table
  • thread names are std::array<char, 16> in the lookup table instead of std::string

@mchangir
Copy link
Contributor Author

mchangir commented Oct 4, 2024

jenkins test api

@mchangir
Copy link
Contributor Author

mchangir commented Oct 4, 2024

jenkins test make check arm64

@badone badone requested a review from markhpc October 6, 2024 01:08
@badone
Copy link
Contributor

badone commented Oct 6, 2024

@markhpc so it is on your radar

@mchangir
Copy link
Contributor Author

mchangir commented Oct 6, 2024

jenkins test api

@mchangir
Copy link
Contributor Author

mchangir commented Oct 6, 2024

jenkins test make check arm64

@mchangir mchangir changed the title log: save/fetch/remove thread name infra log: save/fetch thread name infra Oct 6, 2024
@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 67d2532 to 87ac83c Compare October 6, 2024 12:02
@mchangir
Copy link
Contributor Author

mchangir commented Oct 6, 2024

  • thread names are never removed from the map

@vshankar
Copy link
Contributor

vshankar commented Oct 7, 2024

  • thread names are never removed from the map

Practically this map should not grow too huge to cause problems, but still I'm asking this for brevity - are we okay with this?

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually seems fine. Hopefully we could use C++'s thread_local for delegating the TLS interaction out.

* we do not allocate extra memory to store the 32bit global ID
* we just type cast the 32bit value to a (void*) and save it in the TLS
*/
pthread_once(&global_thread_id_key_once, global_thread_id_key_alloc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ has the thread_local that takes from you the responsibility for this low level stuff.

src/log/Log.cc Outdated
for (const auto& e : t) {
recent_pthread_ids.emplace(e.m_thread);
recent_pthread_ids.emplace(std::make_pair(e.m_thread,
Thread::fetch_thread_name(e.m_global_thread_id)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 87ac83c to 3e7bdfa Compare October 8, 2024 03:14
@mchangir
Copy link
Contributor Author

mchangir commented Oct 8, 2024

  • now using thread_local to save the unique thread ID

@mchangir
Copy link
Contributor Author

mchangir commented Oct 8, 2024

jenkins test api

@mchangir
Copy link
Contributor Author

mchangir commented Oct 8, 2024

jenkins test windows

@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 3e7bdfa to 8f7d3f6 Compare October 8, 2024 10:47
@mchangir
Copy link
Contributor Author

mchangir commented Oct 8, 2024

  • class static members have now been declared as static inline to do away with the class static member declaration and definition confusion

#endif

// STATIC
std::unordered_map<unsigned int, std::array<char, 16>> Thread::threadid_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unordered_map<unsigned int, std::array<char, 16>> Thread::threadid_name;
std::unordered_map<unsigned int, boost::container::small_vector<char, 16>> Thread::threadid_name;

src/log/Entry.h Outdated
time m_stamp;
pthread_t m_thread;
short m_prio, m_subsys;
unsigned int m_global_thread_id; // for Log::dump_recent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be much simpler to:

  • save the thread's name in thread_local storage (once)
  • copy the 128 (16 byte) value in this Entry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, when the thread is gone during dump_recent(), where does the thread name gets fetched?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the Entry itself. It can be copied out into the recent_pthread_ids set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add that since there is strong interest to get this fix into older releases, I would strongly recommend this simple solution. I suspect the performance and memory cost to be inconsequential. It can be explored afterwards whether optimizations are worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had indeed started this PR with copying the thread name into the Entry object ... but was discouraged by Radek to do so due to memory and time consumption in the hot path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it wasn't just copying of 16 bytes; it was a call to the lock-protected Thread::fetch_thread_name() (#60058 (comment)) which spurs the concern.

IIUC Patrick is arguing for the simpler approach where the thread name is copied from TLS by the Entry constructor:

Maybe instead of this complex, lock-protected container every Thread could cache its name in a thread_local variable from which the Entry constructor would simply copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then we wouldn't require any thread local storage at all.
We could simply call ceph_pthread_getname(pthread_self(), ...) in the Entry constructor and save it for future reference.
Does this sound reasonable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I understand:

  • Save the thread name (once) in TLS during thread creation (one call to pthread_getname_np(...))
  • In the ctor for Entry, fetch the thread name from TLS (no locking required)
  • copy the thread name in the Entry object

So, repeated calls to pthread_getname_np(...) is avoided and the overhead is the 16byte copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @mchangir a bit and it's boiling down to deciding if we want to store the thread name in TLS vs calling pthread_getname_np() when creating Entry object. As @rzarzynski mentioned, the pthread call might have a more optimized implementation. So, I'm really on the fence hence to choose between the two.

#endif

// Thread Local Storage
thread_local unsigned int global_thread_id = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ allows for thread_local static class member while the ID (or cached thread name) conceptually belongs to Thread.

The global in the name is bit misleading IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm

@vshankar
Copy link
Contributor

jenkins test api

@vshankar
Copy link
Contributor

jenkins test windows

@mchangir
Copy link
Contributor Author

jenkins test api

@mchangir
Copy link
Contributor Author

jenkins test windows

@mchangir
Copy link
Contributor Author

jenkins test make check arm64

@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 30d4875 to f385c43 Compare October 14, 2024 10:26
@mchangir
Copy link
Contributor Author

  • rebased
  • PR update with revised thread name fetch procedure

@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from f385c43 to 1837811 Compare October 14, 2024 11:16
@mchangir
Copy link
Contributor Author

jenkins test api

@mchangir
Copy link
Contributor Author

jenkins test make check arm64

* pthread name is saved in a thread_local storage
* the thread_local name is copied into Entry object's ctor
* Log::dump_recent() reads the thread name from the Entry
  object's data member when dumping logs

Fixes: https://tracker.ceph.com/issues/50743
Signed-off-by: Milind Changire <mchangir@redhat.com>
@mchangir mchangir force-pushed the log-save-thread-name-in-log-entries branch from 1837811 to 0be8d01 Compare October 15, 2024 11:57
@mchangir
Copy link
Contributor Author

  • minor change to set m_thread_name[15] = '\0'; in class Entry ctor

@mchangir
Copy link
Contributor Author

jenkins test api

@mchangir
Copy link
Contributor Author

jenkins test make check

@ljflores
Copy link
Member

@yuriw yuriw merged commit 6e72881 into ceph:main Oct 16, 2024
@batrick
Copy link
Member

batrick commented Oct 24, 2024

int detach();
int set_affinity(int cpuid);
static const std::string get_thread_name() {
return Thread::thread_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many threads in Ceph (and the MDS) which do not use this class Thread. So the "name" will wrongly be empty every time this is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to make those threads use Thread class then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea behind helpers like make_named_thread() was the opposite -- slowly move off of Thread and just use std::thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants