Skip to content

[Dashboard] Getting stacktrace from the dashboard triggers "failed to get os threadid" when --native option is used #30566

@rkooo567

Description

@rkooo567

What happened + What you expected to happen

https://console.anyscale.com/o/anyscale-internal/projects/prj_FKRmeV5pA6X72aVscFALNC32/clusters/ses_GrgDrjtJWSw9A6wM9sbSwfwe?command-history-section=command_history

When I click the stack trace for the JobSupervisorActor, I saw the following error.

Failed to execute `sudo -n $(which py-spy) dump -p 830 --native`.

Note that this command requires `py-spy` to be installed with root permissions. You
can install `py-spy` and give it root permissions as follows:
  $ pip install py-spy
  $ sudo chown root:root `which py-spy`
  $ sudo chmod u+s `which py-spy`

Alternatively, you can start Ray with passwordless sudo / root permissions.

=== stdout ===
Process 830: ray::JobSupervisor
Python v3.7.7 (/home/ray/anaconda3/bin/python3.7)



=== stderr ===
Error: failed to get os threadid

Even when I run the same command from the terminal, it failed with the same error

Based on the search, I found only one issue, benfred/py-spy#490, but this seems irrelevant to our case (it says it should work if this command runs inside a container).

We may need to figure out the root cause or remove --native as a default option (or we can allow users to turn on/off the flag).

Versions / Dependencies

master

Reproduction script

not sure how to reproduce, but it is easy to find this problem from nightly test.

Issue Severity

No response

Metadata

Metadata

Assignees

Labels

P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tdashboardIssues specific to the Ray Dashboardrelease-blockerP0 Issue that blocks the release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions