Skip to content

Implement basic signal emulation#1881

Merged
sporksmith merged 3 commits intoshadow:mainfrom
sporksmith:signal-state
Feb 4, 2022
Merged

Implement basic signal emulation#1881
sporksmith merged 3 commits intoshadow:mainfrom
sporksmith:signal-state

Conversation

@sporksmith
Copy link
Copy Markdown
Contributor

@sporksmith sporksmith commented Jan 29, 2022

Implements most of MS2 from #1851.

  • Adds an explicit locking mechanism to the shim-shared-memory structures. The "protected" bits of these structures are now protected by a host-wide lock. While we could have used atomics again here, there are many operations that touch multiple parts of this state; it's safer to use a lock to ensure such operations get a consistent view.
  • Adds signal state to the thread and process shim-shared-memory structures, including configured masks, pending signals, etc.
  • Adds basic signal handling to preload mode including configuring signal handlers and masks, sending signals to threads and processes, and interrupting blocked syscalls with signals.
  • Adds tests for the new signal functionality and enables both the old and new tests for preload mode. I left the old signal test in place to help ensure this doesn't introduce regressions in ptrace mode's signal handling. ptrace mode doesn't pass the new tests (in particular it doesn't support interrupting blocked syscalls).

Notable pieces still missing:

  • SA_RESTART: automatically restart interrupted syscalls.
  • SA_ONSTACK: run syscall handlers in stack previously provided in sigaltstack.

Neither of these are very difficult, but seemed worth leaving out of this already-large-PR, and maybe deferring to MS3.

@github-actions github-actions bot added Component: Build Build/install tools and dependencies Component: Libraries Support functions like LD_PRELOAD and logging Component: Main Composing the core Shadow executable Component: Testing Unit and integration tests and frameworks labels Jan 29, 2022
@sporksmith sporksmith added this to the Support for Signals milestone Jan 29, 2022
@sporksmith sporksmith changed the title Signal state Implement basic signal emulation Jan 31, 2022
@sporksmith sporksmith marked this pull request as ready for review January 31, 2022 22:31
@sporksmith
Copy link
Copy Markdown
Contributor Author

Sorry this PR grew into a bit of a monster. I didn't want to merge the new data structures without the code that used those data structure, nor that code without tests to exercise it. I might be able to split this PR into some smaller individual commits along those lines if that'd make things easier for you, though I'm not sure they'd build or make a lot of sense on their own.

@sporksmith
Copy link
Copy Markdown
Contributor Author

Fixes #1455

Copy link
Copy Markdown
Contributor

@stevenengler stevenengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I learned a lot about signals reading it :)

Also, I was testing the PR and I ran into the error "Lock is already held. This is probably a deadlock".

Here's the contrived simulation config if you want to look at it:

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: /bin/python3
      args: -m http.server 80
      start_time: 3s
    - path: /bin/kill
      args: '1000'
      start_time: 4s
    - path: /bin/true
      start_time: 5s

Edit:

I also get a different error with the following config:

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: /bin/sleep
      args: '5'
      start_time: 3s
    - path: /bin/kill
      args: '1000'
      start_time: 4s
    - path: /bin/true
      start_time: 5s
00:00:00.043300 [581144:shadow-worker] 00:00:04.000000000 [INFO] [server:11.0.0.1] [process.c:373] [_process_getAndLogReturnCode] main success code '0' for process 'server.kill.1001'
**ERROR ENCOUNTERED**
	At process: 581136 (parent 565393)
	At file: src/main/host/syscall/time.c
	At line: 68
	At function: _syscallhandler_nanosleep_helper
	Message: nanosleep unblocked but a timer is still pending.
**BEGIN BACKTRACE**
Obtained 16 stack frames:
    build/src/main/shadow(+0x128efd5) [0x5634e1752fd5]
    build/src/main/shadow(utility_handleError+0xeb) [0x5634e17531af]
    build/src/main/shadow(+0x12b57ab) [0x5634e17797ab]
    build/src/main/shadow(syscallhandler_nanosleep+0x67) [0x5634e177987e]
    build/src/main/shadow(syscallhandler_make_syscall+0x204a) [0x5634e176cced]
    build/src/main/shadow(threadpreload_resume+0x25e) [0x5634e1743634]
    build/src/main/shadow(thread_resume+0xcb) [0x5634e173e346]
    build/src/main/shadow(process_continue+0x14c) [0x5634e1736638]
    build/src/main/shadow(+0x127e241) [0x5634e1742241]
    build/src/main/shadow(task_execute+0x78) [0x5634e17598fb]
    build/src/main/shadow(event_execute+0x16b) [0x5634e1759384]
    build/src/main/shadow(worker_runEvent+0x30) [0x5634e172f94f]
    build/src/main/shadow(+0x129018d) [0x5634e175418d]
    build/src/main/shadow(+0x126b7fb) [0x5634e172f7fb]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f078dab0609]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f078d51c293]
**END BACKTRACE**
**ABORTING**

@sporksmith
Copy link
Copy Markdown
Contributor Author

Also, I was testing the PR and I ran into the error "Lock is already held. This is probably a deadlock".

Ah nice. I failed to release the hostlock before raising a fatal signal natively in the shim-side handler. Fixed: ee7dc2b

@sporksmith
Copy link
Copy Markdown
Contributor Author

I also get a different error with the following config:

I actually do have some fixes for how nanosleep handles getting interrupted. It was originally in this PR but I broke it out to slim this one down. I'll resurrect that one after getting this merged.

We'll probably need to fix up some other syscalls as well; e.g. I think there are others with timeouts that assume the timeout expired if _syscallhandler_wasBlocked was true.

Opened #1889

@sporksmith
Copy link
Copy Markdown
Contributor Author

Btw while debugging your first example, I learned that the python simplehttp server doesn't install a handler to gracefully shut down on SIGTERM. It does for SIGINT. With this config the process and shadow exit gracefully:

general:
  stop_time: 10s

network:
  graph:
    type: 1_gbit_switch

hosts:
  server:
    network_node_id: 0
    processes:
    - path: /bin/python3
      args: -m http.server 80
      start_time: 3s
    - path: /bin/kill
      args: -SIGINT '1000'
      start_time: 4s
    - path: /bin/true
      start_time: 5s

It looks like nginx also shuts down on SIGINT (in addition to SIGTERM): http://nginx.org/en/docs/control.html

Maybe SIGINT will be a better default "graceful shutdown" signal, though we'll probably want a way to override it as well.

@sporksmith sporksmith enabled auto-merge February 4, 2022 18:17
@sporksmith sporksmith merged commit b3b8ffd into shadow:main Feb 4, 2022
@sporksmith sporksmith deleted the signal-state branch February 4, 2022 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Build Build/install tools and dependencies Component: Libraries Support functions like LD_PRELOAD and logging Component: Main Composing the core Shadow executable Component: Testing Unit and integration tests and frameworks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants