Skip to content

Ability to shut down managed processes gracefully #1491

@marcosimioni

Description

@marcosimioni

Describe the issue
I thought it would be nice to have a process option that can be used to choose how Shadow should behave when stop_time is reached, i.e. if a SIGKILL or a SIGTERM should be sent to the process so that it can be shut down gracefully.

Or alternatively, we should have a better default strategy like SIGTERM first and then SIGKILL after a timeout, rather than just going straight for a SIGKILL.

What I've tried*
By default, _process_terminate_threads() sends a SIGKILL, so I’ve modified it to send a SIGTERM instead as follows:

static void _process_terminate_threads(Process* proc) {
    trace("Terminating threads");
    if (process_isRunning(proc)) {
        if (kill(proc->nativePid, SIGTERM)) {
            warning("kill(pid=%d) error %d: %s", proc->nativePid, errno, g_strerror(errno));
        }
        process_markAsExiting(proc);
    }

    _process_handleProcessExit(proc);
}

What happens is that when stop_time is reached, Shadow hangs waiting for the PID to be killed. Here’s shadow.log:

00:01:01.443199 [worker-7] 00:06:00.000000000 [INFO] [hiddenserver:11.0.0.5] [process.c:567] [process_stop] terminating process 'hiddenserver.tor.1001'

However, it looks like the Tor process 1001 does not receive the TERM signal, and it keeps running as if nothing happened:

$ tail -n 20 hiddenserver.tor.1001.stdout
Jan 01 00:05:59.000 [notice] While bootstrapping, fetched this many bytes: 2173 (consensus network-status fetch); 1496 (authority cert fetch); 1325 (microdescriptor fetch)
Jan 01 00:05:59.000 [notice] While not bootstrapping, fetched this many bytes: 21120 (hidden-service descriptor upload)
Jan 01 00:05:59.000 [info] log_heartbeat(): Average packaged cell fullness: 98.094%. TLS write overhead: 1%
Jan 01 00:05:59.000 [notice] Our onion service received 0 v2 and 0 v3 INTRODUCE2 cells and attempted to launch 0 rendezvous circuits.
Jan 01 00:05:59.000 [info] should_remove_intro_point(): Intro point $4EBB385C80A2CA5D671E16F1C722FBFB5F176891 has expired (retried: 1 times). Removing it.
Jan 01 00:05:59.000 [info] circuit_mark_for_close_(): Circuit 3281966364 (id: 276) marked for close at src/feature/hs/hs_service.c:2476 (orig reason: 9, new reason: 0)
Jan 01 00:05:59.000 [info] pick_intro_point(): Picked intro point: $A52CA5B56C64D864F6AE43E56F29ACBD5706DDA1~4uthority [cHIspRhOlXa8bNg7cqiafKEsbT/bUEcCWU2wPMOwRWs] at 100.0.0.1
Jan 01 00:05:59.000 [info] update_service_descriptor_intro_points(): Service r4aj4kaqf46mala2yykldkvwrrwjagab2qppuqtvgdxwh6spsulwu2qd just picked 1 intro points and wanted 1 for current descriptor. It currently has 2 intro points. Launching ESTABLISH_INTRO circuit shortly.
Jan 01 00:05:59.000 [info] extend_info_from_node(): Including Ed25519 ID for $A52CA5B56C64D864F6AE43E56F29ACBD5706DDA1~4uthority [cHIspRhOlXa8bNg7cqiafKEsbT/bUEcCWU2wPMOwRWs] at 100.0.0.1
Jan 01 00:05:59.000 [info] hs_circ_launch_intro_point(): Launching a circuit to intro point $A52CA5B56C64D864F6AE43E56F29ACBD5706DDA1~4uthority [cHIspRhOlXa8bNg7cqiafKEsbT/bUEcCWU2wPMOwRWs] at 100.0.0.1 for service r4aj4kaqf46mala2yykldkvwrrwjagab2qppuqtvgdxwh6spsulwu2qd.
Jan 01 00:05:59.000 [info] circuit_launch_by_extend_info(): Cannibalizing circ 3905584307 (id: 302) for purpose 16 (Hidden service: Establishing introduction point)
Jan 01 00:05:59.000 [info] circuit_send_intermediate_onion_skin(): Sending extend relay cell.
Jan 01 00:05:59.000 [info] circuit_free_(): Circuit 0 (id: 276) has been freed.
Jan 01 00:05:59.400 [info] circuit_finish_handshake(): Finished building circuit hop:
Jan 01 00:05:59.400 [info] internal (high-uptime) circ (length 4, last hop 4uthority): $FF197204099FA0E507FA46D41FED97D3337B4BAA(open) $0A9B1B207FD13A6F117F95CAFA358EEE2234F19A(open) $3FB0BD7827C760FE7F9DD810FCB10322D63AB4CF(open) $A52CA5B56C64D864F6AE43E56F29ACBD5706DDA1(open)
Jan 01 00:05:59.400 [info] entry_guards_note_guard_success(): Recorded success for primary confirmed guard relay2 ($FF197204099FA0E507FA46D41FED97D3337B4BAA)
Jan 01 00:05:59.400 [info] circuit_build_no_more_hops(): circuit built!
Jan 01 00:05:59.400 [info] hs_circ_service_intro_has_opened(): Introduction circuit 3905584307 established for service r4aj4kaqf46mala2yykldkvwrrwjagab2qppuqtvgdxwh6spsulwu2qd.
Jan 01 00:05:59.400 [info] internal (high-uptime) circ (length 4): $FF197204099FA0E507FA46D41FED97D3337B4BAA(open) $0A9B1B207FD13A6F117F95CAFA358EEE2234F19A(open) $3FB0BD7827C760FE7F9DD810FCB10322D63AB4CF(open) $A52CA5B56C64D864F6AE43E56F29ACBD5706DDA1(open)
Jan 01 00:05:59.800 [info] service_handle_intro_established(): Successfully received an INTRO_ESTABLISHED cell on circuit 3905584307 for service r4aj4kaqf46mala2yykldkvwrrwjagab2qppuqtvgdxwh6spsulwu2qd

It looks like there's more needed in order to implement this.

Operating System (please complete the following information):

  • OS and version: post the output of lsb_release -d
$ lsb_release -d
Description:    Ubuntu 18.04.5 LTS
  • Kernel version: post the output of uname -a
$ uname -a
Linux ***** 4.15.0-137-generic #141-Ubuntu SMP Fri Feb 19 13:46:27 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Shadow (please complete the following information):

  • Version: post the output of shadow --version
$ shadow --version
Shadow 2.0.0-pre.1
  • Which plug-ins you are using:
    Tor

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Component: MainComposing the core Shadow executableType: EnhancementNew functionality or improved design

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions