[Robustness] ServiceManager.GetAllServices — Parallel.ForEach has no per-service timeout, single hung SCM RPC blocks a worker

**Severity:** Info

**File:** `src/Servy.Core/Services/ServiceManager.cs`
**Lines:** 830–885 (`GetAllServices`, `Parallel.ForEach` body)

**Description:**

`GetAllServices` enumerates the SCM list and calls `PopulateNativeDetails` for each service in parallel:

```csharp
Parallel.ForEach(services, new ParallelOptions
{
    CancellationToken = cancellationToken,
    MaxDegreeOfParallelism = Math.Min(Environment.ProcessorCount, MaxParallelScmQueries),
},
service =>
{
    try
    {
        if (cancellationToken.IsCancellationRequested) return;

        ServiceInfo info = new ServiceInfo { ... };

        // Fetch deep details natively
        PopulateNativeDetails(scmHandle, info);

        results.Add(info);
    }
    finally
    {
        service.Dispose();
    }
});
```

The `CancellationToken` only blocks *new* iterations — it cannot interrupt an **in-flight** native SCM call. `PopulateNativeDetails` issues `QueryServiceConfig` / `QueryServiceConfig2W` against the SCM, and these calls have been observed to hang on:
- protected services where the calling token lacks the right access mask
- driver services in transitional states
- corrupted service registry entries
- machines where a kernel filter driver intercepts SCM calls

When that happens, one of the (typically 4–8) parallel workers stays blocked until the native call eventually returns. With `MaxDegreeOfParallelism = min(ProcessorCount, MaxParallelScmQueries)`, several concurrent hangs can drain the entire pool, and the user-visible Manager UI stalls indefinitely (the cancellation request is honoured for the *queue*, but in-flight RPCs block the workers, so cancellation never completes).

**Reproduction (general shape):**
1. Have a service with an unusual access ACL or a driver service in `START_PENDING` for an extended period.
2. Open the Manager UI on that machine.
3. Click cancel — observe the UI does not actually unblock until each in-flight native call returns on its own.

**Suggested fix:**

Wrap `PopulateNativeDetails` in a `Task.Run(...).Wait(timeoutMs, cancellationToken)` so a stuck call cannot keep a worker indefinitely:

```csharp
service =>
{
    try
    {
        if (cancellationToken.IsCancellationRequested) return;

        ServiceInfo info = new ServiceInfo { ... };

        bool populated = Task.Run(() => PopulateNativeDetails(scmHandle, info), cancellationToken)
                             .Wait(AppConfig.PopulateNativeDetailsTimeoutMs, cancellationToken);

        if (!populated)
        {
            // Emit the basic info we already have rather than dropping the service entirely.
            info.Description = "(details unavailable: native query timed out)";
        }

        results.Add(info);
    }
    catch (OperationCanceledException) { /* token cancelled */ }
    finally
    {
        service.Dispose();
    }
}
```

Pick a sensible default in `AppConfig` (e.g. 1000–2000 ms per service). The leaked native call will still complete on its own, but it no longer holds up the parallel pool or the cancellation path.

**Severity rationale:**
Marked `Info` rather than `Warning` because the failure mode requires a specifically misbehaving service in the SCM list — most production environments will never trip it. Where it does trip, however, the symptom (Manager UI permanently unresponsive, cancel button "doing nothing") is severe and hard to diagnose without the source context above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Robustness] ServiceManager.GetAllServices — Parallel.ForEach has no per-service timeout, single hung SCM RPC blocks a worker #819

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Robustness] ServiceManager.GetAllServices — Parallel.ForEach has no per-service timeout, single hung SCM RPC blocks a worker #819

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions