fix: track real server status and uptime in health endpoint by Copilot · Pull Request #2938 · github/gh-aw-mcpg

Copilot · 2026-03-31T18:02:40Z

Problem

GetServerStatus() hardcoded every configured server as "running" with Uptime: 0, regardless of actual connection state. The /health endpoint always reported all servers as healthy, even if they had crashed or never been contacted.

Spec reference: Section 8 — Health Monitoring requires accurate server status and uptime tracking.

Fix

Track server state in the Launcher — Added ServerState struct with serverStartTimes and serverErrors maps
Record outcomes — recordStart() on successful launch, recordError() on failure, across both GetOrLaunch and GetOrLaunchForSession paths
Query real state — GetServerState() returns "stopped" (never launched), "running" (successfully connected), or "error" (launch failed)
Compute real uptime — time.Since(startedAt).Seconds() instead of hardcoded 0
Accurate health response — BuildHealthResponse() now correctly marks gateway as "unhealthy" when servers have errors

Files Changed

File	Change
`internal/launcher/launcher.go`	Added `ServerState`, state tracking maps, `recordStart/recordError/GetServerState`
`internal/server/unified.go`	`GetServerStatus()` queries launcher for real state + uptime
`internal/launcher/launcher_test.go`	7 new tests: stopped default, unknown server, running state, first-launch-only, error state, error clearing, error precedence
`internal/server/health_test.go`	2 new tests: error on failed launch, unhealthy on error

Testing

make agent-finished passes (format, build, lint, all tests).

Replace hardcoded 'running'/0 in GetServerStatus() with actual runtime state tracking: - Add ServerState struct and serverStartTimes/serverErrors maps to Launcher for tracking connection outcomes - Record start time on successful launch, error on failure - Add GetServerState() method to query real server state - Update GetServerStatus() to return actual status and uptime - Servers report 'stopped' before launch, 'running' with real uptime after successful connection, 'error' on failure - BuildHealthResponse now correctly marks gateway 'unhealthy' when servers have errors Adds 7 launcher state tests and 2 health response tests. Fixes Section 8 (Health Monitoring) compliance gap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR addresses the MCP Gateway health-monitoring compliance gap by replacing the GetServerStatus() stub with launcher-tracked server state and computed uptime, so /health can reflect non-running backends.

Changes:

Track backend server “started at” timestamps and most-recent launch errors in internal/launcher.
Update UnifiedServer.GetServerStatus() to return launcher-derived status and uptime (seconds since recorded start).
Add unit tests for launcher state reporting and health becoming unhealthy when a backend launch fails.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
internal/server/unified.go	Uses launcher state to report per-server status and uptime instead of hardcoded values.
internal/server/health_test.go	Adds tests asserting error status/uptime and unhealthy gateway status on backend launch failure.
internal/launcher/launcher.go	Introduces basic server state tracking (start time + last error) and exposes `GetServerState`.
internal/launcher/launcher_test.go	Adds tests for new server state tracking behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T19:00:58Z

internal/launcher/launcher.go

+// recordError records a server launch failure.
+func (l *Launcher) recordError(serverID string, errMsg string) {
+	l.serverErrors[serverID] = errMsg
+	logLauncher.Printf("Recorded server error: serverID=%s, error=%s", serverID, errMsg)


recordError logs the raw errMsg into logs. Since errMsg can include untrusted/secret-bearing content (e.g., backend stderr or URLs with tokens), this can leak sensitive data into launcher logs. Sanitize/truncate the error string before logging (or log a stable error code/type and keep the full error only in memory).

Suggested change

// recordError records a server launch failure.

func (l *Launcher) recordError(serverID string, errMsg string) {

l.serverErrors[serverID] = errMsg

logLauncher.Printf("Recorded server error: serverID=%s, error=%s", serverID, errMsg)

const maxLoggedErrorLength = 512

// truncateErrorForLog returns a version of the error message that is safe to log by

// limiting its length. The full error is still kept in memory (e.g., in serverErrors).

func truncateErrorForLog(errMsg string) string {

if len(errMsg) <= maxLoggedErrorLength {

return errMsg

}

return errMsg[:maxLoggedErrorLength] + "...(truncated)"

}

// recordError records a server launch failure.

func (l *Launcher) recordError(serverID string, errMsg string) {

// Store the full error message in memory for internal use.

l.serverErrors[serverID] = errMsg

// Log only a truncated version to reduce the risk of leaking sensitive data.

safeErrMsg := truncateErrorForLog(errMsg)

logLauncher.Printf("Recorded server error: serverID=%s, error=%s", serverID, safeErrMsg)

Copilot · 2026-03-31T19:00:59Z

internal/launcher/launcher.go

+// GetServerState returns the observed runtime state for a single server.
+func (l *Launcher) GetServerState(serverID string) ServerState {
+	l.mu.RLock()
+	defer l.mu.RUnlock()
+
+	if errMsg, hasErr := l.serverErrors[serverID]; hasErr {
+		return ServerState{
+			Status:    "error",
+			LastError: errMsg,
+		}
+	}
+
+	if startedAt, ok := l.serverStartTimes[serverID]; ok {
+		return ServerState{
+			Status:    "running",
+			StartedAt: startedAt,
+		}
+	}
+
+	return ServerState{Status: "stopped"}
+}


GetServerState derives "running" solely from serverStartTimes, but nothing clears/updates that state when connections are closed (e.g., Launcher.Close() or session pool cleanup). This can cause /health to report servers as running with increasing uptime even after shutdown/close. Consider clearing serverStartTimes/serverErrors in Close(), and/or recording a "stopped" state when a connection is closed so the reported status reflects reality.

Copilot · 2026-03-31T19:00:59Z

internal/launcher/launcher_test.go

+	firstStart := l.serverStartTimes["test-server"]
+
+	// Second call should be a no-op
+	l.recordStart("test-server")
+	secondStart := l.serverStartTimes["test-server"]


This test reaches into l.serverStartTimes directly to assert behavior. That couples the test to the Launcher's internal representation and bypasses the public surface (GetServerState). Prefer asserting via GetServerState().StartedAt across two recordStart calls so internal refactors (e.g., switching to a different storage/locking strategy) don't break the test unnecessarily.

Suggested change

firstStart := l.serverStartTimes["test-server"]

// Second call should be a no-op

l.recordStart("test-server")

secondStart := l.serverStartTimes["test-server"]

firstState := l.GetServerState("test-server")

firstStart := firstState.StartedAt

// Second call should be a no-op

l.recordStart("test-server")

secondState := l.GetServerState("test-server")

secondStart := secondState.StartedAt

Copilot · 2026-03-31T19:00:59Z

internal/server/health_test.go

+	cfg := &config.Config{
+		Servers: map[string]*config.ServerConfig{
+			"server-a": {Type: "stdio", Command: "docker", Args: []string{"run", "test"}},
+		},
+	}
+
+	ctx := context.Background()
+	us, err := NewUnified(ctx, cfg)
+	require.NoError(err)
+	t.Cleanup(func() { us.Close() })
+
+	status := us.GetServerStatus()
+	require.Contains(status, "server-a")
+	assert.Equal("error", status["server-a"].Status, "Server should report 'error' after failed launch")
+	assert.Equal(0, status["server-a"].Uptime, "Uptime should be 0 for errored server")
+}
+
+// TestBuildHealthResponse_UnhealthyOnError verifies that failed servers make the gateway unhealthy
+func TestBuildHealthResponse_UnhealthyOnError(t *testing.T) {
+	assert := assert.New(t)
+	require := require.New(t)
+
+	cfg := &config.Config{
+		Servers: map[string]*config.ServerConfig{
+			"server-a": {Type: "stdio", Command: "docker", Args: []string{"run", "test"}},
+		},
+	}
+
+	ctx := context.Background()
+	us, err := NewUnified(ctx, cfg)
+	require.NoError(err)
+	t.Cleanup(func() { us.Close() })
+
+	response := BuildHealthResponse(us)
+	assert.Equal("unhealthy", response.Status, "Gateway should be unhealthy when a server has errors")
+}


These tests call NewUnified(), which triggers tool registration and will attempt to launch the configured stdio backend. Using docker run test can hang/pull images and then wait up to the default 60s startup timeout, making the unit test slow/flaky depending on the environment. Use a deterministic fast-fail command (e.g., a clearly nonexistent binary like other launcher tests do) and/or set cfg.Gateway.StartupTimeout to a very small value in the test.

Initial plan

864704b

Copilot AI assigned Copilot and lpcox Mar 31, 2026

Copilot started work on behalf of lpcox March 31, 2026 18:02 View session

Copilot AI linked an issue Mar 31, 2026 that may be closed by this pull request

[compliance] Compliance Gap: Health Monitoring - Server Status Always Reports "running" with Zero Uptime #2914

Closed

Copilot stopped work on behalf of lpcox due to an error March 31, 2026 18:18
The job was not acquired by a runner even after multiple attempts.

Copilot AI requested a review from lpcox March 31, 2026 18:18

lpcox marked this pull request as ready for review March 31, 2026 18:55

Copilot AI review requested due to automatic review settings March 31, 2026 18:55

Copilot started reviewing on behalf of lpcox March 31, 2026 18:56 View session

lpcox changed the title ~~[WIP] Fix server status reporting to include uptime~~ fix: track real server status and uptime in health endpoint Mar 31, 2026

Copilot AI reviewed Mar 31, 2026

View reviewed changes

lpcox merged commit 4d5c7fb into main Mar 31, 2026
28 checks passed

lpcox deleted the copilot/fix-get-server-status-uptime branch March 31, 2026 20:34

Copilot AI mentioned this pull request Apr 1, 2026

feat: bump MCP Gateway v0.2.11, Playwright Browser v1.59.0, Playwright MCP 0.0.70, Firewall v0.25.6 github/gh-aw#23821

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: track real server status and uptime in health endpoint#2938

fix: track real server status and uptime in health endpoint#2938
lpcox merged 2 commits intomainfrom
copilot/fix-get-server-status-uptime

Copilot AI commented Mar 31, 2026 •

edited by lpcox

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-// recordError records a server launch failure.
-func (l *Launcher) recordError(serverID string, errMsg string) {
-	l.serverErrors[serverID] = errMsg
-	logLauncher.Printf("Recorded server error: serverID=%s, error=%s", serverID, errMsg)
+const maxLoggedErrorLength = 512
+// truncateErrorForLog returns a version of the error message that is safe to log by
+// limiting its length. The full error is still kept in memory (e.g., in serverErrors).
+func truncateErrorForLog(errMsg string) string {
+	if len(errMsg) <= maxLoggedErrorLength {
+		return errMsg
+	}
+	return errMsg[:maxLoggedErrorLength] + "...(truncated)"
+}
+// recordError records a server launch failure.
+func (l *Launcher) recordError(serverID string, errMsg string) {
+	// Store the full error message in memory for internal use.
+	l.serverErrors[serverID] = errMsg
+	// Log only a truncated version to reduce the risk of leaking sensitive data.
+	safeErrMsg := truncateErrorForLog(errMsg)
+	logLauncher.Printf("Recorded server error: serverID=%s, error=%s", serverID, safeErrMsg)

Conversation

Copilot AI commented Mar 31, 2026 • edited by lpcox Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Files Changed

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 31, 2026 •

edited by lpcox

Loading