-
-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Background
With PR #483, Fedify now has comprehensive OpenTelemetry instrumentation that records span events containing full activity JSON payloads. This enables developers to build custom observability tools by implementing a SpanExporter that captures ActivityPub activities as they flow through the system.
However, the example FedifyDebugExporter shown in the documentation stores captured activities in an in-memory Map. This works fine when the entire application runs in a single process, but Fedify applications commonly run in distributed environments where:
- The web server handling HTTP requests runs on different nodes than the background workers processing the message queue.
- Multiple worker nodes may process queued messages in parallel.
- The debug dashboard itself may run on yet another node.
In such environments, a custom SpanExporter with in-memory storage cannot aggregate traces across nodes. Each node would only see its own spans, making it impossible to view the complete picture of a distributed trace.
Proposed solution
We should provide a SpanExporter implementation that persists trace data to Fedify's KvStore, which is already designed to be shared across nodes. This allows all nodes in a distributed deployment to write to the same storage, and the debug dashboard can query this shared storage to display complete traces.
The key insight is that OpenTelemetry's context propagation (which Fedify already implements via the traceContext field in InboxMessage, OutboxMessage, and FanoutMessage) ensures that all spans belonging to the same logical request share the same trace ID, even when they execute on different nodes. By storing spans keyed by their trace ID, we can later retrieve all activities that occurred within a single request.
A rough sketch of the API might look like:
import { createFederation } from "@fedify/fedify";
import { RedisKvStore } from "@fedify/redis";
import { FedifySpanExporter } from "@fedify/fedify/otel";
import { NodeTracerProvider, SimpleSpanProcessor } from "@opentelemetry/sdk-trace-node";
const kv = new RedisKvStore(redis);
// Create the exporter that writes to KvStore
const fedifyExporter = new FedifySpanExporter(kv, {
ttl: Temporal.Duration.from({ hours: 1 }),
});
const tracerProvider = new NodeTracerProvider();
tracerProvider.addSpanProcessor(new SimpleSpanProcessor(fedifyExporter));
const federation = createFederation({
kv,
tracerProvider,
});
// Later, in the debug dashboard endpoint:
app.get("/debug/traces/:traceId", async (req, res) => {
const activities = await fedifyExporter.getActivitiesByTraceId(req.params.traceId);
res.json(activities);
});
app.get("/debug/traces", async (req, res) => {
const recentTraces = await fedifyExporter.getRecentTraces({ limit: 100 });
res.json(recentTraces);
});Storage strategy
There are two possible approaches for storing trace data, depending on what operations the KvStore supports.
When the KvStore provides a list() operation (see #498), each activity record can be stored under its own key using a pattern like ["traces", traceId, spanId]. Retrieving all activities for a trace becomes a prefix scan over ["traces", traceId]. This approach is simple and avoids any concurrency issues since each span writes to a unique key.
When the KvStore only provides cas() but not list(), activities must be stored as a list under a single key ["traces", traceId]. Appending a new activity requires reading the existing list, adding the new item, and writing it back using compare-and-swap to handle concurrent writes. This works but is less efficient and may experience contention under high load.
If the KvStore provides neither list() nor cas(), the exporter should throw an error at construction time with a clear message explaining which KvStore capabilities are required.
Data model
Each stored record should contain enough information to reconstruct the activity flow:
interface TraceActivityRecord {
traceId: string;
spanId: string;
parentSpanId?: string;
direction: "inbound" | "outbound";
activityType: string;
activityId?: string;
activityJson: string;
verified?: boolean;
timestamp: string;
// For outbound activities
inboxUrl?: string;
}The exporter should extract this information from the span events added in PR #483, specifically activitypub.activity.received and activitypub.activity.sent.
Relationship to other issues
This issue depends on #498 (adding optional list() operation to KvStore), which should be implemented first. While the exporter can fall back to cas() for KvStore implementations that lack list(), the list() operation provides a more robust and efficient solution.
This issue is part of the larger effort to build a real-time ActivityPub debug dashboard (originally discussed in #234 and #323).