Skip to content

Feature Proposal: Tracing Channel / Diagnostics Channel support #2590

@tlhunter

Description

@tlhunter

Motivation

Hello!

I'm writing this as a member of the Datadog Node.js Tracer team where we work on an Application Performance Monitoring (APM) product (available at DataDog/dd-trace-js if you're curious). Prior to working at DataDog I spoke at RedisConf17 and RedisConf19, with both talks making use of node-redis.

I'm sending similar requests to maintainers of libraries that are instrumented by DataDog (not to mention other APM products). The tl;dr is that I would like to make a Pull Request to node-redis and would like your go-ahead before I start development work. This Pull Request will add support for a new feature that has landed in Node.js, TracingChannel. Does this sound like a change you'd be interested in accepting?

The rest of this is just a whole bunch of background in case you have questions:


Regarding definitions, an APM is a library that users install in their applications. Some popular APM products include DataDog, Elastic, New Relic, AppSignal, and Open Telemetry. These libraries do many things, but some common examples include sending metrics and logs for incoming HTTP requests, tracking how long a database query takes, and even building a graph of relationships between distributed systems. An incoming HTTP request might encompass an outbound HTTP request, a gRPC request, write to a file on disk, and cache something in Redis. As you can imagine, keeping track of how all of these I/O operations are connected is no easy task, and that's the sort of magic that happens with an APM library.

Depending on the type of library an APM tool might need to extract metadata about a call that an application is making to a library. For example, the APM tool might want to know the HTTP method and URL for an inbound or outbound request. Sometimes additional metadata needs to be inserted as well. For example, distributed tracing headers may be added to an outbound HTTP request or to a database query. The most exciting part about all this, in my opinion, is that it can be done automatically for a user's application.

Despite the name, "automatic instrumentation" is a very manual process, done by the authors of the APM libraries by way of monkey patching. In fact, here's dd-trace's redis instrumentation code. Essentially, monkey patching works by either replacing methods on a library, or by creating a new proxy interface in front of the library. When we get enough user requests to instrument a particular library we'll go through the code, figure out what needs to be patched to support expected use-cases, make the changes, and release them in dd-trace.

Of course, the other APM products do the same thing. node-redis has probably been instrumented a dozen times by now!

Monkey patching requires things like calling require() on a deeply nested file and replacing methods that might not even be documented. Of course, this isn't part of any sort of public API for a library, so even SemVer "patch" releases of a library can lead to APM engineers scrambling to fix compatibility issues.

At this point you might be thinking that APM maintainers have a fool's errand and that compatibility must be breaking all the time. And you'd be mostly right. But that sort of leads into the proposal:

I would like to implement a change to node-redis that adds support for TracingChannel (TC), a feature that has landed in Node.js v19.9, which it itself a superset of functionally provided by Diagnostics Channel (DC), a feature that is available in Node.js v14.17. These Node.js features were created specifically with APM support in mind. It is essentially a named, global pub/sub mechanism, as if someone declared a global EventEmitter instance (similar, of course, to Redis's PUBLISH/SUBSCRIBE, though limited to a single Node.js process). Code in one end of an application can publish to a channel and code in another end can subscribe to the channel. A TC channel is essentially an opinionated collection of five different DC channels (start, end, asyncStart, asyncEnd, error) for keeping track of what a library is doing. Channels are then named after the library, such as tracing:redis.query:start.

There are several benefits to incorporating TracingChannel directly into node-redis:

  • Performance: A user's application will be much quicker if instrumentation happens using these channels as compared to monkey patching.
  • Stability: A user's application is much less likely to have compatibility issues when upgrading libraries.
  • Standardization: Each APM vendor can now rely on a common interface for consuming data from libraries.

The DataDog Node.js tracer team is full of engineers who care a lot about Node.js and the ecosystem. Stephen Belanger, aka @Qard, has made many contributions to Node.js core, including Diagnostics Channel and Tracing Channel. Bryan English, aka @bengl, has also made several contributions to Node.js core and, along with myself, co-authored the book Multithreaded JavaScript. I have also written Distributed Node.js. All three of us have given many different conference talks. I guess what I'm trying to say is that we really do intend for these changes to be beneficial to the Node.js ecosystem at large, and not just for paid APM products.

Basic Code Example

// Note that this wouldn't change any user-facing APIs,
// though user code may subscribe to the channels if they desire.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions