Skip to content

ActiveSupport::Notifications::Event#parent_of? is not reliable in environments with poor clock resolution.#5932

Closed
gnarg wants to merge 1 commit intorails:3-2-stablefrom
gnarg:3-2-stable
Closed

ActiveSupport::Notifications::Event#parent_of? is not reliable in environments with poor clock resolution.#5932
gnarg wants to merge 1 commit intorails:3-2-stablefrom
gnarg:3-2-stable

Conversation

@gnarg
Copy link
Copy Markdown

@gnarg gnarg commented Apr 23, 2012

Hi,

Some system clocks, particularly on virtual non-realtime environments can as tick with very low clock resolution. In such an environment the Event#parent_of? method can erroneously conclude that multiple instrumentation events that take place within a single tick are all each other's parents.

To work around this I implemented entry and exit counters for instrumentation events. These counters should reset with each transaction.

I'm not married to this implementation, but I would love to work with you guys on the issue. It's one of the things that is holding us up from implementing the Notifications API in the New Relic Ruby agent.

Regards,
Jon Guymon

…ironments with poor clock resolution. This is an attempt to resolve that by implementing per-transaction entry and exit counters.
@tenderlove
Copy link
Copy Markdown
Member

@gnarg can you take a look at f08f875? I think we could change the log subscriber to respond to start and finish, then we can assemble event object trees without depending on Time. wdyt?

@tenderlove
Copy link
Copy Markdown
Member

If we backport f08f875, we can apply this patch to make Event not time based:


diff --git a/activesupport/lib/active_support/log_subscriber.rb b/activesupport/lib/active_support/log_subscriber.rb
index d2a6e1b..124eb1d 100644
--- a/activesupport/lib/active_support/log_subscriber.rb
+++ b/activesupport/lib/active_support/log_subscriber.rb
@@ -61,7 +61,7 @@ module ActiveSupport
         @@flushable_loggers = nil

         log_subscriber.public_methods(false).each do |event|
-          next if 'call' == event.to_s
+          next if %w{ start finish }.include?(event.to_s)

           notifier.subscribe("#{event}.#{namespace}", log_subscriber)
         end
@@ -86,14 +86,35 @@ module ActiveSupport
       end
     end

-    def call(message, *args)
+    def initialize
+      @event_stack = Hash.new { |h,id|
+        h[id] = Hash.new { |ids,name| ids[name] = [] }
+      }
+      super
+    end
+
+    def start(name, id, payload)
+      return unless logger
+
+      e = ActiveSupport::Notifications::Event.new(name, Time.now, nil, id, payload)
+      parent = @event_stack[id][name].last
+      parent << e if parent
+
+      @event_stack[id][name].push e
+    end
+
+    def finish(name, id, payload)
       return unless logger

-      method = message.split('.').first
+      finished  = Time.now
+      event     = @event_stack[id][name].pop
+      event.end = finished
+
+      method = name.split('.').first
       begin
-        send(method, ActiveSupport::Notifications::Event.new(message, *args))
+        send(method, event)
       rescue Exception => e
-        logger.error "Could not log #{message.inspect} event. #{e.class}: #{e.message} #{e.backtrace}"
+        logger.error "Could not log #{name.inspect} event. #{e.class}: #{e.message} #{e.backtrace}"
       end
     end

diff --git a/activesupport/lib/active_support/notifications/instrumenter.rb b/activesupport/lib/active_support/notifications/instrumenter.rb
index 58e292c..2449f72 100644
--- a/activesupport/lib/active_support/notifications/instrumenter.rb
+++ b/activesupport/lib/active_support/notifications/instrumenter.rb
@@ -31,7 +31,8 @@ module ActiveSupport
     end

     class Event
-      attr_reader :name, :time, :end, :transaction_id, :payload, :duration
+      attr_reader :name, :time, :transaction_id, :payload
+      attr_accessor :end

       def initialize(name, start, ending, transaction_id, payload)
         @name           = name
@@ -39,12 +40,19 @@ module ActiveSupport
         @time           = start
         @transaction_id = transaction_id
         @end            = ending
-        @duration       = 1000.0 * (@end - @time)
+        @children       = []
+      end
+
+      def duration
+        1000.0 * (self.end - time)
+      end
+
+      def <<(event)
+        @children << event
       end

       def parent_of?(event)
-        start = (time - event.time) * 1000
-        start <= 0 && (start + duration >= event.duration)
+        @children.include? event
       end
     end
   end

@gnarg
Copy link
Copy Markdown
Author

gnarg commented Apr 26, 2012

Hi Aaron,

Brilliant, this is even better. I'll try cooking up a prototype agent that exercises this then get back to you with the results. I anticipate that this is all we need (with the possible exception of #5933) but I'll confirm that.

Thanks for this!

@gnarg
Copy link
Copy Markdown
Author

gnarg commented Jun 15, 2012

Hi Aaron,

I hacked up a quick proof of concept. I set this in a Rails 4 project (with the above patch applied) and everything seems to be working well. To walk the call graph of a node I need access to Event's "@children", so I had to use:

event.instance_variable_get(:@children)

An "attr_reader :children" would be swell.

https://gist.github.com/2938578

It would be great if you landed the patch above in Rails 4, but I don't think back-porting to Rails 3 is necessary yet, at least from New Relic's perspective. We can wait until we are at least close to having an agent that makes full use of the mechanism.

Thanks for you're help with this. I'm going to continue to work on this agent prototype and will let you know if I run into any snags.

-jon

@tenderlove
Copy link
Copy Markdown
Member

@gnarg cool, thanks for the feedback. Ya, this missing attr_reader on children was an oversight. I'll add that to the patch and commit it. Thanks for the feedback!

@tenderlove
Copy link
Copy Markdown
Member

If I apply my patch to master, do you still need this patch against 3-2-stable? Or should I just close this?

@gnarg
Copy link
Copy Markdown
Author

gnarg commented Jun 15, 2012

You can close it, thanks!

@gnarg gnarg closed this Jun 15, 2012
takoyakikamen added a commit to takoyakikamen/rails that referenced this pull request Jun 18, 2012
carlosantoniodasilva pushed a commit to carlosantoniodasilva/rails that referenced this pull request Jun 19, 2012
make events not use date and time to determine parent_of. fixes rails#5932
croaky added a commit to thoughtbot/suspenders that referenced this pull request Aug 1, 2017
`ActiveJob` attaches a single `LogSubscriber` to be notified of events.
When one job enqueues another, that's treated as a child event and
stored on the same stack as the parent. If a large number of children
are enqueued that leads to R14 errors on Heroku.

The fix here is to unsubscribe all existing "enqueue" listeners and
then add a new subscriber for just that event. The original handler
is copied from the parent class using `define_method` so that it's
picked up by `attach_to`.

Related:

rails/rails#21036
rails/rails#5932

Before and after:

![Heroku metrics chart showing 1GB drop in memory usage after this change deployed](https://user-images.githubusercontent.com/47136/28436362-b51b3d1e-6d64-11e7-8b24-fc67927e7081.png)
croaky added a commit to thoughtbot/suspenders that referenced this pull request Aug 8, 2017
`ActiveJob` attaches a single `LogSubscriber` to be notified of events.
When one job enqueues another, that's treated as a child event and
stored on the same stack as the parent. If a large number of children
are enqueued that leads to R14 errors on Heroku.

The fix here is to unsubscribe all existing "enqueue" listeners and
then add a new subscriber for just that event. The original handler
is copied from the parent class using `define_method` so that it's
picked up by `attach_to`.

Related:

rails/rails#21036
rails/rails#5932

Before and after:

![Heroku metrics chart showing 1GB drop in memory usage after this change deployed](https://user-images.githubusercontent.com/47136/28436362-b51b3d1e-6d64-11e7-8b24-fc67927e7081.png)
Web-Go-To added a commit to Web-Go-To/rails_suspenders that referenced this pull request Mar 23, 2023
`ActiveJob` attaches a single `LogSubscriber` to be notified of events.
When one job enqueues another, that's treated as a child event and
stored on the same stack as the parent. If a large number of children
are enqueued that leads to R14 errors on Heroku.

The fix here is to unsubscribe all existing "enqueue" listeners and
then add a new subscriber for just that event. The original handler
is copied from the parent class using `define_method` so that it's
picked up by `attach_to`.

Related:

rails/rails#21036
rails/rails#5932

Before and after:

![Heroku metrics chart showing 1GB drop in memory usage after this change deployed](https://user-images.githubusercontent.com/47136/28436362-b51b3d1e-6d64-11e7-8b24-fc67927e7081.png)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants