Skip to content

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

@ellispritchard

Description

@ellispritchard

I'm trying to add some custom metrics using the perform_job.good_job notification, but I'm finding that it's not obvious how to distinguish between successful completions and discards, and I wonder if this is an oversight?

The code in question is this:

ActiveSupport::Notifications.instrument("perform_job.good_job", { job: self, execution: execution, process_id: current_thread.process_id, thread_name: current_thread.thread_name }) do |instrument_payload|
value = ActiveJob::Base.execute(active_job_data)
if value.is_a?(Exception)
handled_error = value
value = nil
end
handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard
error_event = if handled_error == current_thread.error_on_discard
:discarded
elsif handled_error == current_thread.error_on_retry
:retried
elsif handled_error == current_thread.error_on_retry_stopped
:retry_stopped
elsif handled_error
:handled
end
instrument_payload.merge!(
value: value,
handled_error: handled_error,
retried: current_thread.retried_job.present?,
error_event: error_event
)
ExecutionResult.new(value: value, handled_error: handled_error, error_event: error_event, retried_job: current_thread.retried_job)

Specifically the issue is here (annotated ##):

            ##  If the job execution succeeds, `handled_error` is `nil`, since so is 
            ## `current_thread.error_on_retry ||  current_thread.error_on_discard`.
            handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard
   
            error_event = if handled_error == current_thread.error_on_discard  ## nil == nil
                            :discarded  ##  error_event = :discarded
                          elsif handled_error == current_thread.error_on_retry
                            :retried
                          elsif handled_error == current_thread.error_on_retry_stopped
                            :retry_stopped
                          elsif handled_error
                            :handled
                          end

due to the logic here, error_event ends up :discarded even if there was no error, and this is what ends up on payload[:error_event]:

(byebug) event.payload
{:job=>#<GoodJob::Job id: "32da31bc-10e5-422f-9458-8ca96830baf6", queue_name: "default", priority: 0, serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", performed_at: "2025-02-21 17:57:22.821524000 +0000", finished_at: nil, error: nil, created_at: "2025-02-21 17:57:22.786207000 +0000", updated_at: "2025-02-21 17:57:22.830189000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", concurrency_key: nil, cron_key: nil, cron_at: nil, batch_id: nil, batch_callback_id: nil, executions_count: 1, job_class: "TestJob", error_event: nil, labels: nil, locked_by_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", locked_at: "2025-02-21 17:57:22.821696000 +0000">, :execution=>#<GoodJob::Execution id: "a568dbe9-a7a0-426f-991e-7952076362c2", created_at: "2025-02-21 17:57:22.821524000 +0000", updated_at: "2025-02-21 17:57:22.822639000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", job_class: "TestJob", queue_name: "default", serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", finished_at: nil, error: nil, error_event: nil, error_backtrace: nil, process_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", duration: nil>, :process_id=>32948, :thread_name=>"GoodJob::Scheduler(queues=* max_threads=5)-thread-1", :value=>nil, :handled_error=>nil, :retried=>false, :error_event=>:discarded}

As a work-around you can instead check that handled_error (and unhandled_error?) are nil (which is what the following code in fact does) but this is not super intuitive.

Calling the helper methods on payload[:job] or payload[:execution] doesn't help either, since the fields haven't been updated yet.

Have you a recommendation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions