Skip to content

Nagios parser returns not supported return codes and not enough information #11061

@Sakerdotes

Description

@Sakerdotes

Relevant telegraf.conf

[[inputs.exec]]
commands = ['/opt/checks/nagios_check.pl']
interval = '30s'
timeout = '15s'
name_suffix = '__nagios_check'
data_format = 'nagios'

Logs from Telegraf

2022-05-03T11:55:04Z E! [inputs.exec] Failed to add nagios state: exec: get exit code: exit status 127

System info

Telegraf 1.22.3, Oracle Linux Server 8.5

Docker

No response

Steps to reproduce

  1. Remove dependencies a nagios check needs to be executes (depends on check)
  2. Let telegraf execute the check

Expected behavior

Metric 1: => nagios_state__tcp,host=home1 service_output="/some/path/telegraf/commands/check_tcp: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory",state=3i 1651661042000000000

Metric 2: => nagios_state__tcp,host=home1 service_output="fork/exec /some/path/telegraf/commands/check_tcp: permission denied",state=3i 1651665933000000000

Actual behavior

Metric 1: => nagios_state__tcp,host=home1 service_output="",state=127i 1651661431000000000
Metric 2: => nagios_state__tcp,host=home1 service_output="" 1651665481000000000

Additional info

1. Return of not supported status codes:

As seen in the 'actual behavior' the status code '127' is returned. This is a not supported status code and should be converted into a '3' unknown status. (https://nagios-plugins.org/doc/guidelines.html , Plugin Return Codes )

2. Missing error information:

The error message from this check execution is nowhere to be found. Not in the error.log file nor in service_output field from the metric. The message gets thrown away in the parsing process if it is not from type ExitError.

3. Error can only be found in error.log

The error can only be found in the telegraf error log when it occures. It should be shown in the service_output response itself.

Solution

The Solution would be to return a 'unknown' state in case of an error and to put the error message into the service output field. No errors needs to be logged into the telegraf error.log because a unknown state with proper information is a valid checkresult.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/execbugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions