Skip to content

LoggingHandler: logger.exception fails to serialize exceptions as expected #4409

@nhubbard

Description

@nhubbard

Describe your environment

OS: Debian Bookworm (from python:3.12-slim Docker image)
Python version: 3.12.8
SDK version: 1.29.0
API version: 1.29.0

What happened?

We are trying to log exceptions from Pika using the Logs interface that's still in beta. When we get an AMQPConnectionError (or other exception), our code does the following:

try:
  # ...
except AMQPConnectionError as e:
  logger.exception(e)
  # other code, including a trace session

where logger is a standard Python logger that has a LoggingHandler added to it.

When such an error occurs, we instead get an exception related to failure to serialize the exception.

Steps to Reproduce

Install the following dependencies:

pika==1.3.2
opentelemetry-api==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-instrumentation-logging

Create a Python script, call it whatever you'd like. You don't have to spin up a collector or RabbitMQ instance to reproduce the bug.

import os
import pika
import logging

from opentelemetry import _logs as logs
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs.export import ConsoleLogExporter, BatchLogRecordProcessor
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk.resources import Resource


# The function we use to create our LoggingHandler
def create_logger(name: str, console: bool = False) -> LoggingHandler:
    log_provider = LoggerProvider(
        resource=Resource.create(
            {"service.name": name, "service.instance.id": os.uname().nodename}
        )
    )
    logs.set_logger_provider(log_provider)

    # Create OTLP collector log exporter
    # Example URL we use in our Docker setup, feel free to change as necessary
    otlp_exporter = OTLPLogExporter(endpoint="http://collector:4317", insecure=True)
    log_provider.add_log_record_processor(BatchLogRecordProcessor(otlp_exporter))

    # Create console log exporter
    if console:
        console_exporter = ConsoleLogExporter()
        log_provider.add_log_record_processor(BatchLogRecordProcessor(console_exporter))

    return LoggingHandler(level=logging.DEBUG, logger_provider=log_provider)


logger = logging.getLogger(__name__)
logger.addHandler(create_logger(__name__, console=True))

try:
    # Don't start a RabbitMQ server for this issue -- it's easiest to reproduce the error
    # when the connection fails!
    connection = pika.BlockingConnection(
        pika.ConnectionParameters(
            host="localhost:5672",
            heartbeat=600,
            blocked_connection_timeout=300,
        )
    )
    channel = connection.channel()
    channel.queue_declare(queue="example_queue", durable=True)
    channel.basic_qos(prefetch_count=1)
except Exception as e:
    logger.error(f"Failed to initialize Pika connection: {str(e)}")
    # This line triggers the issue
    logger.exception(e)

Expected Result

We were thinking that if the serialization doesn't pass, it could either ignore the log entry, or convert the exception to a string, possibly using traceback.format_exc. Instead, it throws an exception; I'm thinking that it could be an issue to discourage people from using logger.exception because OTel can't serialize the exception.

Actual Result

2025-02-03 11:06:58 Exception: Invalid type <class 'pika.exceptions.AMQPConnectionError'> of value 
2025-02-03 11:07:03 Exception while exporting logs.
2025-02-03 11:07:03 Traceback (most recent call last):
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 307, in _export_batch
2025-02-03 11:07:03     self._exporter.export(self._log_records[:idx])  # type: ignore
2025-02-03 11:07:03     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 111, in export
2025-02-03 11:07:03     return self._export(batch)
2025-02-03 11:07:03            ^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 300, in _export
2025-02-03 11:07:03     request=self._translate_data(data),
2025-02-03 11:07:03             ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 108, in _translate_data
2025-02-03 11:07:03     return encode_logs(data)
2025-02-03 11:07:03            ^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 37, in encode_logs
2025-02-03 11:07:03     return ExportLogsServiceRequest(resource_logs=_encode_resource_logs(batch))
2025-02-03 11:07:03                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 72, in _encode_resource_logs
2025-02-03 11:07:03     pb2_log = _encode_log(sdk_log)
2025-02-03 11:07:03               ^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 58, in _encode_log
2025-02-03 11:07:03     body=_encode_value(body) if body is not None else None,
2025-02-03 11:07:03          ^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py", line 90, in _encode_value
2025-02-03 11:07:03     raise Exception(f"Invalid type {type(value)} of value {value}")
2025-02-03 11:07:03 Exception: Invalid type <class 'pika.exceptions.AMQPConnectionError'> of value 

Note that you're more likely to get a socket.gaierror as the invalid type, but the traceback is still identical.

Workaround: inside the except block, use traceback.format_exc() to get a string representation of the traceback, and log that at the error level instead.

Additional context

The example should work just fine; I tested it on my machine. Hopefully it works for you as well.

We also record the exception using a Tracer span, but it has proven easier to extract exceptions from the logs than the tracer using Grafana, so until we figure out how to use Tempo to get exceptions, we've decided to do both logs and traces for capturing exceptions.

Would you like to implement a fix?

I'm not familiar with the inner workings of OTel, or how the specification defines such behavior, so I don't think I would be the right person to make a fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions