Skip to content

feat: improve error handling#140

Merged
NguyenHoangSon96 merged 10 commits intomainfrom
feat/improve-error-handling
May 27, 2025
Merged

feat: improve error handling#140
NguyenHoangSon96 merged 10 commits intomainfrom
feat/improve-error-handling

Conversation

@NguyenHoangSon96
Copy link
Contributor

@NguyenHoangSon96 NguyenHoangSon96 commented May 21, 2025

Closes #134

Proposed Changes

  • Query api will throws InfluxdbClientQueryError when catching ArrowException exceptions from gRPC servers
  • I want to create InfluxdbClientWriteError but it will break backwards compatibility for the write api If users already use InfluxDBError class
  • Some try catch blocks quite useless If we just catch generic exceptions like Exception class then throws - I think

Checklist

  • CHANGELOG.md updated
  • Rebased/mergeable
  • A test has been added if appropriate
  • Tests pass
  • Commit messages are conventional
  • Sign CLA (if not already signed)

@NguyenHoangSon96 NguyenHoangSon96 self-assigned this May 21, 2025
@NguyenHoangSon96 NguyenHoangSon96 linked an issue May 21, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented May 21, 2025

Codecov Report

Attention: Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.43%. Comparing base (cd55bcd) to head (e9b7ce4).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...write_client/client/util/multiprocessing_helper.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #140      +/-   ##
==========================================
- Coverage   64.70%   63.43%   -1.27%     
==========================================
  Files          33       34       +1     
  Lines        2207     2210       +3     
==========================================
- Hits         1428     1402      -26     
- Misses        779      808      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at #134 I think the new classes InfluxdDBClientError and InfluxdbClientQueryError are a good start, but oclyke is looking for a taxonomy of more specific error definitions. Also class names could include the version 3 value to make sure they are distinct from other influxdb libraries.

Compare this to the Arrow Flight taxonomy of error classes. (https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightError.html).

Firstly from oclyke's comment I think the user does not want to see these Arrow Flight errors or to deal with them.

Secondly also from oclyke's comment I think the user wants to see an error class that is self documenting or self explanatory so that it is readily understood.

Thirdly we could implement a taxonomy similar to that of Arrow, which would wrap specific Arrow exceptions and their contents.

e.g.

class InfluxDB3ClientQueryServerError(InfluxDB3ClientError):
...
class InfluxDB3ClientQueryServerUnkownDBError(InfluxDB3ClientQueryServerError):
...
class InfluxDB3ClientQueryUnauthenticatedError(InfluxDB3ClientError):
...
class InfluxDB3ClientQueryUnavailableError(InfluxDB3ClientError):
...
etc.

We could then maybe have an ArrowErrorHandler class with a static method to map caught Arrow errors to classes in our error taxonomy

Copy link
Contributor

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. An item for a more finely defined error taxonomy across all client libraries has been opened in the backlog. So this PR represents only a partial solution to #140
  2. 1 older unit test was failing locally, so I've updated it. You may want to review or revert these changes.
  3. Some refactoring comments
    1. Since there already exists an InfluxDBError class in the write_client package, I think this needs to be taken into account.
    2. Since InfluxdbClientQueryError only deals with query errors, it could be declared in the query package
    3. another alternative might be to declare all exceptions/errors in a new exceptions package.

BTW. When running unit tests locally, even though all tests now pass, once they are finished, I keep running across a threading issue associated with urllib3 after the tests complete. I don't recall seeing this in the past. Needs further investigation.

...
RuntimeError: can't create new thread at interpreter shutdown
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x71b678a54b90>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x71b678a57e00>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
...

@NguyenHoangSon96
Copy link
Contributor Author

  1. An item for a more finely defined error taxonomy across all client libraries has been opened in the backlog. So this PR represents only a partial solution to feat: improve error handling #140

  2. 1 older unit test was failing locally, so I've updated it. You may want to review or revert these changes.

  3. Some refactoring comments

    1. Since there already exists an InfluxDBError class in the write_client package, I think this needs to be taken into account.
    2. Since InfluxdbClientQueryError only deals with query errors, it could be declared in the query package
    3. another alternative might be to declare all exceptions/errors in a new exceptions package.

BTW. When running unit tests locally, even though all tests now pass, once they are finished, I keep running across a threading issue associated with urllib3 after the tests complete. I don't recall seeing this in the past. Needs further investigation.

...
RuntimeError: can't create new thread at interpreter shutdown
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x71b678a54b90>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x71b678a57e00>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
...

Hi @karel-rehor
I have moved all exceptions to the exception package
Regarding the InfluxDBError class I don't want to touch it because of backward compatibility
And I changed all exception names to include the number 3

Copy link
Contributor

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package exceptions could be improved with an import statement in __init__.py.

It's best to avoid, whenever possible, imports that repeat the same token twice, like...

from influxdb_client_3.exceptions.exceptions import InfluxDB3ClientQueryError

Copy link
Contributor

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I located the source of the problem with urllib3 connection attempts cycling in _write_batching, so I updated this test with a forced disposal.

RuntimeError: can't create new thread at interpreter shutdown
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x79d5a779a780>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x79d5a7799520>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x79d5a779a630>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The retriable error occurred during request. Reason: '<urllib3.connection.HTTPSConnection object at 0x79d5a77986b0>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)'.
The batch item wasn't processed successfully because: HTTPSConnectionPool(host='none', port=443): Max retries exceeded with url: /api/v2/write?org=org&bucket=bucket&precision=ns (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x79d5a77aa7e0>: Failed to resolve 'none' ([Errno -3] Temporary failure in name resolution)"))

Test pipeline now takes 2 minutes instead of 5 minutes.

I see code coverage is missing something now in write/retry.py. This is likely a side-effect of this change. Retry was being called after the test_write_api_custom_options_no_error() test had completed. Execution of these code blocks in retry.py was inadvertent, because the test is supposed to be testing only options. A specific retry test for these dropped blocks could be added. But that now goes too far beyond the scope of this PR.

@NguyenHoangSon96
Copy link
Contributor Author

@karel-rehor Thank you 👨‍🔬

@NguyenHoangSon96 NguyenHoangSon96 merged commit 39bbea9 into main May 27, 2025
13 of 14 checks passed
@NguyenHoangSon96 NguyenHoangSon96 deleted the feat/improve-error-handling branch May 27, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Client exception handling improvements

2 participants