Take into account queue length in autoscaling#5684
Conversation
| try: | ||
| self._run() | ||
| except Exception: | ||
| logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
If you don't do this, the error message is lost forever trying to terminate nodes.
There was a problem hiding this comment.
Does this mean nodes probably weren't cleaned up? If so, would be good to print that in the error message.
There was a problem hiding this comment.
That's done below actually
3422e4a to
512d70c
Compare
|
Test PASSed. |
|
Test PASSed. |
| def testUpdate(self): | ||
| lm = LoadMetrics() | ||
| lm.update("1.1.1.1", {"CPU": 2}, {"CPU": 1}) | ||
| lm.update("1.1.1.1", {"CPU": 2}, {"CPU": 1}, {}) |
There was a problem hiding this comment.
Add some test cases for the new metric
There was a problem hiding this comment.
Yeah there are a couple entries below in testLoadMessages
| try: | ||
| self._run() | ||
| except Exception: | ||
| logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
Does this mean nodes probably weren't cleaned up? If so, would be good to print that in the error message.
| try: | ||
| self._run() | ||
| except Exception: | ||
| logger.exception("Error in monitor loop") |
There was a problem hiding this comment.
| logger.exception("Error in monitor loop") | |
| logger.exception("Error in monitor loop.") |
There was a problem hiding this comment.
I don't believe in periods in log messages
|
Merging so I can test this for real on compiled wheels -- had some issues trying to set it up before. |
|
Test FAILed. |
|
I just tested this on a real cluster and it works as expected. |
Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.