Description
While working on #3756 and #3867, as well as a variety of early-stopping related things (like #6424), I've found the heavy use of positional indexing and tuple unpacking in early stopping and related code make it really difficult to understand and modify.
For example:
https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/callback.py#L167-L176
and:
https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/callback.py#L410-L412
I'm opening this issue to track some work I'd like to do to simplify that.
Benefits of this work
Reduces the effort required to finish these:
And to add finer-grained control over early stopping and validation, e.g.:
Approach
I'm planning a series of PRs with the following types of changes:
- unpacking tuples into named variables and using those named variables
- introducing
collections.namedtuple (docs) where appropriate to allow for named-attribute access while preserving backwards compatibility with the significant amount of custom code in the world relying on lightgbm expecting tuples for this like custom metrics:
https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/engine.py#L135-L138
Notes
Looking at the current state of this code, it's important to remember that when the lightgbm Python package was first introduced 8+ years ago:
Description
While working on #3756 and #3867, as well as a variety of early-stopping related things (like #6424), I've found the heavy use of positional indexing and tuple unpacking in early stopping and related code make it really difficult to understand and modify.
For example:
https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/callback.py#L167-L176
and:
https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/callback.py#L410-L412
I'm opening this issue to track some work I'd like to do to simplify that.
Benefits of this work
Reduces the effort required to finish these:
And to add finer-grained control over early stopping and validation, e.g.:
Approach
I'm planning a series of PRs with the following types of changes:
collections.namedtuple(docs) where appropriate to allow for named-attribute access while preserving backwards compatibility with the significant amount of custom code in the world relying onlightgbmexpecting tuples for this like custom metrics:https://github.com/microsoft/LightGBM/blob/53e0ddf7cd6eb281e3bec6273b19ff541c69bfa6/python-package/lightgbm/engine.py#L135-L138
Notes
Looking at the current state of this code, it's important to remember that when the
lightgbmPython package was first introduced 8+ years ago:dataclasseswas not in the standard library yet (that came in Python 3.7, first release in June 2018)