[python-package] stop relying on string concatenation / splitting for cv() eval results by jameslamb · Pull Request #6761 · lightgbm-org/LightGBM

jameslamb · 2024-12-17T05:37:54Z

Contributes to #6748

There are a few activities where lightgbm (the Python package) needs to inspect the output of one or more evaluation metrics on one or more datasets.

For example:

early stopping
printing evaluation results
recording evaluation results in memory (e.g. in a dictionary) for use after training

For train() and other APIs that end up using it, it tracks those using a list of tuples like this (pseudocode):

[
  ({dataset_name}, {metric_name}, {is_higher_better}, {metric_value}).
  ...
]

cv() does something similar. However, its "metric value" is actually a mean of such values taken over all cross-validation folds. Because multiple values are being aggregated, it appends a 5th item with the standard deviation.

[
  ({dataset_name}, {metric_name}, {is_higher_better}, mean({metric_value}), stddev({metric_value}),
  ...
]

Some code in callbacks.py needs to know, given a list of such tuples, whether they were produced by cross-validation or regular train().

To facilitate that while still somewhat preserving the schema for the tuples, the cv() code:

concatenates the first and second elements into 1
appends the string literal "cv_agg" to the beginning of the tuple

So e.g. ("valid1", "auc", ...) becomes ("cv_agg", "valid1 auc", ...). That happens here:

https://github.com/microsoft/LightGBM/blob/480600b3afaf2a0a6f32cf417edf9567f625b2c3/python-package/lightgbm/engine.py#L580-L592

Every place dealing with such tuples then needs to deal with that, including splitting and re-combining that second element. Like this:

https://github.com/microsoft/LightGBM/blob/480600b3afaf2a0a6f32cf417edf9567f625b2c3/python-package/lightgbm/callback.py#L416-L418

This proposes changes to remove that, so that the cv() and train() tuples follow a similar schema and all the complexity of splitting and re-combining names can be removed.

It also standardizes on the names from #6749 (comment)

Notes for Reviewers

This change should be completely backwards-compatible, including with user-provided custom metric function. The code paths here are well-covered by tests (as I found out from many failed tests while developing this 😅 ).

jameslamb · 2024-12-17T05:38:49Z

python-package/lightgbm/engine.py

-            metric_type[key] = one_line[3]
-            cvmap.setdefault(key, [])
-            cvmap[key].append(one_line[2])
-    return [("cv_agg", k, float(np.mean(v)), metric_type[k], float(np.std(v))) for k, v in cvmap.items()]


This, removing this "cv_agg" string literal, is the key change... everything else flows from that.

StrikerRUS

LGTM! Thanks a lot for clear refactoring!
Just some minor suggestions.

python-package/lightgbm/callback.py

python-package/lightgbm/engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

…into python/remove-cv-agg

StrikerRUS

LGTM! Thank you for working on this and for taking my comments into account!

github-actions · 2025-12-25T00:12:48Z

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added 8 commits December 16, 2024 20:35

unpack into named variables

3c97410

still working

3c94663

more simplification

57a080f

more refactoring

599a732

more refactoring

4fce6ad

simplify _is_train_set()

7386897

bit more refactoring

cf12c81

update _using_cv() check

a989f5f

jameslamb added in progress maintenance labels Dec 17, 2024

jameslamb commented Dec 17, 2024

View reviewed changes

formatting

4176327

jameslamb changed the title ~~WIP: [python-package] stop relying on string concatenation / splitting for cv() eval results~~ [python-package] stop relying on string concatenation / splitting for cv() eval results Dec 17, 2024

jameslamb added awaiting review and removed in progress labels Dec 17, 2024

jameslamb marked this pull request as ready for review December 17, 2024 05:56

jameslamb requested review from StrikerRUS, borchero, guolinke, jmoralez and shiyu1994 as code owners December 17, 2024 05:56

StrikerRUS requested changes Dec 17, 2024

View reviewed changes

python-package/lightgbm/callback.py Outdated Show resolved Hide resolved

python-package/lightgbm/callback.py Outdated Show resolved Hide resolved

python-package/lightgbm/engine.py Outdated Show resolved Hide resolved

python-package/lightgbm/engine.py Outdated Show resolved Hide resolved

jameslamb and others added 5 commits December 17, 2024 23:26

Update python-package/lightgbm/engine.py

d116279

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

simplify

5b25aed

merge master

47bd9d7

Merge branch 'python/remove-cv-agg' of github.com:microsoft/LightGBM …

c5a69fd

…into python/remove-cv-agg

ruff auto-formatting

b8220b7

jameslamb requested a review from StrikerRUS December 18, 2024 06:05

StrikerRUS approved these changes Dec 19, 2024

View reviewed changes

jameslamb removed the awaiting review label Dec 22, 2024

jameslamb merged commit 4ee0bc0 into master Dec 22, 2024

jameslamb deleted the python/remove-cv-agg branch December 22, 2024 15:27

ffineis mentioned this pull request Feb 19, 2025

[fix] lgbm 4.6.0 compatibility optuna/optuna-integration#207

Merged

github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] stop relying on string concatenation / splitting for cv() eval results#6761

[python-package] stop relying on string concatenation / splitting for cv() eval results#6761
jameslamb merged 14 commits intomasterfrom
python/remove-cv-agg

jameslamb commented Dec 17, 2024

Uh oh!

jameslamb Dec 17, 2024

Uh oh!

StrikerRUS left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StrikerRUS left a comment

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jameslamb commented Dec 17, 2024

Notes for Reviewers

Uh oh!

jameslamb Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

StrikerRUS left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StrikerRUS left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants