Updates to impact yearset module and review of tutorial #1075

ValentinGebhart · 2025-07-17T13:57:20Z

While checking the impact yearsets tutorial, I noticed a few potential improvements of the yearset module. The main conceptual point was that the impact events were sampled without replacement. This seemed to work well if all events had the same frequency but resulted in strange behavior if the frequencies vary a lot (see example in comment below). Also, the frequency of the final impact object was unclear.

Changes proposed in this PR:

When sampling without replacement (default behavior), if the frequency of the different events is different, a warning is issued.
The yearly impact objects now has a frequency of 1/n_years, where n_years is the number of sampled years.
The yearly impact object was a deep copy of the original impact object with a few parameters changed (eg, the impact matrix was transfered, which I think dows not make sense). Now the yearly impact object is a fresh impact object filled with relevant attributes only.
The correction factor is now applied by multiplication and not division (such that the yearly impact aai_agg is the same as the aai_agg of the original impact object).
Minor changes in the tutorial.

PR Author Checklist

PR Reviewer Checklist

ValentinGebhart · 2025-07-17T14:08:16Z

This is an example for which the earlier sampling without replacement was not reasonable. One of the impact events has a significantly larger frequency. Once it was drawn in the sampling, the other events were drawn until no events were left. With replacement, the high-frequency event would be drawn more often.

import numpy as np
import climada.util.yearsets as yearsets
from climada.engine import Impact

imp = Impact()
imp.at_event = np.arange(10, 110, 10)
imp.frequency = np.concatenate([np.array([1.]), np.ones(9) * 0.01]) # highly unbalanced frequencies, first event 100 times more likely

# the number of years to sample impacts for (length(yimp.at_event) = sampled_years)
sampled_years = 10

# sample number of events per sampled year
lam = np.sum(imp.frequency)
events_per_year = yearsets.sample_from_poisson(sampled_years, lam, seed=123)
sampling_vect = yearsets.sample_events(events_per_year, imp.frequency, seed=123)
sampling_vect

Result when setting the optional parameter with_replacement=False, similar to the old version

2025-07-22 16:59:12,925 - climada.util.yearsets - WARNING - The frequencies of the different events are not equal. This can lead to distorted sampling if the frequencies vary significantly. To avoid this, please set with_replacement=True to sample with replacement instead.
[array([0]),
 array([], dtype=int64),
 array([7, 1]),
 array([6, 2]),
 array([8]),
 array([5]),
 array([], dtype=int64),
 array([9]),
 array([], dtype=int64),
 array([], dtype=int64)]

Result in the new version

[array([0]),
 array([], dtype=int64),
 array([0, 0]),
 array([0, 0]),
 array([0]),
 array([1]),
 array([], dtype=int64),
 array([0]),
 array([], dtype=int64),
 array([], dtype=int64)]

chahank

Good catch! I think the implementation goes in the right direction, but I would improve the way it is presented to the user a bit.

Also, the other improvements look good to me. I might look at the code in detail later to be sure it does what it should do if you want.

chahank · 2025-07-21T07:19:13Z

climada/util/yearsets.py

+        if with_replacement is False:
+            LOGGER.warning(
+                "Sampling without replacement not implemented for events with varying "
+                "frequencies. Events are sampled with replacement."
+            )


This is a very confusing way to implement things. Having parameter combinations that work and others which do not is to be avoided if possible.

Also, I am a bit confused here. This is just a warning, while I think it should be an error since you want the code to not run for this combination of input parameters. Silently changing inputs upon calculations is also to be avoided.

But more generally, why raise an error if the frequencies are not equal? Would it not make sense to have a default behaviour (unequal frequencies with replacement, equal frequencies without replacement), and the user can overwrite this if they want? Or even more simply, always sample with replacement?

Thanks for the feedback :)

I see your point. Actually, I also thought the easiest way is just always to sample with replacement but maybe there was a reason to implement sampling without replacement that I do not realize.

I suggest this implementation:

default: sample with replacement

Optional parameter with_replacement can be set to False to sample without replacement. If frequencies of the events are not equal, we raise an error.

What do you think?

Optional parameter with_replacement can be set to False to sample without replacement. If frequencies of the events are not equal, we raise an error.

Would you say that it is always wrong to do with replacement when the frequencies are unequal? I thought to understand that it is only a problem in certain cases. So maybe rather raise a warning and explain the potential problem?

Hm good question. I think strictly speaking, sampling with replacement (how it is implemented) is always wrong if the frequencies are unequal. However I guess the error is small if the frequencies are not super different, and only becomes significant in cases like the above.
For me it would be okay to just raise a warning, as you suggest. Should I implement that?

Sounds like a good compromise. Anyway, the default will be with replacement, and thus most users won't have problems.

I now changed the default to sample with replacement, and issue a warning when sampling without replacement for unequal event frequencies.
I also changed the default to not applying the correction factor, such that the default is that the yearly impacts are actual samples of the original impact object.

Great! Just so that we do not introduce a confusing default change, can @carmensteinmann please confirm that setting the correction factor application to false by default is a good idea?

chahank · 2025-07-23T07:29:07Z

climada/util/yearsets.py


 def impact_yearset_from_sampling_vect(
-    imp, sampled_years, sampling_vect, correction_fac=True
+    imp, sampled_years, sampling_vect, correction_fac=False


If we keep it as is (pending confirmation by @carmensteinmann), we should adjust the docstring. I unfortunately cannot suggest a change here directly in the comment.

chahank · 2025-07-23T07:31:00Z

So that we do not forget, the changelog needs to be updated once done.

chahank · 2025-09-10T14:05:17Z

climada/util/yearsets.py

+        event_id=np.arange(1, len(sampled_years) + 1),
+        unit=imp.unit,
+        haz_type=imp.haz_type,
+        frequency=np.ones(len(sampled_years)) / len(sampled_years),


Suggested change

frequency=np.ones(len(sampled_years)) / len(sampled_years),

frequency=np.ones(len(sampled_years)) / len(sampled_years),

frequency_unit='1/year'

aai_agg=np.sum(yimp.frequency * yimp.at_event)

And eai_exp is not defined?

to your suggestion: I think yimp is not yet defined, so I cannot use it in its own initiation (when you define aai_agg), right? I could define it in the initialtion using aai_agg=np.sum(imp_per_year )/len(sampled_years), would that be better? Then the frequency would be hard coded to be constant though

to eai_exp: I believe the impact year set does not have spatial information anymore: they build upon imp.at_event of the original impact object, so it is aggregated over the locations. We could add eai_exp with one entry equal to the aai_agg, if you think that would be useful?

aaa correct. Yes, just use the frequency definition instead, so

Suggested change

frequency=np.ones(len(sampled_years)) / len(sampled_years),

frequency=frequency,

aai_agg = np.sum(frequency * imp_per_year)

and somewhere above frequenvy=np.ones(len(sampled_years)) / len(sampled_years)

In the previous implementation, eai_exp remains whatever was in imp.eai_exp. Not that this was a good idea necessarily, but the current implementation changes the output.

yes, in the previous implementation, yimp was simply a copy of the original impact object, and just a few attributes were changed. Eg, also the impact matrix was transferred if the original impact object had one. I think this results in a quite confusing output object (kind of hybrid between original events and sampled yearly impacts). That is why I changed it such that yimp only has attributes that correspond to the yearly impacts, and yes, that output can look quite different to before.

Does it make sense to change the output in this way? Or are you saying that it is problematic to change the output?

or are you saying that it would be valuable to additionally tranfer just imp.eai_exp as there is still some spatial info in here? even tough the numbers of imp.eai_exp would then not add up to imp.agg_aai if we do not use the correction factor.

I see, it is much clearer now without carrying over the arguments from before, I agree. Maybe mention it in the Changelog and we are good.

The eai_exp should be computable from the impact matrix and the sampling vector right? But this would probably be a separate PR.

Perfect, I'll adapt the changelog.

Yes, using the original impact matrix and sampling vector one could compute eai_exp of the sampled impacts. Still, the yimp does not have any location information anymore (e.g. centroids), so maybe it would also be confusing to save the eai_exp in the yimp object.
A different way would be to compute a spatially explicit yearly impact set, i.e., where the sampled yearly impacts are saved per centroid (and not aggreagted). There the eai_exp would make a lot of sense. If we think this would be interesting, we can do it in a separate PR :)

I think this would be interesting in the context of the multi-hazard impact module #1077 . I would suggest focusing on the multi-hazard impact module instead of tweaking the existing yearset util methods.

chahank · 2025-09-10T14:07:29Z

climada/util/yearsets.py

+        frequency=np.ones(len(sampled_years)) / len(sampled_years),
    )

+    yimp.aai_agg = np.sum(yimp.frequency * yimp.at_event)


Suggested change

yimp.aai_agg = np.sum(yimp.frequency * yimp.at_event)

chahank · 2025-09-10T14:14:50Z

Small cosmetic suggestion and a question regarding the change in the eai_exp value compared to the previous implementation.

Once resolved this last question, good to go!

ValentinGebhart · 2025-09-19T13:48:46Z

After discussion with Carmen, we decided to keep the default parameters as before:

sampling WITHOUT replacement: better inclusion of all original events in the sampling, hoping that situations with very unbalanced frequencies of events are "uncommon"
applying the correction factor: aai of yearset and original imp object are equal, even though sampled impacts have been changed.

I adapted the docstrings, tutorial and changelog accordingly.

ValentinGebhart added 4 commits July 17, 2025 11:50

add optinal replacement parameter to sample year sets

948bfc2

simplify functions and adapt frequency of yearly impacts

0e30806

adapt returned Impact object

eda57ca

update the tutorial

596111c

ValentinGebhart requested review from chahank, emanuel-schmid and peanutfun as code owners July 17, 2025 13:57

ValentinGebhart requested review from carmensteinmann and removed request for emanuel-schmid and peanutfun July 17, 2025 14:08

chahank requested changes Jul 21, 2025

View reviewed changes

ValentinGebhart added 3 commits July 22, 2025 17:11

changed default to sample with replacement and adapt tutorial

d0c1819

adapt parameter in test (to sample without replacement)

16b0ec9

return to old sampling when sampling without replacement

5f41cd3

chahank reviewed Jul 23, 2025

View reviewed changes

updated changelog and docstrings

399715a

chahank reviewed Sep 10, 2025

View reviewed changes

ValentinGebhart added 3 commits September 11, 2025 10:24

implement Chahan's review

1a0eadb

Merge branch 'develop' into feature/review_yearsets_tutorial

4f86e46

adapt to new defaults

27349f3

ValentinGebhart merged commit 16a5ca7 into develop Sep 23, 2025
17 of 19 checks passed

ValentinGebhart deleted the feature/review_yearsets_tutorial branch September 23, 2025 07:52

	frequency=np.ones(len(sampled_years)) / len(sampled_years),
	frequency=frequency,
	aai_agg = np.sum(frequency * imp_per_year)

Updates to impact yearset module and review of tutorial #1075

Updates to impact yearset module and review of tutorial #1075

Uh oh!

Conversation

ValentinGebhart commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Author Checklist

PR Reviewer Checklist

Uh oh!

ValentinGebhart commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chahank left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chahank commented Jul 23, 2025

Uh oh!

chahank Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValentinGebhart Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chahank commented Sep 10, 2025

Uh oh!

ValentinGebhart commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ValentinGebhart commented Jul 17, 2025 •

edited

Loading

ValentinGebhart commented Jul 17, 2025 •

edited

Loading

chahank Sep 10, 2025 •

edited

Loading

ValentinGebhart Sep 11, 2025 •

edited

Loading