Skip to content

[ENH] Add variance_factor to DummyProbaRegressor#860

Open
hrshx3o5o6 wants to merge 2 commits into
sktime:mainfrom
hrshx3o5o6:feature/scaled-variance-dummy
Open

[ENH] Add variance_factor to DummyProbaRegressor#860
hrshx3o5o6 wants to merge 2 commits into
sktime:mainfrom
hrshx3o5o6:feature/scaled-variance-dummy

Conversation

@hrshx3o5o6

Copy link
Copy Markdown

This PR adds a variance_factor option to the DummyProbaRegressor when the underlying strategy is normal.

Previously, the normal strategy correctly predicted a Normal distribution with the empirical standard deviation. However, probabilistic modeling often requires structural baselines where the prediction uncertainty must be explicitly hedged or aggressively reduced. This enhancement allows practitioners to natively scale the standard deviation to predict structural baselines where the variance is equal to $\sigma_{training}^2 \times \text{factor}$. I ran validations locallyt and it both passed.

Resolves #7 , implementing a still unchecked baseline titled "always predict a Gaussian with mean = training mean, var = training var".

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you give an example in which this would be useful?

@hrshx3o5o6

Copy link
Copy Markdown
Author

Thanks. Can you give an example in which this would be useful?

The main use case is creating tunable, conservative probabilistic baselines. In real-world applications (finance, supply chain, forecasting), future data is often more volatile than training data. The standard DummyProbaRegressor can be unrealistically confident.

By setting variance_factor > 1, users can easily construct a baseline with wider predictoin levels. This provides a much stricter benchmark to beat for uncertainty metrics like CRPS or Log-Likelihood than a naive, unscaled dummy model.

@fkiraly

fkiraly commented Mar 14, 2026

Copy link
Copy Markdown
Collaborator

How would you set the variance_factor in practice?

@hrshx3o5o6

Copy link
Copy Markdown
Author

How would you set the variance_factor in practice?

In practice, the value of the variance_factor could be determined in these ways:

  • Hyperparameter tuning: suppose we are stacking models or using a fallback regressor; you can use GridSearchCV to find an optimal variance factor.

  • Domain knowledge / heuristics: In specific domains like finance, for example, practitioners might have a volatility multiplier in mind (e.g., "sales might be volatile by 20% this quarter"). Hence, ensuring that domain knowledge is injected natively into the baseline.

  • Calibration from residual statistics: lets say standard dummy data only covers 80% of the test set but the target requires 95% percent coverage, this can be taken care of by manually tuning up the variance_factor until the baseline achieves emperical coverage.

@hrshx3o5o6

Copy link
Copy Markdown
Author

@fkiraly just checking in to see if there is any update on this?

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] roadmap of probabilistic regressors to implement or to interface

2 participants