You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A wishlist for probabilistic regression methods to implement or interface.
This is partly copied from the list I made when designing the R counterpart mlr-org/mlr3proba#32 .
Number of stars at the end is estimated difficulty or time investment.
GLM
generalized linear model(s) with continuous regression link, e.g., Gaussian *
Gaussian link, statsmodels
further regression links: Gamma, Tweedie, inverse Gaussian
generalized linear model(s) with count link, e.g., Poisson *
Poisson link, statsmodels
Poisson link, sklearn
further links: Binomial
heteroscedastic linear regression ***
Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***
KRR aka Gaussian process regression
vanilla kernel ridge regression with fixed kernel parameters and variance *
kernel ridge regression with MLE for kernel parameters and regularization parameter **
heteroscedastic KRR or Gaussian processes ***
CDE
variants of conditional density estimation (Nadaraya-Watson type) **
reduction to density estimation by binning of input variables, then apply unconditional density estimation **
Gradient boosting and tree-based
ngboost package interface *
probabilistic residual boosting **
probabilistic regression trees **
Neural networks
interface tensorflow probability - some hard-coded NN architectures **
generic tensorflow probability interface - some hard-coded NN architectures ***
Composite techniques, reduction to deterministic regression
stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
reduction via bootstrapped sampling of a determinstic regressor **
Ensembling type pipeline elements and compositors
simple bagging, averaging of pdf/cdf **
probabilistic boosting ***
probabilistic stacking ***
baselines
always predict a Gaussian with mean = training mean, var = training var *
unconditional densities via distfit package, interface *
IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **
Other reduction from/to probabilistic regression
reducing deterministic regression to probabilistic regression - take mean, median or mode **
reduction(s) to quantile regression, use predictive quantiles to make a distr ***
reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
reduction to survival, as the sub-case of no censoring **
A wishlist for probabilistic regression methods to implement or interface.
This is partly copied from the list I made when designing the R counterpart mlr-org/mlr3proba#32 .
Number of stars at the end is estimated difficulty or time investment.
GLM
statsmodelsstatsmodelssklearnKRR aka Gaussian process regression
CDE
Gradient boosting and tree-based
Neural networks
Bayesian toolboxes
Pipeline elements for target transformation
Composite techniques, reduction to deterministic regression
Ensembling type pipeline elements and compositors
baselines
distfitpackage, interface *Other reduction from/to probabilistic regression