Skip to content

[ENH] roadmap of probabilistic regressors to implement or to interface  #7

@fkiraly

Description

@fkiraly

A wishlist for probabilistic regression methods to implement or interface.
This is partly copied from the list I made when designing the R counterpart mlr-org/mlr3proba#32 .
Number of stars at the end is estimated difficulty or time investment.

GLM

  • generalized linear model(s) with continuous regression link, e.g., Gaussian *
    • Gaussian link, statsmodels
    • further regression links: Gamma, Tweedie, inverse Gaussian
  • generalized linear model(s) with count link, e.g., Poisson *
    • Poisson link, statsmodels
    • Poisson link, sklearn
    • further links: Binomial
  • heteroscedastic linear regression ***
  • Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

KRR aka Gaussian process regression

  • vanilla kernel ridge regression with fixed kernel parameters and variance *
  • kernel ridge regression with MLE for kernel parameters and regularization parameter **
  • heteroscedastic KRR or Gaussian processes ***

CDE

  • variants of conditional density estimation (Nadaraya-Watson type) **
  • reduction to density estimation by binning of input variables, then apply unconditional density estimation **

Gradient boosting and tree-based

  • ngboost package interface *
  • probabilistic residual boosting **
  • probabilistic regression trees **

Neural networks

  • interface tensorflow probability - some hard-coded NN architectures **
  • generic tensorflow probability interface - some hard-coded NN architectures ***

Bayesian toolboxes

  • generic pymc3 interface ***
  • generic pyro interface ****
  • generic Stan interface ****
  • generic JAGS interface ****
  • generic BUGS interface ****
  • generic Bayesian interface - prior-valued hyperparameters *****

Pipeline elements for target transformation

  • distr fixed target transformation **
  • distr predictive target calibration **

Composite techniques, reduction to deterministic regression

  • stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
  • use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
  • upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
  • generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
  • reduction via bootstrapped sampling of a determinstic regressor **

Ensembling type pipeline elements and compositors

  • simple bagging, averaging of pdf/cdf **
  • probabilistic boosting ***
  • probabilistic stacking ***

baselines

  • always predict a Gaussian with mean = training mean, var = training var *
  • unconditional densities via distfit package, interface *
  • IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
  • IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

Other reduction from/to probabilistic regression

  • reducing deterministic regression to probabilistic regression - take mean, median or mode **
  • reduction(s) to quantile regression, use predictive quantiles to make a distr ***
  • reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
  • reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
  • reduction to survival, as the sub-case of no censoring **
  • reduction to classification, by binning ***

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomersimplementing algorithmsImplementing algorithms, estimators, objects native to skprointerfacing algorithmsInterfacing existing algorithms/estimators from third party packagesmodule:regressionprobabilistic regression module

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions