A Python package to solve, simulate and estimate separable matching models
- Free software: MIT license
- Documentation: https://bsalanie.github.io/cupid_matching
- See also: An interactive Streamlit app
pip install [-U] cupid_matching
The package relies on utilities from bs_python_utils; installing with pip pulls it in automatically, but you will need it available locally when running the test suite.
For instance:
from cupid_matching.min_distance import estimate_semilinear_mdeThe following only describes the general ideas. See here for more background and the API reference on this site for the technical documentation.
The cupid_matching package has code
- to solve for the stable matching using our Iterative Projection Fitting Procedure (IPFP) in variants of the model of bipartite, one-to-one matching with perfectly transferable utility. It has IPFP solvers for variants of the Choo and Siow 2006 model with or without singles, homoskedastic and heteroskedastic; and also for a class of nested logit models
- to estimate the parameters of separable models with semilinear surplus and entropy using a minimum distance estimator
- to estimate the parameters of semilinear Choo and Siow models using a Poisson GLM estimator
- for a Streamlit interactive app that demonstrates solving and estimating the Choo and Siow model using the
cupid_matchingpackage. You can try it here.
Incidentally, my ipfp_R Github repository contains R code to solve for equilibrium in (only) the basic version of the Choo and Siow model.
The package builds on the pioneering work of Choo and Siow JPE 2006 and on my work with Alfred Galichon, especially our REStud 2022 paper and this working paper.
At this stage, it only deals with bipartite models. As the heterosexual marriage market is a leading example, I will refer to the two sides as men and women. Each man
The primitives of this class of matching models are
- the margins: the numbers
$n_x$ of men of type$x=1,\ldots,X$ and the numbers$m_y$ of women of type$y=1,\ldots,Y$ - the joint surplus created by the match of a man
$m$ of type$x$ and a woman$w$ of type$y$ . We assume separability: this joint surplus takes the form $$ \Phi_{xy}+\varepsilon_{my} +\eta_{xw}. $$ I will only describe here the case when the data also has singles, i.e., when$x=y=0$ is a possible type. The fuller documentation explains how to deal with the case when singles are not observed.
A single man has utility
The modeler chooses the distributions of the vectors
We denote
The total numbers must add up to the margins:
$$
\sum_{y=1}^Y \mu_{xy}+\mu_{x0}=n_x ; \text{ and } ;
\sum_{x=1}^X \mu_{xy}+\mu_{0y}=m_y.
$$
The total number of individuals is
Galichon-Salanié (REStud 2022) shows that in large markets, if the vectors
The files choo_siow.py, choo_siow_gender_heteroskedastic, choo_siow_heteroskedastic, and nested_logit provide EntropyFunctions objects that compute the generalized entropy and at least its first derivative for, respectively,
- the original Choo and Siow 2006 model, in which the
$\varepsilon$ and$\eta$ terms are iid draws from a type I extreme value distribution - the same model, without singles (to be used when only couples are observed)
- an extension of 1. that allows for a scale parameter
$\tau$ for the distribution of$\eta$ - an extension of 3. that has type-dependent scale parameters
$\sigma_x$ and$\tau_y$ (with$\sigma_1=1$ - a two-layer nested logit model in which singles (type 0) are in their own nest and the user chooses the structure of the other nests.
Users of the package are welcome to code EntropyFunctions objects for different distributions of the unobserved heterogeneity terms.
Given any joint surplus matrix
For the five classes of models above, this can be done efficiently using the IPFP algorithm in Galichon-Salanié (REStud 2022) , which is coded in ipfp_solvers.py for the four Choo and Siow variants and in model_classes.py for the nested logit.
Here is an example, given a Numpy array
import numpy as np
from cupid_matching.ipfp_solvers import ipfp_gender_homoskedastic solver
solution = ipfp_gender_heteroskedastic_solver(Phi, n, m, tau)
mus, error_x, error_y = solution
muxy = mus.muxyThe mus above is an instance of a Matching object (defined in matching_utils.py). mus.muxy has the number of couples by mus.mux0 and mus.mu0y contain the numbers of single men and women of each type.
The vectors error_x and error_y are estimates of the precision of the solution (see the code in ipfp_solvers.py).
Given observed matching patterns
The package provides two estimators, which are described extensively in this paper:
- the minimum distance estimator in
min_distance.py - the Poisson estimator in
poisson_glm.py, which only applies to the Choo and Siow homoskedastic model.
At this stage cupid_matching only allows for linear models of the joint surplus:
where the basis functions
The minimum distance estimator works as follows, given
- an observed matching stored in a
Matchingobjectmus - an
EntropyFunctionobjectentropy_modelthat allows forpparameters in$\alpha$ - an
$(X,Y,K)$ Numpy array of basis functionsphi_bases:
mde_results = estimate_semilinear_mde(
mus, phi_bases, entropy_model)
mde_results.print_results(n_alpha=p)The mde_results object contains the estimated
The Poisson-GLM estimator of the Choo and Siow homoskedastic model takes as input the obserrved matching and the basis functions, and returns the estimated
poisson_results = choo_siow_poisson_glm(mus_sim, phi_bases)
_, mux0_sim, mu0y_sim, n_sim, m_sim = mus_sim.unpack()
poisson_results.print_results()The poisson_results object contains the estimated
The following can be found in the examples folder of the package:
- example_choosiow.py shows how to run minimum distance and Poisson estimators on a Choo and Siow homoskedastic model.
- example_choosiow_no_singles.py does the same for a model without singles.
- example_nested_logit.py shows how to run minimum distance estimators on a two-layer nested logit model.
- many of these models (including all variants of Choo and Siow) rely heavily on logarithms and exponentials. It is easy to generate examples where numeric instability sets in.
- as a consequence, the
numericversions of the minimum distance estimator (which use numerical derivatives) are not recommended. - the bias-corrected minimum distance estimator (
corrected) may have a larger mean-squared error and/or introduce numerical instabilities. - the estimated variance of the estimators assumes that the observed matching was sampled at the household level, and that sampling weights are all equal.
- switched to
uvfor package management. - included an
examplesdirectory.
- added
CupidMatchingDoc.pdfon Github, with detailed explanations of the methods.
- incorporates models without singles for both MDE and Poisson; example in
example_choo_siow_no_singles.py.
- fixed URL of Streamlit app.
- improved the Streamlit app, now in two files:
cupid_streamlit.pyandcupid_streamlit_utils.py.
- improved documentation
- the package now relies on my utilities package
bs_python_utils. TheVarianceMatchingclass inmatching_utils,pyis new; this should be transparent for the user.
- deleted spurious print statement.
- fixed error in bias-correction term.
- corrected typo.
- simplified the bias-correction for the minimum distance estimator in the Choo and Siow homoskedastic model.
- added an optional bias-correction for the minimum distance estimator in the Choo and Siow homoskedastic model, to help with cases when the matching patterns vary a lot across cells.
- added two complete examples:
example_choosiow.pyandexample_nestedlogit.py.