I'm working on an IV example that was previously working in linearmodels but doesn't solve in pyfixest. It's possible that I've just misunderstood the syntax though!
reprex:
In this case, the model will be
$$
\text{Price}_i = \hat{\pi_0} + \hat{\pi_1} \text{SalesTax}_i + v_i
$$
in the first stage regression and
$$
\text{Packs}_i = \hat{\beta_0} + \hat{\beta_2}\widehat{\text{Price}_i} + \hat{\beta_1} \text{RealIncome}_i + u_i
$$
in the second stage.
Data:
import pandas as pd
from linearmodels.iv import IV2SLS
dfiv = pd.read_csv(
"https://vincentarelbundock.github.io/Rdatasets/csv/AER/CigarettesSW.csv",
dtype={"state": "category", "year": "category"},
).assign(
rprice=lambda x: x["price"] / x["cpi"],
rincome=lambda x: x["income"] / x["population"] / x["cpi"],
)
dfiv.head()
linearmodels runs okay:
results_iv2sls = IV2SLS.from_formula(
"np.log(packs) ~ 1 + np.log(rincome) + C(year) + C(state) + [np.log(rprice) ~ taxs]",
df,
).fit(cov_type="clustered", clusters=df["year"])
print(results_iv2sls.summary)
IV-2SLS Estimation Summary
==============================================================================
Dep. Variable: np.log(packs) R-squared: 0.9659
Estimator: IV-2SLS Adj. R-squared: 0.9279
No. Observations: 96 F-statistic: -1.296e+17
Date: Thu, Oct 26 2023 P-value (F-stat) 1.0000
Time: 09:31:50 Distribution: chi2(50)
Cov. Estimator: clustered
Parameter Estimates
===================================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
-----------------------------------------------------------------------------------
Intercept 9.4924 0.0263 360.24 0.0000 9.4407 9.5440
np.log(rincome) 0.4434
C(year)[T.1995] -0.0328
C(state)[T.AR] 0.1770 0.0531 3.3338 0.0009 0.0729 0.2810
C(state)[T.AZ] -0.0899 0.0132 -6.8132 0.0000 -0.1158 -0.0640
C(state)[T.CA] -0.2781 0.0214 -12.996 0.0000 -0.3200 -0.2361
C(state)[T.CO] -0.2479 0.0090 -27.625 0.0000 -0.2655 -0.2303
C(state)[T.CT] -0.0171 0.0196 -0.8720 0.3832 -0.0556 0.0213
C(state)[T.DE] 0.1110 0.0291 3.8105 0.0001 0.0539 0.1682
C(state)[T.FL] 0.0762 0.0142 5.3596 0.0000 0.0483 0.1041
C(state)[T.GA] -0.0695 0.0251 -2.7706 0.0056 -0.1186 -0.0203
C(state)[T.IA] 0.0120 0.0739 0.1629 0.8706 -0.1328 0.1569
C(state)[T.ID] -0.1272 0.0077 -16.597 0.0000 -0.1423 -0.1122
C(state)[T.IL] -0.0339 0.0081 -4.1912 0.0000 -0.0497 -0.0180
C(state)[T.IN] 0.1198 0.0611 1.9609 0.0499 5.573e-05 0.2395
C(state)[T.KS] -0.0910 0.0305 -2.9884 0.0028 -0.1507 -0.0313
C(state)[T.KY] 0.3525 0.0631 5.5906 0.0000 0.2289 0.4761
C(state)[T.LA] 0.1315 0.0104 12.664 0.0000 0.1112 0.1519
C(state)[T.MA] -0.0403 0.0069 -5.8826 0.0000 -0.0538 -0.0269
C(state)[T.MD] -0.2322 0.0239 -9.7376 0.0000 -0.2790 -0.1855
C(state)[T.ME] 0.2008 0.0574 3.5011 0.0005 0.0884 0.3133
C(state)[T.MI] 0.1268 0.0745 1.7009 0.0890 -0.0193 0.2728
C(state)[T.MN] 0.0568 0.0490 1.1595 0.2463 -0.0392 0.1529
C(state)[T.MO] 0.0640 0.0476 1.3454 0.1785 -0.0292 0.1572
C(state)[T.MS] 0.1501 0.0272 5.5267 0.0000 0.0969 0.2034
C(state)[T.MT] -0.1522 0.0054 -28.250 0.0000 -0.1627 -0.1416
C(state)[T.NC] 0.0396 0.0191 2.0655 0.0389 0.0020 0.0771
C(state)[T.ND] -0.0311 0.0399 -0.7787 0.4361 -0.1092 0.0471
C(state)[T.NE] -0.0741 0.0375 -1.9765 0.0481 -0.1476 -0.0006
C(state)[T.NH] 0.3504 0.0315 11.114 0.0000 0.2886 0.4122
C(state)[T.NJ] -0.0873 6.107e-05 -1429.3 0.0000 -0.0874 -0.0872
C(state)[T.NM] -0.2858 0.0040 -71.049 0.0000 -0.2937 -0.2779
C(state)[T.NV] 0.1789 0.0259 6.9075 0.0000 0.1281 0.2296
C(state)[T.NY] -0.0719 0.0032 -22.256 0.0000 -0.0782 -0.0655
C(state)[T.OH] 0.0325 0.0402 0.8088 0.4186 -0.0463 0.1114
C(state)[T.OK] 0.0946 0.0538 1.7572 0.0789 -0.0109 0.2000
C(state)[T.OR] -0.0153 0.0673 -0.2269 0.8205 -0.1471 0.1166
C(state)[T.PA] -0.0031 0.0006 -4.8401 0.0000 -0.0044 -0.0019
C(state)[T.RI] 0.1394 0.0921 1.5136 0.1301 -0.0411 0.3200
C(state)[T.SC] -0.0212 0.0334 -0.6345 0.5257 -0.0866 0.0442
C(state)[T.SD] -0.0675 0.0711 -0.9488 0.3427 -0.2069 0.0719
C(state)[T.TN] 0.1473 0.0470 3.1340 0.0017 0.0552 0.2394
C(state)[T.TX] -0.0579 0.0136 -4.2560 0.0000 -0.0845 -0.0312
C(state)[T.UT] -0.4899 0.0276 -17.776 0.0000 -0.5440 -0.4359
C(state)[T.VA] -0.0559 0.0471 -1.1875 0.2350 -0.1482 0.0364
C(state)[T.VT] 0.2209 0.0467 4.7267 0.0000 0.1293 0.3125
C(state)[T.WA] 0.0064 0.0011 6.0151 0.0000 0.0043 0.0085
C(state)[T.WI] 0.0741 0.0590 1.2569 0.2088 -0.0415 0.1897
C(state)[T.WV] 0.1576 0.0582 2.7097 0.0067 0.0436 0.2716
C(state)[T.WY] -0.0169 0.0590 -0.2858 0.7750 -0.1325 0.0988
np.log(rprice) -1.2793
===================================================================================
Endogenous: np.log(rprice)
Instruments: taxs
Clustered Covariance (One-Way)
Debiased: False
Num Clusters: 2
pyfixest produces an "UnderDeterminedIVError". Code:
results_iv = feols("np.log(packs) ~ 1 + np.log(rincome) | C(year) + C(state) | np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
results_iv.summary()
---------------------------------------------------------------------------
UnderDeterminedIVError Traceback (most recent call last)
[/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb) Cell 67 line 1
----> [1](vscode-notebook-cell:/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb#Y340sZmlsZQ%3D%3D?line=0) results_iv = feols("np.log(packs) ~ 1 + np.log(rincome) + C(year) + C(state) | np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
[2](vscode-notebook-cell:/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb#Y340sZmlsZQ%3D%3D?line=1) results_iv.summary()
File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/estimation.py:129](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/estimation.py:129), in feols(fml, data, vcov, ssc, fixef_rm, collin_tol)
126 _estimation_input_checks(fml, data, vcov, ssc, fixef_rm, collin_tol)
128 fixest = FixestMulti(data=data)
--> 129 fixest._prepare_estimation("feols", fml, vcov, ssc, fixef_rm)
131 # demean all models: based on fixed effects x split x missing value combinations
132 fixest._estimate_all_models(vcov, fixest._fixef_keys, collin_tol=collin_tol)
File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FixestMulti.py:85](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FixestMulti.py:85), in FixestMulti._prepare_estimation(self, estimation, fml, vcov, ssc, fixef_rm)
82 self._fixef_keys = None
83 self._is_multiple_estimation = None
---> 85 fxst_fml = FixestFormulaParser(fml)
86 fxst_fml.get_fml_dict() # fxst_fml._fml_dict might look like this: {'0': {'Y': ['Y~X1'], 'Y2': ['Y2~X1']}}. Hence {FE: {DEPVAR: [FMLS]}}
87 if fxst_fml._is_iv:
File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FormulaParser.py:100](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FormulaParser.py:100), in FixestFormulaParser.__init__(self, fml)
98 if endogvars is not None:
99 if len(endogvars) > len(instruments):
--> 100 raise UnderDeterminedIVError(
101 "The IV system is underdetermined. Only fully determined systems are allowed. Please provide as many instruments as endogenous variables."
102 )
103 else:
104 pass
UnderDeterminedIVError: The IV system is underdetermined. Only fully determined systems are allowed.
Grateful for any pointers!
I'm working on an IV example that was previously working in linearmodels but doesn't solve in pyfixest. It's possible that I've just misunderstood the syntax though!
reprex:
In this case, the model will be
in the first stage regression and
in the second stage.
Data:
linearmodels runs okay:
pyfixest produces an "UnderDeterminedIVError". Code:
Grateful for any pointers!