Skip to content

Issue with predict.plm() when using lag() inside the regression #67

@yyatsyn

Description

@yyatsyn

I've recently encountered a problem with predict() function for plm regression that uses AR(1) term in form of lag().

Here is an toy example: 3 countries, 3 periods, current population as a function a lagged population, the models are estimated with pooled OLS.

library(plm)

data <- data.frame(country = rep(c("A", "B", "C"), each = 3),
                  year = rep(c(1, 2, 3), 3),
                  population = c(100, 150, 200, 1000, 1200, 1250, 10, 20, 45)) |>
        pdata.frame(index = c('country', 'year'))

data$lag_population <- plm::lag(data$population, k = 1)

lm_model <- lm(population ~ lag_population, data)
plm_model <- plm(population ~ plm::lag(population, k = 1), data, model = 'pooling')
plm_model2 <- plm(population ~ lag_population, data, model = 'pooling')

All three models are identical, yet, their forecasts for the same data are different:

  • lm() based forecast is exactly what is expected, with NA in first periods for each cross-section (b/c first periods don't have lagged observations)
  • plm() forecast that uses generated variable lag_population instead of the lag() function, is similar to that from lm(), yet, NAs are dropped
  • plm() forecast that uses lag() function is the most problematic: values are shifted 1 period back, plus some values are generated for last periods (the model is pooled, without fixed effects, so how are the forecasts generated for 1st periods, especially that fill.na = FALSE is used?)
predict_plm <- predict(plm_model, newdata = data, na.fill = FALSE)
predict_plm2 <- predict(plm_model2, newdata = data, na.fill = FALSE)
predict_lm <- predict(lm_model, newdata = data)

> print(predict_lm) 
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
        NA  139.50424  193.43973         NA 1110.34313 1326.08511         NA   42.42035   53.20745 
+ print(predict_plm2)
       A-2        A-3        B-2        B-3        C-2        C-3 
 139.50424  193.43973 1110.34313 1326.08511   42.42035   53.20745 
+ print(predict_plm)
       A-1        A-2        A-3        B-1        B-2        B-3        C-1        C-2        C-3 
 139.50424  193.43973  247.37522 1110.34313 1326.08511 1380.02060   42.42035   53.20745   80.17519 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions