This README file provides information about the replication repository (computer code and data) used to generate the
results presented in Ardia & Bluteau (2025), Optimal Text-Based Time-Series Indices, conditionally accepted
at the International Journal of Forecasting. The latest version of the paper optimize_attention.pdf is available
in the replication repository.
- David Ardia – CIRANO & GERAD & HEC Montréal, Canada
- Keven Bluteau – University of Sherbrooke, Canada
Textual data used in this project are proprietary and cannot be shared publicly due to licensing restrictions. Access to these data is granted exclusively to the Editor of the International Journal of Forecasting for review purposes. For all other users, we provide pseudo-data.
The computations are very demanding. On a modern computer, it takes about one day to generate an illustrative setup and several days to produce the complete set of results.
You must use a Windows machine with R version 4.2.3, RStudio, Rtools, and at least 64 GB of RAM. Compatibility with this specific R version is critical; we recommend using rig to manage R versions.
We use the R package renv to install the exact versions of the packages used. If installation with renv fails, you
can run 99_run_install_packages.R to install dependencies manually.
See the file session_info.txt in the repository for the full session details that generated the results.
- Clone the repository to your computer.
- Open the R project
optimize_attention.Rproj. - Run:
and confirm with “y”.
renv::restore()
- If needed, install any failed packages using their specific versions from CRAN archives as per
session_info.txt; see99_run_install_packages.Rbelow. - Run
00_run_all.Rto generate all results, or run individual scripts as described below.
| Script | Description |
|---|---|
00_run_all.R |
Master wrapper to run all scripts sequentially |
01_replicate_epu.R |
Generates results of Section 4.2 |
02_plot_epu.R |
Generates Figure 2 |
03_forecast_inflation.R |
Generates forecasting results (Section 5.3) |
04_nowcast_inflation.R |
Generates nowcasting results (Section 5.3) |
05_measure_performance.R |
Generates Table 1 and Table 2 |
06_analyze_topics.R |
Generates Figure 4 |
07_plot_sentiment.R |
Generates Figure 3 |
99_run_install_packages.R |
Fallback installer if renv fails |
| Folder | Content |
|---|---|
data/ |
Contains the various datasets |
figures/ |
Populated by figures generated by the scripts |
functions/ |
R functions used by the scripts |
output/ |
Outputs generated by the scripts |
renv/ |
Metadata for renv setup |
tables/ |
Tables generated by the scripts |
The data/ folder includes several precomputed .rda files:
dfm_filtered_resolved_unigram_ManualvocFilt_sentiment_accronym.rda: Original DFM that is granted exclusively to the Editor of the International Journal of Forecasting for review purposes. This dataset is not available in the public repository.dfm_filtered_resolved_unigram_ManualvocFilt_sentiment_accronym_pseudo.rda: Pseudo-data DFM, available in the public repository.wv_keywords_fintext.rda: Pretrained FinText word vectors filtered to include only the keywords used in this project.T5YIEM.csv: 5-Year Breakeven Inflation Rate — downloaded from the Federal Reserve Economic Data (FRED).CPILFESL.csv: Core CPI for Urban Consumers (Excluding Food and Energy) — also from FRED.
renv.lock– Used byrenvto reproduce the R environmentsession_info.txt– Records full session details of the system used
Ardia, D., & Bluteau, K. (2025). Optimal Text-Based Time-Series Indices, International Journal of Forecasting (conditionally accepted).
Available at SSRN: https://dx.doi.org/10.2139/ssrn.4830848
We appreciate your interest in our work and encourage you to reach out if you have any questions regarding the replication process.