GitHub - Sourish-07/ml-asset-pricing-study: Exploratory study applying machine learning techniques to asset pricing models, including factor analysis, prediction tasks, and performance evaluation using Jupyter notebooks.

This project studies short horizon stock return predictability using large scale daily equity data and standard machine learning models.

I constructed a unified panel of U.S. equities by processing raw daily price files into a clean return dataset containing over fourteen million observations. The pipeline includes price validation, return construction, lagged feature generation, and volatility estimation.

The primary goal of this project is not to claim strong predictability of daily returns, but to empirically evaluate how much signal remains after standard preprocessing and feature engineering. This mirrors the central challenge in modern asset pricing, where noise dominates at short horizons.

Data Construction

Raw daily price files were aggregated into a single panel and cleaned for missing values and structural issues. Daily returns were computed at the stock level. I then constructed lagged return features at one, five, and twenty day horizons along with rolling twenty day volatility estimates.

Models

I evaluated two baseline models commonly used in empirical finance and machine learning.

A linear regression model serves as a benchmark for linear predictability.

A random forest model captures potential nonlinear interactions between lagged returns and volatility.

Models were evaluated out of sample using mean squared error and R squared.

Results

Both models produced negative out of sample R squared values, with the random forest outperforming the linear model in terms of error reduction. This outcome is consistent with prior research showing that short horizon equity returns are difficult to predict even with flexible nonlinear models.

Rather than treating this as a failure, the results reinforce a key lesson in asset pricing: apparent in sample patterns often fail to generalize, and careful evaluation is essential.

Structure

scripts contains data processing pipelines notebooks contains the modeling and analysis notebook results contains model evaluation outputs figures contains feature importance visualizations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
figures		figures
notebooks		notebooks
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages