Big Data Assessed Exercise: Financial Asset Recommender with Apache Spark

COMPSCI5088 Big Data: Systems, Programming and Management M - 2024-25

This project is a batch-processing Spark application that ranks financial assets based on past performance using large-scale historical price and metadata. It was developed as part of the Big Data coursework and received 24/25 marks, demonstrating both correctness and scalability.

Task Sheet: Assessed-Exercise.pdf

Project Summary

The goal of this Spark-based Java application is to recommend the top 5 financial assets for investment by computing technical indicators such as returns and volatility from historical stock data. The data is processed using a pipeline of distributed transformations and actions, optimized for performance and minimal shuffling.

Technologies & Concepts

Apache Spark (4.0.0-preview2)
Java 21
RDD and Dataset APIs
Broadcast Variables
Distributed Filtering, Mapping, Grouping, and Sorting
Functional decomposition using custom transformation classes.

Pipeline

Driver Code: AssessedExercise.java

Loading large-scale CSV and JSON datasets using Spark's SQL API.
Filtering asset metadata based on P/E ratio and data completeness.
Time-range filtering of stock price data (past 1 year from a reference date).
Computing returns and volatility per asset using a 5-day and 251-day window, respectively.
Filtering high-volatility stocks and broadcasting necessary metadata.
Ranking assets by returns and returning the top 5.

Performance Highlights

Feedback: Assessed-Exercise-Feedback.pdf

Correct use of solution components like broadcast variables and distributed sorting (10/10)
Full marks on code correctness and output (5/5)
Efficient use of Spark (4/5 for performance; 33s vs 30s benchmark for Task Time)
Clean, well-documented, and scalable code (5/5)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
results		results
src/bigdata		src/bigdata
.gitignore		.gitignore
Assessed-Exercise-Feedback.pdf		Assessed-Exercise-Feedback.pdf
Assessed-Exercise.pdf		Assessed-Exercise.pdf
README.md		README.md
pom.xml		pom.xml
resources.zip		resources.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Big Data Assessed Exercise: Financial Asset Recommender with Apache Spark

Project Summary

Technologies & Concepts

Pipeline

Performance Highlights

About

Uh oh!

Languages

siddydutta/Spark-Assessed-Exercise

Folders and files

Latest commit

History

Repository files navigation

Big Data Assessed Exercise: Financial Asset Recommender with Apache Spark

Project Summary

Technologies & Concepts

Pipeline

Performance Highlights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages