Motor Data Downsampling & Visualization

All 500,000 data points
Hybridized distribution
hybridized distribution curve
proportional distribution
uniform distribution
uniform distribution curve

Please use the Google Drive link below to access our final submission and full documentation!

What it does

The Baker Hughes Challenge presents a common issue in modern-day algorithmic modeling: creating accurate data representations given limited computational power. With a mission of improving energy in an efficient and safe manner, having accurate models that better predict the lifespan of products is critical for maintenance and repair procedures. The following documentation outlines the process of preparing model training for an electric motor.

How we built it

The final report can be found here: https://drive.google.com/file/d/1JMLWMsQDiDozFRWFXhn752RjBwWNU_9C/view?usp=sharing

A full write-up and documentation of our design process may be found here: https://drive.google.com/drive/folders/1I0dAvUnOGW33BLSLVn-N_0BPp0K12vEa?usp=sharing

The final project presentation can be found here: https://drive.google.com/file/d/1OocvVIj5e6Kwx19Tlhgwdx1I6v3cADn5/view?usp=sharing

Challenges we ran into

Our team had a the most trouble converting 2D scatter plots into 3D histograms that displayed frequency of data points across various domains, whether that be getting values of 0 to stop displaying on the graph, or tweaking the area covered by histogram bars.

Another large issue was getting the uniform distribution to be even. There was a large bug that caused large, artificial spikes in frequency towards the extrema of x1 and x2 values that took a long time to fix.

Accomplishments that we're proud of

The team is most proud of getting all distributions of data organized and completed, as well as all graphs and CSV files generated.

Additionally, this was the team's first time doing a data science project, and the team's second ever hackathon. Delegating the work effectively and having plenty of time to do a write-up at the end was definitely a big accomplishment.

What we learned

Though the data downsampling project, the team learned data sampling solutions, primarily k-means clustering and convex hull for domain sampling. Additionally, the team was able to improve upon python and project management skills.

What's next for Motor Data Downsampling & Visualization

The next step is to generate more hybrid methods of data downsampling, through tweaking parameters such as the significance of even distribution and density. Currently, the use of k-means++ means that hybridization is not able to be tweaked by an end user.

Built With

dbscan
matplotlib
numpy
python
scikit-learn
scipy

Submitted to

TAMU Datathon 2024
- Winner Baker Hughes' Challenge 1st Place

Created by

Created uniform and hybrid-uniform-proportional distribution algorithms by Utilizing Kmeans and convex hulls. Added 2D and 3D representations of downsized data.

Haoze Wang
Created uniform and density-based data selection algorithms from by utilizing Kmeans and DBSCAN. Added 2D and 3D data visualizations for frequency-power pairs with clustering. Completed code documentation and formal report.

Jason Xiong
Computer Science @ TAMU
Angela Yue
Michael Rao

Updates

Haoze Wang started this project — Nov 10, 2024 11:28 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.