HPC: High Phone Computing

High Phone Computing
The Architecture and Pipeline
The Edge
The Potential

Inspiration

With the advancements in AI technology, the energy necessary to train models continues to increase exponentially, and thus both the environmental costs go up considerably. Some ideas have already been put in place for example the Massachusetts Green High Performance Computing Center (MGHPCC) and Google have started building renewable energy farms using solar, wind, and nuclear power. This presents new challenges such as the difficulty of maintenance, costs, and limits to computing thus we came up with a new form of HPC: High Phone Computing!

What it does

A web server that parallelizes a model’s training procedure and assigns subtasks to iPhones, using their A18 Pro chips with 5 GPU cores. The idea is to use iPhones’ idle compute to train a large GPU-intensive model using a distributed system. A user simply uploads a pre-processed dataset and gets a trained model back!

In theory, we save 28.8 to 46.7% in energy consumption. The world has approximately 90 million iPhone 15s and 16s, adding 10 million NVIDIA A100s 40 GB and about $14 Billion in value, assuming each iPhone is rented out for 4 hours per day.

How we built it

We built this system to enable distributed machine learning model training across multiple iPhones, leveraging their A18 Pro chips for GPU-based computation. The front end was developed using Vue.js, providing an intuitive interface for users to upload models and datasets. A REST API handles the uploads, while Redis Cache optimizes performance by temporarily storing requests and ensuring smooth task queuing. The backend uses a round-robin system to distribute tasks evenly among nodes, preventing any single device from becoming overloaded. A database tracks the status of tasks and manages synchronization across nodes. On the iPhone 16 Pro devices, we implemented a hybrid Rust and JavaScript app. Rust handles performance-critical and security-sensitive operations, while JavaScript manages communication with the backend and local training tasks. Additionally, Nuxt.js with Meta Framework powers each node's local Vue.js-based interfaces. For GPU optimizations, we used Metal Shading Language (MSL) to optimize kernel operations, taking full advantage of the A18 Pro chip's capabilities. The training process utilizes TensorFlow’s multi-threading to split and parallelize tasks across the nodes, with model parallelism enabling the division of large models across multiple devices. Each node processes tensors independently, and the results are synchronized and aggregated to form the final trained model. This synchronization ensures seamless and consistent model performance, regardless of the distributed nature of the computation. Once the training is complete, the system synchronizes the tensors and aggregates the results into a single trained model, which is sent back to the user through the upload portal. This approach leverages high-performance on-device GPU compute with distributed task execution, ensuring both speed and scalability.

Challenges we ran into

The first and most brutal challenge was the iPhone’s very tight security and running tensorflow on iPhones. Since TensorFlow for Swift was deprecated, we had to create a work around utilizing an unlikely culprit: Tensorflow Javascript specifically through Rust for Javascript as an extension. The next issue was the fact that in order to optimize kernel operations on arm chips, we had to use MSL (Metal Shading Language) for optimizing MPS (Metal Performance Shaders) the equivalent of CUDA for NVIDIA GPUs.

Accomplishments that we're proud of

Overcoming Apple’s sandbox blocking developers from train models on IOS by building a Rust, JavaScript, and Swift application! Splitting, synchronizing, and optimizing a training task into subtasks using Metal Shading Language (MSL), and Keras. Managing multiple iPhone Nodes on a server and assigning tasks.

What we learned

A lot! We built an IOS application using Rust, JavaScript, Swift, and Nuxt.js—we’ve never used such a combination of tasks and learned it along the way! We built the backend using FastAPI, Postgres, Redis, and RabbitMQ server to ensure an efficient distribution algorithm. Lastly, we split our main model into Shards, allowing parallel training. We combined the multiple tensor outputs from nodes into a single model, using Metal Shading Language and Keras. We’ve never built such a complicated pipeline before with so many different components talking with each other.

What's next for HPC: High Phone Computing

We’d like to expand and scale our application for other compute tasks such as ray tracing, graphics rendering, and other computationally intensive tasks. Furthermore, we’d like to dynamically allow users to set their idle time and add themselves as “workers.” Lastly, we aim to get every iPhone user on our application and increase the world’s compute by 10 Million NVIDIA A100s 40GB!