TotallyNotAVirus

Inspiration

The motivation for this project stemmed from the concern of COVID-19 and its universal detriment to human wellbeing. While testing for COVID is happening, there are downsides to the traditional swab-test process. Some limitations include:

Availability (geographical/temporal factors)
Scarcity/expense of clinical tests
Required in-person tests - exposes more risk to the general public and health care professionals
Long turnaround to get results back

What our app does

Our app allows anyone with a phone/computer and internet to pre-analyze their potential of having COVID-19. The user records a few seconds of coughing, and our app will return their predicted test results.

Project Goals

Train a model to accurately predict whether a patient has COVID-19 from the sound of their cough
Act as preliminary testing for more in depth clinical tests
Reduce the cost and the delay associated with regular testing
Avoid the need for in-person testing until deemed crucial

How we built it

Our app primarily uses Python, with TensorFlow and OpenCV for the Convolutional Neural Network (CNN). Other libraries we leveraged include sci-kit learn, numpy, pydub and librosa (for Mel Spectrograms). On the frontend, we're using React for the audio upload interface.

Summary of how it works:

A. Preprocess Training Data -> Build CNN -> Save CNN
B. User uploads cough audio -> CNN identifies if user has COVID-19 -> User notified if they have COVID-19, with a confidence rating in %

Backend

Preprocessing the Training Data
This was a fun challenge. We sourced our training data from Kaggle - a dataset containing around 300 samples of COVID-negative cough spectrograms, and close to 1600 COVID-positive cough spectrograms. We cut down the positive samples to around 300 to match the count of the negatives. See Challenges for more details.
Building the CNN
Based around this research paper:
Max pool 2x2
Convolve 16filters 5x5 relu, x2
Max pool 2x2
Dropout 0.2
Convolve same again, x2
Max pool 2x2
Dropout 0.2
Flatten
Dropout 0.3
Dense 256 layer
Dense 2 label, softmax.
Images are classified according to categorical cross entropy.
Model Serialization
Serializing the model allows us to make predictions faster, without having to retrain the CNN each iteration. This allows our app to predict the meaning of a user's cough sample as fast as possible. The serialized model will return a ‘positive’ or ‘negative’ result to the user, and a confidence rating (percent).

Frontend

React UI
On our frontend, the user can easily upload an audio file (.wav or .mp3) of their coughing symptoms. Within seconds, a POST request is sent to the Flask app, which forwards the audio file to our spectrogram generation function.
Spectrogram Generation
On the backend, we generate a Mel Spectrogram of the audio waveform. From there, the spectrogram is forwarded to our serialized classifier, which almost instantaneously judges whether a 'patient' may have COVID-19 or not.
Back to the Front
The results are forwarded to the frontend, where the user is presented with their results, and the confidence of the prediction made. (NotAVirus is meant only to be a supplement to actual medical assessment)

Challenges we ran into

Lack of sufficient data, in quantity and quality. The dataset we found ended up working well, however, having more than 10K samples of training data would be ideal for real-world application.
Difficulty with image preprocessing! It was challenging to find precise coordinates in order to apply the affine and perspective transformations to correct the input images.
Difficulty with using librosa for generating the Mel Spectrograms on some of our local machines
Sleep? or Nah?

Accomplishments that we're proud of

Pairing a CNN with a frontend framework!
Almost meeting the submission deadline! (We were a little too ambitious this time around, but the learning experience is what really matters)

What we learned

There are so many parameters to tweak in machine learning; it's part of the learning process
Many libraries have unexpected dependencies, and these can be a pain to install sometimes

What's next for TotallyNotAVirus?

Integrating a fully-fledged frontend interface with user accounts
Seeking higher-quality datasets as they become available to increase the feasibility of personal, at-home 'pre-diagnosis' for COVID-19.

Built With

Submitted to

NewHacks - IEEE University of Toronto

Created by

I mainly worked on image preprocessing and tuning/serialization of the CNN. I used OpenCV to apply a perspective transform to the original spectrogram images, expanding and rotating them so that the training data was more uniform. I tweaked the Conv2D, pooling and dropout parameters in our CNN to try to yield a better prediction. I then serialized the model to return fast predictions to the frontend.

Ian Webster
Harris Zheng
Wenfei134 He

Updates

Ian Webster started this project — Nov 08, 2020 11:01 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.