Gimmereceipt

Inspiration

In everyday business settings, companies need to turn transaction receipts into electronic records. However, this process is often time-consuming and error prone. Our track aims to tackle this challenge.

What it does

Specifically, we want to accurately and robustly match receipt images with electronic records, which could contain typos and other human errors.

How we built it

In the first step, we used ocr packages to extract all text from a given image and turn them into a single paragraph. Thus, we successfully transform the vis+lang problem into a NLP problem by matching two texts. Then, we really get our hands dirty with the datasets and design policy-based system for computing the similarity scores between each ocr text paragraph and a given transection record. To further improve our performance and to make our model more extensible, we further transform the matching problem into a binary classification task -- given a ocr text and a transaction record, predict whether they correspond to the same source. We applied logistic regression and xgboost models and achieve great model performance.

Challenges we ran into

1) We first want to use document embeddings to infer the similarity between texts, but we found that did not work well as those embeddings are not robust to typos. Thus, we switched to policy-based systems. 2) We have had a hard time in parsing the different datetime formats in receipts. Luckily, we found the internal python utils and use a more robust way to compare the datetimes.

Accomplishments that we're proud of

1) No neural network used in our main model 2) Performances for our best-performing model (xgboost) Test data accuracy: 96.0% Training time: 112 s Prediction time: 9 s

What we learned

Don't always start with fancy models; start from data and then rephrase the problem to a tangible problem.

What's next for Gimmereceipt

1) We could create more noisy datasets with many receipt images not matched with any of the transaction records in the table. However, we think our model is extensible in those scenario due to the log probability measures we created. 2) We could tune hyperparameters to obtain even more accuracies. 3) We could further make use of the spatial information on OCR results.