Gábor Vecsei

Fog and the balcony - wide crops from photos

2022-12-09T00:00:00+00:00

Neural Network Steganography

2022-05-20T00:00:00+00:00

Introduction

We all have secrets, and now you can share these with your favorite neural network 🤫

gaborvecsei/Neural-Network-Steganography - Code and notebooks for the experiments. I encourage everybody to follow the post with the code for easier and more practical understanding.

Steganography is the practice of concealing a message within another message or a physical object [1]. Hiding a message in a picture, or a picture within another picture are good examples on how you can break down the two entities (what we can call base data and secret data) and slightly alter the base to hide your secret. The idea, is that you can make really small modifications to the base which is usually impossible to spot with your eyes and those modifications contain what you wanted to hide.

Imagine increasing every $R$ value from the $(R, G, B)$ representation of the image with $1$ if $R < 255$. The result is a brand new image where you’ve hidden your “secret”, and still you will hardly be able to tell them apart.

The idea is the same with neural networks as a NN can contain millions of parameters which we can smartly modify to embed some secrets. This is what we can read about in the publication “EvilModel: Hiding Malware Inside of Neural Network Models” [2] which I wanted to test with my own implementation.

Floating-Points and how to modify them

In computer science, we are only approximating real numbers as you’d need infinite bits to represent a real number with infinite precision. This is why we are using the floating-point numbers with which we can represent these numbers with a fixed number of bits to a certain precision and range. In this post I will be using the single precision, 32 bit representation (float32), but you could easily extend the theory for representations with more/fewer bits.

Structure of a FP32

I won’t cover the whole story around floating points, there are several well written articles, you can read it up here: [3], but as a quick refresher, this is what you need to know for these experiments. We can split the binary representation into 3 parts and then use these to calculate the value of the number:

sing ($s$) - the 1st bit
exponent ($E$) - 8 bits after the sign bit
fraction ($F$) - 23 bits after the last bit of the exponent

(LSB visualization, source: [3])

Modifying these binary representations allows us to store some data while giving up some precision what we can control by the decision how many and which bits to change in the original bit sequence. The formula to calculate the real number is from which we can see that by modifying $F$ we can do the least harm [6]:

$x = (-1)^s \times (1.F) \times 2^{E-127}$

Floating-Point experiment

As an experiment, let’s say, we would like to modify the number $x=-69.420$. I wrote a little utility class [4] with which we can easily experiment with the representation. Let’s take $x$, convert to the mentioned binary representation: $11000010100010101101011100001010$ and then calculate it’s value again with the formula: $-69.41999816894531$. It’s not the same as the original one…🤔 and yeah that’s the whole point, the difference is $1.8310546892053026e-06$. And kids, this is why we are not doing ~~drugs~~ equality checks with floats.

As another experiment we can take $16bits$ from the fraction of $x$ and play around with it, “simulate” how the value changes as we change these bits. Randomly doing this $1000$ times yields the following plot:

But of course we can just set all bits to $0$s and $1$s and then we have the “range of change”. The wider this range is the more afraid we should be when modifying the NN, as predictions can and will be altered.

You can also experiment more with float32s just run this notebook.

Hiding the secrets in Neural Networks

The process is the following:

🔎 Evaluate your NN without any modification on a test dataset
- Store each individual prediction not just the overall metrics (e.g. f1 score) - for more through evaluation
0️⃣1️⃣ Convert your data/secret to binary representation
🤓 Calculate how many bits are needed to hide this data, then check if you have the available “storage” in your NN
- $storage=\text{nb_bits} * \text{nb_parameters}$
- Remember that there is a quality-quantity trade-off so try to use low number of bits
🤖Go over the parameters in the network, convert to binary format, then switch the defined bits to bits from the secret
🔎 Evaluate the NN again, and inspect the differences

Quality - Quantity trade-off

There is a trade-off what we need to consider when modifying bits of parameters in a neural network: The more precision you give up at each value the more data you can store. But think about what this precision means in a NN. You are using these parameters to perform the forward pass and receive a prediction, and you’d like to keep this prediction as close as you can to the original one. Worst case scenario, the outputs of the network will be so different, that you won’t notice the 24 days of training what you did.

As a general rule, just try to compress your data and use less bits from the fraction. Empirically it’s better to modify less bits everywhere int he network compared to modify more bits for certain selected layers. I will include such measurements in upcoming posts.

Experiment

After all this theory let’s see an actual experiment 🥳. I wrote the tools to use it not just to sit on it 🦾.

Parameters

I used the well known ResNet50 network trained on ImageNet which is easily accessible at every deep learning framework. But how much data can we store here? Actually… a lot, but it should not be surprising with the number of parameters. After I decided to run the experiment, where I change $16bits$ from the fraction of every parameter (in every Conv2D layer) I could calculate the amount of data I can store. Here you can see the layer-wise breakdown:

Adding up all the bits for the params in the 53 layers, it turns out we can easily store $44MB$s of data. And keep in mind that today this is an averaged size CV model. It would be really easy to hide a few Trojan viruses here [5].

We can also take a look on basic statistics for the parameters, to get a hint how much precision we need to retain, and these would help for any fancier placement of the secret bits (e.g. Calculate the number of bits to use for clusters on values), but I will be using a simple iterative method.

Min: -0.7719802856445312
Abs. Min: 8.192913014681835e-10
Max: 0.9003667831420898
Mean: -0.0007807782967574894
---
Nb total values: 23454912
Nb values < 10e-4: 1486452 - 6.3375%
Nb values < 10e-3: 13138630 - 56.0165%
Nb negatives: 12746193 - 54.3434%
Nb positives: 10708719 - 45.6566%
---
(Maximum) Storage capacity is 44.0 MB for the 53 layers with the 16 bits modification

Placement of the bits

For quick experimentation I chose to generate a random $44MB$ data as a secret and a simple iterative approach for hiding it, where I use a stepping window on the secret bits and starting from the 1st layers 1st parameter I make the modifications. In the first iteration I take the bits from 0th to 15th from the secret, convert the first parameter in the first conv2d layer to the binary representation, take the last 16 bits of the fraction and switch the two. The second iteration I step the window and take it from 16th to 31th and switch with the last 16 bits of the second parameter. And this goes on until we don’t have any more bits to hide.

Reconstruction

So I think the backward process is obvious and I won’t waste virtual paper on it (it’s a homework for everyone reading this 😉), but there are 3 things you need to remember for the reconstruction:

The order in which you modified the layers and the parameters (now it’s ordered by the index of the layer in the NN)
The number of bits used for each parameter (fortunately it’s global for this algo)
The index of the last modified parameter, so we can stop the process
- (If your data is $8 bits$ while your NN has $20,000,000$ parameters, you can hide a single bit in the first $8$ parameters)

Evaluation - How much the predictions changed?

As the test dataset I used images randomly found on my laptop, as we don’t necessarily interested in the predictions, only in the difference of the predictions compared to the original state. You only need to pay attention that the dataset should be diverse enough, so it covers all cases the network can meet with.

With my $14,241$ images the results are the following:

Analyzing the softmax output values with the 1000 classes:

Min abs difference: 0.0
Max abs difference: 0.11202079057693481
Number of changed prediction values: 14240972 / 14241000 | 99.9998%

Looking only at the changes where the prediction label (np.argmax(output)) is different:

Changed number of predictions: 146 / 14241 | 1.0252089038691103%

So we can see that almost all outputs changed slightly, and for some cases (approx. $1\%$) this resulted in a new output label. This is not surprising based on the maximum change values, just imagine a 3 class case where these are the original values: $[0.3, 0.34, 0.36]$ (label would be 2) and after the modification it’s: $[0.3, 0.351, 0.349]$ (where label is 1).

Conclusion

While using a relatively simple approach it is clear that we can use NNs to hide secrets. A lot of secrets… With keeping in mind the introduced trade-offs, and testing of the approaches, we can modify the network while loosing little accuracy.

I think you are already thinking about smarter and more sophisticated approaches, and in a follow up I would like to test those, and evaluate a wider range of models.

References

[1] - Steganography - Wikipedia

[2] - EvilModel: Hiding Malware Inside of Neural Network Models

[3] - Single-precision floating-point format - Wikipedia

[4] - Floating point investigation notebook

[5] - How large is a piece of Malware? - SophosLabs

[6] - Single-Precision Format - ScienceDirect

Let your NeoVim remember where you’ve been with the Memento.nvim plugin

2021-11-15T00:00:00+00:00

CryptoPrice Neovim Plugin to check your favourite coins

2021-11-08T00:00:00+00:00

Barlow Twins Tensorflow Implementation - Self-Supervised Learning via Redundancy Reduction

2021-07-06T00:00:00+00:00

Backtesting Mad-Money recommendations and the Cramer-effect

2021-06-14T00:00:00+00:00

When it comes to trading, investors are listening a lot on other people opinions without looking into the data and the background of the company. And the more credible (at least in theory) the source is, the more people pay attention without any second thought. This is the case with the show Mad Money on CNBC [1] with the host Jim Cramer. In this post I will show you how the “Mad Money” portfolio could have performed and what the Cramer-effect [4] looks like (if there is such a thing).

To achieve this, I scraped the historical buy recommendations from the show, then backtested every company which was on the list as a “buy recommendation”.

Find the GitHub repo with all the code and data used to write this post: https://github.com/gaborvecsei/Mad-Money-Backtesting

The Cramer-effect and his recommendations

The Cramer-Effect (Cramer Bounce):

After the show Mad Money the recommended stocks are bought by viewers almost immediately (after hours trading) or on the next day at market open, increasing the price for a short period of time. [4]

This is really interesting but not surprising as I already pointed out in the intro how this works for most people. Kind of sad, but who doesn’t want to get rich without any work 🤑?

Other that this I wanted to take a bigger timeframe and see what would have happened if I follow the investment ideas of the stock picking guru and his team.

Recommendations data from the show

Fortunately the data is available, as Cramer’s team makes it available via their own website [2], we just need to get it from there.

You can find a table on the site which holds the mentioned stocks and actions on the show for a single day. As you can see there are some basic options where we can select a price threshold, and most importantly the day, when there was a show. If we look closely and investigate the HTTP request (via the browsers dev mode), we can see that a simple POST request is sent with some form data (application/x-www-form-urlencoded) which contains the different “filterings”. This can be easily constructed, so once we have the contents of the page, we only need to parse it. I used BeautifulSoup for that.

You can try this for yourself with this little script.

Automation w/ GitHub Actions

Let’s be honest, we can do much better than manually preparing the data. Even better, as the resulting file is not huge, we can keep it in the version control system. This is not just a fancy addition, but can actually help as it can be directly used by everyone and more importantly, we can see the change in the contents of the file over time. Maybe you think it’s not a big deal, but this way, if there would be a “problem” on the Mad Money crew’s end, and they would mess up recommendations for some dates (in the present everyone is smarter about the past 😉, wink wink) then we would see it. We can get rid of the “it’s working on my computer” but for data problem.

Also, with the Flat Data Viewer [3], we get a cool visualization: https://flatgithub.com/gaborvecsei/Mad-Money-Backtesting

This is all achieved with GitHub Actions. Without going into the details it’s as simple as:

Checkout master branch
Prepare Python with the necessary dependencies
Use the scraper code to retrieve and transform the data
If there was a change in the contents of the .csv file, then let’s commit it
Enjoy the fruits of this really cool feature

(Idea is from [7])

Backtesting

As everything is covered: what is the goal, how we got the data, we can start to look into the backtesting and the results.

For the backtesting I used the backtesting.py [5] package (The backtrader is just as good) and yfinance [6] with which I got the historical stock data.

For the simulations, each mentioned stock is tested individually, then the overall results are calculated. (We also store the individual results in a html file.) I prepared a predefined amount which I would invest in a stock. This stays the same no matter the company, as we want to spend equally as we don’t know how the stock will perform. At each buy recommendation we go “all in” and buy as many positions as we can with the money. Once we sell, then we sell all of it. The buy and sell dates are defined in the backtesting classes, and they are “calculated” from the recommendation dates.

$\text{Recommendation dates} \rightarrow \text{Buy dates} \rightarrow \text{Sell dates}$

If a company was mentioned more times, then based on the strategy we can buy and sell more times.

Challenges

Before I show the results, I would like to write a bit about the challenges. These are important factors as all of them can alter the results at the end.

Fortunately if you have better data, then it is easily curable.

After hours data

This is one of the biggest problems 😢 as we literally don’t have it. It’s a bit of an over exaggeration, as yfinance can provide it but it is sparse. I solved this by stating that the price at showtime is the same as it is at market close. Of course with this (dummy) extrapolation, we cannot count on the profits/losses of the after-hours volatility which would be generated (in theory) by the show.

If you have (maybe a paid) data resource, then by adjusting the buy/sell date calculations, you can easily adapt the strategies to a proper after-hours trading session which would provide the accurate results for the Cramer effect.

Missing days

There are a few days for each stock, where we have missing data. The problem comes in when we had a recommendation around that date and we would like to buy/sell on the missing day. To overcome this problem I made a simple function with which you can transform these dates at buy/sell date calculation. Either you drop it, or use the next “closest” date.

Dropping would mean, that we won’t buy/sell on the date at all, while using the closest date could result in lower accuracy in returns, as that is also an approximation. Btw. if in the real-life scenario we would not strictly follow the buy patterns, and would buy max few business day later, than that would match with this approximation.

Data Quality

I don’t have any measures for this, but as I saw free sources of financial data have their own problems and they are usually not accurate. When measuring short term effects, few cents can make a difference. We need to keep this in our minds as well.

(But take this point with a grain of salt, as I only used free data sources.)

Trading Strategies

Multiple trading strategies are implemented to test the Cramer effect and his “portfolio”:

$A$) BuyAndHold (and repeat)
- The stocks are bought at the first mention on the show, then held for $N$ days. On the $N$th day the positions are closed. If there were other mentions after we sold, we repeat this process. (If at the end of the simulation we still have open positions, those are closed automatically)
$B$) AfterShowBuyNextDayCloseSell
- We buy the mentioned stocks at the end of the show and then sell on the next day at market Close
$C$) AfterShowBuyNextDayOpenSell
- We buy the mentioned stocks at the end of the show and then sell on the next day at market Open
$D$) NextDayOpenBuyNextDayCloseSell
- We buy the mentioned stocks at next day market open and then sell it on the same day at market close

The Cramer-effect is simulated with strategies $B$, $C$ and $D$, as we are aiming for the short-term effect. Strategy $D$ is the one, where no after-hours trading is involved.

Results

Results are obtained by observing stock values and company mentions from 2020-01-01 to 2021-06-04.

At every show there are “buy” recommendations and also “positive” mentions. The latter means that there is a bigger chance to see bullish market, but it’s not as strong signal as the buy recommendation. In conclusion we should see more consistent returns with the buy signals. This is what I used for the backtesting.

For each unique stock I invested \$1000 and set a commission of 2%

(In the code there is an option to use stop-loss and take-profit, but results were calculated without these).

Buy and Hold (and repeat)

Days Held	Negative Returns	Positive Returns	Mean Return %	Median Return %
1	543	170	-4.85436	-3.01154
2	523	190	-4.38844	-3.3714
5	481	232	-3.03959	-2.72434
10	455	258	-3.09772	-3.42916
30	385	328	1.84899	-1.93449
60	348	365	9.75003	0.699654
90	329	383	12.1096	2.70547
120	295	418	17.6343	5.15033
240	227	486	31.5762	12.1968
365	215	498	38.8041	18.1675
373	185	528	42.6746	20.8505

(at this last plot the y axis is count and not Return)

Cramer Effect

These are the short term trading strategies which I tested.

Strategy	Negative Returns	Positive Returns	Mean Return %	Median Return %
AfterShowBuyNextDayCloseSell	546	166	-5.01226	-3.12014
AfterShowBuyNextDayOpenSell	570	142	-5.10033	-3.16921
NextDayOpenBuyNextDayCloseSell	543	169	0.83846	-2.9403

In the repo, under the art/ folder you should find visualizations for the results of each strategy.

Conclusion

From the Buy and Hold results it is visible that creating a diverse portfolio and holding the positions results in greater returns. So no magic 🧙 here, just the golden rule of investing - be diverse and hold 💎👐. (Also don’t forget that the tested period was bullish most of the time, so it was “easier” to generate profits).

Of course this is nowhere near a real-life scenario. Let’s think about it: there are more than 700 unique stocks and I invested \$1000 per stock. At the end this resulted in a more than \$700,000 investment. We could fix this buy using a smaller amount, which would exclude stocks to buy then, or we could set an amount per day and based on some logic select positions to buy, which again, results in excluding stocks.

On the Cramer-Effect and short-term investment results, I don’t have any convincing results. Based on these numbers I would say that the Cramer effect is not present, but keep in mind that I used multiple approximations because of the incomplete/missing data. So if there would be a small upside, then we would not catch it this was. But it is true that there are no short term significant returns based on these strategies.

References

[1] Mad Money show

[2] Mad Money screener

[3] GitHub Flat Data Viewer

[4] The Cramer Effect

[5] Backtrading.py repo

[6] Yahoo Finance API

[7] Git scraping: track changes over time by scraping to a Git repository

Finding patterns in stock data with similarity matching - Stock Pattern Analyzer

2021-03-01T00:00:00+00:00

In financial (or almost any type of) forecasting we build models which learn the patterns of the given series. Partially this can be done because the investors and traders tend to make the same choices they did in the past, as they follow the same analysis techniques and their objective rules (e.g.: if a given stock drop below $x$, then I’ll sell). Just look at the textbook examples below [1].

Algorithmic trading, which covers ~80% [8] [9] of the trading activity introduces similar patterns as they often based on the same techniques and fundamentals. These can be observed at different scales (e.g. high-frequency trading)

In this pet-project I wanted to create a tool with which we can directly explore the most similar patterns in $N$ series given a $q$ query. The idea was that, if I look at the last $M$ trading days for a selected stock, and I find the most similar matches from the other $N$ stocks where I also know the “future” values (which comes after the $M$ days), then it can give me a hint on how the selected stock will move in the future. For example, I can observe the top $k$ matches, and then a majority vote could decide, if a bullish or bearish period will come.

(Source: [1])

Search Engine

The approach would be quite simple if we would’t care about runtime. Just imagine a sliding window over all stocks which you select, then a calculation of some distance metric and bummm 💥, you have the closest matches. But we want better than that, as even a 1 second response time can be depressing for an end user. Let’s get to how it can be “optimized”.

Instead of the naive sliding window approach which has the biggest runtime complexity, we can use more advanced similarity search methods [2]. In the project I used 2 different solutions:

KDTree [7] [3]
Faiss Quantized Index [4]

Both are blazing ⚡ fast compared to the basic approach. The only drawback is that you need to build a data model to enable this speed and keep it in the memory. As long as you don’t care how much memory is allocated, you can choose which ever you want. But when you do, I’d recommend the quantized approximate similarity search from Faiss. You can quantize your data and by that you can reduce the memory footprint of the objects with more than a magnitude. Of course the price is that this is an approximate solution [2], but still, you will get satisfying results. At least this was the case in this stock similarity search project.

You can see the comparison of the different solutions at the measurements section.

Window extraction

To build a search model for a given length (measured in days) which we call dimensions, you’ll need to prepare the data in which you would like to search in later on. In our case this means a sliding window [6] across the data with a single step. To speed it up we can vectorize this step with numpy:

window_indices = np.arange(values.shape[0] - window_size + 1)[:, None] + np.arange(window_size)
extracted_windows = values[window_indices]

Now that we have the extracted windows we can build the search model. But wait… It’s sounds great that we have the windows, but we actually don’t care about the “real 💲💲💲 values” for a given window. We are interested in the patterns, the ups 📈🦧 and downs 📉. We can solve this by min-max scaling the values for each window (this is also vectorized). This way we can directly compare the patterns in them.

Building the search tree/index is different for each library, with scipy’s cKDTree it looks like this:

X = min_max_scale(extracted_windows)
# At this point the shape of X is: (n_windows, dimensions)
model = cKDTree(X)

(To build a Faiss index, you can check the code at my repo)

The RAM allocations and build times are compared below in the measurements section.

Query

We have a search model, now we can start to use it to find the most similar patterns in our dataset. We only need to define a constant $k$ which defines how many (approximate) top-results we would like to receive and a min-max scaled query which has the same dimensions as the data we used to build the model.

top_k_distances, top_k_indices = model.query(x=query_values, k=5)

The query speed of the models can be found in the measurement table below.

Measurement results

	Build Time (ms)					Memory Footprint (Mb)					Query Speed (ms)
window sizes	5	10	20	50	100	5	10	20	50	100	5	10	20	50	100
FastIndex	0.80	1.30	1.81	13.57	26.98	3.48	6.96	13.92	34.80	69.58	1.03	1.27	2.47	7.79	15.56
MemoryEfficientIndex	6967.66	7058.32	5959.26	8216.05	7485.01	2.27	2.28	2.12	2.33	2.22	0.23	0.35	0.25	0.23	0.42
cKDTreeIndex	110.23	135.16	206.76	319.94	484.34	10.60	17.57	31.49	73.24	142.80	0.08	1.38	24.68	30.82	40.92

RAM allocation measurements are over-approximations
- Search object is serialized then the size of the file is reported here
Measurements were done on an average laptop (Lenovo Y520 with a “medium” HW config)
No GPUs were used, all calculations are made on CPUs
Query speed is measured as the average of 10 queries with the given model

The tool

Now that we have the search models, we can build the whole tool. There are 2 different parts:

RestAPI (FastAPI) - as the backend - this allows us to search in the stocks
Dash client app - as the frontend
- I had to use this to quickly create a shiny frontend (I am more of a backend guy 😉) but ideally this should be a React frontend which is responsive and looks much better

RestAPI

When we start the stock-API, a bunch of stocks (S&P500 and a few additional ones) are downloaded, prepared, and then we start to build the above mentioned search models. For each length we would like to investigate, a new model gets created with the appropriate dimensions. To speed up the process, we can download and create the models in parallel (with concurrent.futures).

For the simplicity of this tool, 2X a day (because of the different markets) a background scheduled process updates both the stock data and then the search models. In a more advanced (not MVP) version you would only need to download the last values for each stock after market close, create an extra sliding window which contains the new values and then add it to the search model. This would save you bandwidth and some CPU power. In my code, I just re-download everything and re-build the search models 😅.

After starting the script, the endpoints are visible at localhost:8001/docs.

Client Dash app

I really can’t say anything interesting about this, I tried to keep the code at minimum while the site is usable and looks pretty (as long as you are using a desktop).

Dash is perfect to quickly create frontends if you know how to use plotly, but for a production scale app as I mentioned I would go with Reach, Angular or any other alternative.

Making trading decisions based on the patters

Please just don’t. I mean it is really fun to look at the graphs and check what are the most similar stocks out there and what patterns can you find, but let’s be honest:

This will only fuel your confirmation bias.

A weighted ensamble of different forecasting techniques would be my first go-to method 🤫.

My only advice: Hold 💎👐💎👐

Demo & Code

You can find a Demo, which is deployed to Heroku. Maybe you’ll need to wait a few minutes befor the page “wakes up”.

https://stock-dash-client.herokuapp.com

You can find the code in my Stock Pattern Analyzer GitHub repo:

https://github.com/gaborvecsei/Stocks-Pattern-Analyzer

References

[1] Trading Patterns Cheat Sheet

[2] Benchmarking nearest neighbors

[3] Scipy cKDTree

[4] Faiss GitHub repository

[5] Big Lessons from FAISS - Vlad Feinberg

[6] What is Sliding Window Algorithm?

[7] K-d tree Wikipedia

[8] Algo Trading Dominates 80% Of Stock Market

[9] Algorithmic trading - Wikipedia

🚀🚀🌑

Reproduction of Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

2020-12-05T00:00:00+00:00

GitHub project page

Paper

Usage

import sle_gan

G = sle_gan.Generator(output_resolution=512)
G.load_weights("generator_weights.h5")

input_noise = sle_gan.create_input_noise(batch_size=1)
generated_images = G(input_noise)
generated_images = sle_gan.postprocess_images(generated_images, tf.uint8).numpy()

Generated Images

These are not cherry picked

Difficulties throughout reproduction

When I was reading the paper, and I started the implementation I felt that lot of small but important details are missing. You can guess some of that from previous experience but I would love to see a more detailed description on this subject for a 100% reproduction.

Some of these:

Architecture discussion in details. How the smaller variants (resolutions of 256 and 512) are built up. Whuhc layers are skipped, has reduced filter numbers, etc.
Training Schedule and some visualization of the loss(es) when training the network
FID score throughout the training and comparison with the other discussed SOTA models
Hyperparameters chosen for the different datasets
Is there any change needed for training with small datasets (<1k images) and big datasets?

Run Machine Learning Experiments with Docker Containers

2020-04-18T00:00:00+00:00

Docker images and containers. You heard about it, or even someone uses it for model deployment. But it’s not so common to run experiments with. Most of the researchers, data scientists and machine learning engineers find it cumbersome to set it up and pay attention to an extra tool in the workflow. You can easily feel this, because most of the tutorials and blog posts about docker containers with machine learning targets only the deployment phase. But containerization is just as important in the experimentation period as at the end in production. For the following simple reason: Quick Reproducibility and Mobility. Nowadays you can hear these term more often than ever (reproducibility crisis in the field [1]), but still, people don’t really care about the fact that they will forget the little tricks which makes their code work and produce the same results as before. And that they are working in a team, so anyone should easily have the same setups with the same runs as any member of the team.

I am not saying that only with containers you can solve this problem, but with a combination of docker and an experiment tracking system you are performing better than 70% [2] of companies where ML is used. In this post I would like to show you the benefits of Docker and the workflow/setup which I found the most useful.

Why would you care?

I met with multiple projects where there was a “golden” model which was trained by 1 guy/girl over and over again without recording any history on loaded weights or modifications, requirements. Then as always, the time has come and someone wanted to run the experiments again, and it took days before the training/evaluation started. Why? Because none of the dependencies were recorded. Not a single requirements.txt file. Not even a napkin with some hand notes and ketchup. As I said it was trained over and over again so if you’d like to follow that chain you need multiple trainings and of course the requirements changed so you had to spend another 2-3 days on the setup and the perfect combination of package versions. Finally you managed to run the training but it’s nowhere near the recorded KPIs. After a week of debugging you realize that python 3.7 was used but with python 3.5 and another package version combination you can get the desired results…hopefully.

Unfortunately this is not (always) the case. In the real world, even the author of the model can not reproduce the recorded numbers.

Experiments inside containers

In the above mentioned scenario with a proper image the process would have been a few minutes/hours. This is just one positive fact about this method but there are others. You can limit the HW access [3] of the containers. This comes handy when multiple researchers use the same HW so a single experiment won’t eat all the CPUs because of a misconfiguration. Also you won’t need to look through processes and manage tmux/screen sessions. All your running trainings will be accessible with a simple docker ps command. Even, after power outage (on the weekend) you can configure the container to restart itself when the power comes back.

Let’s see how can we use containers for the experiments.

First of all it’s good to know what are the main components we have:

Scripts - training, evaluation, visualization, etc.
Configuration
Data data data
Runtime environment

With the scripts we can run the experiments, and often with a configuration file or simple command line arguments we can set it’s parameters. Data is often huge, stored on a disk and without it you won’t run the experiment 😉. The runtime environment enables us to run these scripts.

Workflows

We have multiple choices on setting up the workflow. We can be sure in one thing: we will mount the data to the container as it is too big to include in the image itself.

One of the best solution from the reproducibility perspective is when we build a new image before every run and our code and configuration is copied to the image itself. After this you could sleep well, as the current state of the code and setup is preserved within the image. Unfortunately this way we need to store all of our built images in a container registry which would take up a lot of space, so we can try to get rid of the experiment images which did not bring any value. Just pay attention that in machine learning experiments low KPIs can be valuable, as it shows direction. Still, with this reduction we can end up with many images, and a single image can take up to 5-10 GB.

Instead, we can wrap the runtime only to an image and rebuild and store it only when the environment changes. But then how to run the code if we can not access it? Just create a bash script which is the entrypoint of the container and it does the following: clones a specific commit hash, then executes the main file which starts the experiment (The well known train.py file 😏). The config file should be mounted as it could contain secrets (e.g. database access) and we do not want to trash our repo with unnecessary config files for every experiment. Why not mount the code also? That would result in confusion as if you run multiple experiments and you just change the branch the code will be changed in every of your containers (as it is just mounted).

This workflow helps with dirty commits as you need to push your changes on a branch before starting anything. Rebuilding is only needed when the dependencies change which results in much fewer images which can be easily used and it’s easily manageable.

There are also things to pay attention to. When you start a run, make notes on the docker image name and tag, the git commit hash which you are using and the configuration file along with your results. In a setup like this, these ensure reproducibility. Basically next time you’d like to run it again, just docker run -d -v config.yaml:/config.yaml -v data:/data start.sh GIT_COMMIT_HASH. Or if your coworker would like to test your code you only need to point to your docker image in the container registry and a 3 liner experiment note (which contains the commit hash, etc.).

Conclusion

In this post we saw how we can put just a bit of extra work into our projects which pays off in the near future. This is not a “toy model”, the workflow is actually is in use in my teams. Based on my experience within 1-2 weeks everyone gets used to the new tool and way of working. Of course with different teams and different workloads, projects this might be different, so take my words with a pinch of salt. You should experiment with different workflows which fits for your load and infrastructure.

[1] Pete Warden - The Machine Learning Reproducibility Crisis

[2] This is just a personal feeling about the situation, based on my experience at multiple companies (startups and multinational) and based on discussions with other ML practitioners (scientists and engineers) (in Central Europe)

[3] Runtime options with Memory, CPUs, and GPUs

Machine Learning Inference with GitHub Actions

2020-03-13T00:00:00+00:00

I just looked up the recently introduced GitHub Actions and my first thought was to create an quick example project where we “deploy” an ML model with this new feature. Of course this is not a “real deployment”, but it can be used to test your model inside the repository without any additional coding. Also you can look super cool when you say to your boss, - “just leave an issue comment at the repo I’ve created”. Moments later a new notification will appear for him/her with the model’s prediction.

You can find the complete GitHub repo here

GitHub Actions is an automation tool for building, testing and deployment. Quick example: every time you create a Pull Request (with a certain tag), a new build will be triggered for your application and then it can send a message to a senior developer to have a quick look on your code.

What will we create?

A custom action and an automatic workflow will be created on top of a repository with which you can use your trained model and is triggered when a new comment arrives under an issue. You can also find the model training and inference code. I wanted to be super hardcore, so I chose the Iris dataset and a Random Forest Classifier. This tree ensemble model is trained so it can identify flowers based on the sepal and petal lengths and widths.

Training of the model was done in a Jupyter Notebook here (I’ll leave the explanation out as there is nothing interesting about it and you can find dozens of tutorials online). The code trains and serializes the model which we will use for the predictions. The GitHub Actions workflow is triggered when an issue receives a comment. If the comment contains the /predict prefix, then we start to parse the comment, then we make a prediction and construct a reply. As the final step this message is sent back to the user by a bot under the same issue. To make things better, this whole custom action will run inside a Docker container.

In a workflow we will find steps and for certain steps we can create individual actions. One workflow can contain multiple actions, but in this project we will use a single one.

Create an Action

As a first step we should create our action in the root folder named action.yaml. In this we can describe the inputs, outputs and the run environment.

From top to bottom you can see 3 defined inputs and a single output. At the end the runs key describes the environment where our code will run. This is a Docker container for which the inputs will be passed as arguments. Therefore the entry point of the container should accept these 3 arguments in the defined order.

The container

When we take a closer look at the Dockerfile we can see how our run environment is built up. First we install all the listed python requirements. Then the entrypoint.sh is copied and made executable, so it can be run inside the container. Lastly the serialized sklearn model file is copied to the container, so we can use it for making prediction (in a real life scenario, you should not store model files in a repo. This is just for the sake of quick demonstration and my laziness).

Define the Workflow

An action can not be used without a workflow. That defines the different steps you would like to take in your pipeline. You can find it at .github/workflows/main.yaml.

First of all on: [issue_comment] defines that I would like to trigger this flow when an issue receives a comment (from anyone at any issue). Then in my job I define the VM type with runs-on: ubuntu-latest (This is can be self-hosted or by GitHub). Now comes the interesting part, the steps which I mentioned before.

Checkout step: with this step we move to the desired branch in our repository (this is a github action also).
See the payload: I left it here for debugging. It shows the whole payload after receiving a comment under an issue. This container, the comment, the issue number, the user who left the comment, etc.
Make the prediction: This is the one for our custom action. The if: startsWith(github.event.comment.body, '/predict') line makes sure this step runs only if a valid prediction request comes in (so it contains the /predict prefix). You can see the inputs are defined under the with keyword and the values are added from the payload through their keys (like github.event.comment.body).
Print the reply: The constructed reply is echoed to the log. It uses the defined output of out previous step: steps.make_prediction.outputs.issue_comment_reply.
Send reply: The created reply which contains the prediction is sent as a reply with the script issue_comment.sh.

Every step runs on the selected runner which is the ubuntu-latest except our action which runs inside the created container. This container is built when the workflow is triggered. (I could have cached it, so a single run from the flow can use a previously built image, but again, I was lazy adding it to this example).

Making the prediction

There is one thing I’ve not talked about: how the prediction is made? You can easily figure this out by looking at the main.py script.

First of all the serialized sklearn model is loaded. Then the comment is parsed and we receive the 4 feature which can be used to identify the flower (sepal length, sepal width, petal length, petal width). With the 4 floats we use the model to make a prediction. The last step is to construct the reply message and then set it as an output which is done with the print(f"::set-output name=issue_comment_reply::{reply_message}") line.

That’s it! Okay… I know what you are thinking. This is too easy: The input, the dataset, the model, the storage of the mode, how the request is handled, etc. But I am suer you can figure out how to develop you method from here. (E.g. for image inputs you could decode from a base64 string and then run it through your Deep Learning model which is stored in GitLFS.)

Try it out

Now that you have read all this, and you are wondering what I’ve talked about, just go here and send a new comment like this:

/predict 5.6 2.9 3.6 1.3

You will receive the prediction in 1-2 minutes.