<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://wgunderwood.github.io//feed.xml" rel="self" type="application/atom+xml" /><link href="https://wgunderwood.github.io//" rel="alternate" type="text/html" /><updated>2026-04-10T14:48:39+00:00</updated><id>https://wgunderwood.github.io//feed.xml</id><title type="html">William G. Underwood</title><subtitle>Welcome to my website.</subtitle><author><name>William G. Underwood</name></author><entry><title type="html">Minimax-optimal In-context Regression with Transformers</title><link href="https://wgunderwood.github.io//2026/01/22/in-context-transformers.html" rel="alternate" type="text/html" title="Minimax-optimal In-context Regression with Transformers" /><published>2026-01-22T00:00:00+00:00</published><updated>2026-01-22T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2026/01/22/in-context-transformers</id><content type="html" xml:base="https://wgunderwood.github.io//2026/01/22/in-context-transformers.html"><![CDATA[<p>Excited to share a collaboration with three brilliant undergraduate/masters
students, Michelle Ching, Ioana Popescu and Nico Smith!</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<p>In “Efficient and
minimax-optimal in-context nonparametric regression with transformers”, we show
that deep transformer networks require only order $\log n$ parameters to attain
optimal rates in smooth regression problems.
The paper is joint work with
<a href="https://www.linkedin.com/in/michelle-ching-408557294/">Michelle Ching</a>,
<a href="https://www.linkedin.com/in/ioana-popescu-409294253/">Ioana Popescu</a>,
<a href="https://www.linkedin.com/in/nico-smith-4730562a1/">Nico Smith</a>,
<a href="https://tianyima2000.github.io/">Tianyi Ma</a>
and
<a href="https://www.statslab.cam.ac.uk/~rjs57/">Richard Samworth</a>,
and can be found at
<a href="https://arxiv.org/abs/2601.15014">arXiv:2601.15014</a>.</p>

<h1 id="abstract">Abstract</h1>

<p>We study in-context learning for nonparametric regression with
$\alpha$-Hölder smooth regression functions, for some $\alpha&gt;0$. We
prove that, with $n$ in-context examples and $d$-dimensional
regression covariates, a pretrained
transformer with $\Theta(\log n)$ parameters and
$\Omega\bigl(n^{2\alpha/(2\alpha+d)}\log^3 n\bigr)$ pretraining sequences can
achieve the minimax-optimal rate of convergence
$O\bigl(n^{-2\alpha/(2\alpha+d)}\bigr)$ in mean squared error.
Our result requires substantially fewer
transformer parameters and pretraining sequences
than previous results in the literature.
This is achieved by showing that
transformers are able to
approximate local polynomial estimators efficiently
by implementing a
kernel-weighted polynomial basis
and then running gradient descent.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[Excited to share a collaboration with three brilliant undergraduate/masters students, Michelle Ching, Ioana Popescu and Nico Smith!]]></summary></entry><entry><title type="html">Early Career Researcher Grant</title><link href="https://wgunderwood.github.io//2025/10/06/g-research-grant.html" rel="alternate" type="text/html" title="Early Career Researcher Grant" /><published>2025-10-06T00:00:00+00:00</published><updated>2025-10-06T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2025/10/06/g-research-grant</id><content type="html" xml:base="https://wgunderwood.github.io//2025/10/06/g-research-grant.html"><![CDATA[<p>Delighted to have been awarded an
<a href="https://www.gresearch.com/news/g-research-august-2025-grant-winners/">Early Career Researcher Grant</a>
by G-Research!</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[Delighted to have been awarded an Early Career Researcher Grant by G-Research!]]></summary></entry><entry><title type="html">MRC Biostatistics Unit Talk</title><link href="https://wgunderwood.github.io//2025/10/02/bsu-seminar.html" rel="alternate" type="text/html" title="MRC Biostatistics Unit Talk" /><published>2025-10-02T00:00:00+00:00</published><updated>2025-10-02T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2025/10/02/bsu-seminar</id><content type="html" xml:base="https://wgunderwood.github.io//2025/10/02/bsu-seminar.html"><![CDATA[<p>I presented my recent work on “Upgrading survival models with CARE”
at the MRC Biostatistics Unit in Cambridge.</p>

<p>The talk is available on <a href="https://www.youtube.com/watch?v=Bh5C3NwxRFA">YouTube</a>.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[I presented my recent work on “Upgrading survival models with CARE” at the MRC Biostatistics Unit in Cambridge.]]></summary></entry><entry><title type="html">Upgrading Survival Models with CARE</title><link href="https://wgunderwood.github.io//2025/07/01/care.html" rel="alternate" type="text/html" title="Upgrading Survival Models with CARE" /><published>2025-07-01T00:00:00+00:00</published><updated>2025-07-01T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2025/07/01/care</id><content type="html" xml:base="https://wgunderwood.github.io//2025/07/01/care.html"><![CDATA[<p>I’m pleased to share my new preprint,
“Upgrading survival models with CARE”.</p>

<p>It is authored with
<a href="https://henryreeve.netlify.app/">Henry Reeve</a>,
<a href="https://sites.google.com/view/oyfeng20/home">Oliver Feng</a>,
<a href="https://www.phpc.cam.ac.uk/staff/dr-samuel-lambert">Samuel Lambert</a>,
<a href="https://ysph.yale.edu/profile/bhramar-mukherjee/">Bhramar Mukherjee</a>
and <a href="https://www.statslab.cam.ac.uk/~rjs57/">Richard Samworth</a>,
and can be found at
<a href="https://arxiv.org/abs/2506.23870">arXiv:2506.23870</a>.</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h1 id="abstract">Abstract</h1>

<p>Clinical risk prediction models are regularly updated as new data, often with
additional covariates, become available. We propose CARE (Convex Aggregation of
relative Risk Estimators) as a general approach for combining existing
“external” estimators with a new data set in a time-to-event survival analysis
setting. Our method initially employs the new data to fit a flexible family of
reproducing kernel estimators via penalised partial likelihood maximisation.
The final relative risk estimator is then constructed as a convex combination
of the kernel and external estimators, with the convex combination coefficients
and regularisation parameters selected using cross-validation. We establish
high-probability bounds for the $L_2$-error of our proposed aggregated
estimator, showing that it achieves a rate of convergence that is at least as
good as both the optimal kernel estimator and the best external model.
Empirical results from simulation studies align with the theoretical results,
and we illustrate the improvements our methods provide for cardiovascular
disease risk modelling. Our methodology is implemented in the Python package
<a href="https://github.com/WGUNDERWOOD/care-survival">care-survival</a>.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[I’m pleased to share my new preprint, “Upgrading survival models with CARE”.]]></summary></entry><entry><title type="html">Sharp Anti-Concentration Inequalities for Extremum Statistics via Copulas</title><link href="https://wgunderwood.github.io//2025/02/13/anti-concentration.html" rel="alternate" type="text/html" title="Sharp Anti-Concentration Inequalities for Extremum Statistics via Copulas" /><published>2025-02-13T00:00:00+00:00</published><updated>2025-02-13T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2025/02/13/anti-concentration</id><content type="html" xml:base="https://wgunderwood.github.io//2025/02/13/anti-concentration.html"><![CDATA[<p>My new preprint is titled
“Sharp Anti-Concentration Inequalities for Extremum Statistics via Copulas”,
with
<a href="https://mdcattaneo.github.io/">Matias Cattaneo</a>
and
<a href="https://anson.ucdavis.edu/~rmasini/bio.html">Ricardo Masini</a>.</p>

<p>It can be found at
<a href="https://arxiv.org/abs/2502.07699">arXiv:2502.07699</a>.</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h1 id="abstract">Abstract</h1>

<p>We derive sharp upper and lower bounds for the pointwise concentration function
of the maximum statistic of $d$ identically distributed real-valued random
variables. Our first main result places no restrictions either on the common
marginal law of the samples or on the copula describing their joint
distribution. We show that, in general, strictly sublinear dependence of the
concentration function on the dimension $d$ is not possible. We then introduce
a new class of copulas, namely those with a convex diagonal section, and
demonstrate that restricting to this class yields a sharper upper bound on the
concentration function. This allows us to establish several new
dimension-independent and poly-logarithmic-in-$d$ anti-concentration
inequalities for a variety of marginal distributions under mild dependence
assumptions. Our theory improves upon the best known results in certain special
cases. Applications to high-dimensional statistical inference are presented,
including a specific example pertaining to Gaussian mixture approximations for
factor models, for which our main results lead to superior distributional
guarantees.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[My new preprint is titled “Sharp Anti-Concentration Inequalities for Extremum Statistics via Copulas”, with Matias Cattaneo and Ricardo Masini.]]></summary></entry><entry><title type="html">MondrianForests.jl</title><link href="https://wgunderwood.github.io//2023/11/04/mondrian-julia.html" rel="alternate" type="text/html" title="MondrianForests.jl" /><published>2023-11-04T00:00:00+00:00</published><updated>2023-11-04T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2023/11/04/mondrian-julia</id><content type="html" xml:base="https://wgunderwood.github.io//2023/11/04/mondrian-julia.html"><![CDATA[<p>My new Julia package for Mondrian random forest
regression is available on
<a href="https://github.com/WGUNDERWOOD/MondrianForests.jl">GitHub</a>.</p>

<p>This package provides:</p>

<ul>
  <li>Fitting (debiased) Mondrian random forests</li>
  <li>Parameter selection with
polynomial estimation or generalized cross-validation</li>
</ul>

<p><a href="https://github.com/WGUNDERWOOD/MondrianForests.jl" target="_blank" rel="noopener noreferrer">
<img style="width: 300px; margin-top: 0px" src="/assets/posts/mondrian_julia/logo.svg" />
</a></p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[My new Julia package for Mondrian random forest regression is available on GitHub.]]></summary></entry><entry><title type="html">Inference with Mondrian random forests</title><link href="https://wgunderwood.github.io//2023/10/17/mondrian-inference.html" rel="alternate" type="text/html" title="Inference with Mondrian random forests" /><published>2023-10-17T00:00:00+00:00</published><updated>2023-10-17T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2023/10/17/mondrian-inference</id><content type="html" xml:base="https://wgunderwood.github.io//2023/10/17/mondrian-inference.html"><![CDATA[<p>I’m excited to share my new preprint,
titled “Inference with Mondrian random forests”
and coauthored with
<a href="https://mdcattaneo.github.io/">Matias Cattaneo</a>
and
<a href="https://klusowski.princeton.edu/">Jason Klusowski</a>.</p>

<p>It can be found at
<a href="https://arxiv.org/abs/2310.09702">arXiv:2310.09702</a>.</p>

<div class="frame">
<a href="https://arxiv.org/abs/2310.09702">
<img style="width: 190px; margin-top: 5px; margin-left: 20px; margin-bottom: 15px;" src="/assets/posts/mondrian_inference/piet.jpg" />
</a>
</div>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h1 id="abstract">Abstract</h1>

<p>Random forests are popular methods for classification and regression,
and many different variants have been proposed in recent years.
One interesting example is the Mondrian random forest,
in which the underlying trees are constructed according to a Mondrian process.
In this paper we give a central limit theorem
for the estimates made by a Mondrian random forest
in the regression setting.
When combined with a bias characterization and a consistent variance estimator,
this allows one to perform asymptotically valid statistical inference,
such as constructing confidence intervals, on the unknown regression function.
We also provide a debiasing procedure for Mondrian random
forests which allows them to achieve minimax-optimal estimation rates
with $\beta$-Hölder regression functions, for all $\beta$
and in arbitrary dimension, assuming appropriate parameter tuning.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[I’m excited to share my new preprint, titled “Inference with Mondrian random forests” and coauthored with Matias Cattaneo and Jason Klusowski.]]></summary></entry><entry><title type="html">Bernstein’s Inequality</title><link href="https://wgunderwood.github.io//2023/02/22/bernstein.html" rel="alternate" type="text/html" title="Bernstein’s Inequality" /><published>2023-02-22T00:00:00+00:00</published><updated>2023-02-22T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2023/02/22/bernstein</id><content type="html" xml:base="https://wgunderwood.github.io//2023/02/22/bernstein.html"><![CDATA[<p>Bernstein’s inequality is an important concentration inequality.
In this post we motivate, state and prove a
“maximal inequality” version which I think
is clearer than the usual formulation.</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<div style="display:none">
  $ \newcommand \R {\mathbb{R}} $
  $ \newcommand \P {\mathbb{P}} $
  $ \newcommand \E {\mathbb{E}} $
  $ \newcommand \I {\mathbb{I}} $
  $ \newcommand \V {\mathbb{V}} $
  $ \newcommand \cX {\mathcal{X}} $
  $ \newcommand \Ber {\mathrm{Ber}} $
  $ \newcommand \Pois {\mathrm{Pois}} $
  $ \newcommand \Bin {\mathrm{Bin}} $
  $ \newcommand \cN {\mathcal{N}} $
  $ \newcommand \N {\mathbb{N}} $
  $ \newcommand \diff {\,\mathrm{d}} $
</div>

<h2 id="introduction">Introduction</h2>

<p><a href="https://en.wikipedia.org/wiki/
Concentration_inequality">Concentration inequalities</a>
are central to
probability theory, mathematical statistics and theoretical machine learning,
providing a mathematical framework to the notion that
“with enough samples you eventually get the right answer.”
More precisely, they provide bounds on the
typical deviations
that a random variable makes from its expected value.
<a href="https://en.wikipedia.org/wiki/
Bernstein_inequalities_(probability_theory)">Bernstein’s inequality</a>
allows us to control
the size of a sum of independent zero-mean random variables
where variance and almost sure bounds on the summands are available.</p>

<p>In this post we state and prove Bernstein’s inequality,
and also demonstrate its approximate optimality
by establishing two different lower bounds.
The Julia code for the simulations is available on
<a href="https://github.com/WGUNDERWOOD/wgunderwood.github.io/
tree/main/_posts/bernstein">GitHub</a>.</p>

<h2 id="motivation">Motivation</h2>

<p>In many applications we need to control the maximum
of a collection of random variables.
For example,
we might be proving uniform convergence of a
<a href="https://en.wikipedia.org/wiki/Kernel_regression">statistical estimator</a>,
establishing consistency for a
<a href="https://en.wikipedia.org/wiki/Binary_classification">binary classifier</a>
under
<a href="https://en.wikipedia.org/
wiki/Empirical_risk_minimization">empirical risk minimization</a>,
or controlling the regret of a
<a href="https://en.wikipedia.org/wiki/Reinforcement_learning">reinforcement learning</a>
algorithm.
Understanding how this maximum behaves as a function
of the number of variables is essential
(Figure 1).
In this post we focus on maximal inequalities for
sums of independent random variables.</p>

<figure style="display: block; margin-left: auto; margin-right: auto;">
<img style="width: 600px; margin-left: auto; margin-right: auto;" src="/assets/posts/bernstein/maximum.svg" />
<figcaption>
  Fig. 1: With $X_j \sim \cN(0,1)$ independent,
  $\max_{1 \leq j \leq d} |X_j|$ grows with $d$.
</figcaption>
</figure>

<h3 id="setup">Setup</h3>

<p>We propose the following setup which is widely applicable
in practice.</p>

<ul>
  <li>
    <p>$X_{i j}$ are real-valued random variables for
$1 \leq i \leq n$ and $1 \leq j \leq d$.</p>
  </li>
  <li>
    <p>$X_{1j}, \ldots, X_{n j}$ are
independent and identically distributed (i.i.d.)
for each $j$.</p>
  </li>
  <li>
    <p>$\E[X_{1j}] = 0$ for each $j$.</p>
  </li>
  <li>
    <p>$\max_{1 \leq j \leq d} \V[X_{1j}] \leq \sigma^2$.</p>
  </li>
  <li>
    <p>$\max_{1 \leq j \leq d} |X_{1j}| \leq M$ almost surely (a.s.).</p>
  </li>
  <li>
    <p>We will provide expectation bounds for the variable
$\max_{1 \leq j \leq d} \left| \sum_{i=1}^n X_{i j} \right|$.</p>
  </li>
</ul>

<p>A brief discussion is in order.
Firstly, the mean-zero property and the variance bound
tell us that
$\max_{1 \leq j \leq d}
\E\big[\left| \sum_{i=1}^n X_{i j} \right|\big] \leq \sqrt{n\sigma^2}$.
However in order to put the maximum inside the expectation,
we need finer control on the tails of the summands,
attained by imposing the almost sure bound.
Note that we do not make any assumptions
regarding the dependencies between
different values of $j$.</p>

<h2 id="bernsteins-inequality">Bernstein’s inequality</h2>

<p>Now let’s state the main result,
a maximal version of Bernstein’s inequality.
The proof is provided later in the post,
to avoid distracting from the discussion.</p>

<div class="box-rounded">

<h4> Theorem (Bernstein's maximal inequality) </h4>

For each $1 \leq j \leq d$,
let $X_{1j}, \ldots, X_{n j}$ be
i.i.d. real-valued random variables.
Suppose $\E[X_{1j}] = 0$ for each $j$,
$\max_{1 \leq j \leq d} \V[X_{1j}] \leq \sigma^2$
and
$\max_{1 \leq j \leq d} |X_{1j}| \leq M$ a.s. Then

$$
\E\left[
\max_{1 \leq j \leq d}
\left|
\sum_{i=1}^n X_{i j}
\right|
\right]
\leq
\sqrt{2 n \sigma^2 \log 2d}
+ \frac{M}{3} \log 2d.
$$

</div>

<h3 id="why-this-formulation">Why this formulation?</h3>

<p>If you have seen Bernstein’s inequality before,
you may notice a few differences between this version
and the usual formulation, including:</p>

<ul>
  <li>
    <p>This result is stated for the maximum of $d$ random variables.
This is to highlight the dependence of the resulting bound on $d$.</p>
  </li>
  <li>
    <p>This version is stated as an expectation rather than a tail probability
to avoid notational complexity,
but can be easily strengthened to include the tail bound.</p>
  </li>
  <li>
    <p>The terms on the right hand side are rather complicated
and perhaps unfamiliar,
but allow us to more directly parse the bound on the maximum.
The more standard version of Bernstein’s inequality makes it difficult
to read off this value.</p>
  </li>
</ul>

<h3 id="interpreting-the-bound">Interpreting the bound</h3>

<p>The resulting bound of
$\sqrt{2 n \sigma^2 \log 2d} + \frac{M}{3} \log 2d$
consists of two terms
which are worth discussing separately.</p>

<ul>
  <li>
    <p>The first term is
$\sqrt{2 n \sigma^2 \log 2d}$,
which depends on $n$ and $\sigma^2$ but not $M$,
and has a sub-Gaussian-type dependence on the dimension.
This is the bound obtained if we assume that
each $X_{1j}$ is $\sigma^2$-sub-Gaussian,
and this term corresponds
to the
<a href="https://en.wikipedia.org/wiki/
Central_limit_theorem">central limit theorem</a>
for
$\frac{1}{\sqrt{n \sigma^2}}\sum_{i=1}^n X_{i j}$.</p>
  </li>
  <li>
    <p>The second term is
$\frac{M}{3} \log 2d$
and depends on $M$ but not $n$ or $\sigma^2$.
This is a sub-exponential-type tail
which captures rare event phenomena associated with
bounded random variables.</p>
  </li>
</ul>

<p>It is worth remarking at this point that
<a href="https://en.wikipedia.org/wiki/Bennett%27s_inequality">Bennett’s inequality</a>
provides a further refinement of Bernstein’s inequality,
but the difference is minor in many applications.</p>

<h2 id="approximate-optimality">Approximate optimality</h2>

<p>In this section we provide two explicit examples
which show why each of the two terms discussed above are necessary.
It is somewhat remarkable that these examples are so easy to find,
providing a straightforward demonstration of the
near-optimality of Bernstein’s maximal inequality.</p>

<h3 id="example-1-central-limit-theorem">Example 1: central limit theorem</h3>

<p>Let $X_{i j} = \pm \sigma$
with equal probability
be i.i.d. for $1 \leq i \leq n$
and $1 \leq j \leq d$.
Note $\E[X_{i j}] = 0$, $\V[X_{i j}] = \sigma^2$
and $|X_{i j}| = \sigma$ a.s., so Bernstein’s inequality gives</p>

\[\begin{align*}
\E\left[
\max_{1 \leq j \leq d}
\left|
\sum_{i=1}^n X_{i j}
\right|
\right]
&amp;\leq
\sqrt{2 n \sigma^2 \log 2d}
+ \frac{\sigma}{3} \log 2d,
\end{align*}\]

<p>and hence for fixed $d \geq 1$</p>

\[\begin{align*}
\limsup_{n \to \infty}
\E\left[
\max_{1 \leq j \leq d}
\left|
\frac{1}{\sqrt{n \sigma^2}}
\sum_{i=1}^n X_{i j}
\right|
\right]
&amp;\leq
\sqrt{2 \log 2d}.
\end{align*}\]

<p>However we also have by the central limit theorem that</p>

\[\begin{align*}
\frac{1}{\sqrt{n \sigma^2}}
\sum_{i=1}^n
(X_{i1}, \ldots, X_{id})
\rightsquigarrow
(Z_1, \ldots, Z_d)
\end{align*}\]

<p>as $n \to \infty$,
where $Z_j \sim \cN(0,1)$ are i.i.d.
So by a Gaussian lower bound in the appendix,</p>

\[\begin{align*}
\lim_{n \to \infty} \,
\E\left[
\max_{1 \leq j \leq d}
\left|
\frac{1}{\sqrt{n \sigma^2}}
\sum_{i=1}^n X_{i j}
\right|
\right]
= \E\left[
\max_{1 \leq j \leq d}
|Z_j|
 \right]
\geq
\frac{1}{2}
\sqrt{\log d}
\end{align*}.\]

<p>Thus the first term in Bernstein’s maximal inequality
is unimprovable up to constants.</p>

<figure style="display: block; margin-left: auto; margin-right: auto;">
<img style="width: 600px; margin-left: auto; margin-right: auto;" src="/assets/posts/bernstein/normal.svg" />
<figcaption>
  Fig. 2: Bernstein's upper bound of
  $\sqrt{2 n \sigma^2 \log 2d} + \frac{\sigma}{3} \log 2d$
  and simulated <br /> values of
  $\E\left[\max_{1 \leq j \leq d}
  \left| \sum_{i=1}^n X_{i j} \right| \right]$
  for Example 1 with $n = 50$ and $\sigma^2 = 1$.
</figcaption>
</figure>

<h3 id="example-2-poisson-weak-convergence">Example 2: Poisson weak convergence</h3>

<p>Now let $X_{i j} = M\left(1 - \frac{1}{n}\right)$
with probability $1/n$
and $-\frac{M}{n}$ with probability $1 - 1/n$ be
i.i.d. for $1 \leq i \leq n$
and $1 \leq j \leq d$.
Note that $\E[X_{i j}] = 0$,
$\V[X_{i j}] = \frac{n-1}{n^2} M^2$
and $|X_{i j}| \leq \frac{n-1}{n} M$ a.s.,
so Bernstein’s inequality gives</p>

\[\E\left[
\max_{1 \leq j \leq d}
\left|
\sum_{i=1}^n X_{i j}
\right|
\right]
\leq
\sqrt{2 M^2 \log 2d}
+ \frac{M}{3} \log 2d,\]

<p>and hence</p>

\[\limsup_{d \to \infty}
\limsup_{n \to \infty}
\E\left[
\max_{1 \leq j \leq d}
\left|
\frac{1}{M \log d}
\sum_{i=1}^n X_{i j}
\right|
\right]
\leq
\frac{1}{3}.\]

<p>However also note the binomial distribution limit</p>

\[\begin{align*}
\P\left(\sum_{i=1}^n \left(\frac{X_{i j}}{M} + \frac{1}{n}\right) = k\right)
&amp;= \frac{n!}{k!(n-k)!}
\left(\frac{1}{n}\right)^k
\left(1 - \frac{1}{n}\right)^{n-k}
\to \frac{1}{e k!}
\end{align*}\]

<p>as $n \to \infty$.
Thus we have the Poisson weak convergence</p>

\[\begin{align*}
\frac{1}{M}
\sum_{i=1}^n
(X_{i1}, \ldots, X_{id})
+ (1, \ldots, 1)
\rightsquigarrow
(Z_1, \ldots, Z_d)
\end{align*}\]

<p>as $n \to \infty$ where $Z_j \sim \Pois(1)$ are i.i.d.
So by a Poisson lower bound in the appendix,</p>

\[\begin{align*}
\liminf_{d \to \infty}
\lim_{n \to \infty}
\E\left[
\max_{1 \leq j \leq d}
\left|
\frac{\log \log d}{M \log d}
\sum_{i=1}^n X_{i j}
\right|
\right]
 =
\liminf_{d \to \infty}
\frac{\log \log d}{\log d}
\left(
\E\left[\max_{1 \leq j \leq d} Z_j \right] - 1
\right)
\geq
\frac{1}{6}.
\end{align*}\]

<p>Hence the second term in Bernstein’s inequality
is tight up to a factor of $\log \log d$.
This factor diverges so slowly that
Bernstein’s inequality is practically optimal
in many applications.
For example, $\log \log d \geq 6$
already requires ${d &gt; 10^{175}}$, far more than the number
of particles in the universe!</p>

<figure style="display: block; margin-left: auto; margin-right: auto;">
<img style="width: 600px; margin-left: auto; margin-right: auto;" src="/assets/posts/bernstein/poisson.svg" />
<figcaption>
  Fig. 3: Bernstein's upper bound of
  $\sqrt{2 M^2 \log 2d} + \frac{M}{3} \log 2d$
  and simulated <br /> values of
  $\E\left[\max_{1 \leq j \leq d}
  \left| \sum_{i=1}^n X_{i j} \right| \right]$
  for Example 2 with $n = 50$ and $M = 1$.
</figcaption>
</figure>

<h2 id="references">References</h2>
<ul>
  <li>
    <p><a href="https://arxiv.org/abs/1612.06661">Four lectures on probabilistic
methods for data science</a>
by Roman Vershynin</p>
  </li>
  <li>
    <p>The University of Oxford’s course on Algorithmic Foundations of Learning,
taught by
<a href="https://www.stats.ox.ac.uk/~rebeschi/">Patrick Rebeschini</a>
in 2018</p>
  </li>
  <li>
    <p>Princeton University’s course on Probability in High Dimension,
taught by
<a href="https://web.math.princeton.edu/~rvan/">Ramon van Handel</a>
in 2021</p>
  </li>
  <li>
    <p><a href="https://arxiv.org/abs/0903.4373">A note on the distribution of the maximum
of a set of Poisson random variables</a>
by K. M. Briggs, L. Song and T. Prellberg, 2009</p>
  </li>
  <li>
    <p><a href="https://link.springer.com/article/
10.1007/BF00533727">A note on Poisson maxima</a>
by A.C. Kimber, 1983</p>
  </li>
</ul>

<h2 id="appendix-proofs">Appendix: proofs</h2>

<p>We begin by proving the main result
of this post.</p>

<div class="box-rounded">

<h4> Proof of Bernstein's maximal inequality </h4>

We first bound the
moment generating function
of $X_{i j}$. Let $t &gt; 0$ and note that by
the mean-zero property and the variance
and almost sure bounds,

$$
\begin{align*}
\E\left[
e^{t X_{i j}}
\right]
&amp;=
1 + \sum_{k=2}^\infty
\frac{t^k \E[X_{i j}^k]}{k!}
\leq
1 + t^2 \sigma^2
\sum_{k=2}^\infty
\frac{t^{k-2} M^{k-2}}{k!}.
\end{align*}
$$

Now since
$k! \geq 2 \cdot 3^{k-2}$ for all $k \geq 2$
and $1 + x \leq e^x$ for all $x$,
we have for $t &lt; 3/M$

$$
\begin{align*}
\E\left[
e^{t X_{i j}}
\right]
&amp;\leq
1 + \frac{t^2 \sigma^2}{2}
\sum_{k=2}^\infty
\left( \frac{t M}{3} \right)^{k-2}
 =
1 + \frac{t^2 \sigma^2/2}{1 - t M/3}
\leq
\exp\left(
\frac{t^2 \sigma^2/2}{1 - t M/3}
\right).
\end{align*}
$$

Now we bound the expected maximum,
using Jensen's inequality
on the convex logarithm function,
to see

$$
\begin{align*}
\E\left[
\max_{1 \leq j \leq d}
\sum_{i=1}^n X_{i j}
\right]
&amp;\leq
\frac{1}{t}
\log \E\left[
\exp
\max_{1 \leq j \leq d}
\sum_{i=1}^n t X_{i j}
\right] \\
&amp;\leq
\frac{1}{t}
\log \E\left[
\sum_{j=1}^d
\exp
\sum_{i=1}^n t X_{i j}
\right] \\
&amp;\leq
\frac{1}{t}
\log
\big(
d \,
\E\left[
\exp
t X_{i j}
\right]^n
\big) \\
&amp;\leq
\frac{1}{t}
\log d
+ \frac{n \sigma^2 t / 2}{1 - Mt/3}.
\end{align*}
$$

Minimizing the bound using calculus
by selecting
$t = \frac{2}{2M/3 + \sqrt{2n \sigma^2 / \log d}}$
gives

$$
\begin{align*}
\E\left[
\max_{1 \leq j \leq d}
\sum_{i=1}^n X_{i j}
\right]
&amp;\leq
\sqrt{2 n \sigma^2 \log d}
+ \frac{M}{3} \log d.
\end{align*}
$$

Finally we set $X_{i (d+j)} = -X_{i j}$ for $1 \leq j \leq d$
to see that

$$
\begin{align*}
\E\left[
\max_{1 \leq j \leq d}
\left|
\sum_{i=1}^n X_{i j}
\right|
\right]
&amp;\leq
\sqrt{2 n \sigma^2 \log 2d}
+ \frac{M}{3} \log 2d.
\end{align*}
$$


</div>

<p>Next we prove the Gaussian lower bound
used in Example 1.</p>

<div class="box-rounded">

<h4> Lemma (Gaussian lower bound) </h4>

Let $Z_1, \ldots, Z_d$ be i.i.d.
$\cN(0,1)$ random variables.
Then

$$
\E\left[
\max_{1 \leq j \leq d}
|Z_j|
\right]
\geq \frac{1}{2} \sqrt{\log d}.
$$

<h4> Proof </h4>

For any $t &gt; 0$, we have by Markov's
inequality

$$
\E\left[
\max_{1 \leq j \leq d}
|Z_j|
\right]
\geq t \, \P\left(
\max_{1 \leq j \leq d}
|Z_j| \geq t
\right)
 =
t \left(1 - \left(1 -
\P\left(|Z_j| \geq t \right)
\right)^d \right).
$$

Now note that by the Gaussian
density function and since
$s^2 \leq 2(s-t)^2 + 2t^2$,

$$
\begin{align*}
\P\left(|Z_j| \geq t \right)
&amp;=
\sqrt\frac{2}{\pi}
\int_t^\infty e^{-s^2/2} \diff{s}
\geq
\sqrt\frac{2}{\pi}
e^{-t^2}
\int_t^\infty e^{-(s-t)^2} \diff{s}
\geq
e^{-t^2}.
\end{align*}
$$

Hence because $1-x \leq e^{-x}$,

$$
\E\left[
\max_{1 \leq j \leq d}
|Z_j|
\right]
\geq
t \left(1 - \left(1 -
e^{-t^2}
\right)^d \right)
\geq
t \left(1 -
\exp\left(-d e^{-t^2}
\right) \right).
$$

Finally set $t = \sqrt{\log d}$
to see

$$
\E\left[
\max_{1 \leq j \leq d}
|Z_j|
\right]
\geq
\sqrt{\log d} \left(1 - 1/e \right)
\geq
\frac{1}{2}\sqrt{\log d}.
$$

</div>

<p>Finally we establish the
Poisson lower bound
used in Example 2.</p>

<div class="box-rounded">

<h4> Lemma (Poisson lower bound) </h4>

Let $Z_1, \ldots, Z_d$ be i.i.d.
$\Pois(1)$ random variables.
Then for $d \geq 16$,

$$
\E\left[
\max_{1 \leq j \leq d}
Z_j
\right]
\geq \frac{\log d}{6 \log \log d}.
$$

<h4> Proof </h4>

As for the Gaussian lower bound, we have for any integer $t \geq 2$

$$
\E\left[
\max_{1 \leq j \leq d}
Z_j
\right]
\geq
t \left(1 - \left(1 -
\P\left(Z_j \geq t \right)
\right)^d \right).
$$

Now note that

$$
\begin{align*}
\P\left(Z_j \geq t \right)
&amp;=
\frac{1}{e} \sum_{k=t}^\infty
\frac{1}{k!}
\geq
\frac{1}{e t!}
\geq
\frac{1}{e t^t}.
\end{align*}
$$

Hence noting that $\log \log \log d \geq 0$ and setting
$\frac{e-1}{e}\frac{\log d}{\log \log d}
\leq t \leq \frac{\log d}{\log \log d}$
gives

$$
\begin{align*}
\E\left[
\max_{1 \leq j \leq d}
|Z_j|
\right]
&amp;\geq
t \left(1 - \left(1 -
\frac{1}{e t^t}
\right)^d \right) \\
&amp;\geq
t \left(1 - \left(1 -
e^{-1} \exp(-t \log t)
\right)^d \right) \\
&amp;\geq
\frac{e-1}{e}
\frac{\log d}{\log \log d}
\left(1 - \left(1 -
e^{-1} \exp\left(
-\frac{\log d}{\log \log d}
\log \frac{\log d}{\log \log d}
\right)
\right)^d \right) \\
&amp;\geq
\frac{e-1}{e}
\frac{\log d}{\log \log d}
\left(1 - \left(1 -
\frac{1}{e d}
\right)^d \right) \\
&amp;\geq
\frac{e-1}{e}
\frac{\log d}{\log \log d}
\left(1 - e^{-1/e} \right) \\
&amp;\geq
\frac{\log d}{6 \log \log d}.
\end{align*}
$$

</div>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[Bernstein’s inequality is an important concentration inequality. In this post we motivate, state and prove a “maximal inequality” version which I think is clearer than the usual formulation.]]></summary></entry><entry><title type="html">Advent of Code 2022</title><link href="https://wgunderwood.github.io//2023/01/11/advent-of-code.html" rel="alternate" type="text/html" title="Advent of Code 2022" /><published>2023-01-11T00:00:00+00:00</published><updated>2023-01-11T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2023/01/11/advent-of-code</id><content type="html" xml:base="https://wgunderwood.github.io//2023/01/11/advent-of-code.html"><![CDATA[<p>In 2022 I tackled
<a href="https://adventofcode.com/">Advent of Code</a>
for the first time, using the Julia language.
Here are some of my thoughts;
my code is on
<a href="https://github.com/WGUNDERWOOD/advent-of-code-2022">GitHub</a>.</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h2 id="first-impressions">First impressions</h2>

<p><a href="https://adventofcode.com/">Advent of Code</a>
(AoC) consists of twenty-five problems increasing in difficulty
from the 1st to the 25th of December.
I enjoyed solving all the problems,
which were varied enough to avoid repetition,
but which also fell into a few distinct categories
giving some familiarity.
The first few problems were very easy,
typically requiring just a few lines of code
and no particular insights – the obvious solution “just worked.”
The later problems presented various challenges,
including knowledge of network search algorithms,
computational complexity
and memory considerations,
and plenty of general-purpose problem solving.</p>

<div class="frame">
<a href="https://julialang.org/">
<img style="width: 170px; margin-top: 25px; margin-left: 15px" src="/assets/posts/advent_of_code/julia.svg" />
</a>
</div>

<h2 id="the-julia-language">The Julia language</h2>

<p>Since this was my first attempt at AoC,
I decided to use a language which I am already comfortable with
but wanted to learn more about.
<a href="https://julialang.org/">Julia</a>
is a high-level yet high-performance language which is
as easy to write as Python and
<a href="https://julialang.org/benchmarks/">almost as fast as C</a>.
Some of my favourite features
which I tried to understand more fully during AoC include
the following.</p>

<h3 id="first-class-arrays">First-class arrays</h3>

<p><a href="https://docs.julialang.org/en/v1/manual/arrays/">Arrays</a>
in Julia are very easy to use,
working seamlessly with the rest of its type system
and requiring no additional packages
(I’m looking at you, <a href="https://numpy.org/">numpy</a>).
For example, dimension is well-defined
and included in the type declaration:</p>

<ul>
  <li>
    <p><code class="language-julia highlight highlighter-rouge"><span class="kt">Array</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="mi">1</span><span class="x">}</span></code>
or <code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>
is a one-dimensional array of integers.</p>
  </li>
  <li>
    <p><code class="language-julia highlight highlighter-rouge"><span class="kt">Array</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="mi">2</span><span class="x">}</span></code>
or <code class="language-julia highlight highlighter-rouge"><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>
is a two-dimensional array of integers.</p>
  </li>
  <li>
    <p><code class="language-julia highlight highlighter-rouge"><span class="kt">Array</span><span class="x">{</span><span class="kt">Array</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="mi">1</span><span class="x">},</span> <span class="mi">1</span><span class="x">}</span></code>
or <code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}}</span></code>
is a one-dimensional array of one-dimensional arrays of integers.
It is not equivalent to <code class="language-julia highlight highlighter-rouge"><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>.</p>
  </li>
</ul>

<p>This is useful in many circumstances – for
a rectangular array, use <code class="language-julia highlight highlighter-rouge"><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>.
For a ragged array,
use <code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}}</span></code>.
Julia has no confusion between two-dimensional
arrays with only one column and one-dimensional arrays:
the former has size <code class="language-julia highlight highlighter-rouge"><span class="x">(</span><span class="n">n</span><span class="x">,</span> <span class="mi">1</span><span class="x">)</span></code>
while the latter has size <code class="language-julia highlight highlighter-rouge"><span class="x">(</span><span class="n">n</span><span class="x">)</span></code>.
Unlike dimension, the size of an array
is not included in the type, allowing
pushing and popping.</p>

<h3 id="just-in-time-compilation">Just-in-time compilation</h3>

<p>Julia is compiled
<a href="https://en.wikipedia.org/wiki/
Just-in-time_compilation">just-in-time</a>
(JIT),
an approach offering a compromise between
<a href="https://en.wikipedia.org/wiki/Compiler">ahead-of-time</a>
compilation
(where the program is first compiled to
machine code and then executed, like C)
and
<a href="https://en.wikipedia.org/wiki/
Interpreter_(computing)">interpretation</a>
(where the program is run line-by-line, like Python).
This allows the program to be recompiled while it is running,
so that the most performance-critical parts are made as fast as possible.</p>

<h3 id="fast-loops">Fast loops</h3>

<p>A great example of the benefit of JIT compilation is Julia’s
<a href="https://julialang.org/benchmarks/">fast loop</a>
implementation,
which requires no explicit vectorisation
(unlike Python, where loops are painfully slow
unless vectorised with list comprehensions or numpy).
Simply write out the loops you need and the JIT compiler will handle
the execution.
That said, Julia does have
<a href="https://docs.julialang.org/en/v1/
manual/mathematical-operations/#man-dot-operators">vectorisation macros</a>
which can allow for neater code.
For example, to add one to the array
<code class="language-julia highlight highlighter-rouge"><span class="n">A</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>,
we write <code class="language-julia highlight highlighter-rouge"><span class="n">A</span> <span class="o">.+</span> <span class="mi">1</span></code>,
using the dot operator to broadcast over
the elements of <code class="language-julia highlight highlighter-rouge"><span class="n">A</span></code>.</p>

<h3 id="type-annotation">Type annotation</h3>

<p>Julia’s
<a href="https://docs.julialang.org/en/v1/manual/types/">type system</a>
is another place where it combines the best of
low-level compiled languages and high-level interpreted languages.
Type annotations are <em>optional</em> in Julia.
If you don’t want to think about types,
you can opt to leave variables and function arguments untyped,
as in Python or R.
However adding type annotations such as
<code class="language-julia highlight highlighter-rouge"><span class="n">x</span><span class="o">::</span><span class="kt">Float64</span></code>
or <code class="language-julia highlight highlighter-rouge"><span class="n">add_one</span><span class="x">(</span><span class="n">x</span><span class="o">::</span><span class="kt">Float64</span><span class="x">)</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span></code>
can help the compiler optimise your code better
and prevent you from making mistakes such as passing the wrong
object into a function.</p>

<h3 id="multiple-dispatch">Multiple dispatch</h3>

<p>Julia allows the creation of
<a href="https://docs.julialang.org/en/v1/
manual/types/#Composite-Types">composite types</a>
(also known as structs or objects).</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">struct</span><span class="nc"> Elf</span>
    <span class="n">height</span><span class="o">::</span><span class="kt">Int</span>
    <span class="n">weight</span><span class="o">::</span><span class="kt">Int</span>
<span class="k">end</span></code></pre></figure>

<p>However, unlike objected-oriented languages such as Python,
Julia does not define methods for its objects.
Instead, we write a regular
<a href="https://docs.julialang.org/en/v1/manual/functions/">function</a>
and annotate its arguments
to be of certain types.
For example,
<code class="language-julia highlight highlighter-rouge"><span class="n">bmi</span><span class="x">(</span><span class="n">elf</span><span class="o">::</span><span class="n">Elf</span><span class="x">)</span> <span class="o">=</span> <span class="n">elf</span><span class="o">.</span><span class="n">weight</span> <span class="o">/</span> <span class="n">elf</span><span class="o">.</span><span class="n">height</span><span class="o">^</span><span class="mi">2</span></code>
rather than
<code class="language-julia highlight highlighter-rouge"><span class="n">elf</span><span class="o">.</span><span class="n">bmi</span><span class="x">()</span> <span class="o">=</span> <span class="n">elf</span><span class="o">.</span><span class="n">weight</span> <span class="o">/</span> <span class="n">elf</span><span class="o">.</span><span class="n">height</span><span class="o">^</span><span class="mi">2</span></code>.
Multiple dispatch means that the type of <em>every</em> argument must agree
in order for the function to be called.</p>

<p>There is plenty more to like about Julia,
including its
<a href="https://docs.julialang.org/en/v1/
manual/types/#Parametric-Types">parametric type system</a>,
its great standard libraries for
<a href="https://docs.julialang.org/en/v1/base/multi-threading/">multi-threading</a>,
<a href="https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/">linear algebra</a>
and <a href="https://docs.julialang.org/en/v1/stdlib/Statistics/">statistics</a>,
its <a href="https://docs.julialang.org/en/v1/stdlib/Pkg/">package manager</a>
and its
<a href="https://docs.julialang.org/en/v1/stdlib/Test/">testing</a>
functionality,
but I won’t go into any more detail here.</p>

<h2 id="general-comments">General comments</h2>

<p>I learned more than I expected to while completing Advent of Code.
While the solutions are perhaps not very surprising,
the actual implementation details are worth thinking about
more carefully.
In particular, I definitely improved my understanding of the following.</p>

<h3 id="data-structures">Data structures</h3>

<p>Since I have never studied computer science formally,
I would often be confused by the obsession with
“data structures” – what’s wrong with arrays?
AoC definitely helped me appreciate this more – in fact
I think the hardest part of most of the problems was finding the right
data structures to use.
I became more familiar with
<a href="https://docs.julialang.org/en/v1/base/collections/#Dictionaries">dictionaries</a>
(hash maps) and
<a href="https://docs.julialang.org/en/v1/manual/functions/#Tuples">tuples</a>,
and their respective strengths and weaknesses.
<a href="https://docs.julialang.org/en/v1/manual/types/#Type-Unions">Type unions</a>
were also useful, particularly for handling
missing or non-existent values through the
<code class="language-julia highlight highlighter-rouge"><span class="kt">Nothing</span></code>
and
<code class="language-julia highlight highlighter-rouge"><span class="kt">Union</span><span class="x">{</span><span class="n">T</span><span class="x">,</span> <span class="kt">Nothing</span><span class="x">}</span></code> types.</p>

<h3 id="network-algorithms">Network algorithms</h3>

<p>While I had seen dynamic programming methods such as
<a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">Dijkstra’s algorithm</a>
for shortest paths before, I had no experience in implementing them.
This turned out to be relatively straightforward,
keeping track of temporary variables in a new array.</p>

<p>More challenging for me were the network search algorithms,
including
<a href="https://en.wikipedia.org/wiki/Breadth-first_search">breadth-first search</a>
(BFS) and
<a href="https://en.wikipedia.org/wiki/Depth-first_search">depth-first search</a>
(DFS).
I didn’t really understand which one to use in a given scenario
and ended up doing quite a bit of trial and error
with various heuristics to keep the run time reasonable.
Julia’s
<code class="language-julia highlight highlighter-rouge"><span class="n">push!</span></code>
and <code class="language-julia highlight highlighter-rouge"><span class="n">pushfirst!</span></code>
functions allow easy switching between
<a href="https://computersciencewiki.org/index.php/Stack">stack</a>
and <a href="https://computersciencewiki.org/index.php/Queue">queue</a>
behaviour,
along with the analogous functions
<code class="language-julia highlight highlighter-rouge"><span class="n">pop!</span></code>
and
<code class="language-julia highlight highlighter-rouge"><span class="n">popfirst!</span></code>.</p>

<h3 id="avoiding-unnecessary-allocation-and-copying">Avoiding unnecessary allocation and copying</h3>

<p>Many problems in AoC (particularly part 2 of the problems)
contain inputs so large that the obvious solutions
are impossible due to memory allocation.
One common fix is to represent the data without allocating much memory,
for example by representing ranges of numbers by their start and end points,
either manually or by using an
<code class="language-julia highlight highlighter-rouge"><span class="kt">AbstractRange</span></code>
object, rather than collecting all the elements into an array.
Another common solution was to use various divisibility tricks,
storing only the remainders of numbers modulo some divisor,
rather than keeping the whole number.
This helped avoid storing huge integers.</p>

<p>A similar challenge was to avoid copying large arrays.
Julia uses exclamation marks to distinguish “in-place” functions
such as
<code class="language-julia highlight highlighter-rouge"><span class="n">sort!</span></code>,
which modify their arguments, from
“returning” functions such as
<code class="language-julia highlight highlighter-rouge"><span class="n">sort</span></code>
which simply return a new value,
in this case the sorted array.
This often allows one to reuse the same memory,
modifying the entries of an array without copying the whole object.</p>

<h2 id="day-by-day">Day-by-day</h2>

<p>Some brief comments on each day’s problems are given below,
along with the approximate execution time after precompilation.
My solutions are available on
<a href="https://github.com/WGUNDERWOOD/advent-of-code-2022">GitHub</a>.</p>

<h3>
<a href="https://adventofcode.com/2022/day/1" style="color:#F1FA9C">
Day 1: Calorie Counting
</a>
<span style="float: right; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day01.jl" style="color:#777777">
0.034 s
</a>
</span>
</h3>

<p>Iterating through the input lines allowed us to calculate the
total calories for each elf,
storing the totals in a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>.
Finding the most calorific elf (part 1)
and the most calorific three elves (part 2)
was easy with the
<code class="language-julia highlight highlighter-rouge"><span class="n">maximum</span></code>
and
<code class="language-julia highlight highlighter-rouge"><span class="n">sort</span></code>
functions.</p>

<h3>
<a href="https://adventofcode.com/2022/day/2" style="color:#F1FA9C">
Day 2: Rock Paper Scissors
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day02.jl" style="color:#777777">
0.080 s
</a>
</span>
</h3>

<p>For each part I kept a lookup table of type
<code class="language-julia highlight highlighter-rouge"><span class="kt">Dict</span><span class="x">{</span><span class="kt">String</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span></code>
for the score based on the input.
For example for part 1 we have
<code class="language-julia highlight highlighter-rouge"><span class="s">"A X"</span> <span class="o">=&gt;</span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">3</span></code>
while for part 2 we have
<code class="language-julia highlight highlighter-rouge"><span class="s">"A X"</span> <span class="o">=&gt;</span> <span class="mi">3</span> <span class="o">+</span> <span class="mi">0</span></code>.
Summing the scores over an iterator of the input file
gave the answer without allocating much memory.</p>

<h3>
<a href="https://adventofcode.com/2022/day/3" style="color:#F1FA9C">
Day 3: Rucksack Reorganization
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day03.jl" style="color:#777777">
0.434 s
</a>
</span>
</h3>

<p>This problem was about finding common characters in several strings.
While I could have just done this for two or three strings as required
in the question, I wrote a recursive function to handle
an arbitrary length
<code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">String</span><span class="x">}</span></code>
based on intersecting the first pair repeatedly until only one
string remains.</p>

<h3>
<a href="https://adventofcode.com/2022/day/4" style="color:#F1FA9C">
Day 4: Camp Cleanup
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day04.jl" style="color:#777777">
0.173 s
</a>
</span>
</h3>

<p>For this question we were given a list of pairs of integer ranges
and asked for how many pairs do the two ranges
contain each other (part 1) or overlap (part 2).
Despite the numbers in this question being quite small,
it still seemed sensible to represent the ranges by their endpoints
using a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span></code>
rather than collecting all the values in between into a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>.
Given the endpoints it was easy to write
functions to check if one contains the other or if they overlap.</p>

<h3>
<a href="https://adventofcode.com/2022/day/5" style="color:#F1FA9C">
Day 5: Supply Stacks
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day05.jl" style="color:#777777">
0.179 s
</a>
</span>
</h3>

<p>Did someone say stacks?
Moving one crate at a time (part 1) was easily achieved by using
<code class="language-julia highlight highlighter-rouge"><span class="n">push!</span></code>
and
<code class="language-julia highlight highlighter-rouge"><span class="n">pop!</span></code>
multiple times on the relevant
<code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Char</span><span class="x">}</span></code>
and this extended to moving multiple crates (part 2)
by pushing them first to a
temporary “holding array” and then to their final destinations.
The hardest part was reading the oddly-formatted input.</p>

<h3>
<a href="https://adventofcode.com/2022/day/6" style="color:#F1FA9C">
Day 6: Tuning Trouble
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day06.jl" style="color:#777777">
0.020 s
</a>
</span>
</h3>

<p>This was the first puzzle where I predicted what the second part
might be and planned accordingly.
Both parts involved finding the first position in a string where
the preceding $n$ characters were all distinct,
with $n=4$ in the first part and $n=14$ in the second.
As such I made sure the complexity of
my solution did not depend much on the size of $n$.
Using a simple loop through the string checking at each point if the
preceding $n$ characters are distinct was enough to solve this.</p>

<h3>
<a href="https://adventofcode.com/2022/day/7" style="color:#F1FA9C">
Day 7: No Space Left On Device
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day07.jl" style="color:#777777">
0.548 s
</a>
</span>
</h3>

<p>This was the first day I found difficult,
and also the first for which I used custom composite types.
The goal was to imitate a simple file system which could run
<code class="language-plaintext highlighter-rouge">cd</code> and <code class="language-plaintext highlighter-rouge">ls</code>.
After some unsuccessful attempts at recursive types
(directories in directories etc.),
I settled on the following simple data structure,
with structs
<code class="language-julia highlight highlighter-rouge"><span class="n">File</span></code>
and <code class="language-julia highlight highlighter-rouge"><span class="n">Directory</span></code>
each containing
<code class="language-plaintext highlighter-rouge">name</code>, <code class="language-plaintext highlighter-rouge">size</code> and <code class="language-plaintext highlighter-rouge">parent</code> fields:</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">mutable struct</span><span class="nc"> Filesystem</span>
    <span class="n">structure</span><span class="o">::</span><span class="kt">Dict</span><span class="x">{</span><span class="kt">String</span><span class="x">,</span> <span class="kt">Union</span><span class="x">{</span><span class="n">File</span><span class="x">,</span> <span class="n">Directory</span><span class="x">}}</span>
    <span class="n">cwd</span><span class="o">::</span><span class="kt">String</span>
<span class="k">end</span></code></pre></figure>

<p>Firstly I made a pass through the problem input to get the structure
of the file system and the sizes of the files,
setting each directory size to
<code class="language-julia highlight highlighter-rouge"><span class="nb">NaN</span></code>,
Julia’s “not a number” value.
The main challenge was to then recursively
propagate the file sizes up to the
directories which contain them.
I did this by looping over every directory
in the file system
and checking that</p>

<ul>
  <li>
    <p>Its current <code class="language-plaintext highlighter-rouge">size</code> was
<code class="language-julia highlight highlighter-rouge"><span class="nb">NaN</span></code></p>
  </li>
  <li>
    <p>All of its children had a <code class="language-plaintext highlighter-rouge">size</code> which was not
<code class="language-julia highlight highlighter-rouge"><span class="nb">NaN</span></code></p>
  </li>
</ul>

<p>If so then the size of this directory was the sum of the sizes of
its children.
I repeated this process $d$ times where $d$ is the depth
of the file system to ensure propagation had terminated.
I’m sure there are better ways to do this, but it worked for me.</p>

<h3>
<a href="https://adventofcode.com/2022/day/8" style="color:#F1FA9C">
Day 8: Treetop Tree House
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day08.jl" style="color:#777777">
0.266 s
</a>
</span>
</h3>

<p>For this problem I wrote a running maximum function to keep track
of the largest tree in a particular direction,
and then used some simple logic to establish whether
there was a line of sight from a tree to the edge of the grid.
I handled the four different directions by permuting the input first
using Julia’s
<code class="language-julia highlight highlighter-rouge"><span class="n">reverse</span></code>
and
<code class="language-julia highlight highlighter-rouge"><span class="n">permutedims</span></code>
functions.</p>

<p>Getting the scenic score for each tree was easily implemented
by moving in each direction until finding a higher tree.</p>

<h3>
<a href="https://adventofcode.com/2022/day/9" style="color:#F1FA9C">
Day 9: Rope Bridge
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day09.jl" style="color:#777777">
0.186 s
</a>
</span>
</h3>

<p>For this question I needed only two functions,
one to move the head of the rope according to the instructions
and another to adjust the tail to keep up with the head.
To do this one needed only to check
that the tail segment is two squares away (in $L^\infty$ norm)
from the head and then adjust it with
<code class="language-julia highlight highlighter-rouge"><span class="n">tail</span> <span class="o">+=</span> <span class="n">sign</span><span class="x">(</span><span class="n">head</span> <span class="o">-</span> <span class="n">tail</span><span class="x">)</span></code>.</p>

<p>The extension to longer ropes was easy,
making sure to adjust the rope from head to tail
in order to propagate the motion correctly.
Finally the
<code class="language-julia highlight highlighter-rouge"><span class="n">unique</span></code>
function made counting the visited squares trivial.</p>

<h3>
<a href="https://adventofcode.com/2022/day/10" style="color:#F1FA9C">
Day 10: Cathode-Ray Tube
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day10.jl" style="color:#777777">
0.157 s
</a>
</span>
</h3>

<p>The following two simple functions carried out the bulk of
the work for this question,
with part 2 requiring some modular arithmetic
to get the location of the cursor on the CRT.</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">function</span><span class="nf"> noop!</span><span class="x">(</span><span class="n">register</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">})</span>
    <span class="n">push!</span><span class="x">(</span><span class="n">register</span><span class="x">,</span> <span class="n">register</span><span class="x">[</span><span class="k">end</span><span class="x">])</span>
    <span class="k">return</span> <span class="nb">nothing</span>
<span class="k">end</span>

<span class="k">function</span><span class="nf"> addx!</span><span class="x">(</span><span class="n">v</span><span class="o">::</span><span class="kt">Int</span><span class="x">,</span> <span class="n">register</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">})</span>
    <span class="n">noop!</span><span class="x">(</span><span class="n">register</span><span class="x">)</span>
    <span class="n">push!</span><span class="x">(</span><span class="n">register</span><span class="x">,</span> <span class="n">register</span><span class="x">[</span><span class="k">end</span><span class="x">]</span> <span class="o">+</span> <span class="n">v</span><span class="x">)</span>
    <span class="k">return</span> <span class="nb">nothing</span>
<span class="k">end</span></code></pre></figure>

<h3>
<a href="https://adventofcode.com/2022/day/11" style="color:#F1FA9C">
Day 11: Monkey in the Middle
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day11.jl" style="color:#777777">
0.923 s
</a>
</span>
</h3>

<p>I found this problem quite difficult,
and eventually settled on the following composite type
for each monkey.
In light of part 2, which required modular arithmetic
to prevent numbers growing too large,
each item was stored as a dictionary of
divisor -&gt; remainder pairs.
The <code class="language-julia highlight highlighter-rouge"><span class="n">operation</span></code>
field contained each monkey’s operation as an anonymous function
while <code class="language-julia highlight highlighter-rouge"><span class="n">test</span></code>
stored the divisor used for its divisibility test.
The potential destination monkeys for the items were given in
<code class="language-julia highlight highlighter-rouge"><span class="n">dest</span></code>,
and I kept track of the number of times the monkey inspected an item in
<code class="language-julia highlight highlighter-rouge"><span class="n">inspections</span></code>.
Once the input had been parsed, performing the rounds of
throwing was quite straightforward.</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">mutable struct</span><span class="nc"> Monkey</span>
    <span class="n">id</span><span class="o">::</span><span class="kt">Int</span>
    <span class="n">items</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Dict</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}}</span>
    <span class="n">operation</span><span class="o">::</span><span class="kt">Function</span>
    <span class="n">test</span><span class="o">::</span><span class="kt">Int</span>
    <span class="n">dest</span><span class="o">::</span><span class="kt">Dict</span><span class="x">{</span><span class="kt">Bool</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
    <span class="n">inspections</span><span class="o">::</span><span class="kt">Int</span>
<span class="k">end</span></code></pre></figure>

<h3>
<a href="https://adventofcode.com/2022/day/12" style="color:#F1FA9C">
Day 12: Hill Climbing Algorithm
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day12.jl" style="color:#777777">
1.151 s
</a>
</span>
</h3>

<p>This was a classic shortest path problem,
with the network edges determined by the heights of neighbouring squares.
For reading the data, Julia’s
<code class="language-julia highlight highlighter-rouge"><span class="sc">'a'</span><span class="o">:</span><span class="sc">'z'</span></code>
syntax was a neat way to get the alphabet.
I solved the main problem using
<a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">Dijkstra’s algorithm</a>,
keeping track of visited squares and their distances in
auxiliary arrays.
By running the algorithm to completion,
not stopping after reaching the target square but
rather finding the shortest path to all reachable squares,
part 2 followed immediately without having to run
the algorithm again.</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">struct</span><span class="nc"> Hill</span>
    <span class="n">heights</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span>
    <span class="n">distances</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Union</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Float64</span><span class="x">}}</span>
    <span class="n">visited</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Bool</span><span class="x">}</span>
    <span class="n">S</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
    <span class="n">E</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
<span class="k">end</span></code></pre></figure>

<p>Note that each distance was of type
<code class="language-julia highlight highlighter-rouge"><span class="kt">Union</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Float64</span><span class="x">}</span></code>
to allow for the value
<code class="language-julia highlight highlighter-rouge"><span class="nb">Inf</span><span class="o">::</span><span class="kt">Float64</span></code>.</p>

<h3>
<a href="https://adventofcode.com/2022/day/13" style="color:#F1FA9C">
Day 13: Distress Signal
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day13.jl" style="color:#777777">
0.271 s
</a>
</span>
</h3>

<p>This problem involved parsing some possibly deeply-nested
vectors of vectors of… of integers and comparing them
to each other with some specific rules.
My data structure probably wasn’t the best for this day:
I don’t think Julia supports arbitrarily nested arrays as types,
so I represented them with
a <code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">String</span><span class="x">}</span></code>,
that is, I only treated the outer-most array as a vector
and just kept its elements as strings such as
<code class="language-julia highlight highlighter-rouge"><span class="s">"[[2],5]"</span></code>.
Comparing using the rules was then fairly straightforward,
recursing and unwrapping the strings into arrays until
the comparison was determined.
The unwrapping was done by counting opening and closing
brackets to understand the depth of each element.</p>

<p>Part 2 was easy once I could compare the objects,
as I could pass the entire list into Julia’s flexible
<code class="language-julia highlight highlighter-rouge"><span class="n">sort</span></code> function
with a custom comparison function,
using <a href="https://en.wikipedia.org/wiki/Quicksort">quicksort</a>
without having to implement it myself.</p>

<h3>
<a href="https://adventofcode.com/2022/day/14" style="color:#F1FA9C">
Day 14: Regolith Reservoir
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day14.jl" style="color:#777777">
0.694 s
</a>
</span>
</h3>

<p>This problem was fun to solve as it was easy to visualise and check everything
was working properly.
The first challenge was to parse the layout of the cave from the input,
which amounted to modifying entries in a matrix along a line
given the start and end points of that line.
I then represented the cave using</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">mutable struct</span><span class="nc"> Cave</span>
    <span class="n">layout</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Char</span><span class="x">}</span>
    <span class="n">current</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
    <span class="n">start</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
    <span class="n">terminated</span><span class="o">::</span><span class="kt">Bool</span>
<span class="k">end</span></code></pre></figure>

<p>where <code class="language-julia highlight highlighter-rouge"><span class="n">layout</span></code>
showed the cave structure as in the problem statement,
<code class="language-julia highlight highlighter-rouge"><span class="n">current</span></code>
gave the current location of the falling unit of sand,
<code class="language-julia highlight highlighter-rouge"><span class="n">start</span></code>
was the starting point of the sand and
<code class="language-julia highlight highlighter-rouge"><span class="n">terminated</span></code>
determined whether to stop the simulation.
Implementing the falling sand logic was not too hard,
and for part 2 I simply added an extra path of solid rock
to represent the floor,
making it at least as wide as twice the height of the
starting point.</p>

<h3>
<a href="https://adventofcode.com/2022/day/15" style="color:#F1FA9C">
Day 15: Beacon Exclusion Zone
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day15.jl" style="color:#777777">
0.264 s
</a>
</span>
</h3>

<p>This day was quite tricky.
The obvious thing to do was to represent the positions of everything in a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Char</span><span class="x">}</span></code>,
as suggested by the example maps given in the problem.
However when looking at the problem input it quickly became apparent
that this wouldn’t work as the numbers involved
(and hence the size of the matrix needed)
were very large.
Hence I looked for a solution which avoided allocating large amounts of memory,
storing only the key properties of each sensor and
using Julia’s inbuilt
<code class="language-julia highlight highlighter-rouge"><span class="kt">UnitRange</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span></code>
type to represent intervals of numbers.</p>

<p>For part 1, I calculated the interval of points precluded by a given sensor
at a given $y$ value,
then wrote a function to simplify a collection of such intervals
into disjoint intervals.
Counting the number of points in disjoint intervals was then easy.</p>

<p>It took me a long time to figure out Part 2.
My final solution relied on the logic that if there is only one point
which is not in any of the sensed regions,
then it must be near a “corner” of two sensed regions.
Thus I first calculated the intersection points of the boundaries
of the sensed regions for each pair of sensors.
I then searched the neighbours of these points to find a point
which was in none of the sensed regions.</p>

<h3>
<a href="https://adventofcode.com/2022/day/16" style="color:#F1FA9C">
Day 16: Proboscidea Volcanium
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day16.jl" style="color:#777777">
0.923 s
</a>
</span>
</h3>

<p>This was one of the hardest problems, involving a network search which
was large enough to require some thought to get it to finish quickly.
Firstly I did some pre-processing of the input,
using Dijkstra’s algorithm to get the shortest path length between any two
valves, thus yielding a complete network.
I then dropped all the valves (nodes) with a
zero flow rate as there was no point
going to them except en route to another valve.</p>

<p>For part 1, I used a
<a href="https://en.wikipedia.org/wiki/Depth-first_search">depth-first search</a>
(DFS), storing the state of each valve (open or closed),
the amount of time used up,
the total pressure relieved
and the current position
to describe each state.
I found that using
<code class="language-julia highlight highlighter-rouge"><span class="kt">UInt8</span></code>
and <code class="language-julia highlight highlighter-rouge"><span class="kt">UInt16</span></code>
where possible gave a noticeable speed-up over the more typical
<code class="language-julia highlight highlighter-rouge"><span class="kt">Int64</span></code>.</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">struct</span><span class="nc"> State</span>
    <span class="n">opens</span><span class="o">::</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Bool</span><span class="x">}</span>
    <span class="n">time</span><span class="o">::</span><span class="kt">UInt16</span>
    <span class="n">pressure</span><span class="o">::</span><span class="kt">UInt16</span>
    <span class="n">position</span><span class="o">::</span><span class="kt">UInt8</span>
<span class="k">end</span></code></pre></figure>

<p>For part 2, I first reasoned that there was no need
for both you and the elephant
to ever visit the same valve (in the processed complete network).
Therefore I used a heuristic to find all of the “good”
paths within the time limit,
meaning those with a sufficiently high total pressure release,
again using DFS.
I then searched among these paths to find the best pair which visit
disjoint sets of valves.
The cutoff value for the good paths was chosen by trial and error,
but could be found using a binary search.</p>

<h3>
<a href="https://adventofcode.com/2022/day/17" style="color:#F1FA9C">
Day 17: Pyroclastic Flow
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day17.jl" style="color:#777777">
0.645 s
</a>
</span>
</h3>

<p>This problem was fun to solve, especially the insights required for part 2.
Part 1 involved implementing
<a href="https://en.wikipedia.org/wiki/Tetris">Tetris</a>
but without rotating the blocks.</p>

<p>I represented the tower of blocks using a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Char</span><span class="x">}}</span></code>
rather than a
<code class="language-julia highlight highlighter-rouge"><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Char</span><span class="x">}</span></code>
so I could easily push new rows onto the top to make the whole assembly taller.
Replicating the behaviour was straightforward,
checking which action needed to be applied at each time step.</p>

<p>Part 2 was initially very challenging, asking to predict the height of the tower
after 1000000000000
(yes, a trillion) blocks had fallen.
Obviously it was not possible to simulate this fully,
but I realised that the pattern of jets and the block shapes were periodic.
After checking that the top part of the tower of blocks was also periodic,
the problem became much easier.
The answer could be calculated by running the simulation
for a few thousand iterations
to observe the repeating section
and then counting the number of repeats needed to reach
1000000000000 blocks.</p>

<h3>
<a href="https://adventofcode.com/2022/day/18" style="color:#F1FA9C">
Day 18: Boiling Boulders
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day18.jl" style="color:#777777">
1.176 s
</a>
</span>
</h3>

<p>This was also a fun problem.
Part 1 was easy, calculating the total area of all the cubes
and then subtracting non-exposed faces
which made contact with another cube.</p>

<p>For part 2 I wrote a function to “complete” a set of cubes
by filling in any internal cavities.
This let me reuse my surface area function
from part 1 to get the exterior surface area.
The completion function worked by drawing a box around the lava droplet
and marking cubes on the boundary of this box as “outside” the droplet.
I then marked any non-droplet neighbours of an
“outside” cube as also being “outside”,
iterating with a DFS until termination.
The completed droplet was then simply the union of the original droplet
and the cubes which were not outside.</p>

<h3>
<a href="https://adventofcode.com/2022/day/19" style="color:#F1FA9C">
Day 19: Not Enough Minerals
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day19.jl" style="color:#777777">
1.062 s
</a>
</span>
</h3>

<p>I thought this was the hardest problem on the calendar,
and definitely took me the longest to solve.
This was another network search-type problem,
so I again went for a DFS approach.
However this instance was significantly larger than that from Day 16,
and I had to use the following pruning strategies to avoid
searching too large a space.</p>

<ul>
  <li>
    <p>If you can buy a geode bot, you must do so.</p>
  </li>
  <li>
    <p>If in the previous state you could have bought bot X
but instead did nothing, then you should not buy bot X now.</p>
  </li>
  <li>
    <p>Suppose in the current state you could spend the rest of the time
only building geode bots. If this doesn’t get more geodes
than the current best strategy, then drop the current state.</p>
  </li>
  <li>
    <p>If no object requires more than X of resource Y,
then don’t buy more than X of bot Y.</p>
  </li>
</ul>

<p>Thankfully this approach was good enough to solve both parts 1 and 2,
but it took a lot of trial and error to work out which optimisations
were worth using.</p>

<h3>
<a href="https://adventofcode.com/2022/day/20" style="color:#F1FA9C">
Day 20: Grove Positioning System
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day20.jl" style="color:#777777">
0.274 s
</a>
</span>
</h3>

<p>This problem was a welcome break, and was quite easy.
The main challenge here was using the right indices
and applying the correct modulo functions.
I kept track of both the mixed list and the original
index of each element in that list,
making it easy to work out which element to move next.
I was worried that part 2 might ask me to mix the list
a huge number of times, but thankfully it didn’t and
I was able to use the same code for both parts.</p>

<h3>
<a href="https://adventofcode.com/2022/day/21" style="color:#F1FA9C">
Day 21: Monkey Math
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day21.jl" style="color:#777777">
0.366 s
</a>
</span>
</h3>

<p>Part 1 of this problem was easy but
after seeing part 2 I rewrote all my code,
keeping track of expressions for each monkey rather than values.
I then wrote a function to check for a given monkey whether
both of its “child” monkeys (the monkey that will provide its inputs)
had had their expressions parsed.
If so, I could parse the current monkey by substituting the children’s
expressions into its own expression.
Repeating this for all the monkeys until termination propagated
the expressions right up to the <code class="language-plaintext highlighter-rouge">root</code> monkey.
For part 1, this could then be directly evaluated using Julia’s
<a href="https://docs.julialang.org/en/v1/manual/metaprogramming/">metaprogramming</a>
facilities:
<code class="language-julia highlight highlighter-rouge"><span class="n">Meta</span><span class="o">.</span><span class="n">parse</span></code>
converts a
<code class="language-julia highlight highlighter-rouge"><span class="kt">String</span></code>
to an
<code class="language-julia highlight highlighter-rouge"><span class="kt">Expr</span></code>,
and this is evaluated to a number with
<code class="language-julia highlight highlighter-rouge"><span class="n">eval</span></code>.</p>

<p>For part 2 I retrieved the expressions of the two children of <code class="language-plaintext highlighter-rouge">root</code>.
One of these evaluated to a number directly,
and the other retained the variable <code class="language-plaintext highlighter-rouge">humn</code>.
To find the value of <code class="language-plaintext highlighter-rouge">humn</code> which solves these equal to each other,
I implemented a
<a href="https://en.wikipedia.org/wiki/Binary_search_algorithm">binary search</a>.
This was probably not guaranteed to work in all situations,
but was fine for my instance.</p>

<h3>
<a href="https://adventofcode.com/2022/day/22" style="color:#F1FA9C">
Day 22: Monkey Map
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day22.jl" style="color:#777777">
0.863 s
</a>
</span>
</h3>

<p>This was a challenging but fun problem,
requiring the most lines of code out of all of the problems.
Part 1 was quite easy, especially since we had already solved
a few of these “follow the rules on a grid”-type problems
(day 9, day 14, day 17),
with the only tricky task being to write down the rules for wrapping around
to the other side of the map.</p>

<p>Part 2 was much harder but also quite interesting.
The central data structure I used was to represent each face of
the cube by the following.</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">mutable struct</span><span class="nc"> Face</span>
    <span class="n">id</span><span class="o">::</span><span class="kt">Int</span>
    <span class="n">board</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Char</span><span class="x">}</span>
    <span class="n">face_coords</span><span class="o">::</span><span class="kt">Matrix</span><span class="x">{</span><span class="kt">Int</span><span class="x">}</span>
    <span class="n">corner_loc</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
<span class="k">end</span></code></pre></figure>

<p>Here, <code class="language-plaintext highlighter-rouge">id</code> identified each face uniquely,
<code class="language-plaintext highlighter-rouge">board</code> provided the layout of open tiles and walls on the face,
<code class="language-plaintext highlighter-rouge">face_coords</code> gave the coordinates in 3D space of the
top-left, top-right and bottom-left corners of the face,
and <code class="language-plaintext highlighter-rouge">corner_loc</code> recorded where the face was located on the original
input net.
To parse the faces, I set the first face to have coordinates
<code class="language-plaintext highlighter-rouge">[-1 1 -1 1; 1 1 -1 -1; 1 1 1 1]</code> and then
traced over the net, applying the appropriate
<a href="https://en.wikipedia.org/wiki/Rotation_matrix">matrix operation</a>
every time I went over an edge to keep track of the
coordinates of each face.
The main part of the problem was then to handle the logic
for going over an edge on the resulting cube.
I did this in stages, first identifying the coordinates of the
edge to cross, then matching this to determine which face we ended up on.
Again matching coordinates identified which side of this face we emerged on,
and I then had to work out where on this edge we would appear,
and which way we would then be facing.
It took me a long time to realise that I had forgotten the last step:
we had to check we wouldn’t emerge into a wall,
as then we would not make the transition at all.
My code ended up being pretty long, but I think it should work
in generality as I never “hard-coded” my cube’s net.</p>

<h3>
<a href="https://adventofcode.com/2022/day/23" style="color:#F1FA9C">
Day 23: Unstable Diffusion
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day23.jl" style="color:#777777">
1.670 s
</a>
</span>
</h3>

<p>This problem was conceptually not too hard but seemed to need
many lines of code.
I initially tried using a dictionary,
but it turned out that iterating through this was too slow
due to the number of elves,
and in fact just keeping an array of all the possible locations
was faster, since the operations are all local.
I therefore used</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">mutable struct</span><span class="nc"> Elf</span>
    <span class="n">id</span><span class="o">::</span><span class="kt">Int</span>
    <span class="n">prop_loc</span><span class="o">::</span><span class="kt">Union</span><span class="x">{</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">},</span> <span class="kt">Nothing</span><span class="x">}</span>
<span class="k">end</span></code></pre></figure>

<p>and
<code class="language-julia highlight highlighter-rouge"><span class="n">Elves</span> <span class="o">=</span> <span class="kt">Matrix</span><span class="x">{</span><span class="kt">Union</span><span class="x">{</span><span class="n">Elf</span><span class="x">,</span> <span class="kt">Nothing</span><span class="x">}}</span></code>.
Each round consisted of a few tasks:
firstly I checked if any elves were near the edge of the array,
extending the array in all directions by ten units if so.
Then I found the proposed location for each elf according to the rules.
Next I updated the locations, checking that
no elves tried to go to the same location.
Finally I reset all the proposed locations to
<code class="language-julia highlight highlighter-rouge"><span class="nb">nothing</span></code>.</p>

<p>For part 2 I was concerned that I might need a huge number of rounds
before termination, but it ended up not being too bad
and I could just reuse the code,
checking at each step if any elf still had neighbours.</p>

<h3>
<a href="https://adventofcode.com/2022/day/24" style="color:#F1FA9C">
Day 24: Blizzard Basin
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day24.jl" style="color:#777777">
1.711 s
</a>
</span>
</h3>

<p>One final network search problem, this time with
a time-varying network.
I stored the blizzard locations as
<code class="language-julia highlight highlighter-rouge"><span class="n">Blizzard</span> <span class="o">=</span> <span class="kt">Matrix</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Int</span><span class="x">}}</span></code>,
recording the number of blizzards in each direction at every location.</p>

<p>For part 1 I first calculated all of the blizzard locations at each
time step following the rules,
up to some time limit chosen by trial and error.
I then used DFS again for the main search problem, with</p>

<figure class="highlight"><pre><code class="language-julia" data-lang="julia"><span class="k">struct</span><span class="nc"> State</span>
    <span class="n">loc</span><span class="o">::</span><span class="kt">Tuple</span><span class="x">{</span><span class="kt">Int</span><span class="x">,</span> <span class="kt">Int</span><span class="x">}</span>
    <span class="n">time</span><span class="o">::</span><span class="kt">Int</span>
<span class="k">end</span></code></pre></figure>

<p>I used the following pruning strategy to reduce run-time.
I noted that the blizzards were periodic with period given
by the least common multiple of the dimensions of the valley.
Thus if we were in the same location at the same time modulo this period,
then the state had already been seen and could be discarded.</p>

<p>For part 2 I used the same approach,
swapping the start and end points and resuming the
blizzards after each trip to get the total round trip time.</p>

<h3>
<a href="https://adventofcode.com/2022/day/25" style="color:#F1FA9C">
Day 25: Full of Hot Air
</a>
<span style="float: right; color: #777777; font-size: 24px;">
<a href="https://github.com/WGUNDERWOOD/
advent-of-code-2022/blob/main/src/day25.jl" style="color:#777777">
0.105 s
</a>
</span>
</h3>

<p>This final problem turned out to be more tricky than I expected,
though didn’t need many lines in the end.
Manipulating numbers in a different
<a href="https://en.wikipedia.org/wiki/Radix">base</a>
didn’t seem too bad,
but the inclusion of negative coefficients made this very confusing.
Writing a
<code class="language-julia highlight highlighter-rouge"><span class="n">snafu_to_decimal</span></code>
function was straightforward,
looking up the coefficients in a dictionary and
calculating in base five.
The <code class="language-julia highlight highlighter-rouge"><span class="n">decimal_to_snafu</span></code>
function was much harder,
and my final solution was recursive,
first finding a large enough power of five for the leading digit
and calculating the coefficient, then
calling the function again with the power reduced by one
on the remainder to get the next digit.
I finally trimmed any leading zeros.</p>

<h2 id="concluding-remarks">Concluding remarks</h2>

<p>Julia was overall a good language to use,
offering a compromise between ease of use
(for those familiar with Python, R and MATLAB, at least),
and performance.
One downside is that Julia is not a scripting language and has a poor
startup time. This means that most of the time is spent
compiling rather than running the program, and it is impossible
to separate these two steps.
I aimed for each solution to execute in around one second,
and managed to get them all under two seconds,
which I’m fairly happy with.
Next time I might try to learn
<a href="https://www.rust-lang.org/">Rust</a>.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[In 2022 I tackled Advent of Code for the first time, using the Julia language. Here are some of my thoughts; my code is on GitHub.]]></summary></entry><entry><title type="html">Yurinskii’s Coupling for Martingales</title><link href="https://wgunderwood.github.io//2022/10/04/martingale-yurinskii.html" rel="alternate" type="text/html" title="Yurinskii’s Coupling for Martingales" /><published>2022-10-04T00:00:00+00:00</published><updated>2022-10-04T00:00:00+00:00</updated><id>https://wgunderwood.github.io//2022/10/04/martingale-yurinskii</id><content type="html" xml:base="https://wgunderwood.github.io//2022/10/04/martingale-yurinskii.html"><![CDATA[<p>I’m pleased to share my new preprint,
titled “Yurinskii’s Coupling for Martingales”
and coauthored with
<a href="https://mdcattaneo.github.io/">Matias Cattaneo</a>
and
<a href="https://anson.ucdavis.edu/~rmasini/bio.html">Ricardo Masini</a>.</p>

<p>It can be found at
<a href="https://arxiv.org/abs/2210.00362">arXiv:2210.00362</a>.</p>

<script type="text/x-mathjax-config">
 MathJax.Hub.Config({
     tex2jax: {
         inlineMath: [['$','$'], ['\\(','\\)']],
         processEscapes: true
     },
     "HTML-CSS": {
         scale: 85
     },
 });
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h1 id="abstract">Abstract</h1>

<p>Yurinskii’s coupling is a popular tool for finite-sample distributional
approximation in mathematical statistics and applied probability, offering a
Gaussian strong approximation for sums of random vectors under easily verified
conditions with an explicit rate of approximation. Originally stated for sums of
independent random vectors in $\ell^2$-norm, it has recently been extended to
the $\ell^p$-norm, where $1 \leq p \leq \infty$, and to vector-valued
martingales in $\ell^2$-norm under some rather strong conditions. We provide as
our main result a generalization of all of the previous forms of Yurinskii’s
coupling, giving a Gaussian strong approximation for martingales in
$\ell^p$-norm under relatively weak conditions. We apply this result to some
areas of statistical theory, including high-dimensional martingale central limit
theorems and uniform strong approximations for martingale empirical processes.
Finally we give a few illustrative examples in statistical methodology, applying
our results to partitioning-based series estimators for nonparametric
regression, distributional approximation of $\ell^p$-norms of high-dimensional
martingales, and local polynomial regression estimators. We address issues of
feasibility, demonstrating implementable statistical inference procedures in
each section.</p>]]></content><author><name>William G. Underwood</name></author><summary type="html"><![CDATA[I’m pleased to share my new preprint, titled “Yurinskii’s Coupling for Martingales” and coauthored with Matias Cattaneo and Ricardo Masini.]]></summary></entry></feed>