<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Rachel Lawrence</title><link href="https://rlaw.me/" rel="alternate"/><link href="https://rlaw.me/feeds/all.atom.xml" rel="self"/><id>https://rlaw.me/</id><updated>2023-12-18T00:00:00-05:00</updated><entry><title>PhD Thesis (director’s cut)</title><link href="https://rlaw.me/phd-thesis-directors-cut.html" rel="alternate"/><published>2023-12-18T00:00:00-05:00</published><updated>2023-12-18T00:00:00-05:00</updated><author><name>Rachel Lawrence</name></author><id>tag:rlaw.me,2023-12-18:/phd-thesis-directors-cut.html</id><summary type="html">&lt;style&gt;
#thesis-embed { height: 640px; }
.aside { font-size: 0.9em; color: gray; }
&lt;/style&gt;&lt;p&gt;My thesis is finally signed, sealed, delivered, filed, and ready to see the
bright light of day. Feel free to take a look, browse, poke around to learn a
bit about mass-action dynamics, simplicial complexes, the Global Attractor
Conjecture, and how …&lt;/p&gt;</summary><content type="html">&lt;style&gt;
#thesis-embed { height: 640px; }
.aside { font-size: 0.9em; color: gray; }
&lt;/style&gt;&lt;p&gt;My thesis is finally signed, sealed, delivered, filed, and ready to see the
bright light of day. Feel free to take a look, browse, poke around to learn a
bit about mass-action dynamics, simplicial complexes, the Global Attractor
Conjecture, and how they're all related in pursuit of building rigorously
understood mathematical/computational tools on top of nonlinear dynamical
systems. Or, for a look at the Minimum Rank problem and its computational
complexity, check out Chapter 4; and continue on to Chapter 5 to find out how it
relates to a randomized graph coloring process.&lt;/p&gt;
&lt;p&gt;It's been an unforgettable journey getting here. Now on to the next adventure!&lt;/p&gt;
&lt;p class="aside"&gt;(The copy below is the &amp;quot;director's cut&amp;quot; AKA &amp;quot;Rachel's Version&amp;quot;: Now
featuring an index of terms and a few extra colors, not allowed in the
official print copy!)&lt;/p&gt;
&lt;embed src="/downloads/Rachel-Lawrence-PhD-Thesis-Rachels-Version.pdf" alt="PDF embed" style="display: block; width: 80%; margin: 0 auto" id="thesis-embed"&gt;</content><category term="blog"/></entry><entry><title>Adversarial Examples, meet Computational Complexity</title><link href="https://rlaw.me/adversarial-examples-meet-computational-complexity.html" rel="alternate"/><published>2021-03-30T00:00:00-04:00</published><updated>2021-03-30T00:00:00-04:00</updated><author><name>Rachel Lawrence</name></author><id>tag:rlaw.me,2021-03-30:/adversarial-examples-meet-computational-complexity.html</id><summary type="html">&lt;p&gt;This blog post summarizes my notes
on &lt;a href="https://arxiv.org/abs/1805.10204"&gt;Adversarial Examples from Computational Constraints&lt;/a&gt;
by Sébastien Bubeck, Eric Price, and Ilya Razenshteyn (2018). The paper asks a
fundamental question in machine learning: Why are adversarial examples a thing,
anyway? What's stopping us from building the robust classifiers of our dreams?&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/tabby-1024x392.jpg"&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;An adversarial …&lt;/p&gt;&lt;/figcaption&gt;&lt;/figure&gt;</summary><content type="html">&lt;p&gt;This blog post summarizes my notes
on &lt;a href="https://arxiv.org/abs/1805.10204"&gt;Adversarial Examples from Computational Constraints&lt;/a&gt;
by Sébastien Bubeck, Eric Price, and Ilya Razenshteyn (2018). The paper asks a
fundamental question in machine learning: Why are adversarial examples a thing,
anyway? What's stopping us from building the robust classifiers of our dreams?&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/tabby-1024x392.jpg"&gt;&lt;/p&gt;
&lt;figcaption&gt;
&lt;p&gt;An adversarial example. Never heard of 'em?
Read &lt;a href="https://openai.com/blog/adversarial-example-research"&gt;this&lt;/a&gt;
first. (&lt;a href="https://github.com/anishathalye/obfuscated-gradients"&gt;Cat citation&lt;/a&gt;)&lt;/p&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There are several schools of thought on this, and they're not mutually
exclusive.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Possibilities-1024x391.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This paper is the first one that I'm aware of to propose option (3), and the
authors present a setting in which it is actually the case! They construct a
family of distributions such that that, even with enough data so that it is
information-theoretically possible to learn a robust classifier, it is
impossible to find a robust classifier in polynomial time under the SQ ("
Statistical Query") model of computation. This suggests a new trade-off between
computational complexity and robustness in classification algorithms.&lt;/p&gt;
&lt;p&gt;But before we get there, let's review what the goal is: &lt;strong&gt;robust learning&lt;/strong&gt;.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Robust-learning-1024x519.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Now we can roughly understand the main theorems from the paper. Here I'm writing
them in qualitative terms, but we'll get to the rigorous ones in a bit.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Results-1-and-2.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Now, let's talk a bit about that SQ ("Statistical Query") model of computation.
It's one I hadn't heard of before this, so maybe you haven't either. The gist
isn't too complicated, though: There's a secret distribution that the SQ oracle
knows, but you don't. You get to ask it a question in the form of a [0,1]
-valued function, and get back an expected value over the distribution
(plus a little noise). Very reasonable!&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/SQ-model.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Of course, it does matter quite a bit how much noise we get in our answers. To
make this a fair fight, since the paper is going to show a hardness result (that
the SQ model isn't able to find our robust classifier efficiently), we'll assume
its quite a precise oracle, with only a little noise.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Oracle-prescision.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;That's a lot of precision! You'd need more than a polynomial number of samples
to achieve it (polynomial in &lt;em&gt;d&lt;/em&gt;, the problem dimension).&lt;/p&gt;
&lt;p&gt;So it's actually pretty surprising that the authors find that &lt;em&gt;even&lt;/em&gt; with this
high precision, there's still a task out there that requires an &lt;strong&gt;exponential&lt;/strong&gt;
number of queries for robust learning... &lt;strong&gt;and yet&lt;/strong&gt; is easy to learn without
robustness... &lt;strong&gt;and yet&lt;/strong&gt; robust learning is information-theoretically possible!&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Main-Theorem.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Now, the SQ model is not the be-all end-all of computational hardness, of
course. In fact, there are some examples we know of that have exponential SQ
hardness, but polynomial time algorithms using other methods. They're rare (so
far), but we don't really know when to expect it to happen. We're really just
grazing the tip of the iceberg on SQ here, but suffice it to say an efficient
algorithm for this problem going outside of the SQ model would be quite the
breakthrough of its own.&lt;/p&gt;
&lt;p&gt;Okay, now is a great time to skip to the takeaways at the end if you've only got
a few minutes to spare on this blog post, because we're about to get into the
nitty-gritty. It's time for everyone's favorite: a ~definition dump~. (Don't
worry, I annotated it!)&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Defs-1.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;So far, so good: We've got a loss function that takes into account an
epsilon-ball around each data point, and a definition for a robust classifier
based on that loss function. Now, let's set up our classification task.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Defs-2.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Whoo, okay! So we've got two types of robustness: robustly learnable and
robustly feasible. Feasible's the easier one: it just says that there's a
classifier out there somewhere. Learnable says we can actually find it with a
classification algorithm and the right amount of data. But don't get too
attached to that distinction -- we're about to find out that they're one and the
same!&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Feasible-Learnable.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Neat! Let's prove it. This commences the "Information Theory" part of our
program. As far as I could tell from the paper, the proof is really just a bunch
of Chernoff bounds daisy-chained together.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Robust-Learning-with-Few-Samples.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Robust-Learning-with-Few-Samples-2.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Great! Well, almost. There's a pesky little note up there on the right side of
the claim: &lt;em&gt;Assuming a finite set of classifiers&lt;/em&gt;. That's quite the assumption,
because usually when we're talking robust classifiers we're looking at neural
networks. And neural networks can generate an (approximately) continuous space
of distributions.&lt;/p&gt;
&lt;p&gt;Now it turns out we can handle this, although I haven't included the full
details in my notes for the sake of space. The basic idea is that you use this
thing called a "covering number", which more-or-less tells you about how many
epsilon-balls you need to cover the continuous space. If the covering number is
small enough, everything works out -- you get a result of the same form as
before, but now &lt;em&gt;n&lt;/em&gt; has a log dependence on the covering number instead of the
total number of classifiers. I'm hiding the hand-waving behind an html tag, but
you're welcome to take a peek.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Details please!&lt;/summary&gt;
&lt;p&gt;&lt;img alt="Generative Model" src="https://rlaw.me/images/adversarial-examples/Generative-model.jpg"&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;Okay, we've been good, eaten our vegetables, and proved learnable == feasible.
Let's get to the fun stuff! What's the distribution that's learnable, but not
efficiently?&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Complexity-1.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;It's this guy! More or less. We've got two discretized Gaussian looking things,
which are beautifully interleaved with each other. I bet that won't make it
super hard to learn using our expected value SQ oracle or anything... &lt;em&gt;shifty
eyes&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;We've got some technical properties (proofs of which I'm not including), but the
main thing to understand is that the two distributions are well-separated from
each other, and each one has its own non-overlapping set of intervals where
almost all of its sample points will fall. Oh, and a whole bunch of their
moments match. (Remember that telling-things-apart-by-expected-value thing?
Yeah, it's not subtle.)&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Complexity-2.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;D&lt;sub&gt;A&lt;/sub&gt; and D&lt;sub&gt;B&lt;/sub&gt; aren't actually our distributions, though --
there's one more step to extend this to higher dimensions. First we set up a
family of &lt;em&gt;k&lt;/em&gt;-dimensional subspaces that are pairwise close-to-orthogonal.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Complexity-3.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Complexity-4.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;And that's all there is to it!&lt;/p&gt;
&lt;p&gt;Now, we can prove a few lemmas to show that the properties we wanted all hold:&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Lemma-4-3.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;details&gt;
&lt;summary&gt;Proof Sketch&lt;/summary&gt;
&lt;p&gt;&lt;img alt="Proof Sketch 4.3" src="https://rlaw.me/images/adversarial-examples/Proof-4-3.jpg"&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Lemma-4-4.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;details&gt;
&lt;summary&gt;Proof Sketch&lt;/summary&gt;
&lt;p&gt;&lt;img alt="Proof Sketch 4.4" src="https://rlaw.me/images/adversarial-examples/Proof-4-4.jpg"&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;And all together, this means our fancy distribution does, in fact, admit a
robust classifier.&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Point-of-the-Lemmas.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;Let's recap. What was the point of all this again?&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/The-Point.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;This last bit is new! This part makes some intuitive sense to me: our
distribution looks a lot like a (discretized) Gaussian, so it stands to reason
it should take lots of queries to tell it apart from one, if all we get to look
at are expected values. The specifics of this are actually completely nontrivial
to prove, and it goes much deeper into the SQ model -- but for that, you'll need
to read the paper. ;) (Check out Appendix B.)&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="The Point 2" src="https://rlaw.me/images/adversarial-examples/The-Point-2.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;As promised, it's also easy to learn without robustness. A hand-wavy sketch, but
there's not much else to see here. We've accomplished what we came for!&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Take-Home.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;p&gt;I'll leave you with some closing thoughts and a few vague pointers to other work
in this space. And always feel free to let me know if you have any papers you'd
like to see in this blog next, and maybe it'll make it into a post!&lt;/p&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/Questions.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;p&gt;&lt;img alt="" src="https://rlaw.me/images/adversarial-examples/past-work.jpg"&gt;&lt;/p&gt;
&lt;/figure&gt;</content><category term="blog"/></entry><entry><title>Where the Blog Walk Ends</title><link href="https://rlaw.me/where-the-blog-walk-ends.html" rel="alternate"/><published>2021-03-07T00:00:00-05:00</published><updated>2021-03-07T00:00:00-05:00</updated><author><name>Rachel Lawrence</name></author><id>tag:rlaw.me,2021-03-07:/where-the-blog-walk-ends.html</id><content type="html">&lt;p&gt;You’ve come to the end of my blog! This is (was) the first post.&lt;/p&gt;</content><category term="blog"/></entry></feed>