<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Statisfaction - I can&#39;t get no</title>
<link>https://statisfaction-blog.github.io/index.html</link>
<atom:link href="https://statisfaction-blog.github.io/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.3.450</generator>
<lastBuildDate>Mon, 24 Nov 2025 23:00:00 GMT</lastBuildDate>
<item>
  <title>The MAP is not the territory</title>
  <dc:creator>Rémi Bardenet</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/25-11-2025-the-MAP-is-not-the-territory/the-map-is-not-the-territory.html</link>
  <description><![CDATA[ 




<p>TL;DR: The MAP estimator is sometimes a Bayes estimator in disguise, but this comes at a price.</p>
<p>Say you are inferring a parameter <img src="https://latex.codecogs.com/png.latex?x%5Cin%5Cmathbb%7BR%7D%5Ed">, and you have come up with a prior density <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot)"> with respect to the Lebesgue measure on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, and a family of densities <img src="https://latex.codecogs.com/png.latex?%5C%7Bp(%5Ccdot%5Cvert%20x),%20x%5Cin%5Cmathbb%7BR%7D%5Ed%5C%7D"> for your observations, with respect to some reference measure on the space where the data live. After observing data <img src="https://latex.codecogs.com/png.latex?y">, a common practice, especially in the literature on inverse problems, is to estimate <img src="https://latex.codecogs.com/png.latex?x"> by maximizing the posterior density <span id="eq-map"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Chat%20x_%7B%5Cmathrm%7BMAP%7D%7D%20=%20%5Carg%5Cmax_x%20%5Clog%20p(y%5Cvert%20x)%20+%20%5Clog%20p(x).%0A%5Ctag%7B1%7D"></span> With the right assumptions on the two densities in the RHS of Equation&nbsp;1, the argmax is unique, thus justifying the definition. The MAP estimator is popular in inverse problems, e.g.&nbsp;for restoring corrupted images, where <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot%5Cvert%20x)"> is typically Gaussian or Poisson and the prior <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot)"> typically expresses a regularization, e.g.&nbsp;a soft constraint on coefficients in a basis or a frame. This popularity of the MAP estimator is largely explained, I think, by the availability of efficient numerical optimization procedures to solve Equation&nbsp;1 for the likelihood-prior pairs that are common in inverse problems.</p>
<p>Yet some Bayesians dislike the MAP estimator. For starters, the primitives of Bayesian inferential procedures are usually probability measures, not their densities. In particular, I can arbitrarily change the MAP estimator by modifying e.g.&nbsp;the prior density <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot)"> in Equation&nbsp;1 on a set of Lebesgue measure zero. That alone was enough of an argument for me against the MAP until about ten years ago. At that time, I saw a talk by Marcelo Pereyra in Bordeaux, presenting this <a href="https://arxiv.org/abs/1612.06149">paper</a>. Marcelo was trying to salvage the MAP estimator, by casting the MAP as a Bayes action in a (twisted) decision-theoretical framework. I remember thinking a lot about this at the time, after which I put these thoughts in a mental drawer for a while. At coffee time during the last <a href="https://gretsi.fr/colloque2025/">GRETSI</a>, Rémi Gribonval mentioned his past work on exactly this issue, and I couldn’t but reopen that drawer. I thought the basic ingredients of this discussion would make a nice blog post.</p>
<p>As a palate cleanser before the theorems, the picture in Figure&nbsp;1 is Alfred Korzybski (1879-1950), the Polish-American philosopher of science who coined the sentence making the punny title of this post. According to <a href="https://en.wikipedia.org/wiki/Science_and_Sanity">Wikipedia</a>, his views are that our understanding of the world is impeded by our nervous system, language, etc. and that mathematics are a language that helps us formulate a discourse that best approximates reality. Amusingly, this resonates with the post’s content: we will see that talking of the MAP as maximizing a posterior, and thus intuitively according some modelling role to the densities appearing in Equation&nbsp;1, is maybe not be the best way to express the mental assumptions we are making on the world when choosing the MAP estimator.</p>
<div id="fig-korzybski" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/25-11-2025-the-MAP-is-not-the-territory/korzybski_original_and_restored.jpg" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">Figure&nbsp;1: Left: Alfred Korzybski, whose Wikipedia picture would actually benefit from some denoising. Right: Denoising is classically implemented as a MAP, here I used a Gaussian likelihood and a “total variation” prior; see the <a href="https://scikit-image.org/docs/0.23.x/api/skimage.restoration.html#skimage.restoration.denoise_tv_chambolle">documentation</a> of <code>scikit-image</code>.</figcaption>
</figure>
</div>
<p>We start with part of Theorem 3.1 in Marcelo Pereyra’s above-mentioned <a href="https://arxiv.org/abs/1612.06149">paper</a>. With the notation of Equation&nbsp;1, let <img src="https://latex.codecogs.com/png.latex?%5CPhi(x)%20%5Cpropto%20-%20%5Clog%20p(y%5Cvert%20x)%20-%20%5Clog%20p(x)"> be minus the log density of the posterior obtained from <img src="https://latex.codecogs.com/png.latex?p(y%5Cvert%20%5Ccdot)"> and <img src="https://latex.codecogs.com/png.latex?p(%5Ccdot)">. Note that I use <img src="https://latex.codecogs.com/png.latex?%5Cpropto"> to say ``up to an additive constant” here. Assume <img src="https://latex.codecogs.com/png.latex?%5CPhi"> is strongly convex and <img src="https://latex.codecogs.com/png.latex?C%5E3">, and that <img src="https://latex.codecogs.com/png.latex?p(x%5Cvert%20y)%20=%20%5Cexp(-%5CPhi(x))"> decays fast as <img src="https://latex.codecogs.com/png.latex?%5CVert%20x%5CVert"> grows. Consider the so-called Bregman divergence <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20D_%5CPhi(u,x)%20=%20%5CPhi(u)%20-%20%5CPhi(x)%20-%20%5Clangle%20%5Cnabla%20%5CPhi(x),%20u-x%5Crangle.%0A"> The result states that <span id="eq-pereyra"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Chat%7Bx%7D_%5Cmathrm%7BMAP%7D%20=%20%5Carg%5Cmin_u%20%5Cint%20D_%5CPhi(u,x)%20p(x%5Cvert%20y)%5Cmathrm%7Bd%7D%20x,%0A%5Ctag%7B2%7D"></span> and in particular the minimum in the RHS is unique. Informally, the proof follows from plugging the definition of <img src="https://latex.codecogs.com/png.latex?D_%5CPhi"> in the expectation, and noting that the only non-trivial term is <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cint%20%5Cnabla%5CPhi(x)%20p(x%5Cvert%20y)%5Cmathrm%7Bd%7D%20x%20=%20-%20%5Cint%20%5Cfrac%7B%5Cnabla%20p(x%5Cvert%20y)%7D%7Bp(x%5Cvert%20y)%7D%20p(x%5Cvert%20y)%20%5Cmathrm%7Bd%7Dx%20=%20-%20%5Cint%20%5Cnabla%20p(x%5Cvert%20y)%20%5Cmathrm%7Bd%7Dx.%0A"> The latter integral is zero under the right decay assumption by the divergence theorem.</p>
<p>Now, Equation&nbsp;2 implies that the MAP estimator is a Bayes estimator, in the sense that it maximizes an expected utility (equivalently, minimizes an expected loss) with respect to a probability measure, here the posterior measure. The twist is that the loss function depends on the posterior through its negative log density <img src="https://latex.codecogs.com/png.latex?%5CPhi">, and in particular it depends on the data <img src="https://latex.codecogs.com/png.latex?y">. In subjective Bayes terms, the utility is state-dependent. This violates the most common sets of axioms of Bayesian decision theory, such as the Anscombe-Aumann axioms presented in Schervish’s <a href="https://link.springer.com/book/10.1007/978-1-4612-4250-5">book</a>. In particular, this makes it hard to interpret the posterior as a degree of belief. Yet, I have grown inclined to weaken my definition of being Bayesian, and it would be interesting to understand how the choice of <img src="https://latex.codecogs.com/png.latex?D_%5CPhi"> as a loss impacts the statistician’s ranking on actions. As a remark of independent interest, <img src="https://latex.codecogs.com/png.latex?D_%5CPhi"> is not symmetric, and if one reverts <img src="https://latex.codecogs.com/png.latex?u"> and <img src="https://latex.codecogs.com/png.latex?x"> in Equation&nbsp;2, Pereyra shows that the Bayes action becomes the posterior mean!</p>
<p>I can’t have an exhaustive bibliography in this post, but I should at least mention that Pereyra’s result generalizes an earlier <a href="https://arxiv.org/abs/1402.5297">result</a> by Burger and Lucka who focussed on Gaussian likelihoods. Pereyra also mentions a <a href="https://arxiv.org/abs/1608.07483">generalization</a> to non-Gaussian likelihoods akin to his by Burger, Dong, and Sciacchitano. Pereyra also cites a 2011 <a href="https://www.google.com/url?sa=t&amp;source=web&amp;rct=j&amp;opi=89978449&amp;url=https://inria.hal.science/inria-00486840/document&amp;ved=2ahUKEwiy0MWt9I2RAxVRLPsDHc44LdcQFnoECB0QAQ&amp;usg=AOvVaw3F-_BAJ1H6XyDftloxKr_t">paper</a> by Rémi Gribonval, which was the start of a line of work by Gribonval and Nikolova. The next result I’d like to cover is in the last paper in that line of work, a 2019 <a href="https://arxiv.org/abs/1807.04021">paper</a> by Gribonval and Nikolova. According to a footnote, Mila Nikolova passed away during their writing of the paper. I have good memories of her lectures on optimization in Cachan.</p>
<p>Gribonval and Nikolova’s take is rather different. They start from a mean posterior estimator, where the posterior is defined by what I will call the initial likelihood-prior pair. Under conditions on the initial likelihood, they manage to rewrite the mean posterior estimator in the form of Equation&nbsp;1, for a different likelihood-prior pair than the initial one. Let me call this new pair of densities appearing in the MAP reformulation the computational pair. For the authors, the computational pair of densities are not thought of as modelling the data generation process or a prior belief, they are simply intermediate quantities that appear in a formal rewriting of the original Bayesian estimator. Compared to Pereyra and Burger et al., the procedure has the benefit of keeping the loss function untouched: it remains the squared loss throughout. The price is to pay is, from what I understand, a limited number of initial likelihoods that can be treated, and a rather intricate definition of the computational prior. An important message from the paper is that, if you choose to go for a MAP estimator (say, a LASSO estimator in linear regression, or the total-variation denoiser I used in Figure&nbsp;1), your likelihood-prior pair is of the computational kind: your modelling choices are encoded in the implicit initial pair of densities.</p>
<p>Their fundamental tool is their Lemma 1 on proximal operators, i.e.&nbsp;operators that map data to the solution of a regularized least-squares problem. Formally, for a function <img src="https://latex.codecogs.com/png.latex?%5Cvarphi:%5Cmathbb%7BR%7D%5Ed%5Crightarrow%20%5Cmathbb%7BR%7D%5Ccup%5C%7B+%5Cinfty%5C%7D"> that is not identically <img src="https://latex.codecogs.com/png.latex?+%5Cinfty">, define <span id="eq-proximal"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathrm%7Bprox%7D_%5Cphi(y)%20:=%20%5Carg%5Cmin_%7Bx%5Cin%5Cmathbb%7BR%7D%5Ed%7D%20%5Cfrac12%5CVert%20y-x%5CVert%5E2%20+%20%5Cvarphi(x).%0A%5Ctag%7B3%7D"></span> Proximal operators are a key notion in optimization of non-differentiable functions, as solving a regularized least squares problem can intuitively replace a gradient descent step. In a companion <a href="https://arxiv.org/abs/1807.04014">paper</a>, Gribonval and Nikolova had found a characterization of proximal operators, and they apply it here to posterior means: under (stringent) conditions on the initial likelihood-prior pairs, the mean of the posterior can be rewritten as a MAP for a different likelihood-prior pair. They give many examples of the resulting MAP reformulations. To cite only one, their Proposition 1 states that if <img src="https://latex.codecogs.com/png.latex?Y%5Cvert%20X"> is a Poisson law, and the prior on <img src="https://latex.codecogs.com/png.latex?X"> is whatever you want, then there exists a function <img src="https://latex.codecogs.com/png.latex?%5Ctilde%5Cvarphi"> on the positive reals such that <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BE%7D(X%5Cvert%20Y=n)%20=%20%5Carg%5Cmin_x%20%5Cfrac12%5Cvert%20n%20-%20x%5Cvert%5E2%20+%20%5Ctilde%5Cvarphi(x).%0A"> Otherly put, the posterior mean for a model with Poisson noise has a MAP formulation as in Equation&nbsp;1, for a computational likelihood that looks like a Gaussian!</p>
<p>Overall, a MAP can hide a Bayes estimator, at the price of either a data-dependent loss function or because your MAP problem is the proximal rewriting of a posterior mean corresponding to a different likelihood-prior pair! Note that I’ve only scratched the surface of the papers I mention, and they all contain more nuggets than what I dug out.</p>



 ]]></description>
  <category>inference</category>
  <category>inverse problems</category>
  <category>foundations</category>
  <guid>https://statisfaction-blog.github.io/posts/25-11-2025-the-MAP-is-not-the-territory/the-map-is-not-the-territory.html</guid>
  <pubDate>Mon, 24 Nov 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Testing the RSS feed</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/04-10-2025-test/index.html</link>
  <description><![CDATA[ 




<p>This is just a test. Sorry for the inconvenience. I will delete this post shortly.</p>



 ]]></description>
  <category>test</category>
  <category>sorry</category>
  <guid>https://statisfaction-blog.github.io/posts/04-10-2025-test/index.html</guid>
  <pubDate>Fri, 03 Oct 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Nested sampling and SMC: numerical experiments</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/30-06-2025-nested-sampling-experiments/index.html</link>
  <description><![CDATA[ 




<div class="hidden">
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cnewcommand%7B%5Cind%7D%7B%5Cmathbb%7B1%7D%7D%0A"></p>
</div>
<p>By “popular” demand (i.e., Adrien asked for it), here are the numerical experiments I promised in my previous <a href="../04-06-2025-simpler-nested-sampling-identity/nested.html">post</a>. I did these experiments initially to illustrate some points in the talk I gave at <a href="https://www.ipp.mpg.de/5281701">MaxEnt 2023</a>.</p>
<section id="ns-smc-vs-tempering-smc-for-a-gaussian-like-target" class="level1">
<h1>NS-SMC vs tempering SMC for a Gaussian-like target</h1>
<p><a href="https://arxiv.org/abs/1805.03924">Salomone et al</a> discuss how NS (nested sampling, both the vanilla and the SMC versions) may outperform tempering whenever the target distribution exhibits pathologies such as <strong>phase transition</strong>. It is not easy (to me at least) to grasp how phase transition may occur in a Bayesian posterior, but I suspect it tends to occur when the posterior is multi-modal. Please have a look at their paper for more details on this and their first numerical experiment which illustrates this point.</p>
<p>In this first experiment, I wanted to see whether NS-SMC is competitive with tempering SMC on a less challenging target distribution, i.e.&nbsp;the good old logistic regression posterior, which is typically Gaussian like, and therefore unimodal.</p>
<p>In the plots below, I compare two instances of waste-free SMC, one based on a tempering sequence, the other on the NS sequence discussed in the previous post. Both algorithms derive automatically the next element in these sequences so that the ESS (effective sampling size) is <img src="https://latex.codecogs.com/png.latex?%5Calpha%20N">. Both rely on random walk Metropolis kernels, which are calibrated on the current particle sample.</p>
<p>I consider the sonar dataset (dim=61). The plot below show how the MSE (over 100 runs) of the log-marginal likelihood evolves as a function of <img src="https://latex.codecogs.com/png.latex?%5Calpha"> for both algorithms; considered values for <img src="https://latex.codecogs.com/png.latex?%5Calpha"> are <img src="https://latex.codecogs.com/png.latex?0.05,%200.10,%20%5Cdots,%200.95">. This plot is a bit misleading, because, when <img src="https://latex.codecogs.com/png.latex?%5Calpha"> changes, the CPU cost changes as well: the large <img src="https://latex.codecogs.com/png.latex?%5Calpha"> is, the larger is the number of intermediate distributions.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/30-06-2025-nested-sampling-experiments/sonar_nested_vs_tempering_mse.png" class="img-fluid figure-img" style="width:60.0%"></p>
</figure>
</div>
<p>So let’s do a second plot, where the <img src="https://latex.codecogs.com/png.latex?y-">axis is the work-normalised MSE; that is, MSE times number of total evaluations of the likelihood (which is a good proxy for overall CPU cost). See below.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/30-06-2025-nested-sampling-experiments/sonar_nested_vs_tempering_work_normalised.png" class="img-fluid figure-img" style="width:60.0%"></p>
</figure>
</div>
<p>Both variants seems to lead to the same level of performance (i.e.&nbsp;CPU vs error trade-off). One point to note is that the best performance for NS is obtained by taking <img src="https://latex.codecogs.com/png.latex?%5Calpha"> small.</p>
<p>Bottom line: NS-SMC seems indeed competitive with tempering SMC. This is a bit surprising to me (as they rely on two very different sequences of distribution), but it shows that NS-SMC may deserve more scrutiny from the Bayesian computation community, I think.</p>
</section>
<section id="a-limitation-of-vanilla-ns" class="level1">
<h1>A limitation of vanilla NS</h1>
<p>Another experiment, possibly of more limited interest. For the same type of target distributions (logistic regression posterior, this time for the Pima dataset), the plot below illustrates the bias of vanilla NS as a function of the number <img src="https://latex.codecogs.com/png.latex?k"> of MCMC steps performed at each iteration. Recall that in vanilla NS, you discard one particle at each time <img src="https://latex.codecogs.com/png.latex?t"> (the one with smallest likelihood), choose randomly one of the <img src="https://latex.codecogs.com/png.latex?N-1"> remaining one, apply <img src="https://latex.codecogs.com/png.latex?k"> MCMC steps to this selected particle, and add back the output to the particle sampler.</p>
<p>This plot suggests that NS may be biased if <img src="https://latex.codecogs.com/png.latex?k"> is too small. I am not sure why this is happening. This may be because NS is valid only when <img src="https://latex.codecogs.com/png.latex?k%5Cto%20%5Cinfty">. Or maybe because of the adaptive MCMC strategy I’m using: as in the previous section, I use random walk Metropolis, and I recursively adapt the proposal covariance to the empirical covariance matrix of the <img src="https://latex.codecogs.com/png.latex?N"> particles.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/30-06-2025-nested-sampling-experiments/pima_bias_vanilla.png" class="img-fluid figure-img" style="width:60.0%"></p>
</figure>
</div>
</section>
<section id="how-to-replicate-these-results" class="level1">
<h1>How to replicate these results</h1>
<p>I have added some time ago a <code>nested</code> module to <a href="https://github.com/nchopin/particles">particles</a>, which implements both vanilla NS and NS-SMC. The numerical experiments reported above may be reproduced by running the scripts in folder <code>papers/nested</code>.</p>


</section>

 ]]></description>
  <category>Nested sampling</category>
  <category>SMC</category>
  <category>numerics</category>
  <guid>https://statisfaction-blog.github.io/posts/30-06-2025-nested-sampling-experiments/index.html</guid>
  <pubDate>Sun, 29 Jun 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>A simpler nested sampling identity</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/04-06-2025-simpler-nested-sampling-identity/nested.html</link>
  <description><![CDATA[ 




<div class="hidden">
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cnewcommand%7B%5Cind%7D%7B%5Cmathbb%7B1%7D%7D%0A"></p>
</div>
<p>In this post, I am trying to come up with a simple introduction to NS (nested sampling), through the lens of SMC samplers. It should be interesting to readers who are familiar with the latter but not with the former.</p>
<p>This post is inspired by <a href="https://arxiv.org/abs/1805.03924">this</a> paper by Salomone et al, which has just been accepted in JRSSB. Congrats to the authors!</p>
<section id="set-up" class="level1">
<h1>Set up</h1>
<p>Consider a model with parameter <img src="https://latex.codecogs.com/png.latex?x">, prior <img src="https://latex.codecogs.com/png.latex?p(x)">, and likelihood <img src="https://latex.codecogs.com/png.latex?L(x)">. The posterior is then <img src="https://latex.codecogs.com/png.latex?%5Cpi(x)%20=%20%5Cfrac%7Bp(x)L(x)%7D%7BZ%7D,%5Cquad%20Z%20=%20%5Cint%20p(x)L(x)dx."> (The Bayesian interpretation is not essential. More generally, <img src="https://latex.codecogs.com/png.latex?p"> could be a proposal distribution, <img src="https://latex.codecogs.com/png.latex?%5Cpi"> a target distribution, and <img src="https://latex.codecogs.com/png.latex?L"> a function proportional to <img src="https://latex.codecogs.com/png.latex?%5Cpi/p">.)</p>
<p>Let’s now introduce the following family of distributions: <img src="https://latex.codecogs.com/png.latex?%5Cpi_%5Clambda(x)%20=%20%5Cfrac%7Bp(x)%20%5Cind%5C%7B%20L(x)%20%3E%20%5Clambda%5C%7D%7D%7BZ(%5Clambda)%7D,%5Cquad%0AZ(%5Clambda)%20=%20%5Cmathbb%7BP%7D_%5Cmathrm%7Bprior%7D(L(X)%20%3E%20%5Clambda)."> In words, <img src="https://latex.codecogs.com/png.latex?%5Cpi_%5Clambda"> is the prior truncated to the region <img src="https://latex.codecogs.com/png.latex?%5C%7Bx:%20L(x)%20%3E%20%5Clambda%5C%7D">, and the normalising constant <img src="https://latex.codecogs.com/png.latex?Z(%5Clambda)"> is the prior probability that <img src="https://latex.codecogs.com/png.latex?L(X)%3E%5Clambda">.</p>
<p>If we introduce a sequence <img src="https://latex.codecogs.com/png.latex?0=%5Clambda_%7B0%7D%20%3C%20%5Clambda_1%20%3C%20%5Cdots%20%3C%20%5Clambda_%7BT+1%7D%20=%20%5Cinfty">, we can use a SMC sampler to approximate recursively <img src="https://latex.codecogs.com/png.latex?%5Cpi_%7B%5Clambda_t%7D"> and its normalising constant <img src="https://latex.codecogs.com/png.latex?Z(%5Clambda_t)">. Note that the particle weights at each time will be 0 or 1 in this particular SMC sampler, since: <img src="https://latex.codecogs.com/png.latex?%20%5Cfrac%7B%5Cpi_%7B%5Clambda_t%7D(x)%7D%7B%5Cpi_%7B%5Clambda_%7Bt-1%7D%7D(x)%7D%20%5Cpropto%20%5Cind%5C%7BL(x)%20%3E%0A%5Clambda_t%5C%7D.%0A"></p>
</section>
<section id="the-simpler-ns-identity" class="level1">
<h1>The simpler NS identity</h1>
<p>Now comes the identity. Let <img src="https://latex.codecogs.com/png.latex?%5Cgamma(%5Cvarphi)%20=%20%5Cint%20p%20L%20%5Cvarphi"> for an arbitrary function <img src="https://latex.codecogs.com/png.latex?%5Cvarphi"> of <img src="https://latex.codecogs.com/png.latex?x">. Then: <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Balign*%7D%0A%5Cgamma(%5Cvarphi)%20&amp;%20=%20%5Cint%20p%20L%20%5Cvarphi%20%5C%5C%0A%20%20%20%20&amp;%20=%20%5Csum_%7Bt=0%7D%5ET%20%5Cint%20p%20L%20%5Cvarphi%20%5Cind%5C%7B%20%5Clambda_%7Bt%7D%20%3C%20L%20%5Cleq%20%5Clambda_%7Bt+1%7D%20%5C%7D%20%5C%5C%0A%20%20%20%20&amp;%20=%20%5Csum_%7Bt=0%7D%5ET%20Z(%5Clambda_t)%20%5Cpi_%7B%5Clambda_t%7D%5Cleft(L%20%5Cvarphi%20%5Ctimes%20%5Cind%5C%7B%20L%20%5Cleq%0A%20%20%20%20%5Clambda_%7Bt+1%7D%5C%7D%5Cright)%0A%5Cend%7Balign*%7D%0A"></p>
<p>Thus, if we implement a SMC sampler that tracks the sequence <img src="https://latex.codecogs.com/png.latex?%5Cpi_%7B%5Clambda_t%7D">, we will be able to approximate all the above quantities, and thus, through this identity, to approximate the marginal likelihood, <img src="https://latex.codecogs.com/png.latex?Z=%5Cgamma(1)">, and posterior moments, <img src="https://latex.codecogs.com/png.latex?%5Cpi(%5Cvarphi)%20=%20%5Cgamma(%5Cvarphi)%20/%20%5Cgamma(1)">.</p>
</section>
<section id="choosing-the-lambda_ts" class="level1">
<h1>Choosing the <img src="https://latex.codecogs.com/png.latex?%5Clambda_t">’s</h1>
<p>In practice, we need to choose the <img src="https://latex.codecogs.com/png.latex?%5Clambda_t">’s. As in tempering, it seems reasonable to set them automatically, in such a way that the ESS (effective sample size) is <img src="https://latex.codecogs.com/png.latex?%5Calpha%20N">, for some <img src="https://latex.codecogs.com/png.latex?%5Calpha%5Cin(0,%201)">. Because the weight function is <img src="https://latex.codecogs.com/png.latex?0/1">, this amounts to taking <img src="https://latex.codecogs.com/png.latex?%5Clambda_t"> to be the <img src="https://latex.codecogs.com/png.latex?%5Calpha-">upper quantile of the <img src="https://latex.codecogs.com/png.latex?L(X_t%5En)">, where the <img src="https://latex.codecogs.com/png.latex?X_t%5En">’s are the <img src="https://latex.codecogs.com/png.latex?N"> particles sampled at time <img src="https://latex.codecogs.com/png.latex?t"> by our SMC sampler. This is what Salomone et al recommend. In this case, we can replace <img src="https://latex.codecogs.com/png.latex?Z(%5Clambda_t)"> in the identity above by <img src="https://latex.codecogs.com/png.latex?%5Calpha%5Et">, at least for <img src="https://latex.codecogs.com/png.latex?t%3CT">.</p>
<p>The corresponding estimate will be something like: <img src="https://latex.codecogs.com/png.latex?%0A%5Cwidehat%7B%5Cgamma%7D(%5Cvarphi)%0A%20%20%20%20%20=%20%5Csum_%7Bt=0%7D%5E%7BT-1%7D%20%5Calpha%5Et%20%5Cleft%5B%20%5Cfrac%7B1%7D%7BN%7D%20%5Csum_%7Bn=1%7D%5EN%20%5Cvarphi(X_%7Bt+1%7D%5En)%20L(X_%7Bt+1%7D%5En)%0A%20%20%20%20%5Cind%5C%7B%20L(X_%7Bt+1%7D%5En)%20%5Cleq%20%5Clambda_%7Bt+1%7D%5C%7D%20%5Cright%5D%20+%20%5Cdots%0A"> where I omitted the <img src="https://latex.codecogs.com/png.latex?T-">th term (it has a slightly different expression, i.e. <img src="https://latex.codecogs.com/png.latex?Z(%5Clambda_T)%5Cneq%20%5Calpha%5ET">), and I used the fact that the <em>unweighted</em> sample <img src="https://latex.codecogs.com/png.latex?X_%7Bt+1%7D%5E%7B1:N%7D"> generated at the beginning of iteration <img src="https://latex.codecogs.com/png.latex?t+1"> currently targets <img src="https://latex.codecogs.com/png.latex?%5Cpi_%7B%5Clambda_t%7D">.</p>
</section>
<section id="vanilla-ns-as-a-particular-waste-free-smc-sampler" class="level1">
<h1>Vanilla NS as a particular waste-free SMC sampler</h1>
<p>Now assume that, in your adaptive NS-SMC sampler, you set <img src="https://latex.codecogs.com/png.latex?%5Calpha=(1%20-%20%5Cfrac%201%20N)"> (or equivalently, <img src="https://latex.codecogs.com/png.latex?%5Clambda_%7Bt+1%7D=%5Cmin_n%20L(X_%7Bt+1%7D%5En)">); that is, you discard only <em>one</em> particle, the one with smallest likelihood. In other words, you decide to move as slowly as possible up the likelihood function.</p>
<p>If you’d resample the <img src="https://latex.codecogs.com/png.latex?N-1"> surviving particles, and apply <img src="https://latex.codecogs.com/png.latex?k"> MCMC step to each of them, you would get a very expensive sampler: increasing <img src="https://latex.codecogs.com/png.latex?N"> means you both increase the cost of a single iteration, and the total number of iterations (since it makes <img src="https://latex.codecogs.com/png.latex?%5Calpha"> larger).</p>
<p>A cheaper alternative is to choose randomly one of the <img src="https://latex.codecogs.com/png.latex?N-1"> surviving particles, apply it a MCMC step, and takes the output as your new <img src="https://latex.codecogs.com/png.latex?N-">th particle. Then, you get an algorithm which is very close to the original NS one. In particular, your estimate of <img src="https://latex.codecogs.com/png.latex?Z=%5Cgamma(1)"> becomes: <img src="https://latex.codecogs.com/png.latex?%0A%5Cwidehat%7B%5Cgamma%7D(1)%0A%20%20%20%20%20=%20%5Csum_%7Bt=0%7D%5E%7BT-1%7D%20%5Cfrac%7B1%7D%7BN%7D%20(1-%20%5Cfrac%201%20N)%5Et%20L_%7Bt+1%7D%20+%20%5Cdots%0A"> with <img src="https://latex.codecogs.com/png.latex?L_%7Bt+1%7D%20=%20%5Cmin_n%20L(X_%7Bt+1%7D%5En)">. (The original NS estimate has <img src="https://latex.codecogs.com/png.latex?(1-1/N)%5Et/N"> replaced by <img src="https://latex.codecogs.com/png.latex?%5Cexp(-t/N)%20-%20%5Cexp(-(t+1)/N)">, which should be very close numerically for large <img src="https://latex.codecogs.com/png.latex?N">.)</p>
<p>This idea of resampling <img src="https://latex.codecogs.com/png.latex?N-1"> particles, and move only one of them is reminiscent of <a href="https://arxiv.org/abs/2011.02328">waste-free SMC</a>. In waste-free SMC, you resample only <img src="https://latex.codecogs.com/png.latex?M"> particles out of <img src="https://latex.codecogs.com/png.latex?N">, <img src="https://latex.codecogs.com/png.latex?M%3CN">. Then, assuming <img src="https://latex.codecogs.com/png.latex?M"> divides <img src="https://latex.codecogs.com/png.latex?N">, i.e., <img src="https://latex.codecogs.com/png.latex?N=M%5Ctimes%20P"> for some <img src="https://latex.codecogs.com/png.latex?P%5Cgeq%202">, you apply to each resampled particle <img src="https://latex.codecogs.com/png.latex?(P-1)"> MCMC steps, and gather the resulting <img src="https://latex.codecogs.com/png.latex?M%5Ctimes%20P"> states to form a new particle sample of size <img src="https://latex.codecogs.com/png.latex?N">. What if <img src="https://latex.codecogs.com/png.latex?M"> does not divide <img src="https://latex.codecogs.com/png.latex?N">, i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?N=M%20%5Ctimes%20k%20+r">, <img src="https://latex.codecogs.com/png.latex?0%3Cr%3CM">? Then it makes sense to generate <img src="https://latex.codecogs.com/png.latex?r"> MCMC chains of length <img src="https://latex.codecogs.com/png.latex?k+1">, and <img src="https://latex.codecogs.com/png.latex?M-r"> chains of length <img src="https://latex.codecogs.com/png.latex?k">. This is what happens here, with <img src="https://latex.codecogs.com/png.latex?M=N-1">, <img src="https://latex.codecogs.com/png.latex?r=1">.</p>
</section>
<section id="why-did-i-say-we-get-a-simpler-identity" class="level1">
<h1>Why did I say we get a “simpler” identity?</h1>
<p>The original NS algorithm by Skilling derives essentially the same identity as above, but through more convoluted steps, which involves the CDF of random variable <img src="https://latex.codecogs.com/png.latex?L(X)">, when <img src="https://latex.codecogs.com/png.latex?X%5Csim%20p">, its inverse, Beta distributions, etc. I find the derivation above simpler (at least, again, if you are familiar with SMC samplers). Of course, in return, you get a justification which is a bit hand-wavy for vanilla NS (but for NS-SMC, it is perfectly solid).</p>
</section>
<section id="should-i-care-about-ns" class="level1">
<h1>Should I care about NS?</h1>
<p>There are two sub-questions:</p>
<ol type="1">
<li><p>NS vs SMC-NS: Salomone et al give numerical evidence (and arguments) suggesting that NS-SMC outperforms NS.</p></li>
<li><p>SMC-NS vs tempering SMC or other SMC schemes: Salomone et al also give numerical evidence suggesting NS-SMC is competitive with tempering SMC, which is intriguing (and in line with independent numerical experiments I did).</p></li>
</ol>
<p>I will elaborate on these two points in my next post (coming soon). In the meantime, feel free to have a look at the aforementioned <a href="https://arxiv.org/abs/1805.03924">paper</a>, it is well worth a read.</p>


</section>

 ]]></description>
  <category>SMC</category>
  <category>nested sampling</category>
  <category>paper</category>
  <guid>https://statisfaction-blog.github.io/posts/04-06-2025-simpler-nested-sampling-identity/nested.html</guid>
  <pubDate>Tue, 03 Jun 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Numpy broke my heart</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/25-04-2024-numpy-broke-my-heart/numpy-broke-my-heart.html</link>
  <description><![CDATA[ 




<p>I swear, the title is kind of funny in French (try to figure out why). Anyway, in this post I wanted to dispel a misconception I had until recently on python, numpy and multi-processing, and which led me to say something silly in our SMC book.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/25-04-2024-numpy-broke-my-heart/python_broke_my_heart.jpg" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">no comment</figcaption>
</figure>
</div>
<section id="python-and-multi-processing" class="level1">
<h1>Python and multi-processing</h1>
<p>Most modern computers have several CPU cores; I guess even potato computers have a least two these days? On the other hand, a program written in Python will be executed on a single core, because of the <a href="https://en.wikipedia.org/wiki/Global_interpreter_lock">GIL</a>. This means that all the other cores will stay idle while you run your program. Which is frustrating when said program takes forever to complete.</p>
<p>There are different ways to make all your CPU cores work for you, but I will discuss the only two ways which I am (a bit familiar) with:</p>
<ol type="1">
<li><p>Use <a href="https://joblib.readthedocs.io">joblib</a> or a similar library. (But seriously, just use joblib, it’s great.) This requires a bit of work, as you have to state explicitly which parts of you program may be turned into independent tasks that will be performed in parallel. The typical use case for me is to run several times the same SMC algorithm (perhaps with different parameters, e.g.&nbsp;a different number of particles); see for instance <a href="https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/notebooks/advanced_tutorial_ssm.html#Running-many-particle-filters-in-one-go">this</a>.</p></li>
<li><p>Do nothing, and pray that your program rely on those Numpy operations which are already parallelised for you (multithreaded). Numpy rely on low-level (C/Fortran) linear algebra libraries such as BLAS and LAPACK, and these libraries are able to implement certain operations (e.g.&nbsp;matrix multiplication) on multiple cores. In this case, your python script still runs on a single core, but, when it encounters a multithreaded numpy operation, this operation spawns (temporarily, for this operation only) several threads that are executed on different cores.</p></li>
</ol>
</section>
<section id="my-bad" class="level1">
<h1>My bad</h1>
<p>Ok, now for my misconception (a.k.a. what a idiot I am.) When I run on a standard PC the following <a href="https://github.com/nchopin/particles/blob/master/book/smc_samplers/logistic_reg.py">script</a>, which implements the numerical experiment of Chapter 17 (on SMC samplers) in our book, all the cores are kept busy during the execution. This script does not rely on any form of explicit parallelism. Several SMC samplers are run, but sequentially (I don’t use <code>multiSMC</code> in this script). So clearly it’s numpy that is doing its thing (point 2 above). In fact, by profiling it, one can see that most of the CPU time is spent in the one line that computes the log-likelihood of the logistic regression model, and this involves a matrix multiplication. So this makes sense.</p>
<p>In the book (page 352 if you want to check), I said naively: if you have k cores, you get a x k speed-up for free in this particular experiment. I thought that that was the case, because all my CPU cores were 100% busy the whole time.</p>
<p>However, I did some more testing recently and tried to compare the running time of this script when numpy does multithreading and when it does not. (See <a href="https://superfastpython.com/numpy-number-blas-threads/">here</a> on how you may disable multithreading in numpy.)</p>
<ol type="1">
<li><p>On a standard PC, the speed-up is more like… one per-cent?</p></li>
<li><p>On a certain cloud-based <a href="https://github.com/InseeFrLab/onyxia">architecture</a> that I’m currently playing with (and which rely on kubernetes containers), multithreading can actually slow down the script by a factor of 10 or more.</p></li>
</ol>
</section>
<section id="whats-going-on" class="level1">
<h1>What’s going on?</h1>
<p>I am not sure, I’m a bit out of my depth here. I guess what happens is that, for this particular script, the speed-up brought by multithreading is cancelled by the time it takes to generate new threads at the beginning of the numpy operation. (Remember that this must be done each time a line with a multithreaded numpy operation is executed.) In fact, the multithreaded operation seems to be a matrix/vector mutiplication, where the matrix is not very large. (It’s of size <img src="https://latex.codecogs.com/png.latex?(N,%20d)">, where <img src="https://latex.codecogs.com/png.latex?N"> is the number of particles. I tried to increase N several times over, but it did not change the results.)</p>
<p>And things may get worse in containers, where either numpy might do wrong assumptions on the available resources, or you simply share resources with many other users. (Disclaimer, I don’t know what I’m talking about.)</p>
<p>Also, of course, this kind or results may depend on your hardware, the version of python and related libraries you are using (in particular whether you use the openBLAS version of BLAS of the MKL one which is specific to Intel CPUs, to see this, check the output <code>numpy.config()</code> on your machine.) and so on. The picture below summaries the situation.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/25-04-2024-numpy-broke-my-heart/Alice_looking_down_cpu_cores.jpg" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">Alice decided to better understand multiprocessing in Python</figcaption>
</figure>
</div>
</section>
<section id="enter-joblib" class="level1">
<h1>Enter joblib</h1>
<p>The discussion above assumes you run a single program, and that Numpy may or may not get access to all the cores. What if you try to implement multi-processing (using joblib, multiprocessing or something else), but each task perform numpy operations? You could have a <em>over-subscription</em> problem, that is, you end up with many threads (more than the number of cores), and the computer wastes a lot of time trying to juggle between all these threads.</p>
<p>Fortunately, joblib is smart enough to tell numpy to calm down and generate fewer threads. This point is discussed <a href="https://joblib.readthedocs.io/en/stable/parallel.html#avoiding-over-subscription-of-cpu-resources">here</a> inin the documentation. Well worth a read.</p>
<p>I managed to speed up my script significantly by using joblib, but I still cannot obtain a x 24 factor on my niffy 24-core PC. I am still <del>crying</del> investigating.</p>
</section>
<section id="take-home-messages" class="level1">
<h1>Take-home messages</h1>
<ul>
<li><p>It’s not because all your 20 CPU cores are busy that your script is running 20 times faster.</p></li>
<li><p>If you actually want to achieve a substantial speed-up in a multi-core hardware, you might need to try different things, and check the actual results (i.e.&nbsp;measure the total running time).</p></li>
<li><p>Read <a href="https://superfastpython.com/numpy-number-blas-threads/">this</a> and <a href="https://superfastpython.com/numpy-blas-multiprocessing/">this</a> if you want to learn more about multiprocessing and numpy, I found these pages clear and authoritative on this topic.</p></li>
<li><p>don’t believe everything you read in books? :-)</p></li>
</ul>


</section>

 ]]></description>
  <category>python</category>
  <category>joblib</category>
  <category>multiprocessing</category>
  <category>numpy</category>
  <guid>https://statisfaction-blog.github.io/posts/25-04-2024-numpy-broke-my-heart/numpy-broke-my-heart.html</guid>
  <pubDate>Wed, 24 Apr 2024 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Quantum workers in Bernoulli factories</title>
  <dc:creator>Rémi Bardenet</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/14-02-2024-quantum-bernoulli-factories/quantum-bernoulli-factories.html</link>
  <description><![CDATA[ 




<p>TL;DR: A quantum computer lets you provably build more general Bernoulli factories than your laptop.</p>
<p>I have grown an interest for quantum computing, both for fun and because it naturally applies to sampling my favourite distribution, <a href="https://arxiv.org/abs/2305.15851">determinantal point processes</a>. One of the natural (and still quite open) big questions in quantum computing is, for a given computational task such as solving a linear system, whether having access to a quantum computer gives you any <em>advantage</em> over using your laptop in the smartest way possible. Maybe the quantum computer lets you solve part of your problem faster, or maybe it allows you to solve a more general class of problems. <a href="https://www.nature.com/articles/ncomms9203">Dale, Jennings, and Rudolph (2015)</a> prove a quantum advantage of the latter kind, for a task that appeals to a computational statistician: a quantum computer gives you access to strictly more Bernoulli factories than your laptop does. In this post, I discuss one of their examples.</p>
<div id="fig-smbc" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/14-02-2024-quantum-bernoulli-factories/smbc.png" class="img-fluid figure-img"></p>
<figcaption class="figure-caption">Figure&nbsp;1: An excerpt from an excellent <a href="https://www.smbc-comics.com/comic/the-talk-3">comic strip</a> by Scott Aaronson and Zach Weinersmith.</figcaption>
</figure>
</div>
<section id="bernoulli-factories" class="level2">
<h2 class="anchored" data-anchor-id="bernoulli-factories">Bernoulli factories</h2>
<p>First, I need to define what a Bernoulli factory is. Loosely speaking, a Bernoulli factory is an algorithm that, when fed with i.i.d draws from a Bernoulli random variable <img src="https://latex.codecogs.com/png.latex?B(p)"> with unknown parameter <img src="https://latex.codecogs.com/png.latex?p">, outputs a stream of independent Bernoullis with parameter <img src="https://latex.codecogs.com/png.latex?f(p)">. The algorithm does not have access to the value of <img src="https://latex.codecogs.com/png.latex?p">, and needs to work for as large a range of values of <img src="https://latex.codecogs.com/png.latex?p"> as possible. For instance, a trick attributed to von Neumann gives you a Bernoulli factory for the constant function <img src="https://latex.codecogs.com/png.latex?f%5Cequiv%201/2">, can you guess how? If you have never seen this trick, take a break and think about it. Here is a hint: try to pair Bernoulli draws and define two events of equal probability.</p>
<p>The problem of determining what Bernoulli factories can be constructed on a <em>classical</em> (as opposed to <em>quantum</em>) computer has been answered by <a href="https://dl.acm.org/doi/10.1145/175007.175019">Keane and O’Brien (1994)</a>. Essentially, it is necessary and sufficient that <img src="https://latex.codecogs.com/png.latex?(i)"> <img src="https://latex.codecogs.com/png.latex?f"> be continuous on its domain <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BP%7D">, and that <img src="https://latex.codecogs.com/png.latex?(ii)"> either <img src="https://latex.codecogs.com/png.latex?f"> is constant or there exists an integer <img src="https://latex.codecogs.com/png.latex?n"> such that, for all <img src="https://latex.codecogs.com/png.latex?p%5Cin%5Cmathcal%7BP%7D">, <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmin%5B%20f(p),%201-f(p)%5D%20%5Cgeq%20%5Cmin%20%5B%20p%5En,%20(1-p)%5En%20%5D.%0A"> In particular, a non-constant <img src="https://latex.codecogs.com/png.latex?f"> should not take the values <img src="https://latex.codecogs.com/png.latex?0"> or <img src="https://latex.codecogs.com/png.latex?1"> in <img src="https://latex.codecogs.com/png.latex?(0,1)">, and cannot approach these extreme values too fast. In particular, the doubling function <img src="https://latex.codecogs.com/png.latex?f_%5Cmathrm%7Bdouble%7D:p%5Cmapsto%202p"> defined on <img src="https://latex.codecogs.com/png.latex?%5B0,1/2%5D"> does not correspond to a Bernoulli factory, while its restriction to <img src="https://latex.codecogs.com/png.latex?%5B0,1/2-%5Cepsilon%5D"> does, for any <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%3E0">. Another simple example is <span id="eq-quadratic"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20f_%5Cmathrm%7Bquadratic%7D:p%5Cmapsto%204p(1-p)%0A%5Ctag%7B1%7D"></span> defined on <img src="https://latex.codecogs.com/png.latex?%5B0,1%5D">, which does not correspond to a Bernoulli factory. Yet, the rest of the post shows that <img src="https://latex.codecogs.com/png.latex?f_%5Cmathrm%7Bquadratic%7D"> does correspond to a specific weakening of the notion of Bernoulli factory, one that is natural in quantum computing.</p>
</section>
<section id="quantum-computers-and-quantum-coins" class="level2">
<h2 class="anchored" data-anchor-id="quantum-computers-and-quantum-coins">Quantum computers and quantum coins</h2>
<p>Now buckle up, because I need to define a mathematical model for a quantum computer. This model only requires basic algebra, albeit with strange notation. Let <img src="https://latex.codecogs.com/png.latex?N"> be a positive integer, and <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BH%7D%20=%20(%5Cmathbb%7BC%7D%5E2)%5E%7B%5Cotimes%20N%7D%20=%20%5Cmathbb%7BC%7D%5E2%5Cotimes%20%5Cdots%20%5Cotimes%20%5Cmathbb%7BC%7D%5E2,%0A"> where the tensor product is taken <img src="https://latex.codecogs.com/png.latex?N"> times. An <img src="https://latex.codecogs.com/png.latex?N">-qubit quantum computer is a machine that, when fed with</p>
<ol type="1">
<li>a positive semi-definite, Hermitian operator <img src="https://latex.codecogs.com/png.latex?%5Crho"> acting on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D">, with trace norm <img src="https://latex.codecogs.com/png.latex?1"> (the <em>state</em>),</li>
<li>a Hermitian operator <img src="https://latex.codecogs.com/png.latex?A"> on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D"> (the <em>observable</em>),</li>
</ol>
<p>outputs a draw from the random variable <img src="https://latex.codecogs.com/png.latex?X_%7BA,%5Crho%7D">, with support included in the spectrum of <img src="https://latex.codecogs.com/png.latex?A">, defined by <span id="eq-born"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BE%7D%20g(X_%7BA,%5Crho%7D)%20=%20%5Cmathrm%7BTr%7D(%5Crho%20g(A)),%20%5Cquad%20g:%5Cmathbb%7BH%7D%5Crightarrow%20%5Cmathbb%7BR%7D_+.%0A%5Ctag%7B2%7D"></span> Here <img src="https://latex.codecogs.com/png.latex?g(A)"> is the operator that has the same eigenvectors as <img src="https://latex.codecogs.com/png.latex?A">, but where each eigenvalue <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is replaced by <img src="https://latex.codecogs.com/png.latex?g(%5Clambda)">. The correspondence in Equation&nbsp;2 between a state-observable pair and a probability distribution on the spectrum of the observable is a cornerstone of quantum physics called <em>Born’s rule</em>, and it is the only bit of quantum theory we shall need. In other words, we see a quantum computer as a procedure to draw from probability distributions parametrized by state-observable pairs. We give two fundamental examples of such state-observable pairs, which can be respectively interpreted as describing one quantum coin and two quantum coins.</p>
<p><em>The quantum coin.</em> Consider a one-qubit computer, i.e.&nbsp;<img src="https://latex.codecogs.com/png.latex?N=1">. Then <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D%20=%20%5Cmathbb%7BC%7D%5E2"> has dimension <img src="https://latex.codecogs.com/png.latex?2">, and we fix an orthonormal basis, which we denote by <img src="https://latex.codecogs.com/png.latex?(%5Cket%7B0%7D,%20%5Cket%7B1%7D)">. The strange notation <img src="https://latex.codecogs.com/png.latex?%5Cket%7B%5Ccdot%7D"> is inherited from physics, and is very practical in computations, as you will see. In short, denote by <img src="https://latex.codecogs.com/png.latex?%5Cbraket%7B%5Ccdot%5Cvert%5Ccdot%7D"> (a <em>bracket</em>, or <em>bra-ket</em>) the inner product in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D">. Now, a vector in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D"> is written <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bv%7D"> (a <em>ket</em>). Similarly, define the linear form <img src="https://latex.codecogs.com/png.latex?%5Cbra%7Bv%7D"> (a <em>bra</em>) by <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cbra%7Bv%7D:%20%5Cket%7Bu%7D%20%5Cmapsto%20%5Cbraket%7Bv%5Cvert%20u%7D.%0A"> By construction, we can write things like <img src="https://latex.codecogs.com/png.latex?%0A%5Cbra%7Bv%7D%20%5Cket%7Bu%7D%20=%20%5Cbraket%7Bv%5Cvert%20u%7D,%0A"> so that the bra-ket notation for linear forms and vectors is consistent with the inner product.</p>
<p>Now, remember we have fixed a basis <img src="https://latex.codecogs.com/png.latex?(%5Cket%7B0%7D,%20%5Cket%7B1%7D)"> of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D">. For <img src="https://latex.codecogs.com/png.latex?p%5Cin%5B0,1%5D">, we define <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cket%7Bp%7D%20=%20%5Csqrt%7B1-p%7D%20%5Cket%7B0%7D%20+%20%5Csqrt%7Bp%7D%5Cket%7B1%7D.%0A"> This definition is consistent with earlier notation, as <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bp%7D%20=%20%5Cket%7B0%7D"> when <img src="https://latex.codecogs.com/png.latex?p=0">, for instance. Now, we define a quantum coin as the state <img src="https://latex.codecogs.com/png.latex?%5Crho_%7B%5Cmathrm%7Bqc%7D%7D%20=%20%5Cket%7Bp%7D%5Cbra%7Bp%7D">. It is the projection onto <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BC%7D%5Cket%7Bp%7D">, and in particular it is a positive semi-definite, Hermitian operator of trace <img src="https://latex.codecogs.com/png.latex?1">, and hence a valid state. As observable, we take the projection onto the second vector of the basis, which we denote in the bra-ket notation by <img src="https://latex.codecogs.com/png.latex?%5Cket%7B1%7D%5Cbra%7B1%7D">. What random variable <img src="https://latex.codecogs.com/png.latex?X_%7B%5Cket%7B1%7D%5Cbra%7B1%7D,%20%5Crho_%7B%5Cmathrm%7Bqc%7D%7D%7D"> does this state-observable pair define in Equation&nbsp;2?</p>
<p>Well, the spectrum of the observable is <img src="https://latex.codecogs.com/png.latex?%5C%7B0,1%5C%7D">, so we have defined a Bernoulli random variable. Moreover, the probability that it is equal to <img src="https://latex.codecogs.com/png.latex?1"> is given by taking <img src="https://latex.codecogs.com/png.latex?g:%5Clambda%5Cmapsto%20%5Cmathbf%7B1%7D_%7B%5Clambda=1%7D"> in Equation&nbsp;2, yielding <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7B%5Cket%7B1%7D%5Cbra%7B1%7D,%20%5Crho_%7B%5Cmathrm%7Bqc%7D%7D%7D%20=%201)%20=%20%5Cmathrm%7BTr%7D%5Cleft%5B%20%5Cket%7B1%7D%5Cbra%7B1%7D%20%5Cket%7Bp%7D%5Cbra%7Bp%7D%20%20%5Cright%5D%20=%20%5Cvert%20%5Cbraket%7B1%5Cvert%20p%7D%5Cvert%5E2%20=%20p.%0A"> by cyclicity of the trace. All of this to define a <img src="https://latex.codecogs.com/png.latex?B(p)"> variable! Things get more interesting when you try to create two dependent Bernoulli variables.</p>
<p><em>Two quantum coins.</em> Consider now a computer with two qubits, so that the Hilbert space is <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D=%5Cmathbb%7BC%7D%5E2%5Cotimes%5Cmathbb%7BC%7D%5E2">. From our orthonormal basis <img src="https://latex.codecogs.com/png.latex?(%5Cket%7B0%7D,%20%5Cket%7B1%7D)"> of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BC%7D%5E2">, we can build an orthonormal basis <img src="https://latex.codecogs.com/png.latex?(%5Cket%7Bi%7D%5Cotimes%5Cket%7Bj%7D,%20i,j%5Cin%5C%7B0,1%5C%7D)"> of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BH%7D">. To keep expressions short, it is customary to write <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bi%7D%5Cotimes%5Cket%7Bj%7D"> as <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bij%7D">. To define a pair of quantum coins, we now consider the tensor product of two quantum coins, <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cket%7Bp%7D%5Cotimes%20%5Cket%7Bp%7D%20=%20(1-p)%20%5Cket%7B00%7D%20+%20%5Csqrt%7Bp(1-p)%7D%5Cket%7B01%7D%20+%20%5Csqrt%7Bp(1-p)%7D%5Cket%7B10%7D+%20p%20%5Cket%7B11%7D.%0A"> We think of the corresponding state <img src="https://latex.codecogs.com/png.latex?%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%20=%20(%5Cket%7Bp%7D%5Cotimes%20%5Cket%7Bp%7D)(%5Cbra%7Bp%7D%5Cotimes%20%5Cbra%7Bp%7D)"> as two quantum coins. Now consider for your observable an operator <img src="https://latex.codecogs.com/png.latex?B"> with four distinct eigenvalues, say <img src="https://latex.codecogs.com/png.latex?%5Clambda_%7Bij%7D%20%5Cin%5Cmathbb%7BC%7D"> for <img src="https://latex.codecogs.com/png.latex?i,%20j%5Cin%5C%7B0,1%5C%7D">, each corresponding to eigenvector <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bij%7D">. In other words, the spectral decomposition of <img src="https://latex.codecogs.com/png.latex?B"> is <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20B%20=%20%5Csum_%7Bi,%20j%5Cin%5C%7B0,1%5C%7D%7D%20%5Clambda_%7Bij%7D%20%5Cket%7Bij%7D%5Cbra%7Bij%7D.%0A"> The random variable <img src="https://latex.codecogs.com/png.latex?X_%7BB,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">, associated through Equation&nbsp;2 to two quantum coins and our newly defined observable <img src="https://latex.codecogs.com/png.latex?B">, has support in <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5C%7B%5Clambda_%7B00%7D,%20%5Clambda_%7B01%7D,%20%5Clambda_%7B10%7D,%20%5Clambda_%7B11%7D%5C%7D.%0A"> Moreover, taking <img src="https://latex.codecogs.com/png.latex?g:%5Clambda%5Cmapsto%20%5Cmathbf%7B1%7D_%7B%5Clambda=%5Clambda_%7Bij%7D%7D"> in Equation&nbsp;2, we obtain <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7BB,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D%20=%20%5Clambda_%7Bij%7D)%20=%20%5Cmathrm%7BTr%7D%5Cleft%5B(%5Cket%7Bp%7D%5Cotimes%20%5Cket%7Bp%7D)(%5Cbra%7Bp%7D%5Cotimes%20%5Cbra%7Bp%7D)%20%5Cket%7Bij%7D%5Cbra%7Bij%7D%5Cright%5D%20=%20p%5E%7Bi%7D(1-p)%5E%7B1-i%7D%20%5Ctimes%20p%5E%7Bj%7D(1-p)%5E%7B1-j%7D,%0A"> again by cyclicity of the trace and then carefully distributing our multiplication, noting that most terms are zero by orthogonality. Otherly put, the indices of <img src="https://latex.codecogs.com/png.latex?X_%7BB,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D"> are a pair of independent Bernoullis with equal parameter <img src="https://latex.codecogs.com/png.latex?p">. Again, this might feel like a lot of algebraic pain for no gain, but wait for it.</p>
<p>What if we had taken the same state, but with another observable? Say the observable with four distinct eigenvalues <img src="https://latex.codecogs.com/png.latex?%5Clambda_%7B%5Cphi%5E+%7D,%20%5Clambda_%7B%5Cphi-%7D,%20%5Clambda_%7B%5Cpsi+%7D,%20%5Clambda_%7B%5Cpsi-%7D%5Cin%20%5Cmathbb%7BC%7D">, and corresponding eigenvectors <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cket%7B%5Cphi%5E%7B%5Cpm%7D%7D%20=%20%5Cfrac%7B%5Cket%7B00%7D%5Cpm%5Cket%7B11%7D%7D%7B%5Csqrt%7B2%7D%7D,%20%5Cquad%0A%20%20%20%20%5Cket%7B%5Cpsi%5E%7B%5Cpm%7D%7D%20=%20%5Cfrac%7B%5Cket%7B01%7D%5Cpm%5Cket%7B10%7D%7D%7B%5Csqrt%7B2%7D%7D.%0A"> Then, the random variable <img src="https://latex.codecogs.com/png.latex?X_%7BC,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D"> defined by Born’s rule in Equation&nbsp;2 is supported in <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5C%7B%5Clambda_%7B%5Cphi%5E+%7D,%20%5Clambda_%7B%5Cphi-%7D,%20%5Clambda_%7B%5Cpsi+%7D,%20%5Clambda_%7B%5Cpsi-%7D%5C%7D,%0A"> with <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7BC,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D%20=%20%5Clambda_%7B%5Cphi%5E+%7D)%20=%20%5Cmathrm%7BTr%5Cleft%5B%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%20%5Cket%7B%5Cphi%5E+%7D%5Cbra%7B%5Cphi%5E+%7D%20%5Cright%5D%7D%20=%20%5Cvert%20(%5Cbra%7Bp%7D%5Cotimes%5Cbra%7Bp%7D)%5Cket%7B%5Cphi%5E+%7D%20%5Cvert%5E2%20=%20%5Cfrac12.%0A"> Similarly, <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7BC,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D%20=%20%5Clambda_%7B%5Cphi%5E-%7D)%20=%20%5Cvert%20(%5Cbra%7Bp%7D%5Cotimes%5Cbra%7Bp%7D)%5Cket%7B%5Cphi%5E-%7D%20%5Cvert%5E2%20=%20%5Cfrac%7B(2p-1)%5E2%7D%7B2%7D,%0A"> <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7BC,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D%20=%20%5Clambda_%7B%5Cpsi%5E+%7D)%20=%20%5Cvert%20(%5Cbra%7Bp%7D%5Cotimes%5Cbra%7Bp%7D)%5Cket%7B%5Cpsi%5E+%7D%20%5Cvert%5E2%20=%202p(1-p),%0A"> and <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cmathbb%7BP%7D(X_%7BC,%20%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D%20=%20%5Clambda_%7B%5Cpsi%5E-%7D)%20=%20%5Cvert%20(%5Cbra%7Bp%7D%5Cotimes%5Cbra%7Bp%7D)%5Cket%7B%5Cpsi%5E-%7D%20%5Cvert%5E2%20=%200.%0A"> You can check that the four probabilities sum to <img src="https://latex.codecogs.com/png.latex?1">. This time, if you map, e.g., <img src="https://latex.codecogs.com/png.latex?%5Cphi_+"> to the string <img src="https://latex.codecogs.com/png.latex?00">, <img src="https://latex.codecogs.com/png.latex?%5Cphi%5E-"> to <img src="https://latex.codecogs.com/png.latex?11">, <img src="https://latex.codecogs.com/png.latex?%5Cpsi%5E+"> to <img src="https://latex.codecogs.com/png.latex?01">, and <img src="https://latex.codecogs.com/png.latex?%5Cpsi%5E-"> to <img src="https://latex.codecogs.com/png.latex?10">, we no longer have independent Bernoulli draws, but a rather strange correlation structure. We shall see that <img src="https://latex.codecogs.com/png.latex?X_%7BC,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D"> allows us to build a Bernoulli factory that is beyond the reach of a classical computer.</p>
</section>
<section id="a-quantum-bernoulli-factory" class="level2">
<h2 class="anchored" data-anchor-id="a-quantum-bernoulli-factory">A quantum Bernoulli factory</h2>
<p>Imagine the following procedure. Draw the random variable <img src="https://latex.codecogs.com/png.latex?X_%7BC,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">. If you obtain <img src="https://latex.codecogs.com/png.latex?%5Clambda_%7B%5Cphi%5E-%7D"> or <img src="https://latex.codecogs.com/png.latex?%5Clambda_%7B%5Cpsi%5E+%7D">, then stop, and respectively output <img src="https://latex.codecogs.com/png.latex?0"> and <img src="https://latex.codecogs.com/png.latex?1">. Otherwise, draw another independent realization of <img src="https://latex.codecogs.com/png.latex?X_%7BC,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">, etc. This is reminiscent of the von Neumann trick we mentioned earlier. What have we achieved? Well, the output is a Bernoulli draw with parameter <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Cfrac%7B2p(1-p)%7D%7B2p(1-p)+%5Cfrac%7B(2p-1)%5E2%7D%7B2%7D%7D%20=%204p(1-p).%0A"> Repeating the procedure as many times as you want draws, we thus have a Bernoulli factory for <img src="https://latex.codecogs.com/png.latex?f_%7B%5Cmathrm%7Bquadratic%7D%7D"> in Equation&nbsp;1, which we know to be beyond the reach of classical Bernoulli factories!</p>
<p>The difference is that our Bernoulli factory is a <em>quantum</em> Bernoulli factory. In particular, our basic resource is (physically) independent copies of <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bp%7D">. This is asking for strictly more than (statistically) independent Bernoulli draws. Indeed, depending on your observable, two physically independent copies of <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bp%7D"> can give you two i.i.d. Bernoullis <img src="https://latex.codecogs.com/png.latex?X_%7BB,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">, or something more complicated like <img src="https://latex.codecogs.com/png.latex?X_%7BC,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">. If you consider as equivalent the cost of preparing the two types of inputs, i.i.d. Bernoullis <img src="https://latex.codecogs.com/png.latex?B(p)"> on one side and physically independent copies of <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bp%7D"> on the other side, then you have a quantum advantage. It might be a big assumption, but I find it easier to swallow than similar caveats in other quantum advantages that I’ve read about.</p>
</section>
<section id="further-remarks" class="level2">
<h2 class="anchored" data-anchor-id="further-remarks">Further remarks</h2>
<p>The example in this post is from the paper by <a href="https://www.nature.com/articles/ncomms9203">Dale, Jennings, and Rudolph (2015)</a>. The authors further characterize the Bernoulli factories that you can build with only single-qubit operations: they strictly include classical Bernoulli factories and the example from this post. In other words, it is not necessary to use pairs of qubits to build <img src="https://latex.codecogs.com/png.latex?X_%7BC,%5Crho_%7B2%5Cmathrm%7Bqc%7D%7D%7D">. Since then, there has been more work on quantum Bernoulli factories, for instance considering <a href="https://arxiv.org/abs/1712.09817">quantum-to-quantum Bernoulli factories</a>, where the goal is to create independent copies of <img src="https://latex.codecogs.com/png.latex?%5Cket%7Bf(p)%7D"> rather than a stream of Bernoulli random variables.</p>
<p>I thank my <a href="http://rbardenet.github.io/#group">group</a> for valuable comments during the writing of this post. One non-consensual point is that I have tried to reduce the quantum formalism to the correspondence in Equation&nbsp;2 between a state-observable pair and a random variable. This has the advantage of keeping the necessary algebra to a minimum, but it forced me to introduce rather abstract observables, with a spectrum that we only use through its indices. A more standard (but arguably lengthier) treatment might have involved <a href="https://en.wikipedia.org/wiki/Projection-valued_measure">projection-valued measures</a>.</p>


</section>

 ]]></description>
  <category>quantum computing</category>
  <category>simulation</category>
  <guid>https://statisfaction-blog.github.io/posts/14-02-2024-quantum-bernoulli-factories/quantum-bernoulli-factories.html</guid>
  <pubDate>Tue, 13 Feb 2024 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Coulomb rhymes with variance reduction</title>
  <dc:creator>Rémi Bardenet</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/01-11-2023-repelled-point-processes/repelled_point_processes.html</link>
  <description><![CDATA[ 




<p>… Well, it does rhyme if you read the title aloud with a French accent, hon hon hon.</p>
<p>To paraphrase Nicolas’s previous <a href="https://statisfaction.wordpress.com/2022/12/22/how-to-beat-monte-carlo-no-qmc/">post</a>, say I want to approximate the integral <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20I(f)%20:=%20%5Cint_%7BS%7D%20f(u)%20du,%0A"> where <img src="https://latex.codecogs.com/png.latex?S"> is a compact set of <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">. I could use plain old Monte Carlo with <img src="https://latex.codecogs.com/png.latex?N"> nodes, <span id="eq-mc"><img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Chat%7BI%7D(f)%20=%20%5Cfrac%201%20N%20%5Csum_%7Bi=1%7D%5EN%20f(U_i),%20%20%20%5Cquad%20U_i%20%5Csim%20%5Cmathrm%7BU%7D(S).%0A%5Ctag%7B1%7D"></span> Intuitively, an i.i.d. uniform sample of quadrature nodes <img src="https://latex.codecogs.com/png.latex?U_1,%20%5Cdots,%20U_N"> will however leave “holes”; see Figure&nbsp;1 (a). In words, given a realization of the nodes, it is possible to insert a few large balls in <img src="https://latex.codecogs.com/png.latex?S"> that do not contain any <img src="https://latex.codecogs.com/png.latex?U_i">. These holes may make us miss some large variations of <img src="https://latex.codecogs.com/png.latex?f">. Part of the variance of the Monte Carlo estimator in Equation&nbsp;1 could intuitively be removed if we managed to fill these holes, using some of the nodes that got lumped together by chance.</p>
<p>Many sampling algorithms, such as randomized quasi-Monte Carlo, impose similar space-filling constraints, yielding a random sample with guarantees of “well-spreadedness”. In the <a href="https://arxiv.org/abs/2308.04825">paper</a> I describe in this post, <a href="https://dhawat.github.io/">Diala Hawat</a> and her two advisors (Raphaël Lachièze-Rey and myself) obtained variance reduction by explicitly trying to fill the holes left by a realization of <img src="https://latex.codecogs.com/png.latex?U_1,%20%5Cdots,%20U_N">. In the remainder of the post, I will describe Diala’s main theoretical result.</p>
<div id="fig-samples" class="quarto-layout-panel">
<figure class="figure">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="quarto-layout-cell quarto-layout-cell-subref" style="flex-basis: 50.0%;justify-content: center;">
<div id="fig-poisson" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><embed src="poisson.pdf" class="img-fluid" data-ref-parent="fig-samples"></p>
<figcaption class="figure-caption">(a) A Poisson sample</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell quarto-layout-cell-subref" style="flex-basis: 50.0%;justify-content: center;">
<div id="fig-repelled" class="quarto-figure quarto-figure-center anchored">
<figure class="figure">
<p><embed src="repelled.pdf" class="img-fluid" data-ref-parent="fig-samples"></p>
<figcaption class="figure-caption">(b) The same sample after repulsion</figcaption>
</figure>
</div>
</div>
</div>
<p></p><figcaption class="figure-caption">Figure&nbsp;1: Note how the repelled sample has fewer visible “holes” and “lumps”. The details of how we implemented the repulsion are interesting in themselves, and can be found in the <a href="https://arxiv.org/abs/2308.04825">paper</a> and the associated <a href="https://github.com/dhawat/MCRPPy">code</a>.</figcaption><p></p>
</figure>
</div>
<p>The basic intuition is to imagine the quadrature nodes <img src="https://latex.codecogs.com/png.latex?U_1,%20%5Cdots,%20U_N"> as electrons. In physics, electrons (like all charged particles) are subject to the <a href="https://en.wikipedia.org/wiki/Coulomb%27s_law">Coulomb force</a>. The Coulomb force exerted by one electron onto another points away from the first electron, with a magnitude that is inversely proportional to the <img src="https://latex.codecogs.com/png.latex?d-1">th power of the Euclidean distance between the two. As a result, electrons tend to repel each other, and electrons close to you will push you away harder than electrons at the other side of the support of <img src="https://latex.codecogs.com/png.latex?f">. This is the behaviour that we would like to emulate, so that our quadrature nodes avoid lumping together and rather go and fill holes where no particle causes any repulsion.</p>
<p>If we solved the differential equation implementing Coulomb’s repulsion on our <img src="https://latex.codecogs.com/png.latex?N"> i.i.d. nodes, however, the points would rapidly leave the support of <img src="https://latex.codecogs.com/png.latex?f"> and “go to infinity”, to make sure that the pairwise distances between nodes are as large as possible. One way to avoid this undesired behaviour is to consider an “infinite” uniform Monte Carlo sample in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, so that, wherever an electron looks, there are an infinite number of electrons preventing it from escaping. To make the situation comparable with our initial <img src="https://latex.codecogs.com/png.latex?N">-point estimator in Equation&nbsp;1, we also require that there are roughly <img src="https://latex.codecogs.com/png.latex?N"> points inside the region <img src="https://latex.codecogs.com/png.latex?S"> where we integrate <img src="https://latex.codecogs.com/png.latex?f">. Formally, we consider a homogeneous Poisson point process <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BP%7D"> of intensity <img src="https://latex.codecogs.com/png.latex?%5Crho%20=%20N/V"> in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, where <img src="https://latex.codecogs.com/png.latex?V"> is the volume of <img src="https://latex.codecogs.com/png.latex?S">. Consider the modified Monte Carlo estimator <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Ctilde%7BI%7D(f)%20=%20%5Cfrac%7B1%7D%7BN%7D%20%5Csum_%7Bx%5Cin%20S%5Ccap%5Cmathcal%7BP%7D%7D%20f(x).%0A"> This estimator is very similar to the <img src="https://latex.codecogs.com/png.latex?N">-point crude Monte Carlo estimator <img src="https://latex.codecogs.com/png.latex?%5Chat%7BI%7D(f)">, except the number of evaluations of <img src="https://latex.codecogs.com/png.latex?f"> in the sum is now Poisson-distributed, with mean and variance <img src="https://latex.codecogs.com/png.latex?N">. What we have gained is that we can now intuitively apply the Coulomb force to the points of <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BP%7D">, and hope that both before and after repulsion, about <img src="https://latex.codecogs.com/png.latex?N"> points remain in our integration domain <img src="https://latex.codecogs.com/png.latex?S">. Proving this remains technically thorny, however. For starters, for <img src="https://latex.codecogs.com/png.latex?x"> in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, the series defining the Coulomb force exerted on <img src="https://latex.codecogs.com/png.latex?x"> by a collection <img src="https://latex.codecogs.com/png.latex?C"> of points in <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D%5Ed">, namely <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20F_C(x)%20=%20%5Csum_%7By%5Cin%20C,%20y%5Cneq%20x%7D%20%5Cfrac%7Bx-y%7D%7B%5CVert%20x-y%5CVert%5E%7Bd%7D%7D,%0A"> is not absolutely convergent, so that the order of summation matters. However, it was observed as early as 1943 that, if you sum by increasing distance to the reference point <img src="https://latex.codecogs.com/png.latex?x">, and <img src="https://latex.codecogs.com/png.latex?C=%5Cmathcal%7BP%7D"> is a homogeneous Poisson point process, then the (random) series <img src="https://latex.codecogs.com/png.latex?F_%5Cmathcal%7BP%7D(x)"> converges almost surely. Interested readers are referred to a classical <a href="(https://arxiv.org/abs/math/0611886)">paper</a> by Chatterjee, Peled, Peres, and Romik (2010) on the gravitational allocation of Poisson points, one of the inspirations behind Diala’s work.</p>
<p>Putting (important) technical issues aside, we are ready to state the main result of our paper. We prove that, for <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%5Cin(-1,1)">, the <em>repelled Poisson point process</em> <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5CPi_%5Cepsilon%5Cmathcal%7BP%7D%20=%20%5C%7B%20x+%5Cepsilon%20F_%7B%5Cmathcal%7BP%7D%7D(x),%20%5Cquad%20x%5Cin%5Cmathcal%7BP%7D%20%5C%7D%0A"> is well-defined, and has on average <img src="https://latex.codecogs.com/png.latex?N"> points in <img src="https://latex.codecogs.com/png.latex?S">. Moreover, <img src="https://latex.codecogs.com/png.latex?%0A%20%20%20%20%5Ccheck%7BI%7D(f)%20=%20%5Cfrac%7B1%7D%7BN%7D%20%5Csum_%7Bx%5Cin%20S%5Ccap%20%5CPi_%5Cepsilon%5Cmathcal%7BP%7D%7D%20f(x)%0A"> is an unbiased estimator of <img src="https://latex.codecogs.com/png.latex?I(f)">. Finally, if <img src="https://latex.codecogs.com/png.latex?f"> is <img src="https://latex.codecogs.com/png.latex?C%5E2">, for <img src="https://latex.codecogs.com/png.latex?%5Cepsilon%3E0"> small enough, the variance of <img src="https://latex.codecogs.com/png.latex?%5Ccheck%7BI%7D(f)"> is lower than that of <img src="https://latex.codecogs.com/png.latex?%5Ctilde%7BI%7D(f)">. To sum up, for any <img src="https://latex.codecogs.com/png.latex?C%5E2"> integrand, we can in principle reduce the variance of our Monte Carlo estimator by slightly repelling the quadrature nodes away from each other. This is it: by breaking lumps and filling holes in a postprocessing step, we obtain variance reduction over crude Monte Carlo. The proof is not trivial, and relies on the super-harmonicity of the potential behind the Coulomb force.</p>
<p>Let me close with two further pointers to the <a href="https://arxiv.org/abs/2308.04825">paper</a>. First, we discuss a particular value of the “step size” parameter <img src="https://latex.codecogs.com/png.latex?%5Cepsilon"> in the paper, which has an easily-implemented closed form, and reliably led to variance reduction across our experiments. Second, while our theoretical results only cover the Poisson case so far, we also show experiments on other (stationary) point processes than Poisson, which confirm that variance reduction is also achieved across point processes with varying second-order structure. In Monte Carlo terms, and being very optimistic, some sort of repulsion might become a standard postprocessing step in the future, to reduce the variance of one’s estimator, independently of the law of the nodes (Markov chain, thinned PDMP, you name it).</p>



 ]]></description>
  <category>Monte Carlo</category>
  <category>point processes</category>
  <guid>https://statisfaction-blog.github.io/posts/01-11-2023-repelled-point-processes/repelled_point_processes.html</guid>
  <pubDate>Tue, 31 Oct 2023 23:00:00 GMT</pubDate>
</item>
<item>
  <title>particles version 0.4: single-run variance estimates, FFBS variants, nested sampling</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/01-09-2023-particles-v0.4/index.html</link>
  <description><![CDATA[ 




<p>Version 0.4 of <a href="https://github.com/nchopin/particles">particles</a> have just been released. Here are the main changes:</p>
<section id="single-run-variance-estimation-for-waste-free-smc" class="level1">
<h1>Single-run variance estimation for waste-free SMC</h1>
<p>Waste-free SMC <a href="https://academic.oup.com/jrsssb/article/84/1/114/7056097">(Dau &amp; Chopin, 2020)</a> was already implemented in particles (since version 0.3), and even proposed by default. This is a variant of SMC samplers where you resample only <img src="https://latex.codecogs.com/png.latex?M%20%5Cll%20N"> particles, apply to each resampled particle <img src="https://latex.codecogs.com/png.latex?P-1"> MCMC steps, and then gather these <img src="https://latex.codecogs.com/png.latex?M%5Ctimes%20P"> states to form the next particle sample; see the paper if you want to know why this is a good idea (short version: this tends to perform better than standard SMC samplers, and to be more robust to the choice of the number of MCMC steps).</p>
<p>What was not yet implemented (but is, in this version) is the <strong>single-run</strong> variance estimates proposed in the same paper. Here is a simple illustration:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/01-09-2023-particles-v0.4/ibis_pima_var_post.png" class="img-fluid figure-img" style="width:60.0%"></p>
</figure>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://statisfaction-blog.github.io/posts/01-09-2023-particles-v0.4/ibis_pima_var_logLt.png" class="img-fluid figure-img" style="width:60.0%"></p>
</figure>
</div>
<p>Both plots were obtained from <img src="https://latex.codecogs.com/png.latex?10%5E3"> runs of waste-free IBIS (i.e.&nbsp;target at time <img src="https://latex.codecogs.com/png.latex?t"> is the posterior based on the first <img src="https://latex.codecogs.com/png.latex?t+1"> observations, <img src="https://latex.codecogs.com/png.latex?p(%5Ctheta%7Cy_%7B0:t%7D)">) applied to Bayesian logistic regression and the Pima Indians dataset. The red line is the empirical variance of the output, and, since the number of runs is large, it should be close to the true variance. The lower (resp. upper) limit of the grey area is the <img src="https://latex.codecogs.com/png.latex?5%5C%25"> (resp. <img src="https://latex.codecogs.com/png.latex?95%5C%25">) quantile of the single-run variance estimates obtained from these <img src="https://latex.codecogs.com/png.latex?10%5E3"> runs. The considered output is either the posterior mean of the intercept (top) or the log marginal likelihood (bottom).</p>
<p>We can see from these plots that these single-run estimates are quite reliable, and make it possible, in case one uses IBIS, to obtain error bars even from a single run. See the documentation of module <code>smc_samplers</code> (or the scripts in <code>papers/wastefreeSMC</code>) for more details on how you may get such estimates.</p>
</section>
<section id="new-ffbs-variants" class="level1">
<h1>New FFBS variants</h1>
<p>I have already mentioned in a previous <a href="https://statisfaction.wordpress.com/2022/11/09/new-smoothing-algorithms-in-particles/">post</a>, on the old blog, that particles now implement new FFBS algorithms (i.e.&nbsp;particle smoothing algorithms that rely on a backward step) that were proposed in <a href="https://arxiv.org/abs/2207.00976">this paper</a>. On top of that, particles now also includes a hybrid version of the Paris algorithm.</p>
</section>
<section id="nested-sampling" class="level1">
<h1>Nested sampling</h1>
<p>I was invited to <a href="https://www.ipp.mpg.de/maxent2023">this</a> nested sampling workshop in Munich, so this gave me some incentive to:</p>
<ul>
<li><p>clean up and document the “vanilla” nested sampling implementation which was in module <code>nested</code>.</p></li>
<li><p>add to the same module the NS-SMC samplers of <a href="https://arxiv.org/abs/1805.03924">Salomone et al (2018)</a> to play with them and do some numerical experiments to illustrate my talk.</p></li>
</ul>
<p>I will blog shortly about the interesting results I found (which essentially are in line with Salmone et al).</p>
</section>
<section id="other-minor-changes" class="level1">
<h1>Other minor changes</h1>
<p>Several distributions and a dataset (Liver) were added, see the <a href="https://github.com/nchopin/particles/releases/tag/v0.4">change log</a>.</p>
</section>
<section id="logo" class="level1">
<h1>Logo</h1>
<p>I’ve added a <a href="https://github.com/nchopin/particles/blob/master/logo.png">logo</a>. It’s… not great, if anyone has suggestions on how to design a better log, I am all ears.</p>
</section>
<section id="whats-next" class="level1">
<h1>What’s next?</h1>
<p>I guess what’s still missing from the package are stuff like:</p>
<ul>
<li><p>the ensemble Kalman filter, which would be reasonably easy to add, and would be useful in various problems;</p></li>
<li><p>advanced methods to design better proposals, such as controlled SMC <a href="https://projecteuclid.org/journals/annals-of-statistics/volume-48/issue-5/Controlled-sequential-Monte-Carlo/10.1214/19-AOS1914.short">(Heng et al, 2020)</a> or the iterated auxiliary particle filter <a href="https://www.tandfonline.com/doi/full/10.1080/01621459.2016.1222291">(Guarniero et al, 2017)</a>.</p></li>
</ul>
<p>If you have other ideas, let me know.</p>
</section>
<section id="feedback" class="level1">
<h1>Feedback</h1>
<p>I have not yet looked into how to enable comments on a quarto blog. You can comment by replying to this <a href="https://mathstodon.xyz/@nchopin/111018449931345157">post</a> on Mastodon, or to the same post on LinkedIn (coming soon); or you can raise an issue on github or send me an e-mail, of course.</p>


</section>

 ]]></description>
  <category>news</category>
  <category>particles</category>
  <category>SMC</category>
  <guid>https://statisfaction-blog.github.io/posts/01-09-2023-particles-v0.4/index.html</guid>
  <pubDate>Thu, 31 Aug 2023 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Better than Monte Carlo (this post is not about QMC)</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/22-08-2023-monte-carlo-rates/monte_carlo_rates.html</link>
  <description><![CDATA[ 




<p>(This is repost from this December 2022 <a href="https://statisfaction.wordpress.com/2022/12/22/how-to-beat-monte-carlo-no-qmc/">post</a> on the old website, but since math support is so poor on Wordpress, I’d rather have this post published here.)</p>
<p>Say I want to approximate the integral <img src="https://latex.codecogs.com/png.latex?I(f)%20:=%20%5Cint_%7B%5B0,%201%5D%5Es%7D%20f(u)%20du"> based on <img src="https://latex.codecogs.com/png.latex?n"> evaluations of function <img src="https://latex.codecogs.com/png.latex?f">. I could use plain old Monte Carlo: <img src="https://latex.codecogs.com/png.latex?%5Chat%7BI%7D(f)%20=%20%5Cfrac%201%20n%20%5Csum_%7Bi=1%7D%5En%20f(U_i),%5Cquad%20U_i%20%5Csim%20%5Cmathrm%7BU%7D(%5B0,%0A1%5D%5Es)."> whose RMSE (root mean square error) is <img src="https://latex.codecogs.com/png.latex?O(n%5E%7B-1/2%7D)">.</p>
<p>Can I do better? That is, can I design an alternative estimator/algorithm, which performs <img src="https://latex.codecogs.com/png.latex?n"> evaluations and returns a random output, such that its RMSE converge quicker?</p>
<p>Surprisingly, the answer to this question has been known for a long time. If I am ready to focus on functions <img src="https://latex.codecogs.com/png.latex?f%5Cin%5Cmathcal%7BC%7D%5Er(%5B0,%201%5D%5Es)">, Bakhvalov (1959) showed that the best rate I can hope for is <img src="https://latex.codecogs.com/png.latex?O(n%5E%7B-1/2-r/s%7D)."> That is, there exist algorithms that achieve this rate, and algorithms achieving a better rate simply do not exist.</p>
<p>Ok, but how can I actually design such an algorithm? The proof of Bakhvalov contains a very simple recipe. Say I am able to construct a good approximation <img src="https://latex.codecogs.com/png.latex?f_n"> of <img src="https://latex.codecogs.com/png.latex?f">, based on <img src="https://latex.codecogs.com/png.latex?n"> evaluations; assume the approximation error is <img src="https://latex.codecogs.com/png.latex?%5C%7Cf-f_n%5C%7C_%5Cinfty%20=%20O(n%5E%7B-%5Calpha%7D)">, <img src="https://latex.codecogs.com/png.latex?%5Calpha%3E0">. Then I could compute the following estimator, based on a second batch of <img src="https://latex.codecogs.com/png.latex?n"> evaluations: <img src="https://latex.codecogs.com/png.latex?%20%5Chat%7BI%7D(f)%0A:=%20I(f_n)%20+%20%20%5Cfrac%201%20n%20%5Csum_%7Bi=1%7D%5En%20(f-f_n)(U_i),%5Cquad%20U_i%20%5Csim%0A%5Cmathrm%7BUniform%7D(%5B0,%201%5D%5Es)."> and it is easy to check that this new estimator is unbiased, that its variance is <img src="https://latex.codecogs.com/png.latex?O(n%5E%7B-1-2%5Calpha%7D)">, and therefore its RMSE is <img src="https://latex.codecogs.com/png.latex?O(n%5E%7B-1/2-%5Calpha%7D)">. (It is based on <img src="https://latex.codecogs.com/png.latex?2n"> evaluations.)</p>
<p>So there is strong relation between Bakhvalov results and function approximation. In fact, the best rate you can achieve for the latter is <img src="https://latex.codecogs.com/png.latex?%5Calpha=r/s">, which explain the rate above for stochastic quadrature. You can see now why I gave this title to this post. QMC is about using points that are better than random points. But here I’m using IID points, and the improved rate comes from the fact I use a better approximation of <img src="https://latex.codecogs.com/png.latex?f">.</p>
<p>Here is a simple example of a good function approximation. Take <img src="https://latex.codecogs.com/png.latex?s=1">, and <img src="https://latex.codecogs.com/png.latex?%0Af_n(u)%20=%20%5Csum_%7Bi=1%7D%5En%20f(%20%5Cfrac%7B2i-1%7D%7B2n%7D%20)%20%5Cmathbf%7B1%7D_%7B%5B(i-1)/n,%20i/n%5D%7D(u);%0A"> that is, split <img src="https://latex.codecogs.com/png.latex?%5B0,%201%5D"> into <img src="https://latex.codecogs.com/png.latex?n"> intervals <img src="https://latex.codecogs.com/png.latex?%5B(i-1)/n,%20i/n%5D">, and approximate <img src="https://latex.codecogs.com/png.latex?f"> inside a given interval by its value at the centre of the interval. You can quickly check that the approximation error is then <img src="https://latex.codecogs.com/png.latex?O(n%5E%7B-1%7D)"> provided <img src="https://latex.codecogs.com/png.latex?f"> is <img src="https://latex.codecogs.com/png.latex?C%5E1">. So you get a simple recipe to get the optimal rate for <img src="https://latex.codecogs.com/png.latex?s=1"> and <img src="https://latex.codecogs.com/png.latex?r=1">.</p>
<p>Is it possible to generalise this type of construction to any <img src="https://latex.codecogs.com/png.latex?r"> and any <img src="https://latex.codecogs.com/png.latex?s">? The answer is in our recent paper with Mathieu Gerber, which you can find <a href="https://arxiv.org/abs/2210.01554">here</a>. You may also want to read <a href="https://arxiv.org/abs/1409.6714">Novak (2016)</a>, which is a very good entry on stochastic quadrature, and in particular gives a nice overview of Bakhvalov’s and related results.</p>



 ]]></description>
  <category>Monte Carlo</category>
  <category>QMC</category>
  <category>rates</category>
  <guid>https://statisfaction-blog.github.io/posts/22-08-2023-monte-carlo-rates/monte_carlo_rates.html</guid>
  <pubDate>Fri, 18 Aug 2023 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Welcome to the new, quarto-based version of Statisfaction</title>
  <dc:creator>Nicolas Chopin</dc:creator>
  <link>https://statisfaction-blog.github.io/posts/welcome/index.html</link>
  <description><![CDATA[ 




<p>Hey! We have just moved this blog from Wordpress to github. The old version is still available <a href="https://statisfaction.wordpress.com/">here</a>. The new version is based on <a href="https://quarto.org/">quarto</a>, which will make it much easier to write mathematics, e.g.&nbsp;<img src="https://latex.codecogs.com/png.latex?%5Cpi(%5Ctheta%7Cx)%20%5Cpropto%20%5Cpi(%5Ctheta)%20L(x%7C%5Ctheta)">, and code, e.g.&nbsp;</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">def</span> fact(n):</span>
<span id="cb1-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> np.prod(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div>



 ]]></description>
  <category>news</category>
  <guid>https://statisfaction-blog.github.io/posts/welcome/index.html</guid>
  <pubDate>Fri, 18 Aug 2023 22:00:00 GMT</pubDate>
</item>
</channel>
</rss>
