A correction to BTC's difficulty algorithm

This discusses an accuracy error in fixed-window difficulty algorithms that was brought to my attention by Pieter Wuille's [twitter poll](https://twitter.com/pwuille/status/1098288749098795008) and [result](https://twitter.com/pwuille/status/1098651788343955456).  Meni Rosenfeld wrote a **[paper](https://arxiv.org/pdf/1708.05185.pdf)** about it in 2016. It also shows how coins and pools are not calculating hashrate precisely correct. It makes a big difference if you're estimating hashrate with a small sample.

**Large error due to not taking increasing hashrate into account**
BTC difficulty adjustment have historically been almost a day before the scheduled 2 week target adjustment (coin emission is now about 7 months ahead of time) because hashrate usually increases (an average of about 6% per two weeks) before the next adjustment is made, so emission has been 6% faster every two weeks. This could have been corrected by looking at the past 4 weeks and adding on an adjustment for the slope, thereby taking into account the "acceleration" in hashrate instead of just its most recent "velocity". 

**Commonly known 2015/2016 error**
It is well-known that without the above "ramp error", the BTC code has an error that would have caused the 2016 block adjustment to take 2 weeks and 10 minutes.  This comes from the code being like this:
```next_target = previous_target * sum_2015_solvetimes / target_time_2016_times_600```
This is the same as:
``` next_target = previous_target * avg(2015 solvetimes) / 600 * 2015/2016```
This 2015/2016 is the cause of the lower-than-intended target (higher difficulty).

There is another 10 minute error in the same direction as BTC's known code error.  It applies to fixed-window difficulty algorithms. The error is a lot smaller in simple moving averages. I can't figure out how to analytically determine the adjustment in simple moving averages, but the correction I've experimentally determined to be accurate down to N=4 is adjusted target = target/(1-3/N^2).

Satoshi and pretty everyone else thought for years that the next target (~1/difficulty) should be  determined by:
```
T = desired block time
t = observed solvetimes
HR = hashrate
nextTarget = priorTarget * avg(t) / T = 2^256/(T*HR)
```
The above basis of all existing difficulty algorithms is wrong because in estimating HR to adjust the target, you have N samples of previous solvetimes (t) but the probability of fast solvetimes is higher (which is offset in an N=infinite average by some solvetimes being substantially longer). So when you take a smallish sample N of them, you're more likely than not to have more fast blocks. This causes the difficulty calculation to overestimate the hashrate and thereby set the target too low (difficulty too high) which makes the resulting avg solvetime higher than your goal.

The correction turns out to be simple, which appears to be directly related to the memoryless property (see bottom section "Similar Effect"):
``` nextD = nextD * (N-1)/N```
``` nextTarget = nextTarget * N/(N-1)```

I tested it all the way down to N=2 with no discernible error. It has a problem at N=1. 

**Derivations**

My derivation below is based on what I needed to understand it. Others have a much more efficient way to explain it in a language I don't grasp.  [Meni Rosenfeld's paper derives it](https://arxiv.org/pdf/1708.05185.pdf).  Pieter Wuille did a poll and has an [explanation](https://twitter.com/pwuille/status/1098653008014049285?s=20).   Pieter says it comes from
```E(1/X) where X~ErlangDistribution(N,1)```


**Code to prove the error**

Here is code Pieter did to show the error is occurring in BTC.  Note line 34 is the sum of the past 2015 solvetimes that skips the first solvetime in the 2016-blocks of solvetimes in the array, just like BTC, so the avg solvetime will shows the both the old and new error combined:  600\*2016/2015\*2015/(2015-1) = 600.6 seconds.
https://gist.github.com/sipa/0523b510fcb7576511f0f670e0c6c0a5

**A Similarly Unexpected Effect**

The above N/(N-1) adjustment seems to be related to a surprising fact that [91% of Pieter Wuille's followers missed](https://twitter.com/pwuille/status/967878361782652928).  Given a mean time to occurrence of T = 1/λ, when you take a random sample of time 2\*T you see on average only 1 occurrence.  Another way to say it is that if you take a random point in time, it will be T to the next block by the memory-less property.  But memory-less is history-less and therefore works backwards in time as well so the time to the previous block is also T.   Another way to say it is that some solvetimes are long enough to offset the more frequent short solvetimes, so random samples often fall somewhere inside a  long solvetime.  See also [Russel O'Conner's article](http://r6.ca/blog/20180225T160548Z.html). This is extendable to longer periods of time.  If you take a random sample of time that is N\*T then you will see on average only N-1 blocks. 
```
T = average solvetime = 1/λ
average(N blocktimes) =  T
average(random time sample N*T) = (N-1)*T
```
**Deriving Exponential Distribution for PoW solvetimes**

If the target is 1/1000th of 2^256, then each hash has a 1/1000 chance of being below the target. The p of a hash not that low is 1-1/1000 which means the p of not succeeding in a sequence of n hashes is (1-1/1000)^n. This can be rewritten as
```p = (1 - 1/1000*n/n)^n``` (1)
For large n, the following is true:
``` p = (1 - c/n)^n = e^(-c)```
We can use c = t * HR  * target / 2^256 because 1/1000 comes from 2^256/target and n = number of hashes  = HR * t.  So :
```
p = e^(- HR * t * target/2^256)   (2)
Let λ = HR * target/2^256   
p = e^(-t*λ)
CDF = 1- p = 1 - e^(-t*λ)
PDF = d(CDF)/dt =  λ*e^(-t*λ)  
```
p is the probability a block will arrive after time t, so the 1-p=CDF is the probability it will arrive by t.  The CDF is for a [Exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). p = 1-e^(-t * λ) is commonly seen in engineering and nature because it is the fraction of something that is blocked in time or space for a blocking rate of λ per distance or time (in same units as t). A larger hashrate blocks solves from taking longer. 

**[update in 2022: The following is difficult even for me to follow. I'm sure there's a much better way to present it.]**

**Derivation of a Difficulty Algorithm**

A common way to express a difficulty algorithm that's needed to get avg solvetime = T is to skip most of the following and say
```next_target = previous_target *  λ_desired /  λ_observed```
or 
```next_target = previous_target *  HR_expected /  HR_observed```
which is the same as eq 3 below when replacing HR with λ_observed * 2^256 / previous_target.
  
To derive a difficulty algorithm, our goal is to set a target that will result in an avg solvetime of T. We do this by applying the [expected value integral](https://en.wikipedia.org/wiki/Expected_value#Absolutely_continuous_case) to the PDF and setting it equal to T. This gives **λ = 1/T = HR\*target/2^256**.  Rearranging shows how we should set the target:
```next_target = 2^256 / (HR*T)``` (3)
We need to estimate HR from previous target(s) and solvetime(s), but we need to be careful because the expected value equation is for all possible solvetimes t which we will see only in an infinite number of samples.  We can't simply say ```HR = 2^256 / t / prev_target``` for a 1-block estimate. It's technically impossible to estimate HR from 1 sample (see below).   More generally, we can't use ```HR = 2^256 / sum_t / sum_target``` because there is a finite number of samples. It's approximately correct for a large number of samples, if the target is not changing.

In a difficulty algorithm like this:
```next_target = prev_target *  λ_desired /  λ_measured = prev_target * avg(t) / T```
the avg(t) will be too small if our sample size is small. To correct for the error, use:
``` 1/λ_measured = avg(t) * N / (N-1)```
The math shows an infinity at N=1. But if no solvetimes are < 1 and there are integer solvetiems with T=600, 
```1/λ_measured = avg(t) * 6.5```
For N=2, the correction factor is 2. See the last section how this seems to be related to the memoryless property. That is, picking the start of a block to start a timer is the same as picking a random point in time. If you pick a random point in time, the current block began T in the past and the next block will occur in T. This means a randomly chosen block will take on average 2\*T due to slow solves canceling the effect of fast solves.

The solvetime for a block is determined by setting rand() = 1-CDF = x  and solving for t in the following:
```1-CDF = x = e^(-t*HR*target/2^256)```
```t = -ln(x) * 2^256 / (HR * target)```  (5)

This entire issue is based on avg(1/y) != 1/avg(y).  When we apply a HR adjustment to the target many times in a row, we're applying **avg_many( 1 / avg_N(-ln(x))) which equals N/(N-1)**. A simple way to show this is correct for N=2 is to run the experiment or do a definite double integral of 1/[(-ln(x)-ln(y)/2] for x and y, both 0 to 1.   So BTC-type difficulty algorithms (a fixed target for N blocks and adjusting only once per N blocks) require a modification to eq 3. 
```next_target = prev_target * timespan/N / T * N/(N-1)```  (8)

BTW, more complex difficulty algorithms can change the target during the block based on the timestamp in the header. The correction factor can again be determined by using eq 5 where HR and target can be functions of t, then solve for t again to isolate the ln(x) function then integrate that from 0 to 1. This is the same as using the expectation integral on t\*PDF from 0 to infinity because x is equally probable for all t's from 0 to 1.

**Calculating Hashrate in General**

As discussed above, if difficulty or target are constant:
```HR = priorD / avg(t) * (N-1)/N```
```HR = 2^256/priorTarget / avg(t) * (N-1)/N```
If D or target change every block, the following is more accurate for determining hashrate, but the (N-1)/N correction should not be used in simple moving average difficulty algorithms because the N samples overlap when you're adjusting every block, so they're not independent sample.  I tried [(N-1)/N]^(1/N) to account for this but it was not accurate for smallish N.
```HR = harmonicMeanD / avg(t) * (N-1)/N```
```HR = 2^256/avgTarget / avg(t) * (N-1)/N```
The harmonic mean of D is used because of another problem caused by avg(1/x) != 1/avg(x).  D = 2^256/target, but avgD != 2^256/target.   You can show that
```harmonicMeanD = 2^256/avgTarget```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A correction to BTC's difficulty algorithm #39

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A correction to BTC's difficulty algorithm #39

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions