Skip to content

A correction to BTC's difficulty algorithm #39

@zawy12

Description

@zawy12

This discusses an accuracy error in fixed-window difficulty algorithms that was brought to my attention by Pieter Wuille's twitter poll and result. Meni Rosenfeld wrote a paper about it in 2016. It also shows how coins and pools are not calculating hashrate precisely correct. It makes a big difference if you're estimating hashrate with a small sample.

Large error due to not taking increasing hashrate into account
BTC difficulty adjustment have historically been almost a day before the scheduled 2 week target adjustment (coin emission is now about 7 months ahead of time) because hashrate usually increases (an average of about 6% per two weeks) before the next adjustment is made, so emission has been 6% faster every two weeks. This could have been corrected by looking at the past 4 weeks and adding on an adjustment for the slope, thereby taking into account the "acceleration" in hashrate instead of just its most recent "velocity".

Commonly known 2015/2016 error
It is well-known that without the above "ramp error", the BTC code has an error that would have caused the 2016 block adjustment to take 2 weeks and 10 minutes. This comes from the code being like this:
next_target = previous_target * sum_2015_solvetimes / target_time_2016_times_600
This is the same as:
next_target = previous_target * avg(2015 solvetimes) / 600 * 2015/2016
This 2015/2016 is the cause of the lower-than-intended target (higher difficulty).

There is another 10 minute error in the same direction as BTC's known code error. It applies to fixed-window difficulty algorithms. The error is a lot smaller in simple moving averages. I can't figure out how to analytically determine the adjustment in simple moving averages, but the correction I've experimentally determined to be accurate down to N=4 is adjusted target = target/(1-3/N^2).

Satoshi and pretty everyone else thought for years that the next target (~1/difficulty) should be determined by:

T = desired block time
t = observed solvetimes
HR = hashrate
nextTarget = priorTarget * avg(t) / T = 2^256/(T*HR)

The above basis of all existing difficulty algorithms is wrong because in estimating HR to adjust the target, you have N samples of previous solvetimes (t) but the probability of fast solvetimes is higher (which is offset in an N=infinite average by some solvetimes being substantially longer). So when you take a smallish sample N of them, you're more likely than not to have more fast blocks. This causes the difficulty calculation to overestimate the hashrate and thereby set the target too low (difficulty too high) which makes the resulting avg solvetime higher than your goal.

The correction turns out to be simple, which appears to be directly related to the memoryless property (see bottom section "Similar Effect"):
nextD = nextD * (N-1)/N
nextTarget = nextTarget * N/(N-1)

I tested it all the way down to N=2 with no discernible error. It has a problem at N=1.

Derivations

My derivation below is based on what I needed to understand it. Others have a much more efficient way to explain it in a language I don't grasp. Meni Rosenfeld's paper derives it. Pieter Wuille did a poll and has an explanation. Pieter says it comes from
E(1/X) where X~ErlangDistribution(N,1)

Code to prove the error

Here is code Pieter did to show the error is occurring in BTC. Note line 34 is the sum of the past 2015 solvetimes that skips the first solvetime in the 2016-blocks of solvetimes in the array, just like BTC, so the avg solvetime will shows the both the old and new error combined: 600*2016/2015*2015/(2015-1) = 600.6 seconds.
https://gist.github.com/sipa/0523b510fcb7576511f0f670e0c6c0a5

A Similarly Unexpected Effect

The above N/(N-1) adjustment seems to be related to a surprising fact that 91% of Pieter Wuille's followers missed. Given a mean time to occurrence of T = 1/λ, when you take a random sample of time 2*T you see on average only 1 occurrence. Another way to say it is that if you take a random point in time, it will be T to the next block by the memory-less property. But memory-less is history-less and therefore works backwards in time as well so the time to the previous block is also T. Another way to say it is that some solvetimes are long enough to offset the more frequent short solvetimes, so random samples often fall somewhere inside a long solvetime. See also Russel O'Conner's article. This is extendable to longer periods of time. If you take a random sample of time that is N*T then you will see on average only N-1 blocks.

T = average solvetime = 1/λ
average(N blocktimes) =  T
average(random time sample N*T) = (N-1)*T

Deriving Exponential Distribution for PoW solvetimes

If the target is 1/1000th of 2^256, then each hash has a 1/1000 chance of being below the target. The p of a hash not that low is 1-1/1000 which means the p of not succeeding in a sequence of n hashes is (1-1/1000)^n. This can be rewritten as
p = (1 - 1/1000*n/n)^n (1)
For large n, the following is true:
p = (1 - c/n)^n = e^(-c)
We can use c = t * HR * target / 2^256 because 1/1000 comes from 2^256/target and n = number of hashes = HR * t. So :

p = e^(- HR * t * target/2^256)   (2)
Let λ = HR * target/2^256   
p = e^(-t*λ)
CDF = 1- p = 1 - e^(-t*λ)
PDF = d(CDF)/dt =  λ*e^(-t*λ)  

p is the probability a block will arrive after time t, so the 1-p=CDF is the probability it will arrive by t. The CDF is for a Exponential distribution. p = 1-e^(-t * λ) is commonly seen in engineering and nature because it is the fraction of something that is blocked in time or space for a blocking rate of λ per distance or time (in same units as t). A larger hashrate blocks solves from taking longer.

[update in 2022: The following is difficult even for me to follow. I'm sure there's a much better way to present it.]

Derivation of a Difficulty Algorithm

A common way to express a difficulty algorithm that's needed to get avg solvetime = T is to skip most of the following and say
next_target = previous_target * λ_desired / λ_observed
or
next_target = previous_target * HR_expected / HR_observed
which is the same as eq 3 below when replacing HR with λ_observed * 2^256 / previous_target.

To derive a difficulty algorithm, our goal is to set a target that will result in an avg solvetime of T. We do this by applying the expected value integral to the PDF and setting it equal to T. This gives λ = 1/T = HR*target/2^256. Rearranging shows how we should set the target:
next_target = 2^256 / (HR*T) (3)
We need to estimate HR from previous target(s) and solvetime(s), but we need to be careful because the expected value equation is for all possible solvetimes t which we will see only in an infinite number of samples. We can't simply say HR = 2^256 / t / prev_target for a 1-block estimate. It's technically impossible to estimate HR from 1 sample (see below). More generally, we can't use HR = 2^256 / sum_t / sum_target because there is a finite number of samples. It's approximately correct for a large number of samples, if the target is not changing.

In a difficulty algorithm like this:
next_target = prev_target * λ_desired / λ_measured = prev_target * avg(t) / T
the avg(t) will be too small if our sample size is small. To correct for the error, use:
1/λ_measured = avg(t) * N / (N-1)
The math shows an infinity at N=1. But if no solvetimes are < 1 and there are integer solvetiems with T=600,
1/λ_measured = avg(t) * 6.5
For N=2, the correction factor is 2. See the last section how this seems to be related to the memoryless property. That is, picking the start of a block to start a timer is the same as picking a random point in time. If you pick a random point in time, the current block began T in the past and the next block will occur in T. This means a randomly chosen block will take on average 2*T due to slow solves canceling the effect of fast solves.

The solvetime for a block is determined by setting rand() = 1-CDF = x and solving for t in the following:
1-CDF = x = e^(-t*HR*target/2^256)
t = -ln(x) * 2^256 / (HR * target) (5)

This entire issue is based on avg(1/y) != 1/avg(y). When we apply a HR adjustment to the target many times in a row, we're applying avg_many( 1 / avg_N(-ln(x))) which equals N/(N-1). A simple way to show this is correct for N=2 is to run the experiment or do a definite double integral of 1/[(-ln(x)-ln(y)/2] for x and y, both 0 to 1. So BTC-type difficulty algorithms (a fixed target for N blocks and adjusting only once per N blocks) require a modification to eq 3.
next_target = prev_target * timespan/N / T * N/(N-1) (8)

BTW, more complex difficulty algorithms can change the target during the block based on the timestamp in the header. The correction factor can again be determined by using eq 5 where HR and target can be functions of t, then solve for t again to isolate the ln(x) function then integrate that from 0 to 1. This is the same as using the expectation integral on t*PDF from 0 to infinity because x is equally probable for all t's from 0 to 1.

Calculating Hashrate in General

As discussed above, if difficulty or target are constant:
HR = priorD / avg(t) * (N-1)/N
HR = 2^256/priorTarget / avg(t) * (N-1)/N
If D or target change every block, the following is more accurate for determining hashrate, but the (N-1)/N correction should not be used in simple moving average difficulty algorithms because the N samples overlap when you're adjusting every block, so they're not independent sample. I tried [(N-1)/N]^(1/N) to account for this but it was not accurate for smallish N.
HR = harmonicMeanD / avg(t) * (N-1)/N
HR = 2^256/avgTarget / avg(t) * (N-1)/N
The harmonic mean of D is used because of another problem caused by avg(1/x) != 1/avg(x). D = 2^256/target, but avgD != 2^256/target. You can show that
harmonicMeanD = 2^256/avgTarget

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions