Summary or problem description
When there are many elements cached in the FIFOSet, its Add and ExceptWith efficiency will become worse. It is very obvious during performance testing, in the TaskManager's RestartTask,
|
private void OnRestartTasks(InvPayload payload) |
|
{ |
|
knownHashes.ExceptWith(payload.Hashes); |
the ExceptWith method is used to exclude 16,000 of the 62,800 knownHashes. Each delete performance is 0.015s and totally spends 240s. The process will get stuck.
That means under high overload, if the nodes are out of sync due to lacking of a large number of transactions, obtaining transactions will become extremely slow.
The reason is mainly the computational efficiency issue of the OrderedDictionary. The unit test below will show the performance for large number of elements.
[TestMethod]
public void TestFIFOSet()
{
Stopwatch stopwatch = new Stopwatch();
var bucket = new FIFOSet<int>(150_000);
stopwatch.Start();
for (int i = 1; i <= 550_000; i++)
{
bucket.Add(i);
}
stopwatch.Stop();
Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
stopwatch.Reset();
var items = new int[10000];
var value = 550_000;
for (int i = 0; i <= 9999; i++)
{
items[i] = value;
value -= 50;
}
stopwatch.Start();
bucket.ExceptWith(items);
stopwatch.Stop();
Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
stopwatch.Reset();
stopwatch.Start();
var ret = bucket.Contains(140_000);
stopwatch.Stop();
Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
stopwatch.Reset();
stopwatch.Start();
ret = bucket.Contains(545_001);
stopwatch.Stop();
Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
stopwatch.Reset();
}
Console prints
Add timespan: 21.3015352s
except with timespan: 10.8901768s
contains with timespan: 0.0001552s result: False
contains with timespan: 5E-07s result: True
Do you have any solution you want to propose?
Currently, we only use FIFOSet to cache UInt256 hashes. The elimination order is based on the sequence of the hashes. In this case the elimination order is not critical. Therefore, we can use the List of HashSet instead, create a List containing multiple HashSets, and each element is stored in the HashSet in chronological order. When the previous HashSet is full, it starts to load the next HashSet. If all HashSets are full, delete the first HashSet in the List and add a new empty HashSet at the end of the List to hold the newly added elements.
The design of this class is as follows
public class HashSetCache<T> : IEnumerable<T> where T : IEquatable<T>
{
private readonly int hashSetCapacity; //maxmium of elements in each hashset
private readonly List<HashSet<T>> sets = new List<HashSet<T>>();
public int Size
{
get
{
int size = 0;
foreach (var set in sets)
{
size += set.Count;
}
return size;
}
}
public HashSetCache(int hashSetCapacity, int hashSetCount = 10)
{
if (hashSetCapacity <= 0) throw new ArgumentOutOfRangeException(nameof(hashSetCapacity));
if (hashSetCount <= 0 || hashSetCount > 20) throw new ArgumentOutOfRangeException($"{nameof(hashSetCount)} should between 1 and 20");
this.hashSetCapacity = hashSetCapacity;
for (int i = 0; i < hashSetCount; i++)
{
sets.Add(new HashSet<T>());
}
}
public bool Add(T item)
{
if (Contains(item)) return false;
foreach (var set in sets)
{
if (set.Count < hashSetCapacity)
{
return set.Add(item);
}
}
sets.RemoveAt(0);
var newSet = new HashSet<T>
{
item
};
sets.Add(newSet);
return true;
}
public bool Contains(T item)
{
foreach (var set in sets)
{
if (set.Contains(item)) return true;
}
return false;
}
public void ExceptWith(IEnumerable<T> items)
{
foreach (var item in items)
{
foreach (var set in sets)
{
if (set.Remove(item)) break;
}
}
}
public IEnumerator<T> GetEnumerator()
{
foreach (var set in sets)
{
foreach (var item in set)
{
yield return item;
}
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
The unit test shows this design is much faster than OrderedDictionary under the same input condition.
[TestMethod]
public void TestHashSetCache()
{
Stopwatch stopwatch = new Stopwatch();
var bucket = new HashSetCache<int>(15_000);
stopwatch.Start();
for (int i = 1; i <= 550_000; i++)
{
bucket.Add(i);
}
stopwatch.Stop();
Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
stopwatch.Reset();
var items = new int[10000];
var value = 550_000;
for (int i = 0; i <= 9999; i++)
{
items[i] = value;
value -= 50;
}
stopwatch.Start();
bucket.ExceptWith(items);
stopwatch.Stop();
Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
stopwatch.Reset();
stopwatch.Start();
var ret = bucket.Contains(140_000);
stopwatch.Stop();
Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
stopwatch.Reset();
stopwatch.Start();
ret = bucket.Contains(545_001);
stopwatch.Stop();
Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
stopwatch.Reset();
}
Here is the Console print
Add timespan: 0.2357612s
except with timespan: 0.0035483s
contains with timespan: 2.1E-06s result: False
contains with timespan: 4.6E-06s result: True
Neo Version
Where in the software does this update applies to?
Summary or problem description
When there are many elements cached in the
FIFOSet, itsAddandExceptWithefficiency will become worse. It is very obvious during performance testing, in the TaskManager's RestartTask,neo/src/neo/Network/P2P/TaskManager.cs
Lines 140 to 142 in 44e7f3b
the ExceptWith method is used to exclude 16,000 of the 62,800 knownHashes. Each delete performance is 0.015s and totally spends 240s. The process will get stuck. That means under high overload, if the nodes are out of sync due to lacking of a large number of transactions, obtaining transactions will become extremely slow.
The reason is mainly the computational efficiency issue of the OrderedDictionary. The unit test below will show the performance for large number of elements.
Console prints
Do you have any solution you want to propose?
Currently, we only use
FIFOSetto cacheUInt256hashes. The elimination order is based on the sequence of the hashes. In this case the elimination order is not critical. Therefore, we can use the List of HashSet instead, create a List containing multiple HashSets, and each element is stored in the HashSet in chronological order. When the previous HashSet is full, it starts to load the next HashSet. If all HashSets are full, delete the first HashSet in the List and add a new empty HashSet at the end of the List to hold the newly added elements.The design of this class is as follows
The unit test shows this design is much faster than OrderedDictionary under the same input condition.
Here is the Console print
Neo Version
Where in the software does this update applies to?