-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
I also did some basic performance tests in v2.0.0rc2 for the experimental StringDType but it seems to be much slower than "O" or "U" dtypes in certain cases, are you aware of these issues:
import numpy
import random
import timeit
print(numpy.__version__) # '2.0.0rc2'
options = ["a", "bb", "ccc", "dddd"]
lst = random.choices(options, k=1000)
arr_s = numpy.fromiter(lst, dtype="T", count=len(lst))
arr_o = numpy.fromiter(lst, dtype="O", count=len(lst))
arr_u = numpy.fromiter(lst, dtype="U5", count=len(lst))
print(timeit.timeit(lambda: numpy.unique(arr_s), number=10000)) # 4.270 <- why so much slower?
print(timeit.timeit(lambda: numpy.unique(arr_o), number=10000)) # 2.357
print(timeit.timeit(lambda: numpy.unique(arr_u), number=10000)) # 0.502
print(timeit.timeit(lambda: sorted(set(lst)), number=10000)) # 0.078Profiling (will include in a separate comment below) indicates that the overhead is from acquiring and releasing the allocator lock (it does it for every entry in the array because sorting is implemented using getitem).
We could improve performance for sorting specifically by adding a fast getitem path that assumes the allocator lock is already acquired and using it in the sorting implementation after acquiring the lock once.
Optimizing getitem and setitem would also help (maybe by adding a stringdtype scalar that interns the packed string).
Originally posted by @MarkusSintonen in #25693 (comment)