Pandas version checks
Reproducible Example
import threading
from concurrent.futures import ThreadPoolExecutor
from io import StringIO
from textwrap import dedent
import numpy as np
from pandas import read_csv
# From statsmodels.datasets.longley
longley_csv = StringIO(
dedent(
""""Obs","GNPDEFL","GNP","UNEMP","ARMED","POP","YEAR"
1,83,234289,2356,1590,107608,1947
2,88.5,259426,2325,1456,108632,1948
3,88.2,258054,3682,1616,109773,1949
4,89.5,284599,3351,1650,110929,1950
5,96.2,328975,2099,3099,112075,1951
6,98.1,346999,1932,3594,113270,1952
7,99,365385,1870,3547,115094,1953
8,100,363112,3578,3350,116219,1954
9,101.2,397469,2904,3048,117388,1955
10,104.6,419180,2822,2857,118734,1956
11,108.4,442769,2936,2798,120445,1957
12,110.8,444546,4681,2637,121950,1958
13,112.6,482704,3813,2552,123366,1959
14,114.2,502601,3931,2514,125368,1960
15,115.7,518173,4806,2572,127852,1961
16,116.9,554894,4007,2827,130081,1962
"""
)
)
data = read_csv(longley_csv).iloc[:, [1, 2, 3, 4, 5, 6]].astype(float)
n_threads = 4
b = threading.Barrier(n_threads)
def closure():
b.wait()
np.testing.assert_array_equal(data, data)
with ThreadPoolExecutor(max_workers=n_threads) as tpe:
futures = [tpe.submit(closure) for _ in range(n_threads)]
[f.result() for f in futures]
Issue Description
@kumaraditya303 originally discovered this issue working on free-threaded support for statsmodels, running their test suite under pytest-run-parallel - a pytest plugin that runs tests repeatedly in a thread pool to shake out thread safety issues.
I built pandas and all dependencies, including CPython, using thread sanitizer. See https://py-free-threading.github.io/thread_sanitizer/ for more about using thread sanitizer with Python projects. An annoying aspect of this is that you need to use the same compiler stack to compile everything. On a Mac it's kind of complicated to set this up for Pandas because of the dependency on SciPy, which needs OpenMP, which also means you can't use the Apple clang compilers to build everything. I shared instructions for doing this on the issue tracker our team uses to track ecosystem issues on the free-threaded build.
If you run the above script on the free-threaded build with Pandas compiled using TSan, you may see race reports like this:
Details
==================
WARNING: ThreadSanitizer: data race (pid=56827)
Write of size 4 at 0x0001101d0134 by thread T3:
#0 __pyx_f_6pandas_5_libs_5index_11IndexEngine__ensure_mapping_populated index.pyx.c:15822 (index.cpython-314t-darwin.so:arm64+0x2416c)
#1 __pyx_f_6pandas_5_libs_5index_11IndexEngine_get_loc index.pyx.c:12938 (index.cpython-314t-darwin.so:arm64+0x1c238)
#2 __pyx_pw_6pandas_5_libs_5index_11IndexEngine_3__contains__ index.pyx.c:12430 (index.cpython-314t-darwin.so:arm64+0x454ac)
#3 PySequence_Contains abstract.c:2235 (libpython3.14t.dylib:arm64+0x60844)
#4 _PyEval_EvalFrameDefault generated_cases.c.h:5017 (libpython3.14t.dylib:arm64+0x2849fc)
#5 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#6 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#7 maybe_call_special_one_arg typeobject.c:3036 (libpython3.14t.dylib:arm64+0x1886e8)
#8 slot_sq_contains typeobject.c:10003 (libpython3.14t.dylib:arm64+0x1abf90)
#9 PySequence_Contains abstract.c:2235 (libpython3.14t.dylib:arm64+0x60844)
#10 _PyEval_EvalFrameDefault generated_cases.c.h:5017 (libpython3.14t.dylib:arm64+0x2849fc)
#11 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#12 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#13 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#14 _Py_slot_tp_getattr_hook typeobject.c (libpython3.14t.dylib:arm64+0x196858)
#15 PyObject_GetOptionalAttr object.c (libpython3.14t.dylib:arm64+0x139fd0)
#16 PyArray_FromInterface <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b3430)
#17 _array_from_array_like <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b2bb4)
#18 PyArray_DiscoverDTypeAndShape_Recursive <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1854b0)
#19 PyArray_DiscoverDTypeAndShape <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x18655c)
#20 PyArray_FromAny_int <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b4cdc)
#21 PyArray_CheckFromAny_int <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b5f44)
#22 _array_fromobject_generic <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x2303ac)
#23 array_asanyarray <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x226f98)
#24 cfunction_vectorcall_FASTCALL_KEYWORDS methodobject.c:465 (libpython3.14t.dylib:arm64+0x12fe90)
#25 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#26 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27c1cc)
#27 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#28 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#29 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x909a4)
#30 context_run context.c:722 (libpython3.14t.dylib:arm64+0x2c21f8)
#31 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (libpython3.14t.dylib:arm64+0xa2fcc)
#32 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#33 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27c1cc)
#34 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#35 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#36 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x909a4)
#37 _PyObject_Call call.c:348 (libpython3.14t.dylib:arm64+0x8c228)
#38 PyObject_Call call.c:373 (libpython3.14t.dylib:arm64+0x8c2a0)
#39 thread_run _threadmodule.c:359 (libpython3.14t.dylib:arm64+0x416930)
#40 pythread_wrapper thread_pthread.h:242 (libpython3.14t.dylib:arm64+0x366d04)
Previous write of size 4 at 0x0001101d0134 by thread T6:
#0 __pyx_f_6pandas_5_libs_5index_11IndexEngine__ensure_mapping_populated index.pyx.c:15822 (index.cpython-314t-darwin.so:arm64+0x2416c)
#1 __pyx_f_6pandas_5_libs_5index_11IndexEngine_get_loc index.pyx.c:12938 (index.cpython-314t-darwin.so:arm64+0x1c238)
#2 __pyx_pw_6pandas_5_libs_5index_11IndexEngine_3__contains__ index.pyx.c:12430 (index.cpython-314t-darwin.so:arm64+0x454ac)
#3 PySequence_Contains abstract.c:2235 (libpython3.14t.dylib:arm64+0x60844)
#4 _PyEval_EvalFrameDefault generated_cases.c.h:5017 (libpython3.14t.dylib:arm64+0x2849fc)
#5 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#6 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#7 maybe_call_special_one_arg typeobject.c:3036 (libpython3.14t.dylib:arm64+0x1886e8)
#8 slot_sq_contains typeobject.c:10003 (libpython3.14t.dylib:arm64+0x1abf90)
#9 PySequence_Contains abstract.c:2235 (libpython3.14t.dylib:arm64+0x60844)
#10 _PyEval_EvalFrameDefault generated_cases.c.h:5017 (libpython3.14t.dylib:arm64+0x2849fc)
#11 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#12 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#13 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#14 _Py_slot_tp_getattr_hook typeobject.c (libpython3.14t.dylib:arm64+0x196858)
#15 PyObject_GetOptionalAttr object.c (libpython3.14t.dylib:arm64+0x139fd0)
#16 PyArray_FromInterface <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b3430)
#17 _array_from_array_like <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b2bb4)
#18 PyArray_DiscoverDTypeAndShape_Recursive <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1854b0)
#19 PyArray_DiscoverDTypeAndShape <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x18655c)
#20 PyArray_FromAny_int <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b4cdc)
#21 PyArray_CheckFromAny_int <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x1b5f44)
#22 _array_fromobject_generic <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x2303ac)
#23 array_asanyarray <null> (_multiarray_umath.cpython-314t-darwin.so:arm64+0x226f98)
#24 cfunction_vectorcall_FASTCALL_KEYWORDS methodobject.c:465 (libpython3.14t.dylib:arm64+0x12fe90)
#25 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#26 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27c1cc)
#27 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#28 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#29 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x909a4)
#30 context_run context.c:722 (libpython3.14t.dylib:arm64+0x2c21f8)
#31 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (libpython3.14t.dylib:arm64+0xa2fcc)
#32 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8bf74)
#33 _PyEval_EvalFrameDefault generated_cases.c.h:1619 (libpython3.14t.dylib:arm64+0x27c1cc)
#34 _PyEval_Vector ceval.c:2083 (libpython3.14t.dylib:arm64+0x277eac)
#35 _PyFunction_Vectorcall call.c (libpython3.14t.dylib:arm64+0x8c5b8)
#36 method_vectorcall classobject.c:73 (libpython3.14t.dylib:arm64+0x909a4)
#37 _PyObject_Call call.c:348 (libpython3.14t.dylib:arm64+0x8c228)
#38 PyObject_Call call.c:373 (libpython3.14t.dylib:arm64+0x8c2a0)
#39 thread_run _threadmodule.c:359 (libpython3.14t.dylib:arm64+0x416930)
#40 pythread_wrapper thread_pthread.h:242 (libpython3.14t.dylib:arm64+0x366d04)
Thread T3 (tid=106798504, running) created by main thread at:
#0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64+0x30e28)
#1 do_start_joinable_thread thread_pthread.h:289 (libpython3.14t.dylib:arm64+0x365964)
#2 PyThread_start_joinable_thread thread_pthread.h:331 (libpython3.14t.dylib:arm64+0x3657a8)
#3 do_start_new_thread _threadmodule.c:1868 (libpython3.14t.dylib:arm64+0x4165cc)
#4 thread_PyThread_start_joinable_thread _threadmodule.c:1991 (libpython3.14t.dylib:arm64+0x415298)
#5 cfunction_call methodobject.c:564 (libpython3.14t.dylib:arm64+0x130ad8)
#6 _PyObject_MakeTpCall call.c:242 (libpython3.14t.dylib:arm64+0x8b46c)
#7 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8c008)
#8 _PyEval_EvalFrameDefault generated_cases.c.h:2959 (libpython3.14t.dylib:arm64+0x27fb80)
#9 PyEval_EvalCode ceval.c:975 (libpython3.14t.dylib:arm64+0x277ac0)
#10 run_mod pythonrun.c:1459 (libpython3.14t.dylib:arm64+0x344e70)
#11 _PyRun_SimpleFileObject pythonrun.c:521 (libpython3.14t.dylib:arm64+0x340788)
#12 _PyRun_AnyFileObject pythonrun.c:81 (libpython3.14t.dylib:arm64+0x33fefc)
#13 pymain_run_file main.c:429 (libpython3.14t.dylib:arm64+0x380fd8)
#14 Py_RunMain main.c:775 (libpython3.14t.dylib:arm64+0x380370)
#15 pymain_main main.c:805 (libpython3.14t.dylib:arm64+0x380870)
#16 Py_BytesMain main.c:829 (libpython3.14t.dylib:arm64+0x380970)
#17 main <null> (python3.14:arm64+0x100000738)
Thread T6 (tid=106798507, running) created by main thread at:
#0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64+0x30e28)
#1 do_start_joinable_thread thread_pthread.h:289 (libpython3.14t.dylib:arm64+0x365964)
#2 PyThread_start_joinable_thread thread_pthread.h:331 (libpython3.14t.dylib:arm64+0x3657a8)
#3 do_start_new_thread _threadmodule.c:1868 (libpython3.14t.dylib:arm64+0x4165cc)
#4 thread_PyThread_start_joinable_thread _threadmodule.c:1991 (libpython3.14t.dylib:arm64+0x415298)
#5 cfunction_call methodobject.c:564 (libpython3.14t.dylib:arm64+0x130ad8)
#6 _PyObject_MakeTpCall call.c:242 (libpython3.14t.dylib:arm64+0x8b46c)
#7 PyObject_Vectorcall call.c:327 (libpython3.14t.dylib:arm64+0x8c008)
#8 _PyEval_EvalFrameDefault generated_cases.c.h:3227 (libpython3.14t.dylib:arm64+0x280b44)
#9 PyEval_EvalCode ceval.c:975 (libpython3.14t.dylib:arm64+0x277ac0)
#10 run_mod pythonrun.c:1459 (libpython3.14t.dylib:arm64+0x344e70)
#11 _PyRun_SimpleFileObject pythonrun.c:521 (libpython3.14t.dylib:arm64+0x340788)
#12 _PyRun_AnyFileObject pythonrun.c:81 (libpython3.14t.dylib:arm64+0x33fefc)
#13 pymain_run_file main.c:429 (libpython3.14t.dylib:arm64+0x380fd8)
#14 Py_RunMain main.c:775 (libpython3.14t.dylib:arm64+0x380370)
#15 pymain_main main.c:805 (libpython3.14t.dylib:arm64+0x380870)
#16 Py_BytesMain main.c:829 (libpython3.14t.dylib:arm64+0x380970)
#17 main <null> (python3.14:arm64+0x100000738)
SUMMARY: ThreadSanitizer: data race index.pyx.c:15822 in __pyx_f_6pandas_5_libs_5index_11IndexEngine__ensure_mapping_populated
==================
You may have to run the script repeatedly in a loop to see it. How likely the race is depends on CPU timing so different machines may not hit this. I'm testing on an M3 Pro Macbook Pro.
The issue in this race is that the implementation of _ensure_mapping_populated isn't thread safe:
|
if not self.is_mapping_populated: |
|
|
|
values = self.values |
|
self.mapping = self._make_hash_table(len(values)) |
|
self.mapping.map_locations(values, self.mask) |
|
|
|
if len(self.mapping) == len(values): |
|
self.unique = 1 |
|
|
|
self.need_unique_check = 0 |
In addition, I've also seen races in cases that involve mutating a dataframe generating the _as_array attribute of BlockPlacement, which isn't thread-safe:
|
if not self._has_array: |
|
start, stop, step, _ = slice_get_indices_ex(self._as_slice) |
|
# NOTE: this is the C-optimized equivalent of |
|
# `np.arange(start, stop, step, dtype=np.intp)` |
|
self._as_array = cnp.PyArray_Arange(start, stop, step, NPY_INTP) |
|
self._has_array = True |
|
|
|
return self._as_array |
@kumaraditya303 also saw races in the BlockManager implementation:
- get_slice:
The assignment of _blknos in _slice_mgr_rows is not thread safe
- _rebuild_blknos_and_blklocs:
This does lazy initialization of _blknos which is not thread safe
- _ensure_has_slice: Called from a number of places and lazily initializes
_as_slice which is not thread safe
I think for all of these patterns, Pandas should be using a low-level single-initialization API that ensures only one thread can populate these lazy attributes: https://cython.readthedocs.io/en/latest/src/userguide/freethreading.html#use-c-for-low-level-synchronization-primitives. It might require using C++ mode in Cython to be able to use the C++ standard library. Unfortunately MSVC support for C standard library threading primitives is very recent and probably can't be relied on in general yet.
Expected Behavior
No data races reported by TSan.
Installed Versions
Details
INSTALLED VERSIONS
commit : 944c527
python : 3.14.2+
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Wed Nov 5 21:33:32 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_ARM64_T6030
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 3.0.0.dev0+2809.g944c527c0a
numpy : 2.5.0.dev0+git20260113.60057c0
dateutil : 2.9.0.post0
pip : 25.3
Cython : 3.2.4
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : 2026.1.0
html5lib : None
hypothesis : 6.142.2
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : N/A
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : 1.18.0.dev0+git20260113.591f83e
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
@kumaraditya303 originally discovered this issue working on free-threaded support for statsmodels, running their test suite under pytest-run-parallel - a pytest plugin that runs tests repeatedly in a thread pool to shake out thread safety issues.
I built pandas and all dependencies, including CPython, using thread sanitizer. See https://py-free-threading.github.io/thread_sanitizer/ for more about using thread sanitizer with Python projects. An annoying aspect of this is that you need to use the same compiler stack to compile everything. On a Mac it's kind of complicated to set this up for Pandas because of the dependency on SciPy, which needs OpenMP, which also means you can't use the Apple clang compilers to build everything. I shared instructions for doing this on the issue tracker our team uses to track ecosystem issues on the free-threaded build.
If you run the above script on the free-threaded build with Pandas compiled using TSan, you may see race reports like this:
Details
You may have to run the script repeatedly in a loop to see it. How likely the race is depends on CPU timing so different machines may not hit this. I'm testing on an M3 Pro Macbook Pro.
The issue in this race is that the implementation of
_ensure_mapping_populatedisn't thread safe:pandas/pandas/_libs/index.pyx
Lines 347 to 356 in ff0cd9a
In addition, I've also seen races in cases that involve mutating a dataframe generating the
_as_arrayattribute ofBlockPlacement, which isn't thread-safe:pandas/pandas/_libs/internals.pyx
Lines 146 to 153 in 14dbf05
@kumaraditya303 also saw races in the
BlockManagerimplementation:The assignment of
_blknosin _slice_mgr_rows is not thread safeThis does lazy initialization of
_blknoswhich is not thread safe_as_slicewhich is not thread safeI think for all of these patterns, Pandas should be using a low-level single-initialization API that ensures only one thread can populate these lazy attributes: https://cython.readthedocs.io/en/latest/src/userguide/freethreading.html#use-c-for-low-level-synchronization-primitives. It might require using C++ mode in Cython to be able to use the C++ standard library. Unfortunately MSVC support for C standard library threading primitives is very recent and probably can't be relied on in general yet.
Expected Behavior
No data races reported by TSan.
Installed Versions
Details
INSTALLED VERSIONS
commit : 944c527
python : 3.14.2+
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Wed Nov 5 21:33:32 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_ARM64_T6030
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 3.0.0.dev0+2809.g944c527c0a
numpy : 2.5.0.dev0+git20260113.60057c0
dateutil : 2.9.0.post0
pip : 25.3
Cython : 3.2.4
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : 2026.1.0
html5lib : None
hypothesis : 6.142.2
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : N/A
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : 1.18.0.dev0+git20260113.591f83e
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None