A Clang plugin that parses every call to CPython's custom format functions
(PyErr_Format, PyUnicode_FromFormat, etc.), type-checks each argument
against its format specifier, and reports mismatches. Both standard C specs
(%s, %d, %lu, …) and CPython-specific specs (%R, %S, %A, %U,
%T, %N, %V) are understood.
| Tool | Version |
|---|---|
| clang / clang++ | 21 |
| cmake | ≥ 3.15 |
| Python | ≥ 3.11 (for run_checker.py) |
compile_commands.json |
generated by bear -- make or cmake |
LLVM 21 headers are expected at /usr/lib/llvm-21/lib/cmake/. Adjust the
HINTS paths in CMakeLists.txt if your installation differs.
# From the CPython root
cd Tools/py-format-checker
mkdir -p build && cd build
cmake ..
make -j$(nproc)This produces build/PyFormatChecker.so.
If you have not yet generated compile_commands.json for the CPython tree:
# In the CPython root (after ./configure)
pip install bear # or: apt install bear
bear -- make -j$(nproc)# From the CPython root
python3 Tools/py-format-checker/run_checker.py [output_file] [--jobs N]run_checker.py replays every entry in compile_commands.json through
clang-21 with the plugin loaded, in parallel. Results are written to
Tools/py-format-checker/py_format_report.txt by default (or the path you supply as the first argument).
| Flag | Default | Description |
|---|---|---|
output_file |
reports/py_format_report[_<target>].txt |
Output file (auto-named by target) |
--jobs N |
cpu_count |
Parallel workers |
--plugin PATH |
build/PyFormatChecker.so |
Override plugin path |
--db PATH |
compile_commands.json |
Override compilation database |
--target TRIPLE |
host | Clang target triple for type-size checks |
--verbose |
— | Print all call-sites (sets PY_FMT_ERROR_ONLY=0) |
| Variable | Default | Description |
|---|---|---|
PY_FMT_ERROR_ONLY |
1 |
Set to 0 to print all call-sites, not just those with at least one mismatch |
PY_FMT_INTEGRAL_CHECK_MODE |
standard |
Integer width/sign checking: off — accept any integer; standard — bit-width must match (C99, signedness ignored); full — both bit-width and signedness must match |
| Status | Meaning |
|---|---|
ok |
Type matches the spec |
MISMATCH got=X want=<sentinel> |
Wrong type |
MISSING_ARG want=<sentinel> |
Fewer arguments than format specs |
SURPLUS N arg(s) |
More arguments than format specs |
UNKNOWN_SPEC |
Unrecognized/unsupported format spec (e.g. %y) |
<sentinel> is either a standard C type (e.g. long, const char *) or a special placeholder like <PyObject*> or <any-int> that the plugin understands and checks for. See the source for the full list of supported sentinels and their checks.
When a mismatch involves an integer conversion spec (%d, %i, %u, %o,
%x, %X) the plugin emits a hint: line showing the corrected
format string with the proper length modifier (e.g. %d → %zd for a
Py_ssize_t argument). In full mode the conversion character is also
corrected for signedness (e.g. %d → %u for an unsigned type).
Passing --target changes how Clang resolves type sizes without requiring the
plugin itself to be cross-compiled. This is useful for catching bugs (such as
using %ld for an off_t that is 64-bit even on 32-bit Linux when
_FILE_OFFSET_BITS=64 is set) that are invisible on a 64-bit host.
sudo apt install gcc-multilib libc6-dev-i386Without these, Clang cannot resolve system typedefs (int64_t, size_t, …)
for the i686 target and the checker will report spurious <dependent type>
mismatches instead of real type names.
python3 Tools/py-format-checker/run_checker.py \
--target=i686-linux-gnu -j$(nproc)
# Output: Tools/py-format-checker/reports/py_format_report_i686-linux-gnu.txtCompare 64-bit and 32-bit results:
python3 Tools/py-format-checker/run_checker.py -j$(nproc)
# Output: reports/py_format_report.txt (host/64-bit)
diff Tools/py-format-checker/reports/py_format_report.txt \
Tools/py-format-checker/reports/py_format_report_i686-linux-gnu.txtThe following functions are recognised. Static/file-local helpers include a filename filter so calls from unrelated translation units are not matched.
| Function | Format arg | File constraint |
|---|---|---|
_PyErr_FormatNote |
0 | — |
PyUnicode_FromFormat |
0 | — |
PySys_FormatStdout / PySys_FormatStderr |
0 | — |
PyErr_Format |
1 | — |
_PyErr_FormatFromCause |
1 | — |
_Py_FatalErrorFormat |
1 | — |
PyUnicodeWriter_Format |
1 | — |
PyBytesWriter_Format |
1 | — |
_PyXIData_FormatNotShareableError |
1 | — |
_abiinfo_raise |
1 | modsupport.c |
_PyTokenizer_syntaxerror |
1 | — |
_PyErr_Format |
2 | — |
_PyErr_FormatFromCauseTstate |
2 | — |
PyErr_WarnFormat |
2 | — |
PyErr_ResourceWarning |
2 | — |
_PyCompile_Error / _PyCompile_Warn |
2 | — |
_PyTokenizer_parser_warn |
2 | — |
task_set_error_soon |
3 | _asynciomodule.c |
format_notshareableerror |
3 | crossinterp |
_PyTokenizer_syntaxerror_known_range |
3 | — |
PyErr_WarnExplicitFormat |
5 | — |
| Spec | Expected C type |
|---|---|
%R, %S, %A, %U, %T, %#T |
PyObject * (any Py*-typed pointer) |
%N, %#N |
PyTypeObject * |
%V |
PyObject * + const char * (two arguments) |
%lV |
PyObject * + const wchar_t * |
%ls |
const wchar_t * |
Standard specs (%s, %d, %u, %ld, %zd, %p, %x, %o, …) and
* width/precision arguments are also supported.
An argument satisfies <PyObject*> if any of the following hold:
- Its pointee typedef name starts with
Pyor_Py(covers all public API types before canonical unwrapping). - Its pointee struct's first field is named
ob_base(structuralPyObject_HEAD/PyObject_VAR_HEADcheck — covers internal types such asTaskObj,FutureObj,buffered,ElementObject, etc. without maintaining an explicit name list).
An argument satisfies <PyTypeObject*> if any of the following hold:
- Its pointee typedef name is
PyTypeObject(covers all public API types before canonical unwrapping). - Its pointee struct's name is
_typeobject.
C enum types have an implementation-defined underlying integer type. The
plugin resolves any enum argument to its compiler-chosen underlying integer
type before performing width and signedness checks, so e.g. an enum backed
by unsigned long is correctly matched against %lu and not %u. Incomplete
enums (no underlying type yet) are accepted to avoid false positives.
Edit the kFormatFuncs map near the top of py_format_checker.cpp:
{"my_format_helper", {2, "mymodule.c"}},
// ^ ^
// | optional filename substring; nullptr = any file
// 0-based index of the format-string argumentIf the function takes a va_list instead of ..., add its name to
kVaListFuncs as well (format string is still parsed, but individual
argument types cannot be checked).
Rebuild the plugin after any source change:
cd Tools/py-format-checker/build && make -j$(nproc)