Skip to content

scripts/fix_allowlists.py: bulk-shift heuristic misses entries on multi-edit sequences #606

@aallan

Description

@aallan

Summary

scripts/fix_allowlists.py --fix is supposed to re-anchor stale line-number-keyed allowlist entries after edits to a documentation file shift line numbers. In practice, when a file receives multiple edits at different positions in a single session, the bulk-shift heuristic doesn't reliably re-anchor — it misses entries and can leave the allowlist in an inconsistent state with no error.

Concrete failure mode

This was hit twice during the PR #601 workflow, both times during routine edits to SKILL.md:

  1. Adding 3 lines at line 626 (an array_fold comment block). The script auto-shifted most subsequent entries by +3 but missed the entry at "Array built-in examples (line 749 → 752)". Manual patch needed.
  2. Adding 13 lines at line 415 (a new "Tuples" subsection). The script bulk-shifted some entries by +13 but left others stale. Several entries pointed at lines that didn't contain vera blocks any more; one entry was suppressing a parseable example.

In total the workflow needed to: (a) run fix_allowlists.py --fix, (b) write a custom Python script to bump 49 entries by +13, (c) hand-patch two stragglers the bulk script missed (#849 JSON, #1783 ANSI), and (d) delete one redundant entry surfaced by the shift (Non-exhaustive Match section). One mis-anchored entry (#1905 "Wrong: bare @Int + @Int") was only caught by CodeRabbit on PR review — --fix had silently moved the description to the wrong block.

Likely root cause

The current heuristic appears to be position-based (line-offset bumping). When two edits at different positions overlap, the script can't tell which edit applies to which entry. A content-based anchor — find each block by hashing the surrounding ~5 lines, then re-emit the line number from the live file — would be robust to multi-edit sequences.

Suggested investigation

  1. Audit the existing logic: what does fix_allowlists.py actually do when two edits land between runs?
  2. Replace line-offset bumping with content-fingerprint anchoring: each entry stores a small hash of the block content (or the surrounding context); the script re-derives line numbers by searching the live file for matching hashes.
  3. Consider whether the allowlist should be keyed by content hash directly (instead of line number) — slower to look up but immune to edits.

Acceptance

Related

  • #601 — PR where this surfaced; mis-anchored entry was caught by CodeRabbit, not by the script.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtoolingIssue around tooling built for the language (e.g. package managers, IDE plug-ins)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions