Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: python/cpython
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: eendebakpt/cpython
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: gh-149079-find-nfc-index
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 1 commit
  • 1 file changed
  • 2 contributors

Commits on Jun 2, 2026

  1. gh-149079: Speed up find_nfc_index in unicodedata composition

    find_nfc_index() linearly scanned the nfc_first / nfc_last reindex tables
    (nfc_first has ~225 entries) for every starter character during NFC/NFKC
    composition. The tables are sorted ascending by .start with disjoint
    [start, start+count] ranges, so replace the scan with a galloping
    (exponential + binary) search: it stays as cheap as the linear scan for
    low-codepoint starters near the front of the table (e.g. Latin, the common
    case) while being logarithmic for codepoints deeper in the table.
    
    Interleaved benchmark of unicodedata.normalize('NFC', ...) on decomposed
    input (geomean 1.80x faster):
      combining-mark runs   1.88x faster
      decomposed Latin      ~flat (no regression)
      decomposed Greek      2.83x faster
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
    eendebakpt and claude committed Jun 2, 2026
    Configuration menu
    Copy the full SHA
    a6ce109 View commit details
    Browse the repository at this point in the history
Loading