Skip to content

Store collector data in own cache, emit only hashes to PHPStan#319

Merged
janedbal merged 11 commits intomasterfrom
custom-usage-cache
Mar 30, 2026
Merged

Store collector data in own cache, emit only hashes to PHPStan#319
janedbal merged 11 commits intomasterfrom
custom-usage-cache

Conversation

@janedbal
Copy link
Copy Markdown
Member

@janedbal janedbal commented Mar 23, 2026

PHPStan's result cache becomes huge (up to 1GB for huge projects) because DCD collectors emit many JSON-serialized usage strings.

Introduce UsageCacheStorage that stores full collector data in %tmpDir%/dcd/ using content-addressable files. Collectors now emit only md5 hashes through PHPStan's mechanism, dramatically shrinking the result cache. Re-introduce class-boundary batching (flush at ClassMethodsNode) to further reduce the number of emits. GC of orphaned cache files runs automatically after each full analysis.

Inspired by my idea in phpstan/phpstan#14074 (comment)

@janedbal
Copy link
Copy Markdown
Member Author

janedbal commented Mar 23, 2026

Result cache size measurements:

  • shipmonk core (23k files)
    • Before: 840 MB
    • After: 360 MB (- 57 %)
  • phpstan-src (2k files)
    • Before: 69 MB
    • After: 31 MB (- 55 %)

@janedbal
Copy link
Copy Markdown
Member Author

janedbal commented Mar 26, 2026

shipmonk core:

First Full Run

Metric Before After Change % Change
Elapsed Time 4 min 3 s 4 min 2 s -1 s -0.41%
Used Memory 26.27 GB 25.26 GB -1.01 GB -3.84%

Reuse Cache Run

Metric Before After Change % Change
Elapsed Time 33.64 s 18.59 s -15.05 s -44.74%
Used Memory 4.34 GB 2.74 GB -1.60 GB -36.87%

phpstan-src:

First Full Run

Metric Before After Change % Change
Elapsed Time 21.20 s 21.28 s +0.08 s +0.38%
Used Memory 2.74 GB 2.69 GB -0.05 GB -1.82%

Reuse Cache Run

Metric Before After Change % Change
Elapsed Time 1.78 s 1.67 s -0.11 s -6.18%
Used Memory 430.49 MB 337 MB -93.49 MB -21.72%

@janedbal
Copy link
Copy Markdown
Member Author

janedbal commented Mar 26, 2026

dcd folder stats:

  • File count: 72,832 (in subfolders now)
  • Total size: 603 MB
  • Avg file size: ~5.5 KB

Size distribution:

  • < 1 KB: 37,301 (51%)
  • 1–10 KB: 29,120 (40%)
  • 10–100 KB: 5,843 (8%)
  • 100 KB–1 MB: 561 (~1%)
  • over 1 MB: 7

PHPStan's result cache becomes huge (up to 1GB) because DCD collectors
emit many JSON-serialized usage strings. This slows down every PHPStan
startup, even for partial analysis where DCD is disabled.

Introduce UsageCacheStorage that stores full collector data in
%tmpDir%/dcd/ using content-addressable files. Collectors now emit only
md5 hashes through PHPStan's mechanism, dramatically shrinking the
result cache. Re-introduce class-boundary batching (flush at
ClassMethodsNode) to further reduce the number of emits. GC of
orphaned cache files runs automatically after each full analysis.

Co-Authored-By: Claude Code
Content-addressable storage means identical hashes always produce
identical content, so concurrent writes to the same file are harmless.

Co-Authored-By: Claude Code
@janedbal janedbal force-pushed the custom-usage-cache branch from 68684e3 to 747cd39 Compare March 27, 2026 15:38
Split cache files into 256 subdirectories using the first 2 hex chars
of the hash as the folder name (e.g. dcd/ab/cdef1234....dat).

Co-Authored-By: Claude Code
When shipmonkDeadCode.cache.useOwnCache is true (default), collector
data is stored in DCD's own cache and only hashes are emitted to
PHPStan. When false, serialized data is emitted directly through
PHPStan's collector mechanism (original behavior).

Co-Authored-By: Claude Code
When switching from offloadCollectorData: true to false, the leftover
cache directory should still be cleaned up on the next full analysis.

Co-Authored-By: Claude Code
@janedbal janedbal marked this pull request as ready for review March 27, 2026 16:00
Better names since no file I/O happens when offloadCollectorData is
disabled — pack/unpack reflects the serialization semantics.

Co-Authored-By: Claude Code
@janedbal janedbal merged commit 0adc976 into master Mar 30, 2026
32 checks passed
@janedbal janedbal deleted the custom-usage-cache branch March 30, 2026 07:33
@staabm
Copy link
Copy Markdown
Contributor

staabm commented Mar 30, 2026

nice!

Comment thread src/Cache/UsageCacheStorage.php
@ruudk
Copy link
Copy Markdown
Contributor

ruudk commented Apr 7, 2026

Awesome. Should this be documented? The readme could maybe also instruct how this should be cached on CI?

@janedbal
Copy link
Copy Markdown
Member Author

janedbal commented Apr 7, 2026

Should this be documented?

I dont think so, it works seamlessly for official documented CI setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants