Skip to content

Improve performance of excluded files filter#5157

Merged
SimplyDanny merged 7 commits intorealm:mainfrom
SimplyDanny:excluded-performance
Dec 25, 2024
Merged

Improve performance of excluded files filter#5157
SimplyDanny merged 7 commits intorealm:mainfrom
SimplyDanny:excluded-performance

Conversation

@SimplyDanny
Copy link
Collaborator

The current algorithm is like "collect all included files and subtract all excluded files". Collecting all included and all excluded files relies on the file system. This can become slow when the patterns used to exclude files resolve to a large number of files.

The new approach only collects all lintable files and checks them against the exclude patterns. This can be done by in-memory string-regex-match and does therefore not require file system accesses.

The most critical part is the conversion of glob patterns to regular expressions. I might have missed cases.

Fixes #5018.

@SimplyDanny SimplyDanny requested a review from keith August 5, 2023 20:04
@SwiftLintBot
Copy link

SwiftLintBot commented Aug 5, 2023

17 Messages
📖 Linting Aerial with this PR took 1.03s vs 1.04s on main (0% faster)
📖 Linting Alamofire with this PR took 1.27s vs 1.27s on main (0% slower)
📖 Linting Brave with this PR took 6.58s vs 6.58s on main (0% slower)
📖 Linting DuckDuckGo with this PR took 5.44s vs 5.5s on main (1% faster)
📖 Linting Firefox with this PR took 10.99s vs 11.03s on main (0% faster)
📖 Linting Kickstarter with this PR took 10.3s vs 10.3s on main (0% slower)
📖 Linting Moya with this PR took 0.53s vs 0.54s on main (1% faster)
📖 Linting NetNewsWire with this PR took 2.94s vs 2.96s on main (0% faster)
📖 Linting Nimble with this PR took 0.79s vs 0.8s on main (1% faster)
📖 Linting PocketCasts with this PR took 8.78s vs 8.71s on main (0% slower)
📖 Linting Quick with this PR took 0.45s vs 0.45s on main (0% slower)
📖 Linting Realm with this PR took 4.51s vs 4.51s on main (0% slower)
📖 Linting Sourcery with this PR took 2.33s vs 2.34s on main (0% faster)
📖 Linting Swift with this PR took 4.54s vs 4.55s on main (0% faster)
📖 Linting VLC with this PR took 1.28s vs 1.26s on main (1% slower)
📖 Linting Wire with this PR took 18.25s vs 18.31s on main (0% faster)
📖 Linting WordPress with this PR took 11.56s vs 11.6s on main (0% faster)

Generated by 🚫 Danger

@ileitch
Copy link

ileitch commented Nov 2, 2023

Periphery had a similar performance issue not long ago, and I noticed there wasn't a solid glob to regex implementation, so I ported Python's fnmatch to Swift: https://github.com/ileitch/swift-filename-matcher. It might be useful here too.

@jpsim
Copy link
Collaborator

jpsim commented Nov 2, 2023

Last time we tried to speed this up, it caused some slight differences in the result of what was matched vs not, so please be super careful here.

@SimplyDanny
Copy link
Collaborator Author

At the moment, I'm rather concerned here that normal runs without any excludes and includes seem to become much slower sometimes.

Periphery had a similar performance issue not long ago, and I noticed there wasn't a solid glob to regex implementation, so I ported Python's fnmatch to Swift: https://github.com/ileitch/swift-filename-matcher. It might be useful here too.

This is a very helpful tip. I don't want to invent a half-backed version myself. Thanks!

@SimplyDanny SimplyDanny force-pushed the excluded-performance branch 3 times, most recently from bfd249c to 6a0585a Compare November 20, 2023 22:30
@SimplyDanny SimplyDanny force-pushed the excluded-performance branch from a4e724b to a95044b Compare January 4, 2024 20:00
@JaviSoto
Copy link
Contributor

Any chance this can land? 🙏

@SimplyDanny SimplyDanny force-pushed the excluded-performance branch from ca66b0c to e5989f1 Compare March 11, 2024 19:29
@SimplyDanny
Copy link
Collaborator Author

Any chance this can land? 🙏

This is a critical change that needs thorough testing. Unfortunately, I'm lacking own projects with nifty included and excluded specifications.

@JaviSoto: In case this change took effect in your projects, I'd appreciate your feedback.

@SimplyDanny SimplyDanny force-pushed the excluded-performance branch from e5989f1 to dad51eb Compare March 20, 2024 22:22
@SimplyDanny SimplyDanny force-pushed the excluded-performance branch from 21c0add to 61acd6d Compare March 22, 2024 07:36
@SimplyDanny SimplyDanny force-pushed the excluded-performance branch from 61acd6d to 6ec4d52 Compare June 12, 2024 21:52
@SimplyDanny SimplyDanny force-pushed the excluded-performance branch 2 times, most recently from 28d212f to b60d907 Compare July 26, 2024 18:32
@nekrich
Copy link

nekrich commented Sep 13, 2024

@SimplyDanny, thanks for drastically improving SwfitLint's speed 💪.

On one of our projects, I noticed that - "Tuist/**/Plugin.swift" no longer excludes Tuist/Plugin.swift.

Project with ~400 files:

3.42s user 0.86s system 194% cpu 2.198 total
vs
34.77s user 8.40s system 222% cpu 19.390 total

Project with ~2000 files:

3.75s user 3.78s system 63% cpu 11.846 total
vs
31.94s user 20.20s system 133% cpu 39.146 total

@SimplyDanny
Copy link
Collaborator Author

On one of our projects, I noticed that - "Tuist/**/Plugin.swift" no longer excludes Tuist/Plugin.swift.

You're saying this is a regression?

Project with ~400 files:

3.42s user 0.86s system 194% cpu 2.198 total
vs
34.77s user 8.40s system 222% cpu 19.390 total

Project with ~2000 files:

3.75s user 3.78s system 63% cpu 11.846 total
vs
31.94s user 20.20s system 133% cpu 39.146 total

Down to ~10% of linting time is impressive. However, one sample with a single ** is not proof enough to merge this. Not sure how to motivate more people to test this.

@nekrich
Copy link

nekrich commented Oct 10, 2024

You're saying this is a regression?

Yes, a little.

Down to ~10% of linting time is impressive. However, one sample with a single ** is not proof enough to merge this. Not sure how to motivate more people to test this.

It's ~15 excludes with 7-8 globs.

Also, I found https://github.com/davbeck/swift-glob which probably will be even faster with async file matching.

@SimplyDanny SimplyDanny mentioned this pull request Oct 25, 2024
2 tasks
@leviathan
Copy link

leviathan commented Dec 2, 2024

Running SwiftLint on a rather large enterprise codebase with lots excluded files in SwiftLint configuration:

excluded:
  - .temp
  - Dependencies
  - .swiftpm
  - "**/*/.build"
  - fastlane
  - .fastlane
  - derivedData
  - "**/Build/**/*"
  - "**/Tests/**/*"
  - "**/Mocks/**/*"
  - "**/TestUtils/**/*"

Here are the results in comparison between SwiftLint main branch in this PR:

=> SwiftLint main branch (15.0 seconds)

❯ mint run swiftlint lint . --quiet
in /development/app-xxxx-ios on master [$] took 15.0s 

=> PR #5157 (3.5 seconds)

❯ mint run swiftlint lint . --quiet
in /development/app-xxxx-ios on master [$] took 3.5s 

@SimplyDanny
Copy link
Collaborator Author

I appreciate your performance reports! This really helps a lot to evaluate this critical change in real-world settings.

On one of our projects, I noticed that - "Tuist/**/Plugin.swift" no longer excludes Tuist/Plugin.swift.

You're saying this is a regression?

Yes, a little.

That's now fixed with the most recent update. Thanks @nekrich and @ileitch for your efforts!

Surprisingly, when removing the last two excluded items (the ones using **) from the config, this branch ran slower.

This concerns me. @CraigSiemens: I assume the project you ran SwiftLint on is not open source, is it? I'm really interested in what causes this slow down. I don't want to make the tool slightly slower in most common settings only to improve performance in edge cases.

@ileitch
Copy link

ileitch commented Dec 18, 2024

2.1 million Swift LOC codebase with 19 excluded patterns.

main at 8e3b50fa6

Time (mean ± σ):     61.294 s ±  0.302 s    [User: 31.682 s, System: 38.439 s]
Range (min … max):   60.934 s … 61.732 s    5 runs

This PR

Time (mean ± σ):     36.925 s ±  0.268 s    [User: 25.933 s, System: 16.765 s]
Range (min … max):   36.499 s … 37.184 s    5 runs

@SimplyDanny SimplyDanny force-pushed the excluded-performance branch 2 times, most recently from 6f63dd8 to 9c3791f Compare December 20, 2024 11:24
@SimplyDanny
Copy link
Collaborator Author

In summary, according to the reports, we see performance improvements down to 6% of main's runtime depending on the number of exclude patterns and the number of files they match. Other tests concluded 10%, 50% or 70%. This is impressive.

However, there are also constellations where the new approach of file matching is slightly slower. In my observations, this happens when the file system (or the file system API) has cached enough information. In this case, collecting all files is very fast, even faster than the regular expression matching. This is usually the case in performance tests and the reason why regex matching appears slower. In real word linting, these optimal conditions are not too common.

Seems like we can give this a try in the next release.

The current algorithm is like "collect all included files and subtract all excluded files".
Collecting all included and all excluded files relies on the file system. This can become slow
when the patterns used to exclude files resolve to a large number of files.

The new approach only collects all lintable files and checks them against the exclude patterns.
This can be done by in-memory string-regex-match and does therefore not require file system accesses.
@SimplyDanny SimplyDanny enabled auto-merge (squash) December 25, 2024 22:09
@SimplyDanny SimplyDanny merged commit 152355e into realm:main Dec 25, 2024
@SimplyDanny SimplyDanny deleted the excluded-performance branch December 26, 2024 11:37
@0xLucasMarcal
Copy link

@SimplyDanny Can we cut a new release? This is a nice fix

@weakfl
Copy link

weakfl commented Jan 14, 2025

@SimplyDanny

Unfortunately the new exclusion algorithm does not work if used with an Xcode build tool plugin that relies on a configuration file. I guess the path matching doesn't work correctly in that case.

Please note that I'm using my own build tool plugin variant (with the official SwiftLintBinary.artifactbundle.zip), which relies on a configuration file to determine which files should be linted.
The official build tool plugin relies on inputFiles of the target, which is not very useful if you want local swift packages to be linted too.

It does still work with the --use-alternative-excluding flag, which probably uses some variation of the old algorithm.

I'm not sure if you consider this a supported use case, so let me know if I should create an issue or not.

SimplyDanny added a commit to SimplyDanny/SwiftLint that referenced this pull request Jan 14, 2025
This reverts commit 152355e.

# Conflicts:
#	tools/oss-check
SimplyDanny added a commit to SimplyDanny/SwiftLint that referenced this pull request Jan 14, 2025
This reverts commit 152355e.

# Conflicts:
#	tools/oss-check
SimplyDanny added a commit that referenced this pull request Jan 15, 2025
This reverts commit 152355e from #5157.

# Conflicts:
#	tools/oss-check
@SimplyDanny
Copy link
Collaborator Author

SimplyDanny commented Jan 19, 2025

@SimplyDanny

Unfortunately the new exclusion algorithm does not work if used with an Xcode build tool plugin that relies on a configuration file. I guess the path matching doesn't work correctly in that case.

Please note that I'm using my own build tool plugin variant (with the official SwiftLintBinary.artifactbundle.zip), which relies on a configuration file to determine which files should be linted. The official build tool plugin relies on inputFiles of the target, which is not very useful if you want local swift packages to be linted too.

It does still work with the --use-alternative-excluding flag, which probably uses some variation of the old algorithm.

I'm not sure if you consider this a supported use case, so let me know if I should create an issue or not.

Everything should work as before with >=0.58.1, as I've reverted this change. Unfortunately, there are much more things to think thoroughly about before this optimization can be re-included in a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excluded files impact the performance of swiftlint