Skip to content

feat(core): targeted file scanner#6614

Merged
arendjr merged 9 commits intobiomejs:nextfrom
arendjr:targeted-file-scanner
Jun 29, 2025
Merged

feat(core): targeted file scanner#6614
arendjr merged 9 commits intobiomejs:nextfrom
arendjr:targeted-file-scanner

Conversation

@arendjr
Copy link
Copy Markdown
Contributor

@arendjr arendjr commented Jun 28, 2025

Summary

Implemented a more targeted version of the scanner, which ensures that if you provide file paths to handle on the CLI, the scanner will exclude directories that are not relevant to those paths.

Note that for many commands, such as biome check and biome format, the file paths to handle are implicitly set to the current working directory if you do not provide any path explicitly. The targeted scanner also works with such implicit paths, which means that if you run Biome from a subfolder, other folders that are part of the project are automatically exempted.

Use cases where you invoke Biome from the root of the project without providing a path, as well as those where project rules are enabled, are not expected to see performance benefits from this.

Fixed #6234.
Fixed #6483.
Fixed #6563.

Test Plan

I have added a bunch of automated tests, but could use a little bit of extra verification from others as well.

Even though we could technically push this as a patch release, there may be subtle behavioral changes. I don't expect any issues, but just in case I would prefer to push this as a minor instead.

@arendjr arendjr requested review from a team June 28, 2025 19:57
@arendjr arendjr added the A-Project Area: project label Jun 28, 2025
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 28, 2025

🦋 Changeset detected

Latest commit: 90d8516

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
@biomejs/biome Minor
@biomejs/cli-win32-x64 Minor
@biomejs/cli-win32-arm64 Minor
@biomejs/cli-darwin-x64 Minor
@biomejs/cli-darwin-arm64 Minor
@biomejs/cli-linux-x64 Minor
@biomejs/cli-linux-arm64 Minor
@biomejs/cli-linux-x64-musl Minor
@biomejs/cli-linux-arm64-musl Minor
@biomejs/wasm-web Minor
@biomejs/wasm-bundler Minor
@biomejs/wasm-nodejs Minor
@biomejs/backend-jsonrpc Patch
@biomejs/js-api Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the A-CLI Area: CLI label Jun 28, 2025
@ematipico
Copy link
Copy Markdown
Member

I think the second issue you linked is incorrect

@arendjr
Copy link
Copy Markdown
Contributor Author

arendjr commented Jun 28, 2025

Good catch, fixed.

Copy link
Copy Markdown
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the tests and the implementation, and overall I think that want, good job! I left a comment regarding the --config-path implementation, where I have some doubts.

It would be great if we could add new tests in the monorepo cases, where we use --config-path.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start documenting the scanner's behaviour on our website? Imho, we should. Not just for the users, but for us too, so we document the business logic of Biome, and when PRs are raised against it, we can verify if issue is in line with the docs or not.

If we shouldn't, then we should at least document it internally somewhere. At the moment, the business logic is scattered across files/modules, so having a central place to document the business logic might help us in the future

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. I think we can document it on the website, we just need to be careful to be not too specific to the point that people may start to rely on implementation details where it would become a breaking change if we improve the behaviour.

Comment on lines +883 to +884
// directory and configuration directory, we use whichever one is higher
// up in the file system.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's the correct expectations? Let's say we call the CLI from /Users/app with biome check --config-path=webapp/biome.json, it means we have two directories

  • /Users/app which is fs.working_directory
  • /Users/app/webapp which is the directory from --config-path

Based on this logic, /Users/app is chosen as project_dir (based on the comment), however this folder isn't the real biome project. So I would expect /Users/app/webapp to be my working directory where we scan the project and where the root configuration is.

Now, I may have misunderstood the comment or the new functionality, so could you please clarify it for me?

EDIT: I know it's kind of an edge case, however, we shouldn't forget that Biome can be installed globally, which means that via CLI a user could run biome from the root of the project, even though the main configuration is in a nested folder.

Copy link
Copy Markdown
Contributor Author

@arendjr arendjr Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it’s questionable, but there was an existing test case that said /Users/app must be used in such a case, so I stuck to it. It means users are not forced to have biome.json in the root of their repository, which I suppose some people may rely upon.

But indeed, there is a downside that potentially we register the wrong folder in such a case. But indeed in terms of behaviour I think nothing should change.

Comment thread crates/biome_cli/src/commands/mod.rs
return Some(ScanKind::NoScanner);
} else {
return Some(ScanKind::TargetedKnownFiles {
target_paths: vec![BiomePath::new(working_dir.join(path))],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a change in functionality. Until now, we haven't tinkered with the paths passed via stdin, so if this change may be what we want, we should document it (update the docs) and possibly the doc comments of --stdin-file-path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Pretty sure it is what we want though, as users who provide --stdin-file-path expect the path to have meaning. This allows us to apply the configuration that would be applied if the file at the given path were checked normally (ie. without using stdin).

Comment thread crates/biome_service/src/diagnostics.rs Outdated
Comment thread crates/biome_service/src/workspace/scanner.rs Outdated
Comment on lines +301 to +310
if self
.workspace
.is_path_ignored(IsPathIgnoredParams {
project_key: self.project_key,
path: path.clone(),
// The scanner only cares about the top-level
// `files.includes`
features: FeaturesBuilder::new().build(),
})
.unwrap_or_default()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, not for this PR, but I intend to create a new method inside the workspace that queries only files.includes, and it doesn't check for paths ignored via VCS.

That should increase performance for this particular case.

@arendjr arendjr changed the base branch from main to next June 29, 2025 19:21
@arendjr arendjr merged commit 0840021 into biomejs:next Jun 29, 2025
13 checks passed
@github-actions github-actions Bot mentioned this pull request Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Project Area: project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Major regression in minimum formatting time on Biome 2.x 🐛 Monorepo child config extends not working 📎 Implement targeted light scanner

2 participants